Run-time Software Upgrading Framework for Mission Critical

Network Applications

Seung-Woo Hong, Seong Moon and Ho-Yong Ryu

ETRI (Electronics and Telecommunication Research Institute), 218 Gajeongno, Daejeon, Korea

Keywords: Live Software Upgrade, Run-time Upgrade, Mission-critical.

Abstract: In mission critical and safety software applications such as internet infrastructure, telecommunication,

military and medical applications, service continuity is very important. Since for these applications it is

unacceptable to shut-down and restart the system during software upgrade, run-time software upgrade

techniques, which are deployed for online maintenance and upgrades without shutdown the system, can

meet the demand for high levels of system availability and service continuity. However, upgrading an

application while it is running without shut-down is a complex process. The new and the old component

may differ in the functionality, interface, and performance. Only selected components of an application are

changed while the other parts of the application continue to function. It is important to safeguard the

software application’s integrity when changes are implemented at runtime. Various researchers have

employed different tactics to solve the problem of run-time software upgrade such as compiler-based

methods, hardware-based method, and analytic redundancy based. In order to ensure a reliable run-time

upgrade, we designed and implemented a software framework based run-time software upgrading method,

which has the ability to make runtime modification is considered at the software architecture-level. In this

paper, we present the software component architecture for run-time upgrade and software upgrade

procedure, and then show the implementation results.

1 INTRODUCTION

In information technology, high availability refers to

a system or component that is continuously

operational for a desirably long length of time. The

high availability for continuous service is important

in safety critical software applications such as

internet infrastructure, aero-space, tele-

communication, military and medical applications,

since monetary loss, interruption of service and

unpredictable damage can be caused with any

moment of software failure. However, software

change is unavoidable, because the software

requirements change, a bug is bound or optimisation

and enhancement of functionality is discovered. To

upgrade the software for these reasons, halting

execution of the existing software and restarting

with new one is inevitably involved, and those

upgrading approach results in software outage and

service interruption (Jeff, 1996; Tewksbury, 2001).

The purpose of run-time software upgrade

technique is to dynamically upgrade the behaviour

of a running software system without the software

outage and service interruption. Various researchers

have employed different tactics to solve the problem

of run-time software upgrade such as component

based (Jeff, 1996; Peyman, 1993), process based

(Deepak, 1993), analytic redundancy based

(Jonathan, 1999; Mike, 1996), distributed object

based (Tewksbury, 2001; Louise, 2000), dynamic

module based (Michael, 1997; Wilson, 1991; Donn

1990; Drossopoulou, 2002; Yu, 2002), and compiler

patch based (Chen, 2007; Fahmi, 2008; Neamtiu,

2006; Makris, 2009; Chen, 2016). But, in the real

fields most general and widely used way to achieve

the run-time software upgrade is through the use of

replication that is based on redundant hardware

(Deepak, 1993). The basic idea is that two machines

are available, ‘A’ and ‘B’. The both machines run

same application software, and software of machine

‘A’ runs as active role to provide actual application

service while ‘B’ runs as standby role. The state of

‘A’ and ‘B’ should be synchronized by some kind of

synchronization mechanism such as check-point

service. If the software needs to be upgraded,

standby B is brought up running the new software

Hong, S-W., Moon, S. and Ryu, H-Y.

Run-time Software Upgrading Framework for Mission Critical Network Applications.

DOI: 10.5220/0006355007270732

In Proceedings of the 19th International Conference on Enterprise Information Systems (ICEIS 2017) - Volume 2, pages 727-732

ISBN: 978-989-758-248-6

727

while active ‘A’ still provides the application

service. After upgrade of standby ‘B’, switch-over of

the role take place and B runs as active role to

provide new application service with new software.

The problem of redundant hardware basis is the

expensive solution and, it still has retaining problem

of transferring the state information, more precisely,

how to extract relevant state information from old

software, and transformed to be compatible with and

injected into new software (Yu, 2003).

In this article, we propose the run-time software

upgrading method that avoids temporary interruption

caused by a software upgrade by allowing the

system to be updated on-the-fly without hardware

redundancy. We focus on software architecture

based approach, and designed and implemented run-

time software upgrade framework. The proposed

framework provides dynamic software component

architecture, communication model between

dynamic software components and run-time module

upgrading procedure.

2 RUN-TIME UPGRADE

To support run-time software upgrade, we designed

run-time software upgrade framework. In the

framework, a software component is defined as

single software process that performs role of a

specific application role. Inter-process

communication between the software components is

done via message passing based on socket or

message queue. This section describes the internal

software architecture of dynamic component and

gives details of software upgrade procedure.

2.1 Software Component Architecture

Figure 1 shows internal architecture of dynamic

software components that supporting run-time

software upgrading. The dynamic software

component consists of two modules. The first one is

main task module, which is persistent and

unchangeable part of a component while the

component process is running. The other is a

dynamic implementation module that performs

application specific functions and can be updated

dynamically. Detailed characteristics of the modules

are described as follows:

Main Task Module: The module consists of

singe task thread, and the thread serves as main loop

to process incoming events such as request message

from other component, timer event caused by time

expiration and signal event from kernel. All events

are buffered by event-queue, and the single thread

dispatches the events one by one and processes it

sequentially. As mentioned before, this module is

unchangeable part of component, and it controls

upgrading procedure of user dynamic

implementation module, that is, it swaps old user

module with new one when updating dynamic

implementation module. In addition, it manages the

user module data to preserve the consistency of state

after update.

Dynamic Implementation Module: This

module, as dynamic module, includes all the

application specific implementation. It can be

updated on the fly while main task module is still

running. The module consists of task call-back

function and user module implementation. The task

call-back function includes a set of statically defined

call-back functions such as process-message,

process-timer and process-signal. Basically, it

connects task thread and module implementation.

The task thread re-maps reference of the task call-

back functions whenever dynamic module is

changed, and then the connection between main task

module and dynamic implementation module can be

retained. To support run-time evolution, we make

the module in form of packaged shared library in

order to load and unload dynamically in a run-time

environment.

Figure 1: Architecture for dynamic software component.

Figure 2 depicts an example of message

communication between dynamic component A and

B, and shows how the component processes

incoming request messages as bellow.

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

728

1. Component ‘A’ sends a request message via

certain kind of inter process communication

method. In case of the example, we use

socket with specific port number.

2. A socket bound to the port number is

triggered in the kernel, which notify to

component.

3. The task thread inputs the socket event into

FIFO event-queue.

4. The task thread also dispatches one event

from event-queue, and if the event is type of

message, it calls process-message call-back

function in task call-back functions module.

5. The called process-message call-back

function calls a user defined message

processing function, which is defined by

module implementation.

Component A

Main Task Module

Dynamic Implementation

Module

4. Task Thread

calls a task message

callback function

Task Callback Functions

(5. call implementation

processMessage function)

Dynamic Component B

3. Input

event-queue

2. Receive message

via socket

Module Data

Module Implementation

1. Send Message

Figure 2: Example of message processing.

Processing procedures for timer and signal event

almost same as the message except that user can

expired. To maintain data consistency between old

and new user module, we preserve user module data

inside main task module and the task thread provides

reference of data to user implementation module

when the user module is loaded. Therefore we don’t

have any synchronization of module data between

old module and newly upgraded module in our

upgrade framework.

2.2 Software Upgrade Procedure

To deal with module replacement, creation and

removal, we define a sequence of operations

between main the task module and the dynamic

implementation module as blows:

 INIT: the task tread calls this operation to user

module when dynamic user module is initially

loaded in order to make the new user module

to initialize application specific functions.

 TERM: the task tread calls this operation to

user module when upgrade is needed or when

process need to be terminated. The dynamic

user module should stop its work immediately.

 HOLD: the task tread calls this operation to

user module before starting upgrade, which

requests that the user module should be in

quiescent state.

 RESTART: the task tread calls this operation

to user module after loading the new user

module, which inform that the new user

module restart its function with restored data

reference.

Figure 3 shows the procedure for unloading old

dynamic implementation module and loading new

dynamic implementation module. First, to upgrade

to new dynamic module, new dynamic module need

to be compiled as shared library and located in

software repository. And then, user can trigger

upgrade procedure by command line interface which

sends upgrade request message to target software

component. The task thread of the target component

starts upgrade procedure as blows:

1. The task tread of the target component

checks the incoming message, and if the

message is upgrade request, then it starts

dynamic upgrade procedure.

2. The task thread calls TERM operation to

stop the old module, and subsequently it

calls HOLD operation to make the old

module to be in quiescent state.

3. The task thread unloads old module from

memory. Function of target software

component is temporarily stopped while the

upgrading, but no incoming events are lost

since those events will be queued in event-

queue.

4. After then, the task thread load new module

package into memory. Remapping procedure

between main task and new user module is

described in Figure 4.

5. After loading the new module, the task

thread starts remapping procedure because

address reference of task call-back functions

of new module might be different from old

Run-time Software Upgrading Framework for Mission Critical Network Applications

729

one. The task tread relinks all the defined

static call-back functions of new user module.

6. The task manager restores reference of user

data by sending the reference through

remapped task call-back functions.

7. The task tread calls RESTART operation to

allow new module to start its application

functions. And the task thread starts to

dispatch queued event from event-queue.

Figure 3: Unload old module and load new module.

Figure 4: Re-mapping between main task and user

module.

3 IMPLEMENTATION RESULTS

3.1 Implementation Environment

As mentioned above, we implemented the main task

module as main process of program that is main part

of our run-time software upgrade framework. To

make main event loop that is used for processing

every event such as IPC message by network sockets,

timer expired event, signal triggered by kernel, and

so on, we utilized the well-known libevent software

library. The libevent provides a mechanism to

execute a call-back function when a specific event

occurs on a file descriptor including socket or after a

timeout has been reached, and it also support call-

back triggered by signals and regular timeouts. As

shown in Figure 5, the main task module forms

single threaded event loop by libevent and every

event from outside of dynamic component are

dispatched through event-dispatch function which

subsequently calls the event-receive functions such

as receive-message, receive-timer and receive-signal

function in dynamic module. In our implementation

framework, the main task module also has various

libraries API including IPC message, memory

handling, logging, and so on, therefore user who

wants make dynamic component only take care of

application logic itself in dynamic module.

Main Task (a.out)

Dynamic Module (lib.so)

Dynamic Component

Event

(Message, Timer, Signal)

Main Task Thread

LibEvent

(Main Loop)

Event

Queue

Library API

(IPC, Mem, Log, etc)

IPC

Module

Data

Reference

Library

API

Func_impl

Recv_Ipc()

Recv_evt()

Recv_sig()

Symbol Mapper

work

Figure 5: Implemented architecture of dynamic

component.

We realize the dynamic implementation module

as type of shared library because the shared library

are intended to be shared by executable modules and

loaded into memory at load time or run time rather

than being copied by a linker when it creates a single

monolithic executable program. The only way to get

into dynamic module is through implemented event-

receive function in dynamic module, then every

request or event for dynamic module is coming

though event loop in main task module. As

mentioned earlier, the main task module constructs

symbol mapping table called symbol mapper

between main task module and dynamic module

whenever it loads dynamic module. We have three

symbol mapping table in symbol mapper, the first

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

730

one is event-receive symbol mapping that is used for

connection while event-dispatch, second one is

library API symbol mapping and third one is

permanent module data reference mapping. For

dynamic linking between main task module and

dynamic implementation module, POSIX’s DL

(dynamic loaded) libraries such as dlopen, dlsym

and dlclose are used. The dlopen loads new data and

code of dynamic implementation module into a

component process’s address space, and the dlsym

provides a mechanism to locate functions in the

dynamic implementation module by name, and

dlclose unload old dynamic implementation module.

As a result, main task module and dynamic module

can be separately compiled and produce each binary

module. The main task module forms an executable

program and the dynamic module takes the form of a

shared library as shown in Figure 5.

3.2 Test Results

Test application is network router system that is one

of core element of internet network infrastructure. A

router is connected to two or more date lines from

different networks. When a data packet comes in on

one of the network lines, the router reads the address

information in the packet to determine the ultimate

destination. To decide the destination of packet, all

of router maintains its own routing table that is

created by routing protocol software such as RIP

protocol. In case of upgrading the routing protocol,

network operator should shut down the old routing

protocol and restart the new routing protocol, which

results in losing current routing table and network

connection is down until the new routing protocol

newly constructs new routing table. The objective of

the test is that routing protocol does not lose its

routing table after upgrading with our run-time

software upgrading framework.

Figure 6: Test network environment.

Figure 6 shows test network environment. There

are RIP router A, B, C and D, and two hosts Host-1

and Host-2. Four RIP routers provide a connection

path between two hosts. All of routers already make

their own routing table and connection check is done

by testing ping between Host-1 and Host-2.

Furthermore we continue to send ping messages to

Host-2 from Host-1 to confirm a connection path

during upgrade. The ping messages will be stopped

if network connection is down, which means RIP

protocol in route A loses its routing table during

upgrade. We have the CLI (Command Line Interface)

that is used to configure RIP protocol’s parameters

or show current status of the protocol. Table 1 shows

CLI commands and the results of status of the

protocol. The current module version of RIP is 0.0.1

and new version is 0.0.2 as shown in Table 1. We

Table 1: Test results.

CLI shows old RIP module (librip_0.0.1.so).

CLI to upgrade to version 0.0.2.

CLI shows new RIP module (librip_0.0.2.so).

ping result to Host-2 during upgrade.

dynamically upgrade RIP component from old

librip_0.0.1.so to new librip_0.0.2.so through CLI

command in RIP router ‘A’. We can see that Host-1

still receives ping response from Host-2, and

Run-time Software Upgrading Framework for Mission Critical Network Applications

731

connection path is preserved, which means that RIP

router ‘A’ still has its routing state consistency after

changing the dynamic module.

4 CONCLUSIONS

Service continuity is very important in mission

critical and safety software application such as

internet infrastructure, telecommunication, military

and medical applications, since these applications

above, it is unacceptable to shut-down and restart the

system during software upgrade. The purpose of

run-time software upgrade technique is to

dynamically upgrade the behaviour of a running

software system without the software outage and

service interruption. In this article, we present a run-

time software upgrading method based on software

architecture. We proposed the software framework

for dynamic software module architecture and run-

time module upgrading procedure. Also, we

implemented the proposed scheme and show results

of run-time upgrading via network router software.

In future work, we will focus on further enhanced

features such performance measurement and dealing

with the case that user data structure is changed in

run-time.

ACKNOWLEDGMENTS

This research was supported by the IT R&D program of

MSIP (Ministry of Science, ICT and Future Planning) /

IITP (Institute for Information & Communication

Technology Promotion). [R0101-16-0070, Development

of The High Availability Network Operating System for

Supporting Non-Stop Active Routing]. This research was

supported by the National Research Foundation of Korea

(NRF-2012R1A1A2044460).

REFERENCES

Jeff Magee and Jeff Kramer, “Dynamic Structure in

Software Architectures,” Fourth SIGSOFT

Symposium on the Foundations of Software

Engineering (FSE), pp. 3-14, San Francisco, October

1996.

Peyman Oreizy and Richard N. Taylor, “On the Role of

Software Architectures in Runtime System

Reconfiguration,” Proceedings of the International

Conference on Configurable Distributed Systems

(ICCDS 4), Annapolis, Maryland, May 1998.

Deepak Gupta and Pankaj Jalote, “Increasing System

Availability through On-Line Software Version

Change,” Proceedings of 1993 IEEE 23rd

International Symposium On Fault-Tolerant

Computing, pp. 30-35, August 1993.

Jonathan E. Cook, Jeffrey A. Dage, “Highly Reliable

Upgrading of Components”, IEEE/ACM International

Conference on Software Engineering (ICSE ’99),

pp.203-212, Los Angeles, CA. 1999.

Mike Gagliardi, Raj Rajkumar, and Lui Sha, “Designing

for Evolvability: Building Blocks for Evolvable Real-

Time Systems,” In Proceedings of the IEEE Real-time

Technology and Applications Symposium, pp. 100-

109, June 1996.

L. A. Tewksbury, Louise E. Moser, P. M. Melliar-Smith,

“Live Upgrades for CORBA Applications using object

replication,” IEEE International Conference on

Software Maintenance, pp488-497, Florence, Italy,

Nov. 2001.

Louise E. Moser, P. M. Melliar-Smith, P. Narasimhan,

L.A. Tewksbury, V. Kalogeraki, “Eternal: fault

tolerance and live upgrades for distributed object

systems,” Proceedings of IEEE information

Survivability Conference and Exposition (DISCEX

2000), Vol.2, pp184-196, 2000.

Michael Franz, “Dynamic Linking of Software

Components”, IEEE Computer, Vol. 30, No. 3, pp. 74-

81, March 1997.

W. Wilson Ho and Ronald A. Olsson, “An approach to

genuine dynamic linking”, Software-Pratice and

Experience, Vol. 21, No. 4, pp. 375-390, April, 1991.

Donn Seeley, “Shared Libraries as Objects,” USENIX

Summer Conference Proceedings, pp. 25-37, 1990.

S. Drossopoulou and S. Eisenbach, Manifestations of

Dynamic Linking, The First Workshop on

Unanticipated Software Evolution (USE 2002),

Málaga, Spain, June 2002.

L. Yu, G.C. Shoja, H.A. Muller, A. Srinivasan. "A

Framework for Live Software Upgrade", Software

Reliability Engineering, 2002, ISSRE 2003,

Proceedings. pp. 149 - 158.

H. Chen, J. Yu, R. Chen, B Zang. "POLUS: A POwerful

Live Updating System", 29th International Conference

on Software Engineering, May 2007, ICSE'07.

S. Fahmi, H. Choi, “Life Cycles for Component Based

Software Development”, IEEE 8th International

Conference on Computer and Information Technology

Workshops, July 2008, CIT Workshop 2008.

I. Neamtiu, M. Hicks, G. Stoyle, and M. Oriol, “Practical

dynamic software updating for C,” in Proc. ACM

SIGPLAN Conf. Program. Language Design

Implementation, 2006, pp. 72–83.

K. Makris, R. Bazzi, “Immediate multi-threaded dynamic

software updates using stack reconstruction”, Proc.

Conf. USENIX Annu. Tech. Conf., 2009, p. 31.

Kristis Makris, Rida A. Bazzi, Immediate multi-threaded

dynamic software updates using stack reconstruction,

Proceedings of the 2009 conference on USENIX

Annual technical conference, p.31-31, June 14-19,

2009, San Diego, California.

G. Chen, H. Jin, D. Zou, Z. Liang, B. B. Zhou, H. Wang.

"A Framework for Practical Dynamic Software

Updating", IEEE Transactions on Parallel and

Distributed Systems, April 1 2016, pp. 941-950.

ICEIS 2017 - 19th International Conference on Enterprise Information Systems

732