Study on a Fast OSPF Route Reconstruction Method

under Network Failures

Hiroki Doi

System Engineering Research Laboratory, Central Research Institute of Electric Power Industry,

Iwado kita 2-11-1, Komae-shi, Tokyo, Japan

Keywords:

OSPF, Router Dead Interval, Delay Time, Route, Designated Router.

Abstract:

The Great East Japan Earthquake occurred on March 11, 2011. Many Japanese people and Japanese companies

were damaged by the disaster. Also, network failures occurred over a wide area because many facilities of

commercial ISPs (Internet Service Providers) were damaged. Thus, there is a need to reexamine the disaster

estimation and reconstruct a robust network system against disasters in Japan. The network must have higher

reliability and fast recovery. Although OSPF (Open Shortest Path First) is used widely on networks, it has a

router dead interval problem. If a (backup) designated router has stopped operation due to failure, the other

OSPF routers miss the designated router and try to ﬁnd it by multiple hello packets. The OSPF routers await

a hello packet acknowledgment from the designated router for the router dead interval. After the router dead

interval, those routers can recognize that the designated router has ceased the operation. The router dead

interval is 40 seconds. This interval time is not only long for many real-time applications but also involves

huge buffering of data and a burst of trafﬁc after the router reconstruction. To avoid the router dead interval,

we propose a fast method of designated router detection by enhanced OSPF. In this report, we show how our

method reduces the route reconstruction time from 45 seconds to 10 or less on OSPF networks.

1 INTRODUCTION

In Japan, many Japanese people and Japanese com-

panies were damaged by the Great East Japan Earth-

quake. Following this disaster, Japanese commer-

cial ISPs and the government reexamined the plan

for disaster estimation and protection against disas-

ters. According to this protection plan, commercial

ISPs must reconstruct robust networks against disas-

ters. Networks require high reliability and fast re-

covery. One of the important problems for these re-

quirements is that of routing, since considerable time

is required to reroute paths on IP networks, when

multiple routers have ceased operation due to fail-

ures. To study this problem, we focus on OSPF (Open

Shortest Past First)(Moy, 1998b)(Moy, 1998a) behav-

ior, which is one of the major routing protocols used

worldwide, and presume a large company network,

namely a broadcast multi-access network with 400

OSPF routers.

OSPF works with 2 kinds of router, namely, the

Designated Router (DR) and its neighboring routers

(neighbors) on broadcast multi-access networks. An

adjacency should be formed with the DR and its neig-

hbor. The DR also has a list of all other routers at-

tached to the network. In this case, when the DR

has ceased the routing operation, neighbors attempt

to cast hello packets to the DR. If the DR does not

respond to 4 hello packets from a neighbor, a neigh-

bor detects DR failure and all neighbors start to elect

new DR among their own neighbors. The hello packet

interval is 10 seconds (Hello Interval, default value),

hence it takes 40 seconds (Router Dead Interval) for

neighbors to detect the DR failure. After the DR fail-

ure, it takes more than 40 seconds to reroute all paths

by original OSPF. General speaking, this time length

of communication failure is very long for many appli-

cations on networks. Thus, when the DR has ceased

the routing operation on OSPF networks by the net-

work failure, it takes long time to recover the network

operation.

There is a simple method to reduce Router Dead

Interval. We can set the value of the hello packet in-

terval under 10 seconds on an OSPF router. How-

ever, paper (Goyal, 2003) reports that any Hello In-

terval value less than 10 seconds leads to unaccept-

able number of false alarms, meaning neighbors mis-

takenly DR failure due to the successive discards of

Doi H..

Study on a Fast OSPF Route Reconstruction Method under Network Failures.

DOI: 10.5220/0004065800130022

In Proceedings of the International Conference on Data Communication Networking, e-Business and Optical Communication Systems (DCNET-2012),

pages 13-22

ISBN: 978-989-8565-23-5

 2012 SCITEPRESS (Science and Technology Publications, Lda.)

hello packets.

There are another methods to detect OSPF fail-

ures. When the links fail, OSPF multicasts LSA

(Link State Advertisement) packets. The Paper

(Yuichiro Hei and Hasegawa, 2007) proposed a

method of OSPF failure identiﬁcation based on LSA

ﬂooding analysis taking these aspects into account.

However, if the OSPF on a router ceases the oper-

ation or the Layer-2 (L2) link fails (in this case, net-

work topology contains L2-network ), the other OSPF

routers cannot detect this failure and send LSA pack-

ets. Thus, this proposed method cannot detect OSPF

failure in these cases by monitoring LSA packets and

avoid Router Dead Interval.

To avoid this Router Dead Interval, we propose

an enhanced OSPF with a new DR failure detec-

tion mechanism added without the hello packet. Our

method uses user IP packets to detect the DR failure

and monitors user IP packets from the DR. When the

DR has ceased the operation, it no longer sends user

IP packets. Our method can detect DR failure faster

than the original OSPF by monitoring the behavior of

those IP packets.

This paper is organized as follows. In Section

2, we ﬁrst indicate our objective for original OSPF.

In Section 3, we describe the mechanism of original

OSPF and its Router Dead interval problem and show

our proposed method to solve this problem. In Sec-

tion 4, we show the behavior examples of our pro-

posed method for several network facility failures. In

section 5, we evaluate path reroute processing time

of our proposed method and original OSPF in typi-

cal network model. Finally, in Section 6, the effect of

our proposal method is summarized and future works

mentioned.

2 OSPF BEHAVIOR FOR THE DR

FAILURE

OSPF can adapt to many network conﬁgurations,

peer-to-peer networks, point-to-multipoint networks,

broadcast multi-access networks and so on. We fo-

cus on the broadcast multi-access network, because

it is a major network conﬁguration of company pri-

vate networks. OSPF works with 2 kinds of OSPF

router, DR and neighbors on broadcast multi-access

networks. The router will attempt to form adjacen-

cies with some of its newly acquired neighbors. Link-

state databases are synchronized between pairs of ad-

jacent routers. On broadcast multi-access networks,

the DR determines which routers should become ad-

jacent. Adjacencies control the distribution of routing

information. Routing updates are only sent and re-

ceived on adjacencies, hence the DR plays an impor-

tant role in OSPF networks.

If the DR has ceased routing operation due to fail-

ure, neighbors cannot detect this failure immediately

and cannot receive new link-state information from

the DR. Under these circumstances, the OSPF cannot

reroute paths to avoid failing routers or links until the

successful detection of DR failure. Neighbors send

hello packets to the DR to conﬁrm such failure. Hello

interval is 10 seconds as the default value on an OSPF

router. If the DR does not respond to 4 hello pack-

ets from a neighbor, the neighbor detects DR failure,

meaning it takes 40 seconds is required for neighbors

to detect DR failure. This time interval is called the

Router Dead Interval.

Of course, the Hello Interval is one of the OSPF

parameters and there is a simple way for Hello Inter-

val to be set to under 10 seconds to reduce Router

Dead Interval. However, this is not feasible for com-

mercial ISPs. This method was analyzed by pa-

per (Goyal, 2003) by measuring ISPs topologies and

it was reported that any Hello Interval value un-

der 10 seconds led to an unacceptable number of

false alarms. Thus, we think that the Hello Interval

should remain 10 seconds and need to adapt a differ-

ent method.

There is also a backup DR in the general OSPF

network. When the DR has ceased operation, the

backup DR becomes the DR and a new backup DR

is elected among other neighbors. In this paper, we

assume that a DR and a backup DR have ceased the

operation due to simultaneous multiple failure.

3 ENHANCEMENT OSPF FOR

THE ROUTER DEAD

INTERVAL

3.1 Outline for Enhancement OSPF

Our objective is to avoid using the hello packet to re-

alize the faster path reroute mechanism. To achieve

this objective, we enhance the DR failure detection

mechanism part of OSPF.

We have 2 simple key ideas as follows for this en-

hancement

1. When a link or router fails, the ﬂow of IP packets

stops or changes immediately.

2. An IP packet which traverses the DR has a hello

function.

For key idea 1, if the DR fails, a neighbor does not

receive IP packets from the DR. Also, in the case of

DCNET2012-InternationalConferenceonDataCommunicationNetworking

Case 1 : OSPF failure on the DR

DRNeighbor

Case 2 : L2 link failure and OSPF failure on the DR

DRNeighbor

L2 switch

L2 link

DRNeighbor

L2 switch

L2 link

Figure 1: Typical OSPF network failures.

Fig. 1, if the DR or an L2-link fails, a neighbor does

not receive IP packets. In other words, a neighbor can

detect DR failure by monitoring IP packets from the

DR.

For key idea 2, we can substitute a user IP packet

for a hello packet to detect DR failure, because we can

use an IP header option within the private network and

the IP is at the same layer as the OSPF.

We show the outline of the new DR failure detec-

tion mechanism based on the ideas.

1. The user IP packet which traverses the DR is

marked on an option of the IP header.

2. The neighbor monitors the marked IP packets.

3. If the receiving rate of user IP packets on the DR is

less than the threshold value (R

), the DR sends

a marked dummy IP packet to its neighbor.

4. If the local time exceeds the threshold value (R

)

on an neighbor i, this neighbor casts missing mes-

sage packets to all neighbors.

5. If another neighbor j receives a missing message

packet, it monitors the arrival interval time of

marked IP packets. If the marked IP packet in-

terval time is under the threshold value (R

), this

neighbor sends an alive message packet.

6. If the neighbor i does not receives an alive mes-

sage packet, this neighbor detects the DR failure.

A new DR is elected among all neighbors and re-

constructs the new routing table.

Here, we presume the DR writes 1 as a mark in an

option of the IP packet header, which is sent from the

DR to a neighbor. When a neighbor receives a marked

IP packet, it writes 0 as an unmark in an option and

sends the user IP packet.

Next, we deﬁne the threshold value R. To calcu-

late R, we borrow the idea of the TCP timeout mech-

anism (Stevens, 1994).

TCP monitors all RTT (Round Trip Time) of TCP

packets at the TCP interfaces and calculates the av-

erage RTT and its deviation. The time out value is

the average RTT + 2×deviation (Jacobson, 1988). (In

1990, the paper (Jacobson, 1990) revised this equa-

tion, average RTT + 4×deviation. We select the for-

mer equation for the performance of our method.)

TCP decides on the packet loss event based on this

time out value and retransmits the packet.

Our proposed method decides the DR failure event

by comparing the threshold value R with the arrival

interval time of the marked IP packets. R is calculated

by the following equation

Err = M − A

A → A+ gErr

D → D+ h(|Err| − D)

R = A+ 2D

where M is the arrival interval time of the marked

IP packet (measurement value), A is the average of M,

g is the coefﬁcient 1/8, Err is the difference M and A,

h is the coefﬁcient 1/4, D is the mean deviation. The

value of coefﬁcients is equal to one of the original

TCP timeout mechanism.

3.2 Our Proposal Algorithm

We describe our proposed new DR failure detection

mechanism. We show the state transitions diagram of

DR and its neighbor in Fig. 2.

Neighbor Side.

1. Measurement.

The neighbor monitors the marked IP packets and

calculates M and R. If the local time exceeds R,

this state transits into state 2. If M is less than R,

there is no transition of state. If a missing message

packet is received, this state transits into state 3.

2. Missing.

The neighbor multicasts a missing message

packet to all OSPF routers. It corrects the R

of other neighbors i and calculates the maximum

value R

max

among R

If an alive message packet is

received by R

max

, the neighbor knows that the DR

is alive and there is path failure on an adjacency

path. This state transits into state 5 to reconstruct

adjacency with the DR. If an alive message packet

is not received by R

max

, the neighbor detects DR

failure and this state transits into state 6.

3. Conﬁrm R.

The neighbor i having received the missing mes-

sage packet conﬁrms R

and sends it to the sender

of the missing message packet, whereupon this

state transits into state 4.

4. Conﬁrmation.

If a marked IP packet is received by R

, an alive

message packet is multicast. Also, if an alive mes-

sage packet is received from the other neighbor,

this state transits into state 1.

StudyonaFastOSPFRouteReconstructionMethodunderNetworkFailures

DR side

Neighbor side

1. Measurement

2. Dummy Packet Generation

3. Conﬁrmation Acknowledge

Rec : “Missing message”

Send : “Alive message”

M>R

Send : “Dummy IP packet”

8. Full*

1. Measurement

2. Missing

3. Conﬁrmation threshold R

6. OspfRestart

Rec : “Missing message”

M > R

Rec : “Alive message”

Rmax>timer

5. OspfInit

Rec/Send : “Alive message”

4. Conﬁrmation

Send: “Threshold R”

8. Full*

1. Down*

6. Exchange*

* original OSPF state.

Figure 2: Proposed state transition diagram.

5. Ospf-Init.

In this state, the neighbor sends an LSA to the DR.

6. Ospf-Restart.

In this state, the neighbor detects DR failure and

multicasts an init message packet. The state of all

neighbors which receive an init message packet

transits into the down state of OSPF.

DR Side.

1. Measurement.

The DR marks a user IP packet and sends it to a

neighbor. Subsequently, the DR measures M and

calculate R

. If the DR does not receive a user

IP packet by R

, this state transits into 2. If the

DR receives a missing message packet, this state

transits into 3.

2. Dummy Packet Generation.

The DR generates a dummy marked IP packet and

sends it to a neighbor.

3. Conﬁrmation Acknowledgement.

The DR multicasts alive message packets and this

state transits into state 1.

3.3 Path Reroute Processing Time

In this section, we mention the path reroute process-

ing ﬂow of our proposed method for various network

facility failure. Various network facilities and OSPF

network conﬁguration patterns exist. We assume a

DR, neighbor, L2 switch and link to comprise the

main network facilities for simplicity and show the

path reroute processing ﬂow of our proposed method

for failure of those facilities in Fig. 3.

The Path reconstruction process is the original

OSPF process, SPF calculation, SPF Delay and LSA

processing and so on, but this process is used by our

failure occured

Path reconstruction

DR election

failure

non-failure

failure

OSPF router

(exclude DR)

non-failure

DR failure detection

Figure 3: Processing ﬂow for network facilities failure.

proposed method. The DR election includes hello

processing.

The Fig. 3 shows that there are 3 cases of process-

ing ﬂow, namely, (1) Path reconstruction, (2) Path re-

construction + DR failure detection and (3) Path re-

construction + DR failure detection + DR election.

But there are only 2 processing time cases (2) and (3)

for the failure of those facilities to evaluate our pro-

posed method. We will evaluate the case (2) in section

5.2 and the case (3) in section 5.1.

DCNET2012-InternationalConferenceonDataCommunicationNetworking

Marked IP packet

DRNeighbor

Router A

Router A sends missing packets

R < local time at router A

Other routers send Ri

NeighborNeighbor

Missing

Calculating Rmax

Init

After Rmax, router A sends init packets

The state of each router transitions into Down state.

Marked IP packet lost

Figure 4: Example 1: the DR failure.

4 EXAMPLES OF

ENHANCEMENT OF OSPF

BEHAVIOR

In this section, we show some examples of working

mechanisms of our proposed method in the event of

failure of various network facilities.

4.1 Example 1: The DR Failure

We assume that the DR is connected to a neighbor,

whereupon the DR has ceased operation due to OSPF

function failure but not link failure. For the original

OSPF, Router Dead Interval occurs in this case. We

explain our method with Fig. 4 in this case.

1. In a stable state, router A receives marked IP

packets from the DR. Each router calculates R

and R.

2. The OSPF function on the DR stops due to failure,

but the link state is ready.

3. Router A cannot receive a marked IP packet by R

and multicasts missing message packets.

4. The other routers multicast their R. Router A cal-

culates R

max

5. The other routers cannot receive a marked IP

packet from the DR by R and does not send an

alive message packet. Router A cannot receive an

alive message packet by R

max

and multicast init

message packets. Subsequently, the state of all

routers transits into the down state of OSPF.

4.2 Example 2: L2 Link Failure

In this case, we assume that there is a L2 link between

router A and the DR. When an L2 link fails, neither

router A nor the DR can detect it. Hence, Router Dead

Interval occurs in the case of the original OSPF. We

explain our method with Fig. 5 in this case.

1. In this stable state, router A receives marked IP

packets from the DR. Each router calculates R

and R.

2. The L2 link fails, but OSPF routers and other links

are ready.

3. Router A cannot receive a marked IP packet by R

and multicasts missing message packets. The DR

sends marked IP packets to router A and cannot

detect the failure on an L2 link.

4. The other routers receive a missing message

packet from router A and multicast R.

5. The other routers receive marked IP packets from

the DR and multicast alive message packets.

6. Router A receives an alive message packets and

sends LSA to the DR.

4.3 Example 3: Few User IP Packets

In this example, there is no network failure. However,

few user IP packets traverse the DR. The detection

time of our proposed method depends on the aver-

age packet arrival interval time. If the amount of user

IP packets declines further, the packet arrival interval

time increases to an ever greater extent, and hence the

detection time of our proposed method follows suit.

StudyonaFastOSPFRouteReconstructionMethodunderNetworkFailures

Marked IP packet

DRNeighbor

Router A

Router A sends

missing packets

R < local time at router A

Neighbors send R packets

NeighborNeighbor

Missing

Alive

Router A calculates Rmax

Router A receives alive packets

L2 switch

Marked IP packet

Router A sends LSA to the DR

LSA

Figure 5: Example 2: L2 link failure.

Marked IP packet

DRNeighbor

Router A

After threshold R time,

Router A receives a marked (dummy) IP packet.

the DR generates a marked dummy IP packet.

Silent priod

(There is no user (marked) IP packet)

Figure 6: Example 3: few user IP packets.

We conﬁrm the mechanism of our proposal in this sit-

uation with Fig. 6.

1. In this stable state, router A receives marked IP

packets from the DR. Each router calculates R

and R.

2. The user applications temporarily stop communi-

cations.

3. When the DR does not receive a user IP packet

by R, it generates a marked dummy IP packet and

sends it to the router A.

4. Router A receives a marked dummy IP packet and

can conﬁrm that the DR is alive.

4.4 Example 4: Loss of Message Packets

In this example, we assume that some of the marked

IP packets, missing message packets and alive mes-

sage packets are lost. We conﬁrm the mechanism of

our proposal in this situation with Fig. 7.

1. In a stable state, router A receives marked IP

packets from the DR. Each router calculates R

and R.

2. Marked IP packets are lost due to some failures.

3. Router A cannot receive a marked IP packet by R

and multicasts missing message packets. How-

ever, we assume that certain missing message

packets are lost due to some failures.

4. Some neighbors receive missing message packets

and send R to router A. Here, we also assume that

some of those missing message packets are lost.

However router A can receive R from some neigh-

bors, because there are many neighbors and we as-

sume that some of their packets can reach router

A. Router A calculates R

max

and awaits an alive

message packet.

5. Some neighbors can multicast alive message

packets, because the DR is alive, some of which

can be received by router A Subsequently, router

A sends LSA to the DR.

5 EVALUATION OF THE PATH

REROUTING TIME

In the previoussection 3.3, we explainedthat there are

2 cases of the path reroute processing time of our pro-

posed method for network facility failure. We evalu-

ate the path reroute processing time for our proposed

method in those 2 cases.

We show the network conﬁguration in Fig. 8

as the typical network model. There are 2 types of

network, a backbone network and many local net-

DCNET2012-InternationalConferenceonDataCommunicationNetworking

Marked IP packet

DRNeighbor

Router A

Router A sends missing packets

R < local time at router A

NeighborNeighbor

Missing

Alive

Router A receives alive packets

and reconstruct adjacencies.

Neighbors send R packets

Router A calculates Rmax

Marked IP packet lost

Missing packet lost

Figure 7: Example 4: message packets lost.

Area 0

Router A

Router B

Backbone network

local

network

local

network

local

network

Backup DR

Figure 8: Evaluation network model.

works. All local networks are connected to a back-

bone network. OSPF manages the network area. The

backbone is area 0 and local networks are area i

(i = 1, 2, . . . , N) on typical OSFP networks. But we

set only area 0 on all networks for simplicity. Be-

cause we focus on the effect of our proposal method

on the path reroute processing time. If the OSPF net-

works have many areas, the path reroute processing

time needs to include path information propagation

time from a area to the other area.

We assume that each local and backbone net-

work has a DR, a backup DR and 18 OSPF neighbor

routers. In this network conﬁguration, we evaluate

the processing time for path rerouting from router A

to router B. We assume that backup DR and DR fail

at the same time in this evaluation.

Next, we set the evaluation parameters. The pa-

per (Goyal, 2003) lists different standards and vendor

introduced delays that affect the OSPF operation in

networks of popular commercial routers. We show

those delays which are used in our evaluation in table

Also, the DR failure detection time of our pro-

posal methods depends on the arrival interval time of

user IP packets. In this evaluation, we set the follow-

ing constant arrival interval time of user IP packets on

each link for simplicity.

• Arrival interval: 1, 0.5, 0.1 seconds.

5.1 Case 1: DR Failure

Initially, we evaluate the path reroute processing time

for both our proposed method and the original OSPF

in the case of DR failure on the backbone network as

a typical case.

In the case of the original OSPF, new DR and

StudyonaFastOSPFRouteReconstructionMethodunderNetworkFailures

Table 1: Various delays affecting the operation of OSPF

protocol(Goyal, 2003)(CISCO Systems, 2007).

Name Processing time and description

Hello Interval The time delay between succes-

sive Hello packets. Usually 10

seconds.

Router Dead

Interval

The time delay since the last

Hello before a neighbor is de-

clared to be down. Usually 4

times the Hello Interval.

SPF Delay The delay between the short-

est path calculation and the ﬁrst

topology change that triggered

the calculation. Used to avoid fre-

quent shortest path calculations.

Usually 5 seconds.

SPF calcula-

tion delay

0.00000247× x

+ 0.000978 sec

(Cisco 3600 series)

Route install

delay

The delay between shortest path

calculation and update of for-

warding table. Observed to be 0.2

seconds.

LSA process-

ing delay

<0.001 sec

Hello pro-

cessing delay

<0.001 sec

∗

*In (CISCO Systems, 2007), CISCO Systems, Inc. showed

the OSPF processing log with time stamp. The time

resolution of this log is 0.001 seconds and we can see that

hello processing delay is less than 0.001 seconds. Thus, we

set that hello processing delay is less than 0.001 seconds.

backup DR are elected among neighbors after Router

Dead Interval, whereupon OSPF routers reconstruct

the path table.

In the case of our proposed method, new DR and

backup DR are elected without Router Dead Interval

by a new failure detection mechanism using marked

IP packets.

We sum up the overall processing delay time of

the path rerouting according to the original OSPF al-

gorithm and our proposed method. The Fig 9 shows

the path reroute processing time for the original OSPF

and proposed method. When the number of OSPF

routers increases, so does the SPF calculation de-

lay. However, this increase is minor in terms of to-

tal processing delay. The Fig 10 shows the details

of processing time in the case of 400 routers. The

major contribution to path reroute processing time is

SPF delay and Router Dead Interval. Thus, we can

say that our proposed method reduces this processing

time very effectively, because it avoids Router Dead

Interval.

Also, if the arrival interval time of the marked IP

0 100 200 300 400

Number of OSPF routers

Path reroute processing time (sec)

Original OSPF

Proposal method (arrival interval time of IP Packet : 1 sec)

Proposal method (arrival interval time of IP Packet : 0.5 sec)

Proposal method (arrival interval time of IP Packet : 0.1 sec)

Figure 9: Path reroute processing time for case 1.

original OSPF Proposal method

Router Dead

Interval

LSA processing time (0.001sec)

Hello processing time (0.02sec)

SPF Calculation,

SPF Delay

LSA processing time (0.001sec)

Hello processing time (0.4sec)

DR failure

detection time

(0.603sec)

Path reroute processing time (sec)

SPF Calculation,

SPF Delay

Figure 10: The details of path reroute processing time for

case 1. (Number of OSPF routers is 400).

packets exceeds 0.1 seconds, our proposed method

can send dummy marked IP packets every 0.1 sec-

onds. In this case, the bandwidth consumed is

5.12kbps (The size of a dummy packet is 64 bytes).

This bandwidth consumption can be considered neg-

ligible.

5.2 Case 2: Marked Packet Loss

In this case, we assume that certain marked IP pack-

ets, missing message packets and alive message pack-

ets are lost in the network. This case is similar to ex-

ample 4 in section 4.4.

Both the DR and backup DR are operating nor-

mally. However, the original OSPF and proposed

method determine that the DR and backup DR have

stopped the OSPF operation, because hello packets

and marked IP packets are lost.

In the case of the original OSPF, both the DR

DCNET2012-InternationalConferenceonDataCommunicationNetworking

0 100 200 300 400

Number of OSPF routers

Path reroute processing time (sec)

Original OSPF

Proposal method (arrival interval time of IP Packet : 1 sec)

Proposal method (arrival interval time of IP Packet : 0.5 sec)

Proposal method (arrival interval time of IP Packet : 0.1 sec)

Figure 11: Path reroute processing time for case 2.

original OSPF Proposal method

Router Dead

Interval

LSA processing time (0.001sec)

Hello processing time (0.02sec)

SPF Calculation,

SPF Delay

LSA processing time (0.001sec)

DR failure

detection time

(0.603sec)

SPF Calculation,

SPF Delay

Path reroute processing time (sec)

Figure 12: The details of path reroute processing time for

case 2. (Number of OSPF routers is 400).

and backup DR are elected among OSPF routers af-

ter Router Dead Interval and the path table is recon-

structed.

In the case of the proposed method, some neigh-

bors cannot detect either the DR or backup DR. How-

ever, there are many OSPF routers (neighbors) and all

routers monitoring the marked IP packets. We can-

not assume that all marked IP packets are lost. Thus,

neighbors can receive some marked IP packets and

multicast alive message packets. Also, we assume

that some alive message packets can reach neighbors,

if some alive message packets are lost. Neighbors

which receive alive message packets send the LSA

packets to the DR and reconstruct the routing table.

In the case of the proposed method, the DR election

process is omitted, because the neighbor can conﬁrm

that the DR is alive.

The Fig 11 shows the results of the path reroute

processing time for the original OSPF and proposed

method in this case and Fig 12 shows the detail of

results. We conﬁrm that our proposed method can

reduce the path reroute processing time, because it

avoids Router Dead Interval.

6 RELATED WORK

There have been several approches and proposals for

the network failure detection method on OSPF net-

works. OSPF has the complex processing algorithms

and many factors of processing delay to recover the

link failure. There are mainly 2 kinds of delay type.

One is compute part, such as generation of routing

and forwarding tables, processing hello packets or

link state packets (LSP) and so on. The other is wait

or time out part, such as SPF hold delay, Router Dead

Interval and so on. The main cause of former type is

CPU load. But the newest OSPF routers are equipped

high performance CPU and this case should be ne-

glected(Goyal, 2003). The latter comes from OSPF

algorithms and parameters. Thus, OSPF algorithms

and parameters should be modiﬁed to achieve the fast

failure recovery. First, the simple way is that the value

of wait timer is reduced. In paper (Basu and Riecke,

2001), authors analyzed the effect of Hello Interval

parameter reduction and reported 275ms to be an op-

timal value for providing fast failure detection while

not resulting in too many route ﬂaps due to frequent

timeouts. However, this paper did not consider the

network congestion and topology characteristics.

The paper (Goyal, 2003) examined the Hello In-

terval considered the network congestion and topol-

ogy characteristics. The authors claimed that the op-

timal value for Hello Interval is strongly inﬂuenced

by the expected congestion levels and the number of

links in the topology. The simulation results indicated

that Hello Interval under 10 seconds leads to increase

the frequency of false alarms which are generated if

the Hello message gets queued behind a huge burst of

LSAs and can not be processed in time. Although the

false alarms can be suppressed by the RED mecha-

nism which can suppress the network congestion, it is

difﬁcult to set the suitable parameters of RED mecha-

nism for the network trafﬁc characteristics in general.

The Paper (Yuichiro Hei and Hasegawa, 2007)

proposed a method of OSPF failure identiﬁcation

based on LSA ﬂooding analysis taking these aspects

into account. This aproach works suitable on OSPF

networks. Also, the paper (Nelakuditi et al., 2007)

proposed the failure insensitive routing (FIR). This

proposal method is proactive routing approach and

computes interface - speciﬁc forwarding and back-

warding tables for link failures. When this method de-

StudyonaFastOSPFRouteReconstructionMethodunderNetworkFailures

tects link failures, it can avoid link failures and reroute

effectively. However, if the OSPF on a router ceases

the operation or the L2 link failures (in this case, net-

work topology contains L2-network), these proposed

method cannot detect those failures and avoid Router

Dead Interval.

7 CONCLUSIONS

We proposed a fast DR failure detection mechanism

for OSPF to reroute paths when the DR has ceased

operation. The original OSPF uses hello packets to

detect DR failure, but it takes Router Dead Interval.

Our new DR failure detection mechanism substitutes

user IP packets for the hello packets to avoid Router

Dead Interval.

Our proposed method involves the 2 processing

procedures for network facility failures. We evaluated

it in each case on the typical OSPF network models

and results showed that our proposed method can re-

duce the path reroute processing time, due to avoiding

Router Dead Interval. Our proposed method is very

effective in rerouting paths when the DR and backup

DR fails.

In this paper, we showed the results by the calcu-

lating the sum of processing the time according to the

original algorithms and the proposed method. We will

install our proposed method on a test OSPF router and

evaluate the performance in the event of network fail-

ure.

REFERENCES

Basu, A. and Riecke, J. (2001). Stability issues in ospf rout-

ing. In Proceedings of the 2001 conference on Ap-

plications, technologies, architectures, and protocols

for computer communications, SIGCOMM ’01, pages

225–236, New York, NY, USA. ACM.

CISCO Systems, I. (2007). Troubleshooting the routing

protocols : Rst-3901. In Cisco Networkers.

Goyal, M. (2003). Achieving faster failure detection in ospf

networks. In in Proceedings of the International Con-

ference on Communications (ICC), pages 296–300.

Jacobson, V. (1988). Congestion avoidance and control.

Computer Communication Review, 18(4):314–329.

Jacobson, V. (1990). Berkeley tcp evolution from 4.3-tahoe

to 4.3-reno. In in Proceedings of the Eighteenth Inter-

net Engineering Task Force, page 365.

Moy, J. T. (1998a). OSPF: Anatomy of an Internet Routing

Protocol. Addison-Wesley Professional.

Moy, J. T. (1998b). Ospf version 2. Request For Com-

ments (Standard) RFC 2328, Internet Engineering

Task Force.

Nelakuditi, S., Lee, S., Yu, Y., li Zhang, Z., nee Chuah, C.,

and Member, S. (2007). Fast local rerouting for han-

dling transient link failures. IEEE/ACM Trans. Net-

working, 15:359–372.

Stevens, W. R. (1994). TCP/IP Illustrated, Volume 1 : Pro-

tocols. Addison Wesley Longman.

Yuichiro Hei, Tomohiko Ogishi, S. A. and Hasegawa, T.

(2007). Ospf failure identiﬁcation based on lsa ﬂood-

ing analysis. In 10th IFIP/IEEE International Sympo-

sium on Integrated Network Management (IM), 2007,

pages 717–720.

DCNET2012-InternationalConferenceonDataCommunicationNetworking