PARALLELIZATION OF A MATHEMATICAL MODEL

TO EVALUATE A CCA APPLICATION FOR VANETS

Roc´ıo Murcia-Hern´andez, Carolina Garc´ıa-Costa, Juan Bautista Tom´as-Gabarr´on,

Esteban Egea-L´opez and Joan Garc´ıa-Haro

Department of Information and Communications Technologies, Technical University of Cartagena (UPCT)

Plaza del Hospital 1, Cartagena, Spain

Keywords:

OpenMP, VANET, Supercomputing, Cooperative/Chain collision avoidance application.

Abstract:

Vehicular Ad-hoc Networks (VANET) are currently becoming not only an extremely important factor for

vehicles engineering development but also a key issue for improving road safety. Cooperative/Chain Collision

Avoidance (CCA) application comes up as a solution for decreasing accidents on the road, therefore it is

highly convenient to study how the system of vehicles in a platoon will behave at different stages of technology

deployment until full penetration in the market. We have developed an analytical model to compute the average

number of accidents in a platoon of vehicles. However, due to the model structure, when the CCA technology

penetration rate is taken into account, the increase in the number of operations of the analytical model is such

that the sequential computation of a numerical solution is no longer feasible. In this paper, with the goal in

mind of reducing computation time, we show how we have implemented and parallelized our analytical model

so as a solution can be achieved, what is conducted using the OpenMP parallelization techniques under a

supercomputing shared memory environment.

1 INTRODUCTION

Vehicular networks, also known as VANETs, are de-

ﬁned as ad-hoc mobile networks with two main com-

munication features. On the one hand, VANETs are

in charge of transmitting information among vehi-

cles (V2V communications). In this ﬁrst case, cars

carry out the information interchange without any in-

frastructure support for regulating the access. On

the other hand, an intercommunication among vehi-

cles and infrastructures also exists (V2I communica-

tions), making possible a connection through cars and

a backbone network, reaching in this way those vehic-

ular entities allocated out of the direct communication

range.

One of the aims of vehicular networks develop-

ment is the improvement of road safety. The main

goal of these innovative systems is to provide drivers

a better knowledge about road conditions, decreas-

ing the number of accidents and their severity, and si-

multaneously aiding to a more comfortable and ﬂuent

driving. Other vehicular applications are also consid-

ered, such as Internet access, driving cooperation and

public information services support.

A Cooperative/Chain Collision Avoidance (CCA)

application (Tomas-Gabarron et al., 2010) uses

VANET communications for warning drivers and de-

creasing the number of trafﬁc accidents. CCA takes

advantage of vehicles with cooperative communica-

tion skills, in a way that these cars are able to react to

possible accident risks or emergence situations. The

CCA mechanism generates an encapsulated notiﬁca-

tion which is sent as a message through a one-hop

communication scheme to all vehicles within a po-

tential danger coverage (relay schemes are also pos-

sible). It should be noted that the establishment of

this VANET application will be deployed gradually,

equipping vehicles with the proper hardware and soft-

ware so as they can communicate in an effective way

within the vehicular environment.

In our research we consider a platoon (or chain)

of N vehicles following a leading one, where each ve-

hicle C

, i ∈ 1. .. N, moves at constant velocity. The

leading vehicle, C

, stops instantly and the following

vehicles start to brake when they are aware of the risk

of collision, because of a warning message reception

or the perception of a reduction in the speed of the ve-

hicle immediately ahead. To test the worst case situ-

Murcia-Hernández R., García-Costa C., Bautista Tomás-Gabarrón J., Egea-López E. and García-Haro J..

PARALLELIZATION OF A MATHEMATICAL MODEL TO EVALUATE A CCA APPLICATION FOR VANETS.

DOI: 10.5220/0003570000050013

In Proceedings of 1st International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH-2011), pages

5-13

ISBN: 978-989-8425-78-2

 2011 SCITEPRESS (Science and Technology Publications, Lda.)

ation, vehicles cannot change lane or perform evasive

maneuvers.

We have developed a ﬁrst approach mathemati-

cal model to calculate the average percentage of ac-

cidents in the platoon, varying the number of consid-

ered vehicles, their average speed, the average inter-

vehicle spacing and the penetration ratio of the CCA

technology. Speciﬁcally when the CCA penetration

ratio is taken into account, the growth in the num-

ber of operations of the analytical model is such that

the sequential computation of a numerical solution is

no longer feasible. Consequently, we resort to the

use of parallelization techniques such as OpenMP for

solving those computational cases considered as un-

approachable by means of sequential procedures.

Additionally, we execute our programs in the Ben-

Arabi Supercomputingenvironment(FPCMur, 2011),

taking the advantage of utilizing the fourth fastest Su-

percomputer in Spain.

In the current work we show how the paralleliza-

tion techniques coordinated with supercomputing re-

sources make the simulation process a more suitable

and efﬁcient one, therefore we succeed to evaluate the

CCA application thoroughly.

The remainder of this paper is organized as fol-

lows. In Section 2 we brieﬂy review the related work.

In Section 3 the OpenMP environment is brieﬂy re-

viewed and the Ben-Arabi Supercomputer architec-

ture introduced. A description of the mathematical

model, its implementation and parallelization are pro-

vided in Sections 4 and 5. Finally, some results are

shown and discussed in Section 6 to illustrate the per-

formance of the resulting parallel algorithm. Conclu-

sions and future work are remarked in Section 7.

2 RELATED WORK

So far, most typical high performance computing

(HPC) problems focused either on those ﬁelds related

with the Grand Challenges deﬁned as fundamental

problems in science and engineering or directed to-

ward Web search databases (Barney, 2010). That is

the reason why we consider our VANET mathemati-

cal model approximationas a non-classical issue to be

solved under HPC conditions, contributing to extend

the use of supercomputing to other ﬁelds of interest.

In the implementation of our mathematical model

we parallelize a sparse matrix-vector multiplication.

This operation is considered as a relevant compu-

tational kernel in scientiﬁc applications, which per-

forms not optimally on modern processors because of

the lack of compromise between memory and com-

puting power and irregular memory access patterns

(Liu et al., 2009). In general, we ﬁnd quite a lot of

done work in the ﬁeld of sparse matrix-vector mul-

tiplications using parallelization techniques (Kotake-

mori et al., 2008; Goumas et al., 2009; Williams et al.,

2009). These papers study in depth the optimal per-

formance of this operation, but in this paper, we show

that even using a simpler parallelization routine, the

computation time is noticeably shortened.

Several mathematical models have been devel-

oped to study different aspects of VANETs. Most

of them are related with the vehicle routing optimiza-

tion (Ning et al., 2009; Wisitpongphan et al., 2007),

the broadcasting methods (Du et al., 2009; Fasolo

et al., 2006; Li et al., 2010), the mobility of vehicles

(Harri et al., 2009; Djenouri et al., 2008) and the com-

munication delay time (Abboud and Zhuang, 2009;

Fukuyama, 2009; Prasanth et al., 2009). Other re-

lated VANET issues have been studied as well, like

network connectivity (Khabazian and Mehmet Ali,

2008), or survivability (Xie and Xiao, 2008). In this

paper we focus on collision models for a chain of ve-

hicles, particularly those based on physical parame-

ters to assess the collision process itself (Glimm and

Fenton, 1980; Touran and Brackstone, 1999; Kim and

Jeong, 2010).

However in an attempt of searching related work

we ﬁnd that few work has been done speciﬁcally re-

garding to the parallelization of these VANET math-

ematical models, strictly speaking. Moreover, to the

best of our knowledge, only the vehicle routing prob-

lem has been approached using parallelization tech-

niques (Cook and Rich, 1999; Ghiani and Guerriero,

2003; Bouthillier and Crainic, 2005).

Therefore and summing up, in this paper we de-

scribe a preliminary model (although computation-

ally expensive) for a CCA application to compute the

number of chain collisions and to address the bene-

ﬁts of using parallelization techniques in the VANET

arena.

3 SUPPORTING TOOLS

3.1 The OpenMP Technique

OpenMP is a well-known open standard for providing

parallelization mechanisms to multiprocessors with

shared memory (Chandra et al., 2001). OpenMP

API supports shared memory programming, multi-

platform techniques for the programming languages

like Fortran, C and C++, and for every architecture

including Unix and Windows platforms. OpenMP is

a scalable and portable model developed for hardware

and software distributors which provides shared me-

SIMULTECH 2011 - 1st International Conference on Simulation and Modeling Methodologies, Technologies and

Applications

mory programmers with a simple and ﬂexible inter-

face for parallel applications developing which can

run not only in a personal computer but also in a su-

percomputer.

OpenMP uses the parallel paradigm known as

fork-join with the generation of multiple threads,

where a heavy computational task is divided into k

threads (forks) with less weight and afterwards it col-

lects their results and combines them at the end of the

execution in a single result (join).

The master thread runs sequentially till it ﬁnds an

OpenMP guideline and since this moment a bifurca-

tion is generated with the correspondingslave threads.

These threads can be distributed and executed in dif-

ferent processors, decreasing this way the execution

time.

3.2 The Ben-Arabi Supercomputer

Our model is executed under the Ben-Arabi super-

computer resources, which is placed in the Scientiﬁc

Park of Murcia (Spain). The Ben-Arabi system con-

sists of two different architectures; on the one hand

the central node HP Integrity Superdome SX2000

with 128 cores of the Intel Itanium-2 dual-core Mont-

vale (1.6 Ghz, 18 MB of cache L3) processor and 1.5

TB of shared memory, called Ben. On the other hand,

Arabi is a cluster consisting of 102 nodes, which of-

fers a total of 816 Intel Xeon Quad-Core E5450 (3

GHz y 6 MB of cache L2) processor cores and a total

of 1072 GB of shared memory.

We run our mathematical model within a node of

the Arabi cluster environment using 2, 4 and 8 pro-

cessors respectively to compare the different execu-

tion times results. Taking into account that we are

not allowed to combine the usage of processors from

different nodes, since we are using a shared memory

parallelization technique.

Next we summarize the cluster technical features:

• Capacity: 9.72 Tﬂops.

• Processor: Intel Xeon Quad-Core E5450.

• Nodes number: 102.

• Processors number: 816.

• Processors/Node: 8.

• Memory/Node: 32 nodes of 16 GB and 70 of 8

GB.

• Memory/Core: 3 MB (6 MB shared among 2

cores).

• Clock frequency: 3 Ghz.

1− p

0,2

1− p

0,N

N,0

N−1,1

1,N−1

2,0

1,1

1− p

1,0

0,1

1− p

0,0

Figure 1: Probability tree that deﬁnes the model. S

i, j

rep-

resents the state with i collided vehicles and j successfully

stopped vehicles.

4 MODEL DESCRIPTION

We are interested in evaluating the performance of

a CCA application for a chain of N vehicles when

the technology penetration rate is not 100%. Vehi-

cles drive in convoy, reacting to the ﬁrst collision of

another car according to two possible schemes: start-

ing to brake because of a previously received warning

message transmited by a collided vehicle (if the vehi-

cle is equipped with CCA technology) or starting to

decelerate after noticing a reduction in the speed of

the vehicle immediately ahead (if the vehicle under

consideration is not equipped with CCA technology).

With this model the ﬁnal outcome of a vehicle

depends on the outcome of the preceding vehicles.

Therefore, the collision model is based on the con-

struction of the following probability tree. We con-

sider an initial state in which no vehicle has collided.

Once the danger of collision has been detected, the

ﬁrst vehicle in the chain C

(immediately after the

leading one) may collide or stop successfully. From

both of these states two possible cases spring as well,

that is either the following vehicle in the chainC

may

collide or stop successfully. And so on until the last

vehicle in the chain. At the last level of the tree we

have N +1 possible outcomes (ﬁnal outcomes) which

represent the number of collided vehicles in the chain,

that is, from 0 to N collisions (Figure 1).

The transition probability between the nodes of

the tree is the probability of collision of the corre-

sponding vehicle in the chain p

(or its complemen-

tary). These probabilities are calculated recursively,

regarding different kinematic parameters, as the aver-

PARALLELIZATION OF A MATHEMATICAL MODEL TO EVALUATE A CCA APPLICATION FOR VANETS

age velocity of the vehicles in the chain (used to com-

pute the distance to stop), the average inter-vehicle

distance and the driver’s reaction time, among others.

The exact details of this calculation are crucial for our

analytical model but they are out of the scope of this

paper. Let us remark that it is a recursive operation,

that is p

= f(p

i−1

), and must be done sequentially.

Let us note how every path in the tree from the

root to the leaves leads to a possible outcome involv-

ing every vehicle in the chain. The probability of a

particular path is the product of the transition proba-

bilities that belongs to the path. Since there are mul-

tiple paths that lead to the same ﬁnal outcome (leaf

node in the tree), the probability of that outcome will

be the sum of the probabilities of every path reaching

it.

In order to compute the probabilities of the ﬁnal

outcomes, we can construct a Markov chain whose

state diagram is shown in Figure 1 and is based on the

previously discussed probability tree. It is a homoge-

neous Markov chain with

(N+1)(N+2)

states,

0,0

, S

1,0

, S

0,1

,. .. , S

N,0

, S

N−1,1

,. .. , S

1,N−1

, S

0,N

(1)

The transition matrix P of the resulting Markov chain

is a square matrix of dimension

(N+1)(N+2)

, which is a

sparse matrix, since from each state it is only possible

to move to two of the other subsequent states.

Then, we need to compute the probabilities of

going from the initial state to each of the N + 1 ﬁ-

nal states in N steps, which are given by matrix P

Therefore, the ﬁnal outcome probabilities are the last

N + 1 entries of the ﬁrst row of the matrix P

Let Π

be the probability of reaching the ﬁnal out-

come with i collided vehicles, that is, state S

i,N−i

. We

obtain the average of the total number of accidents in

the chain using the weighted sum:

acc

∑

i=0

i· Π

. (2)

Our purpose is to evaluate the functionality of the

CCA system depending on the current penetration

rate of this technology. So that, we have to solve the

model assuming different technology penetration ra-

tios. This assumption implies that we have to calcu-

late the number of collisions once for each of the pos-

sible combinations in the chain of vehicles equipped

with and without CCA technology, that is,





(N − m)! m!

, (3)

where N is the total number of vehicles in the chain

and m is the number of vehicles equipped with the

CCA technology. It is worth to notice that the number

of combinations for m vehicles set with CCA technol-

ogy and N − m without it is the same that for N − m

vehicles with CCA and m without it. Therefore, in

order to analyze the computation time, we solve the

model varying the CCA penetration rate between 0%

and 50%, since the rest of cases are computationally

(but not numerically) identical. As we can see in Ta-

ble 1, the number of combinations grows quickly by

an increase on the CCA penetration rate as well as by

an increase on the number of vehicles.

Table 1: Number of combinations of N = {10,20,30} vehi-

cles with and without CCA technology.

CCA% 10 veh. 20 veh. 30 veh.

0% 1 1 1

10% 10 190 4060

20% 45 4845 593775

30% 120 38760 14307150

40% 210 125970 86493225

50% 252 184756 155117520

In addition to that, we also aim at evaluating

the impact on the number of accidents of the inter-

vehicular distance, varying this parameter (dist) in a

wide range.

5 IMPLEMENTATION

In this section we ﬁrstly introduce the algorithm for

the model implementation (Algorithm 1) and then, we

explain the method we have used to parallelize it.

Algorithm 1: Computation of the number of collisions in a

chain of vehicles.

for all comb in Combinations do

for all dist in RangeOfDistances do

for i = 1 to N do

= f(p

i−1

,comb,dist,i, veloc,reactTime)

end for

for j = 0 to N do

= P

(1,

(N+1)(N+2)

− j)

end for

acc

∑

j=0

j · Π

end for

Examining the algorithm we can make the follow-

ing observations:

1. The iterations of the for loop that covers the num-

ber of Combinations resulting from the CCA tech-

nology penetration rate are independent for each

SIMULTECH 2011 - 1st International Conference on Simulation and Modeling Methodologies, Technologies and

Applications

other, so they can be executed in parallel by dif-

ferent threads.

2. The same occurs with the for loop that covers the

RangeOfDistances (for the inter-vehicular spac-

ing) to be evaluated.

3. Since the collision probabilities of the vehicles in

the platoon is computed recursively, each iteration

of the for loop that considers each vehicle in the

chain needs the results of the preceding iteration,

so this loop should be executed sequentially.

4. To obtain the ﬁrst row of matrix P

we have

to multiply a

(N+1)(N+2)

dimension vector by a

(N+1)(N+2)

matrix N times. The

vector-matrix multiplication can be also paral-

lelized so that each thread executes the multipli-

cation of the vector by part of the matrix columns.

However, the N multiplications should be done

one after the other, that is, sequentially.

For the sake of clarity, we will parallelize the fol-

lowing tasks:

• A: Vector-Matrix multiplication.

• B: Average inter-vehicular distance variation.

• C: Technology penetration rate variation.

Next, we will combine the different parallelized

tasks (see Table 2) and execute the resulting programs

in order to assess the actual improvement obtained

from each one.

Table 2: Resulting programs with different parallelized

tasks. X means that the corresponding parallelization takes

place.

Program A B C

Program 1

Program 2 ×

Program 3 ×

Program 4 ×

Program 5 × ×

Program 6 × ×

Program 7 × ×

Program 8 × × ×

6 RESULTS

In this section we execute the programs as discussed

in the previous section (shown in Table 2) in a node

of the Arabi cluster using 2, 4 and 8 processors in

order to assess the improvementon the execution time

achieved by each one.

The parameters used to execute the model are the

following:

• CCA penetration rate: 0%− 50%, in 10% steps.

• Average inter-vehicular distance: 6 − 70 m, in 1

meter steps.

• Number of vehicles: 20 vehicles.

• Average velocity: 33 m/s.

• Average driver’s reaction time: 1 s.

6.1 Execution with 2 Processors

The computation times resulting from the execution

of the eight programs with the selected penetration

rates of CCA technology using 2 processors are gath-

ered in Table 3 and illustrated in Figure 2.

Figure 2: Execution times in minutes for each program us-

ing 2 processors.

Now we focus on the results associated to the 50%

CCA penetration rate, since for this value we obtain

the highest number of combinations, speciﬁcally for

a chain of 20 vehicles we obtain a total of 184756

combinations. Therefore,it is for this particular pen-

etration rate case when we obtain a higher execution

time and it can be considered as the critical case in

terms of the solving time.

The sequential program (Program 1) lasts a total

of 297.975 minutes, that is approximately 5 hours of

computation. If we make a comparison among the

parallelized programs we conclude that the best result

is given by the Program 7, with a computation time

of 156.433 minutes, what implies around 2.6 hours of

calculation time. It is worth mentioning that Program

7 is built by a combination of the parallelized tasks B

and C, parallelizing the for loops that cover the range

of average inter-vehicular distances and the number

of combinations resulting from the technology pene-

tration rate respectively. We obtain thus:

PARALLELIZATION OF A MATHEMATICAL MODEL TO EVALUATE A CCA APPLICATION FOR VANETS

Table 3: Execution times in minutes and speedup (SU) for each program using 2 processors.

0% 10% 20% 30% 40% 50%

Time SU Time SU Time SU Time SU Time SU Time SU

P1 0.002 1.00 0.307 1.00 7.848 1.00 62.876 1.00 203.896 1.00 297.975 1.00

P2 0.002 1.00 0.247 1.24 6.694 1.17 40.693 1.54 125.658 1.62 188.396 1.58

P3 0.001 2.00 0.175 1.75 4.315 1.82 33.858 1.86 110.142 1.85 159.558 1.87

P4 0.003 0.67 0.147 2.09 3.655 2.15 29.323 2.14 95.671 2.13 157.483 1.90

P5 0.001 2.00 0.173 1.77 4.326 1.81 34.208 1.84 108.542 1.88 161.026 1.85

P6 0.004 0.50 0.167 1.84 4.227 1.86 33.009 1.90 107.534 1.90 156.688 1.90

P7 0.002 1.00 0.167 1.84 4.176 1.88 32.771 1.92 106.119 1.92 156.433 1.90

P8 0.002 1.00 0.168 1.83 4.226 1.86 32.962 1.91 107.422 1.90 158.509 1.88

• Sequential time (P1): 297.975 minutes.

• Parallel time (P7): 156.433 minutes.

The achieved speedup (P1/P7) is 1.9, which im-

plies an improvement of around 47.5% referred to the

execution time.

6.2 Execution with 4 Processors

The computation times resulting from the execution

of the eight programs with the selected penetration

rates of CCA technology using 4 processors are pre-

sented in Table 4 and depicted in Figure 3.

Figure 3: Execution times in minutes for each program us-

ing 4 processors.

When the CCA penetration rate equals the 50%

we reach the highest computational load. So we also

analyze the results with this penetration rate using 4

processors, focusing on the best and worst execution

times achieved. The reference is still the sequential

Program 1 with a duration of 297.93 minutes (around

5 hours). If we make a comparison among the par-

allelized programs we conclude that the best result is

given again by the Program 7 with a calculation time

of 85.988 minutes (around 1.43 hours). We obtain

thus:

• Sequential time (P1): 297.93 minutes.

• Parallel time (P7): 85.988 minutes.

The achieved speedup is 3.46, which implies an

improvement of around 71.1% referred to the execu-

tion time.

6.3 Execution with 8 Processors

The computation times resulting from the execution

of the eight programs with the selected penetration

rates of CCA technology using 8 processors are gath-

ered in Table 5 and illustrated in Figure 4.

Figure 4: Execution times in minutes for each program us-

ing 8 processors.

Finally we analyze what happens if we use 8 pro-

cessors to solve the problem. Once more, we obtain

for the parallelized Program 7 the least computation

time, 50.402 minutes with a 50% CCA penetration

rate. So if we compare this result with the execution

time of the sequential program we obtain an improve-

ment of the 83%, that is, a speedup factor of 5.89.

SIMULTECH 2011 - 1st International Conference on Simulation and Modeling Methodologies, Technologies and

Applications

Table 4: Execution times in minutes and speedup for each program using 4 processors.

0% 10% 20% 30% 40% 50%

Time SU Time SU Time SU Time SU Time SU Time SU

P1 0.002 1.00 0.308 1.00 7.838 1.00 62.653 1.00 203.757 1.00 297.930 1.00

P2 0.001 2.00 0.199 1.55 5.053 1.55 30.676 2.04 94.173 2.16 135.907 2.19

P3 0.001 2.00 0.098 3.14 2.473 3.17 19.488 3.21 59.724 3.41 95.360 3.12

P4 0.004 0.50 0.078 3.95 1.998 3.92 16.072 3.90 51.830 3.93 86.175 3.45

P5 0.002 1.00 0.101 3.05 2.494 3.14 19.933 3.14 63.464 3.21 95.158 3.13

P6 0.005 0.40 0.091 3.38 2.251 3.48 18.013 3.48 59.810 3.40 89.064 3.34

P7 0.004 0.50 0.089 3.46 2.232 3.51 17.754 3.53 57.699 3.53 85.988 3.46

P8 0.003 0.67 0.090 3.42 2.245 3.49 17.926 3.49 59.453 3.43 88.422 3.37

Table 5: Execution times in minutes and speedup for each program using 8 processors.

0% 10% 20% 30% 40% 50%

Time SU Time SU Time SU Time SU Time SU Time SU

P1 0.002 1.00 0.308 1.00 7.844 1.00 62.695 1.00 203.416 1.00 296.691 1.00

P2 0.003 0.67 0.193 1.59 4.610 1.70 26.578 2.36 78.644 2.58 117.415 2.53

P3 0.001 2.00 0.067 4.60 1.767 4.44 13.634 4.60 45.213 4.50 62.572 4.74

P4 0.008 0.25 0.047 6.55 1.155 6.79 9.310 6.73 32.142 6.33 54.165 5.48

P5 0.002 1.00 0.071 4.34 1.739 4.51 15.125 4.14 45.858 4.43 62.572 4.74

P6 0.005 0.40 0.055 5.60 1.258 6.23 10.158 6.17 35.275 5.76 54.006 5.49

P7 0.008 0.25 0.054 5.70 1.232 6.37 10.041 6.24 34.800 5.84 50.402 5.89

P8 0.007 0.28 0.051 6.04 1.248 6.28 10.143 6.18 35.376 5.75 53.031 5.59

Figure 5: Execution times for Program 7 with 50% of CCA

penetration rate using 2, 4 and 8 processors.

6.4 Results Discussion

In conclusion, on the one hand, we have achieved

an improvement of 83% in the computation time of

the most complex case, what can be considered as a

pretty much outstanding improvement. On the other

hand, if we compare the best execution times be-

tween the two technical extremes under study, that is

the use of 2 or 8 processors belonging to the shared

nodes architecture in the Arabi cluster, we reach to an

improvement of 67.78%, which implies an upwards

trend with increasing the number of processors, as

expected. Moreover, we can observe that those pro-

grams including the parallelization of task C, which

implies an acceleration on the loop varying the CCA

penetration rate, are the fastest ones. Nevertheless,

the results obtained from Program 2 show that the

improvement achieved parallelizing only the vector-

matrix multiplication (task A) is already signiﬁcant,

reaching 60.4% using 8 processors.

Analyzing the speedup for programs 7 and 8 it

surprises that P7, with two paralellized tasks, wins

P8 including one more task. But this is a common

fact in parallel computing due to load balancing and

synchronization overhead (OpenMP, 2011). This ex-

plains also that all programs including parallelized

task C have similar execution times, since this is

the heaviest computational task and outshines the im-

provement derivedfrom the A and B tasks paralleliza-

tion.

Let us compare now the obtained results for the

Program 7, the one with the best execution times,

centering on the 50% CCA penetration rate, since as

we already mentioned, this is the heaviest option in

terms of computational load. These results are de-

picted in Figure 5. We ﬁnd out an inverse relationship

between computation time and the number of proces-

sors in use, since when we duplicate the number of

PARALLELIZATION OF A MATHEMATICAL MODEL TO EVALUATE A CCA APPLICATION FOR VANETS

processors the execution time of Program 7 is reduced

almost to a half. Speciﬁcally, the speedup achieved

passing from 2 to 4 processors is 1.82, and from 4 to

8 processors, 1.7. However, this speedup is limited

according to Amdahl’s law (Amdahl, 1967). We have

calculated for each program the theoretical speedup

obtained from this law, as depicted in Figure 6.

Amdahl’s law states that if α is the proportion of

a program that can be made parallel then the maxi-

mum speedup, SU, that can be achieved by using n

processors is:

SU =

(1− α) +

. (4)

We can estimate α by using the measured speedup

SU on a speciﬁc number of processors sn as follows:

estimated

− 1

. (5)

The results show that for Program 2 the speedup

obtained with 8 processors is almost the limit for it,

but the speedup for Program 7 can still grow up to 20,

which implies reducing the executiontime to less than

15 minutes. So, as future work, we will try to execute

our programs with a higher amount of processors in

order to evaluate the system involving a more number

of vehicles.

1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384

Number of Processors

Speedup

P3,P5

P4,P6

Figure 6: Theoretical speedup limits calculated from Am-

dahl’s law.

7 CONCLUSIONS AND

OUTLOOK

Thanks to OpenMP parallelization techniques run-

ning under a supercomputing shared memory envi-

ronment we succeded to evaluate the perfomance of

a CCA application at different stages of technology

deployment. To conclude, we were able to solve a

program with an execution time of 297.975 minutes

in only 50.402 minutes.

As future work, on the one hand we will try to ex-

ecute our programs with a higher amount of proces-

sors in order to evaluate the system involving a bigger

number of vehicles. On the other hand, we aim to

deepen and improve our analytical model and study

further, within this model, the use of parallelization in

a supercomputing platform. We are also facing simi-

lar tasks to improve the efﬁciency of the VANET sim-

ulation environments we are using in order to validate

our mathematical analyses.

ACKNOWLEDGEMENTS

This research has been supported by the

MICINN/FEDER project grant TEC2010-21405-

C02-02/TCM (CALM) and Fundaci´on S´eneca RM

grant 00002/CS/08 FORMA. It is also developed in

the framework of “Programa de Ayudas a Grupos de

Excelencia de la Regi´on de Murcia, de la Fundaci´on

S´eneca, Agencia de Ciencia y Tecnolog´ıa de la RM”.

J. Garcia-Haro acknowledges personal grant PR2009-

0337 (MICINN/SEU/DGU Programa Nacional de

Movilidad de Recursos Humanos de Investigaci´on).

J. B. Tomas-Gabarr´on thanks the Spanish MICINN

for a FPU (REF AP2008-02244) pre-doctoral fellow-

ship. C. Garcia-Costa acknowledges the Fundaci´on

Seneca for a FPI (REF 12347/FPI/09) pre-doctoral

fellowship.Roc´ıo Murcia-Hern´andez acknowledges

the personal grant she is enjoying belonging to the

project FORMA and specially the Supercomputating

center of the Scientiﬁc Park Foundation in Murcia

(Spain).

REFERENCES

Abboud, K. and Zhuang, W. (2009). Modeling and anal-

ysis for emergency messaging delay in vehicular ad

hoc networks. In IEEE Global Telecommunications

Conference.

Amdahl, G. M. (1967). Validity of the single processor ap-

proach to achieving large scale computing capabili-

ties. In Proceedings of the spring joint computer con-

ference.

Barney, B. (2010). Introduction to parallel computing.

https://computing.llnl.gov/tutorials/parallel comp/.

Bouthillier, A. L. and Crainic, T. G. (2005). A cooperative

parallel meta-heuristic for the vehicle routing problem

with time windows. In Computers & Operations Re-

search.

SIMULTECH 2011 - 1st International Conference on Simulation and Modeling Methodologies, Technologies and

Applications

Chandra, R., Dagum, L., and et al. (2001). Parallel Pro-

gramming in OpenMP. Morgan Kaufmann Publish-

ers.

Cook, W. and Rich, J. L. (1999). A parallel cutting-plane

algorithm for the vehicle routing problem with time

windows. In Computational and Applied Mathematics

Rice University.

Djenouri, D., Nekka, E., and et al. (2008). Simulation of

mobility models in vehicular ad hoc networks. In Pro-

ceedings of the 2008 Ambi-Sys workshop on Software

Organisation and MonIToring of Ambient Systems.

Du, S., Liu, F. Q., and et al. (2009). On-demand power

control algorithm based on vehicle ad hoc network. In

Jisuanji Gongcheng/ Computer Engineering.

Fasolo, E., Zanella, A., and et al. (2006). An effective

broadcast scheme for alert message propagation in ve-

hicular ad hoc networks. In IEEE International Con-

ference on Communications.

FPCMur (2011). Scientiﬁc park in murcia, ofﬁcial web-

page. http://www.cesmu.es.

Fukuyama, J. (2009). A delay time analysis for multi-hop

v2v communications over a linear vanet. In IEEE Ve-

hicular Networking Conference.

Ghiani, G. and Guerriero, F. e. a. (2003). Real-time vehi-

cle routing: Solution concepts, algorithms and parallel

computing strategies. In European Journal of Opera-

tional Research.

Glimm, J. and Fenton, R. E. (1980). An accident-severity

analysis for a uniform-spacing headway policy. In

IEEE Transactions on Vehicular Technology.

Goumas, G., Kourtis, K., and et al. (2009). Performance

evaluation of the sparse matrix-vector multiplication

on modern architectures. In The Journal of Supercom-

puting.

Harri, J., Filali, F., and et al. (2009). Mobility models for

vehicular ad hoc networks: a survey and taxonomy. In

IEEE Communications Surveys & Tutorials.

Khabazian, M. and Mehmet Ali, M. K. (2008). A perfor-

mance modeling of connectivity in vehicular ad hoc

networks. In IEEE Transactions on Vehicular Tech-

nology.

Kim, T. and Jeong, H. Y. (2010). Crash probability and

error rates for head-on collisions based on stochastic

analyses. In IEEE Transactions on Intelligent Trans-

portation Systems.

Kotakemori, H., Hasegawa, H., and et al. (2008). Perfor-

mance evaluation of parallel sparse matrixvector prod-

ucts on sgi altix3700. In OpenMP Shared Memory

Parallel Programming. Lecture Notes in Computer

Science.

Li, L. J., Liu, H. F., and et al. (2010). Broadcasting methods

in vehicular ad hoc networks. In Ruan Jian Xue Bao

(Journal of Software).

Liu, S., Zhang, Y., and et al. (2009). Performance evaluation

of multithreaded sparse matrix-vector multiplication

using openmp. In IEEE International Conference on

High Performance Computing and Communications.

Ning, Z., Jung, Y., and et al. (2009). Route optimization for

gpsr in vanet. In IEEE International Advance Com-

puting Conference.

OpenMP (2011). Openmp, ofﬁcial webpage.

http://openmp.org.

Prasanth, K., Duraiswamy, K., and et al. (2009). Minimiz-

ing end-to-end delay in vehicular ad hoc network us-

ing edge node based greedy routing. In First Interna-

tional Conference on Advanced Computing.

Tomas-Gabarron, J. B., Egea-Lopez, E., and et al. (2010).

Performance evaluation of a cca application for vanets

using ieee 802.11p. In IEEE International Conference

on Communications Workshops (ICC).

Touran, A. and Brackstone, M. A. e. a. (1999). A collision

model for safety evaluation of autonomous intelligent

cruise control. In Accident Analysis and Prevention.

Williams, S., Oliker, L., and et al. (2009). Optimization of

sparse matrixvector multiplication on emerging mul-

ticore platforms. In Parallel Computing.

Wisitpongphan, N., Bai, F., and et al. (2007). On the routing

problem in disconnected vehicular ad-hoc networks.

In IEEE International Conference on Computer Com-

munications.

Xie, B. and Xiao, X. Q. e. a. (2008). Survivability model for

vehicular ad-hoc network based on markov chain. In

Jisuanji Yingyong/ Journal of Computer Applications.

PARALLELIZATION OF A MATHEMATICAL MODEL TO EVALUATE A CCA APPLICATION FOR VANETS