Tsunami and Storm Surge Simulation Using Low Power Architectures

Concept and Evaluation

Dominik Schoenwetter

, Alexander Ditter

, Bruno Kleinert

, Arne Hendricks

, Vadym Aizinger

Harald Koestler

and Dietmar Fey

Chair of Computer Science 3 (Computer Architecture), Friedrich-Alexander-University Erlangen-N

urnberg (FAU),

Martensstr. 3, 91058 Erlangen, Germany

Chair of Applied Mathematics (AM1), Friedrich-Alexander-University Erlangen-N

urnberg (FAU),

Cauerstr. 11, 91058 Erlangen, Germany

Chair of Computer Science 10 (System Simulation), Friedrich-Alexander-University Erlangen-N

urnberg (FAU),

Cauerstr. 11, 91058 Erlangen, Germany

Keywords:

Multiscale Simulation, Environmental Modeling, Crisis Modeling and Simulation.

Abstract:

Performing a tsunami or storm surge simulation in real time is a highly challenging research topic that calls

for a collaboration between mathematicians and computer scientists. One must combine mathematical models

with numerical methods and rely on computational performance and code parallelization to produce accurate

simulation results as fast as possible. The traditional modeling approaches require a lot of computing power

and signiﬁcant amounts of electrical energy; they are also highly dependent on uninterrupted access to a reli-

able power supply. This paper presents a concept how to develop suitable low power hardware architectures

for tsunami and storm surge simulations based on cooperative software and hardware simulation. The main

goal is to be able – if necessary – to perform simulations in-situ and battery-powered. For ﬂood warning sys-

tems installed in regions with weak or unreliable power and computing infrastructure, this would signiﬁcantly

decrease the risk of failure at the most critical moments.

1 INTRODUCTION

Accurately predicting ﬂoods in endangered coastal re-

gions precipitated by catastrophic geophysical events

such as earthquakes, landslides, hurricanes, etc. re-

quires accurate numerical simulations that utilize

two- or three-dimensional regional or global ocean

models. The computational resources needed to run

these models at a sufﬁcient spatial resolution and in

real time, i.e., with a hard upper bound for the allo-

cated time to produce a usable solution, greatly ex-

ceeds the capabilities of conventional work stations.

This currently precludes the use of accurate simula-

tion software in tsunami and ﬂood warning systems

installed in areas with weak or unreliable compu-

tational, communication, and power infrastructure –

which is the case in many of the affected regions.

Currently, a number of less sophisticated ap-

proaches are being offered for installation in ﬂood

warning systems. These solutions are relying on one

of the following techniques: (i) simulation at a coarser

grid resolution, (ii) using simpler or less accurate nu-

merical methods, (iii) searching in a database of pre-

computed scenarios, or (iv) running the simulation re-

motely (e.g., in a cloud). All these alternatives in-

crease the risk of the ﬂood prediction being either too

late or too inaccurate incurring a potentially high cost

in terms of human lives and property damage.

In this paper, we present a concept for an afford-

able, reliable, and energy-efﬁcient ﬂood simulation

system designed to mitigate the aforementioned prob-

lems of current systems. We analyze the requirements

for such a system in terms of performance, power ef-

ﬁciency, and reliability with the ultimate goal of de-

signing a combined hardware/software-system capa-

ble of carrying out a ﬂood simulation in parallel in

an entirely battery-powered manner using low power

compute units.

In order to speed up the development of such a

system, we rely on its simulation as a ﬁrst step. This

simulation must not only cover the functional proper-

ties, but also provide an estimate of the system’s en-

ergy consumptions and other relevant non-functional

requirements. Even though our algorithm is highly

377

Schoenwetter D., Ditter A., Kleinert B., Hendricks A., Aizinger V., Koestler H. and Fey D..

Tsunami and Storm Surge Simulation Using Low Power Architectures - Concept and Evaluation.

DOI: 10.5220/0005566603770382

In Proceedings of the 5th International Conference on Simulation and Modeling Methodologies, Technologies and Applications (SIMULTECH-2015),

pages 377-382

ISBN: 978-989-758-120-5

 2015 SCITEPRESS (Science and Technology Publications, Lda.)

scalable (Aizinger et al., 2013), we do not consider

the use of either dedicated GPU-based compute units,

nor GPGPUs, as we currently do not have any reliable

power models or simulations tools for them. In addi-

tion, we want to focus our study on processor archi-

tectures available in the open-source form, which also

precludes usage of commercially available GPUs.

Our simulation relies on 2D/3D shallow-water

solver UTBEST/UTBEST3D that is based on the dis-

continuous Galerkin ﬁnite element method (Aizinger

and Dawson, 2002; Dawson and Aizinger, 2005). We

decided on this type of numerical algorithm for a

number of reasons: It runs on unstructured meshes

which allows to utilize computational grids with opti-

mal spatial resolution in the areas of interest; the code

has shown excellent parallel scalability; the imple-

mentation posesses adaptive reﬁnement capabilities,

thus capable of automatically increasing the mesh

resolution in the critical locations not known in ad-

vance; the user can choose between different approx-

imation orders on the same mesh providing a simple

way to get the optimal accuracy for a given perfor-

mance/energy cost.

The rest of the paper is organized as follows: Sec-

tion 2 provides an overview of the state-of-the-art in

the use of low power architectures for high perfor-

mance computing and energy optimization. Section 3

presents our concept for the simulation based eval-

uation of suitable low power architectures. A pre-

liminary evaluation of available multi- and many-core

hardware simulation environments and their suitabil-

ity for our purposes is given in section 4. The pa-

per concludes with a summary and outlook of future

work.

2 RELATED WORK

There is a signiﬁcant body of research in the ﬁeld

of utilizing low power architectures for high perfor-

mance computing (HPC) and in the optimization of

energy efﬁciency for HPC applications.

Rajovic et al. investigated the usage of low power

ARM

architectures and SoCs (System on Chips) as

means to reduce the cost of high performance com-

puting (Rajovic et al., 2013a). They conclude that low

power ARM-based SoCs have promising characteris-

tics for high performance computing.

In 2013, Goeddeke et al. presented a paper on

energy-to-solution comparisons between different ar-

chitectures for different classes of numerical methods

for partial different equations (G

oddeke et al., 2013).

http://www.arm.com/

They showed that energy to solution and energy per

time step improvements up to a factor of three are

possible when using the ARM-based Tibidabo cluster

(Rajovic et al., 2014), instead of an x86-based clus-

ter. The x86 cluster used for the reference measure-

ments was the Nehalem sub-cluster of the LiDOng

machine provided by TU Dortmund (ITMC TU Dort-

mund, 2015).

A study comparing the performance as well as

the energy consumption of different low power and

general-purpose architectures was published by Cas-

tro et al. (Castro et al., 2013). Based on the Traveling-

Salesman problem (Applegate et al., 2011), they in-

vestigated time to solution and energy to solution for

an Intel Xeon E5-4640 Sandy Bridge-EP, a low power

Kalray MPPA-256 many-core processor (KALRAY

Corporation, 2015), as well as for a low power

CARMA board from SECO (NVIDIA Corporation,

2015a). The results show, that the CARMA board and

the MPPA-256 many-core processor achieve better re-

sults than the Xeon 5 measured in terms of energy to

solution. Concerning the time to solution, the Xeon 5

performed better than the CARMA board but not as

good as the low power MPPA-256 many-core proces-

sor.

A work considering low power processors and ac-

celerators in terms of energy aware high performance

computing was published in 2013 (Rajovic et al.,

2013b). There, a number of different HPC micro-

benchmarks was used to determine the energy to so-

lution. The architectures evaluated were NVIDIA

Tegra 2 (NVIDIA Corporation, 2015b) and Tegra 3

(NVIDIA Corporation, 2015c) SoCs. The results

show that drastic energy to solution improvements are

possible on the newer Tegra 3 SoC in comparison to

the Tegra 2 SoC (reduction of 67% on average). Fur-

thermore, the authors conclude that the usage of in-

tegrated GPUs in low power architectures, such as

Tegra 2 and Tegra 3, can improve the overall energy

efﬁciency.

3 CONCEPT

Our goal is to develop an integrated hard-

ware/software system that can satisfy (i) the

functional requirements, i.e., computational perfor-

mance, accuracy, and efﬁciency as well as (ii) the

non-functional requirements, such as the energy

efﬁciency and cost effectiveness. The project require-

ments are formulated in this slightly unusual vein,

where the non-functional requirements are given

comparable importance to the functional ones. This

is caused by the fact that the operating environment

SIMULTECH2015-5thInternationalConferenceonSimulationandModelingMethodologies,Technologiesand

Applications

378

of the system sets a number of rather rigid constraints

for the entire solution.

The ﬂow of events in this setting provides the

most important boundary condition. As soon as re-

mote sensors provide a warning and supply data for

a geophysical event (e.g., an earthquake) capable of

causing ﬂooding, the time remaining until the landfall

of the wave is clearly deﬁned. The available infor-

mation then needs to be processed as fast as possi-

ble in order to predict the magnitude of the ﬂood and

identify the affected areas. In addition, from this mo-

ment on, the availability of uninterrupted power sup-

ply is critical in order to complete the ﬂood simula-

tion. One has to note here, that this hard upper bound

for the time-to-solution can strongly vary on a case-

by-case basis depending on a variety of factors (dis-

tance from shore, wave speed, area and topography of

affected regions, etc.). If one also includes other non-

functional requirements such as the size and speed of

the available computational hardware, power source

(external or battery), this clearly motivates the need

of a highly adaptive ﬂood modeling software in the

warning system. Many of the regions at risk from

such catastrophic events cannot boast of either good

power infrastructure or reliable communication net-

works. Thus the minimum set of requirements must

include the following two: (i) the hardware platform

must be able – if such need arises – to complete the

simulation on battery power (ii) the computation must

be carried out locally.

Since the algorithm can be executed in highly par-

allel manner, we ﬁll focus on parallel computer archi-

tectures. As mentioned in the introduction, GPGPU

computing is not in the scope of our work. We

will simulate the different functional as well as non-

functional properties of the computation of the ﬂood

simulation on different virtual hardware platforms

ﬁrst. This enables us to produce a preliminary deci-

sion on which hardware to use, thus saving resources

and time during the development phase. With re-

gard to the non-functional properties of the system,

we need to know the total computation time and to

obtain an estimate of the energy consumption for the

entire run.

In order to obtain optimal results for our simula-

tions, we need to limit the number of possible com-

puter architectures. E.g., though the Intel x86 archi-

tecture is known to deliver high computation perfor-

mance, this architecture is not in our scope due to its

high power consumption when running high compu-

tation loads. Yet, along with the improvements of

smart phones, the ARM CPU platform has strongly

evolved as well and turned into a high performance

and low power consumption architecture. It enables

today’s devices to run on battery power for a long time

under relatively high computational loads.

Another possible architecture is the well-known

sparc-v8 architecture. Here we choose the LEON pro-

cessor (an open source sparc processor developed by

AEROFLEX Gaisler), as there have been successful

investigations on estimation of non-functional proper-

ties for it. Furthermore, the sparc-v8 powered LEON

processor is well suited to our purposes due to its fault

tolerance, high conﬁgurability, and a relatively cost-

effective licensing. Therefore we concentrate our in-

vestigations on the promising ARM platform as well

as the LEON processor family.

4 PRELIMINARY EVALUATION

OF SIMULATION

ENVIRONMENTS

We intend to focus on building heterogenous as well

as homogenous low power multi- and many-core ar-

chitectures. Therefore, virtual environments that en-

able the simulation of those systems are of interest for

us and have to be evaluated with respect to their suit-

ability for our purposes. There are three further key

aspects that are of importance for our choice: Simula-

tion performance, the capability of power estimation

and modeling, and last but not least, the availability

of low power processor models. The next paragraphs

brieﬂy describe available multi- and many-core sim-

ulation environments and discuss their suitability for

our purposes; Table 1 gives an overview of chief pa-

rameters of each simulation environment relevant in

the context of our project.

An open-source multi- and many-core simulator

is Graphite (Miller et al., 2010); it offers the possibil-

ity to simulate hundreds or even thousands of cores.

Graphite is not a complete cycle-accurate simulator,

it uses different techniques to provide accurate per-

formance results. The simulation environment offers

processors, a memory subsystem, cache models as

well as a network for realizing interconnections. All

these models use further analytical timing models to

guarantee accurate results. Two processor models are

supported, iocoom (in-order core model with out-of-

order memory completion) and simple (in-order core

model that adds all latencies). Power modeling for

the processor, the caches, and the network (Kurian

et al., 2014) is also supported. However, the focus of

Graphite is not on embedded systems and low power

architectures.

Based on the Graphite simulation infrastructure

Carlson et al. (Carlson et al., 2011) developed Sniper.

TsunamiandStormSurgeSimulationUsingLowPowerArchitectures-ConceptandEvaluation

379

Table 1: Comparison of different simulation environments for our purposes.

Simulation

Environment

Multi- and Many-Core

Simulations Possible

Simulation

Speed

Power Estimation

Possible

Availability of Low

Power Processor Models

Graphite yes mid to high yes none

Sniper yes high yes none

SoCLib yes slow to mid yes mid

HORNET yes slow to mid yes very low

gem5 yes slow to mid yes mid

QEMU yes very high no low

OVP yes very high yes high

Speed of instruction accurate simulation is used as the reference value and thus for classiﬁcation of simulation speed

Sniper enhances Graphite by an interval simulation

approach, a more detailed timing model, and operat-

ing system modeling improvements. It allows faster

and more precise simulations for exploring homoge-

neous and heterogeneous multi- and many-core archi-

tectures than Graphite. Sniper supports power model-

ing and estimation using McPAT (Li et al., 2013) and

custom DRAM power models (Heirman et al., 2012).

Unfortunately, Sniper is an x86-tool only.

An environment that focuses on virtual prototyp-

ing of multi-processor system on chips (MP-SoC) is

SoCLib (SocLib Project, 2015). SoCLib provides a

wide range of processor and peripheral models, for

example MIPS32 and ARM. Furthermore, the usage

of real-time operating systems like eCos is supported.

This environment enables simulations at cycle- and

bit-accurate level. Since all models are written in Sys-

temC (Accellera Systems Initiative, 2015), the ability

to simulate at transaction level is provided as well.

To supply the power and energy estimations, Atital-

lah et al. (Atitallah et al., 2006) developed energy

models for different hardware components that can be

used in conjunction with SoCLib. In terms of sim-

ulation speed, the cycle accurate simulation level is

very slow in comparison to the instruction accurate

level: Weaver and McKee (Weaver and McKee, 2008)

showed that discrepancies of hours up to days are pos-

sible. As a consequence, cycle accurate simulations

are not an option for the simulation of large many-

core systems in the context of our project.

A cycle-level multi- and many-core simulator for

Network on Chip (NoC) architectures is HORNET

(Lis et al., 2011). The simulator provides a variety of

memory hierarchies, interconnect geometries as well

as accurate power modeling. HORNET can operate in

full multi-core mode, i.e., using a built-in MIPS core

simulator in addition to the network model. Unfor-

tunately, HORNET only offers one single-cycle in-

order MIPS core. For increasing simulation perfor-

mance, a loose synchronization mechanism is sup-

ported. As a result of loose synchronization, accuracy

of performance measurements suffers.

The gem5 simulation environment (Binkert et al.,

2011) combines the beneﬁts of the M5 (Binkert et al.,

2006) and the GEMS ((GEMS Development Team,

2015)) environments. M5 is a conﬁgurable simula-

tion environment offering multiple ISAs (instruction

set architectures) as well as various CPU models. The

CPUs can be conﬁgured to operate on different levels

of detail and accuracy. In combination with GEMS,

gem5 provides a detailed and ﬂexible memory system

as well as interconnection models. A wide range of

instruction set architectures (e.g. x86, ALPHA, ARM

or MIPS) is supported by gem5. For a short time,

power modeling and estimation for low power ARM

architectures is also possible (Endo et al., 2015). This

simulation environment is not designed to be pure in-

struction accurate and targets low power architectures

only partially.

QEMU (Bellard, 2005) is an emulator and virtual

machine (VM) for the Intel x86 architecture, that can

also emulate and virtualize a variety of systems of

differing architectures. When used as an emulator,

QEMU operates on an instruction accurate level. Typ-

ically, QEMU is used as a VM in hosting centres, but

can also be used as a debugging platform for embed-

ded ARM systems. QEMU is not meant to be an ex-

tensible framework, even though it is possible to im-

plement new platforms. Among the emulated ARM

platforms are, e.g., Nokia N810 tablet or ARM Versa-

tile Express for Cortex-A9. QEMU does not support

power measurements and estimations.

The instruction accurate simulation technology

from Open Virtual Platforms (OVP) was developed

for high performance simulation of low power multi-

and many-core architectures. Simulations can run

100s of MIPS, often faster than in real time. De-

bugging applications, which run on the virtual hard-

ware, as well as analysis of virtual platforms contain-

ing multiple processor and peripheral models is pos-

sible. A wide range of older and current processor

models is available, e.g. for ARM, MIPS, Renesas,

SIMULTECH2015-5thInternationalConferenceonSimulationandModelingMethodologies,Technologiesand

Applications

380

or PowerPC processor families. A number of prede-

ﬁned platforms (Altera Cyclone V SoC, MIPS malta,

etc.) is also available for the system. Furthermore,

the OVP simulation technology offers the ability to

create new processor architectures and other platform

components (Imperas Software Limited, 2014). The

OVP simulator supports measuring instruction counts

within a program, thus permitting in-depth analysis

of speciﬁc code fragments. Also power modeling for

selected processors is possible using OVP – as intro-

duced in (Rosa et al., 2014).

5 CONCLUSION

In this paper, we present a concept that enables the

determination of suitable low power multi- and many-

core architectures for tsunami and storm surge sim-

ulation. The realization of the concept relies on a

virtual environment that enables emulation and sim-

ulation of different low power multi- and many-core

hardware architectures. For that reason, we conducted

a preliminary investigation of available multi- and

many-core simulation environments. Three aspects

played an important role in our choice: Simulation

performance, the ability to estimate and model power

consumption, and the availability of low power pro-

cessor models in the simulation system. As can be

seen from Table 1, OVP appears to be the best solu-

tion for the realization of our concept.

6 FUTURE WORK

Preliminary estimations of time-to-solution and en-

ergy consumption are necessary to improve design de-

cisions when developing hardware architectures. This

is especially true in the case of a power-aware ﬂood

warning system. As already discussed, cycle accu-

rate simulation is not an option for developing a many

core system due to the poor simulation speed. How-

ever, some work has already been done on utiliz-

ing high-level functional simulation enhanced by a

mechanistic extension, thus including non-functional

properties such as time and energy consumption on

given hardware. Functional Simulators, such as In-

struction Set Simulators (ISS) can be considered as

interpreters of binary executables, simulating the in-

ternal registers of a given system. Our general ap-

proach is to estimate the energy- and time-to-solution

by classifying a given instruction set into instruction

categories, particularly regarding their non-functional

characteristics. This can be done by using micro-

benchmarks as well as existing data (Berschneider

et al., 2014). The process of categorizing and col-

lecting the information can be regarded as an initial

training phase. Once complete, the obtained infor-

mation from each single category can thus be used

to analyze an instruction mix with regard to its mean

energy consumption, computation time, or other non-

functional properties. The ISS is used to provide

an instruction mix out of compiled binary executable

for a given toolchain. This approach works well for

simple embedded architectures with simple pipeline

design, no caches, and in-order execution. Evalua-

tions where already performed for different, mostly

compute bound, algorithms, and the results where

very promising so far (mean relative estimation er-

rors of under 5%). Since the ﬂood simulation algo-

rithm is also compute bound, it seems also promising

to choose this approach for our system as well. How-

ever, further research has to be done to include effects

of more complicated pipeline structures, one or more

data/instruction caches and possible out-of-order exe-

cution.

REFERENCES

Accellera Systems Initiative (2015). Ofﬁcial systemc

website. http://www.systemc.org. Last visit on

02.02.2015.

Aizinger, V. and Dawson, C. (2002). A discontinuous

galerkin method for two-dimensional ﬂow and trans-

port in shallow water. Advances in Water Resources,

25(1):67–84.

Aizinger, V., Proft, J., Dawson, C., Pothina, D., and Ne-

gusse, S. (2013). A three-dimensional discontinuous

galerkin model applied to the baroclinic simulation of

corpus christi bay. Ocean Dynamics, 63(1):89113.

Applegate, D., Bixby, R., Chv

atal, V., and Cook, W. (2011).

The Traveling Salesman Problem: A Computational

Study: A Computational Study. Princeton Series in

Applied Mathematics. Princeton University Press.

Atitallah, R., Niar, S., Greiner, A., Meftali, S., and

Dekeyser, J. (2006). Estimating energy consumption

for an mpsoc architectural exploration. In Grass, W.,

Sick, B., and Waldschmidt, K., editors, Architecture

of Computing Systems - ARCS 2006, volume 3894 of

Lecture Notes in Computer Science, pages 298–310.

Springer Berlin Heidelberg.

Bellard, F. (2005). QEMU, a Fast and Portable Dynamic

Translator. In USENIX Annual Technical Conference,

FREENIX Track, pages 41–46.

Berschneider, S., Herglotz, C., Reichenbach, M., Fey, D.,

and Kaup, A. (2014). Estimating video decoding en-

ergies and processing times utilizing virtual hardware.

In Proc. 3PMCES Workshop. Design, Automation &

Test in Europe (DATE).

Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K.,

Saidi, A., Basu, A., Hestness, J., Hower, D. R., Kr-

TsunamiandStormSurgeSimulationUsingLowPowerArchitectures-ConceptandEvaluation

381

ishna, T., Sardashti, S., Sen, R., Sewell, K., Shoaib,

M., Vaish, N., Hill, M. D., and Wood, D. A. (2011).

The gem5 simulator. SIGARCH Comput. Archit.

News, 39(2):1–7.

Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T.,

Saidi, A. G., and Reinhardt, S. K. (2006). The m5

simulator: Modeling networked systems. IEEE Micro,

26:52–60.

Carlson, T. E., Heirman, W., and Eeckhout, L. (2011).

Sniper: Exploring the level of abstraction for scalable

and accurate parallel multi-core simulations. In In-

ternational Conference for High Performance Com-

puting, Networking, Storage and Analysis (SC), pages

52:1–52:12.

Castro, M., Francesquini, E., Ngu

e, T. M., and M

ehaut, J.-

F. (2013). Analysis of computing and energy perfor-

mance of multicore, numa, and manycore platforms

for an irregular application. In Proceedings of the

3rd Workshop on Irregular Applications: Architec-

tures and Algorithms, IA3 ’13, pages 5:1–5:8, New

York, NY, USA. ACM.

Dawson, C. and Aizinger, V. (2005). A discontinuous

galerkin method for three-dimensional shallow wa-

ter equations. Journal of Scientiﬁc Computing, 22(1-

3):245–267.

Endo, F. A., Courouss

e, D., and Charles, H.-P. (2015).

Micro-architectural simulation of embedded core het-

erogeneity with gem5 and mcpat. In Proceedings of

the 2015 Workshop on Rapid Simulation and Perfor-

mance Evaluation: Methods and Tools, RAPIDO ’15,

pages 7:1–7:6, New York, NY, USA. ACM.

GEMS Development Team (2015). Ofﬁcial gems web-

site. http://research.cs.wisc.edu/gems/. Last visit on

02.02.2015.

oddeke, D., Komatitsch, D., Geveler, M., Ribbrock, D.,

Rajovic, N., Puzovic, N., and Ramirez, A. (2013). En-

ergy efﬁciency vs. performance of the numerical so-

lution of pdes: An application study on a low-power

arm-based cluster. J. Comput. Phys., 237:132–150.

Heirman, W., Sarkar, S., Carlson, T. E., Hur, I., and

Eeckhout, L. (2012). Power-aware multi-core sim-

ulation for early design stage hardware/software co-

optimization. In International Conference on Parallel

Architectures and Compilation Techniques (PACT).

Imperas Software Limited (2014). OVP Guide to Using

Processor Models. Imperas Buildings, North Weston,

Thame, Oxfordshire, OX9 2HA, UK. Version 0.5,

docs@imperas.com.

ITMC TU Dortmund (2015). Ofﬁcial lido website.

https://www.itmc.uni-dortmund.de/dienste/ hochleis-

tungsrechnen/lido.html. Last visit on 26.03.2015.

KALRAY Corporation (2015). Ofﬁcial kalray

mppa processor website. http://www. kalray-

inc.com/kalray/products/#processors. Last visit on

31.03.2015.

Kurian, G., Neuman, S., Bezerra, G., Giovinazzo, A., De-

vadas, S., and Miller, J. (2014). Power modeling and

other new features in the graphite simulator. In Per-

formance Analysis of Systems and Software (ISPASS),

2014 IEEE International Symposium on, pages 132–

134.

Li, S., Ahn, J. H., Strong, R. D., Brockman, J. B., Tullsen,

D. M., and Jouppi, N. P. (2013). The mcpat framework

for multicore and manycore architectures: Simultane-

ously modeling power, area, and timing. ACM Trans.

Archit. Code Optim., 10(1):5:1–5:29.

Lis, M., Ren, P., Cho, M. H., Shim, K. S., Fletcher, C.,

Khan, O., and Devadas, S. (2011). Scalable, accu-

rate multicore simulation in the 1000-core era. In Per-

formance Analysis of Systems and Software (ISPASS),

2011 IEEE International Symposium on, pages 175–

185.

Miller, J., Kasture, H., Kurian, G., Gruenwald, C., Beck-

mann, N., Celio, C., Eastep, J., and Agarwal, A.

(2010). Graphite: A distributed parallel simulator for

multicores. In High Performance Computer Architec-

ture (HPCA), 2010 IEEE 16th International Sympo-

sium on, pages 1–12.

NVIDIA Corporation (2015a). Ofﬁcial nvidia seco develop-

ment kit website. https://developer.nvidia.com/seco-

development-kit. Last visit on 31.03.2015.

NVIDIA Corporation (2015b). Ofﬁcial nvidia tegra

2 website. http://www.nvidia.com/object/tegra-

superchip.html. Last visit on 27.03.2015.

NVIDIA Corporation (2015c). Ofﬁcial nvidia tegra

3 website. http://www.nvidia.com/object/tegra-3-

processor.html. Last visit on 27.03.2015.

Rajovic, N., Carpenter, P. M., Gelado, I., Puzovic, N.,

Ramirez, A., and Valero, M. (2013a). Supercomput-

ing with commodity cpus: Are mobile socs ready for

hpc? In Proceedings of the International Conference

on High Performance Computing, Networking, Stor-

age and Analysis, SC ’13, pages 40:1–40:12, New

York, NY, USA. ACM.

Rajovic, N., Rico, A., Puzovic, N., Adeniyi-Jones, C., and

Ramirez, A. (2014). Tibidabo: Making the case for an

arm-based {HPC} system. Future Generation Com-

puter Systems, 36(0):322 – 334. Special Section: In-

telligent Big Data Processing Special Section: Behav-

ior Data Security Issues in Network Information Prop-

agation Special Section: Energy-efﬁciency in Large

Distributed Computing Architectures Special Section:

eScience Infrastructure and Applications.

Rajovic, N., Rico, A., Vipond, J., Gelado, I., Puzovic, N.,

and Ramirez, A. (2013b). Experiences with mobile

processors for energy efﬁcient hpc. In Proceedings of

the Conference on Design, Automation and Test in Eu-

rope, DATE ’13, pages 464–468, San Jose, CA, USA.

EDA Consortium.

Rosa, F., Ost, L., Raupp, T., Moraes, F., and Reis, R. (2014).

Fast energy evaluation of embedded applications for

many-core systems. In Power and Timing Modeling,

Optimization and Simulation (PATMOS), 2014 24th

International Workshop on, pages 1–6.

SocLib Project (2015). Ofﬁcial soclib developer web-

site. http://www.soclib.fr/trac/dev. Last visit on

01.02.2015.

Weaver, V. M. and McKee, S. A. (2008). Are cycle accurate

simulations a waste of time? In Proc. 7th Workshop

on Duplicating, Deconstructing, and Debunking.

SIMULTECH2015-5thInternationalConferenceonSimulationandModelingMethodologies,Technologiesand

Applications

382