An Application- and Platform-agnostic Runtime Management

Framework for Multicore Systems

Graeme M. Bragg

, Charles Leech

, Domenico Balsamo

, James J. Davis

, Eduardo Wachter

Geoff V. Merrett

, George A. Constantinides

and Bashir M. Al-Hashimi

School of Electronics and Computer Science, University of Southampton, SO17 1BJ, U.K.

Department of Electrical and Electronic Engineering, Imperial College London, SW7 2AZ, U.K.

Keywords:

Heterogeneous Systems, Runtime Management, Software Framework.

Abstract:

Heterogeneous multiprocessor systems have increased in complexity to provide both high performance and

energy efﬁciency for a diverse range of applications. This motivates the need for a standard framework that

enables the management, at runtime, of software applications executing on these processors. This paper

proposes the ﬁrst fully application- and platform-agnostic framework for runtime management approaches

that control and optimise software applications and hardware resources. This is achieved by separating the

system into three distinct layers connected by an API and cross-layer constructs called knobs and monitors.

The proposed framework also supports the management of applications that are executing concurrently on

heterogeneous platforms. The operation of the proposed framework is experimentally validated using a basic

runtime controller and two heterogeneous platforms, to show how it is application- and platform-agnostic and

easy to use. Furthermore, the management of concurrently executing applications through the framework is

demonstrated. Finally, two recently reported runtime management approaches are implemented to demonstrate

how the framework enables their operation and comparison. The energy and latency overheads introduced by

the framework have been quantiﬁed and an open-source implementation has been released

1 INTRODUCTION

The management and control of hardware settings at

runtime is crucial to the efﬁcient execution of applica-

tions with varying performance requirements on em-

bedded platforms. This has, however, become a non-

trivial task for multi-core and heterogeneous embed-

ded systems. In addition, applications have become

increasingly dynamic in order to exploit the capabili-

ties of these systems, with adjustable parameters that

must be tuned to optimise their behaviour. As a result,

the proactive optimisation of application performance

and system energy efﬁciency is a key research chal-

lenge. Runtime management is a solution that enables

optimisation of, and tradeoff between, quality, appli-

cation throughput and energy with varying require-

ments.

One way in which this can be achieved is by the

exposure and adaptation of tunable parameters from

Available at: https://github.com/PRiME-project/

PRiME-Framework

the application and platform through a consistent fra-

mework interface. However, the majority of current

frameworks only provide a mechanism to monitor ap-

plication performance, and do not allow for the simul-

taneous monitoring and control of hardware compo-

nents and applications at runtime. Moreover, most ex-

isting frameworks do not support heterogeneous plat-

forms, which contain processors with differing capa-

bilities, or the management of concurrent applicati-

ons.

This paper presents the ﬁrst framework for fully

application- and platform-agnostic runtime manage-

ment that enables the simultaneous control and op-

timisation of software applications and hardware re-

sources. This is achieved by separating systems

into three distinct layers: application, runtime ma-

nagement and device. These layers are connected

through cross-layer constructs called knobs and mo-

nitors, accessed through a novel application program-

ming interface (API), which enable the ﬂow of infor-

mation between layers and the control and monito-

ring of runtime-tunable and -observable parameters.

Bragg, G., Leech, C., Balsamo, D., Davis, J., Wachter, E., Merrett, G., Constantinides, G. and Al-Hashimi, B.

An Application- and Platform-agnostic Runtime Management Framework for Multicore Systems.

DOI: 10.5220/0006939100570066

In Proceedings of the 8th International Joint Conference on Pervasive and Embedded Computing and Communication Systems (PECCS 2018), pages 57-66

ISBN: 978-989-758-322-3

Table 1: Properties of state-of-the-art frameworks for runtime management of applications on multiprocessor systems.

Framework

Application–RTM RTM–device

Monitor

bounds

Hetero.

platforms

Open

source

Knobs Monitors

Non-temp.

monitors

Multiple

monitors

Knobs Monitors

Heartbeats

(Hoffmann et al., 2010)

7 X 7 7 7 7 7 7 X

PowerDial

(Hoffmann et al., 2011)

X Heartbeats 7 7 7 7 7 7 7

Heterogeneous Heartbeats

(Fleming and Thomas, 2014)

7 Heartbeats 7 7 7 X 7 CPU+FPGA X

ARGO

(Gadioli et al., 2015)

X X X X 7 7 7 7 7

AS-RTM

(Paone et al., 2014)

X Heartbeats 7 X 7 7 7 7 7

PTRADE

(Hoffmann et al., 2013)

7 Heartbeats 7 7 X 7 7 7 7

DRM

(Baldassari et al., 2017)

7 Heartbeats 7 7 7 7 7 7 7

BEEPS

(Gaspar et al., 2015)

7 Heartbeats 7 7 X 7 7 7 7

Proposed X X X X X X X X X

This reduces the design complexity by enabling the

runtime management layer to provide a speciﬁc ser-

vice to the applications, e.g. to meet a performance

requirement, whilst meeting optimisation targets by

controlling the hardware. The framework’s novel fe-

atures include:

• The ability to control and monitor applicati-

ons and hardware simultaneously using a cross-

layered approach.

• An API that provides a consistent way in which

knobs and monitors are speciﬁed and monitored

across applications and platforms.

• A mechanism to enable the management of con-

currently executing applications and heterogene-

ous platforms.

Additionally, the framework enables the direct com-

parison of runtime management approaches and algo-

rithms, which has not previously been possible, and

simpliﬁes runtime manager (RTM) development.

In the remainder of this paper, a survey of existing

frameworks is carried out to contrast the proposed fra-

mework against the state of the art. The proposed fra-

mework is experimentally validated with a range of

applications and two different types of heterogeneous

platform to demonstrate its application- and platform-

agnostic properties and to illustrate its ease of use.

The management of two concurrently-executing ap-

plications is then demonstrated. In addition, two re-

cently reported runtime management approaches, the

ﬁrst based on performance counter-driven control and

the second using reinforcement learning, are imple-

mented with the framework to demonstrate how the

framework enables their operation and comparison.

Finally, the energy and latency overheads of the pro-

posed framework are quantiﬁed. An open-source C++

implementation of the framework and API has also

been released.

2 RELATED WORK

Various runtime management approaches exist in the

literature for optimising system behaviour, whilst sa-

tisfying application requirements. These include dy-

namic voltage and frequency scaling (DVFS) (Das

et al., 2014; Wang et al., 2017), per-core power ga-

ting (Rahmani et al., 2017), dynamic task mapping

and thread migration (Reddy et al., 2017). While

RTMs are typically designed to address general chal-

lenges, such as energy efﬁciency or thermal manage-

ment, they are largely implemented on speciﬁc plat-

forms or with speciﬁc classes of application, e.g. mul-

timedia (Kim et al., 2017) or image processing (Yang

et al., 2015).

In addition, benchmarks are typically used to as-

sess relative performance and measure speciﬁc as-

pects of RTMs and hardware platforms. However,

they do not typically expose application requirements

(e.g. error or accuracy) in addition to performance

PEC 2018 - International Conference on Pervasive and Embedded Computing

and this can limit the range of optimisation opportu-

nities of runtime management approaches. Further-

more, source code for RTMs is often not released,

with limited detail on implementation reported, ma-

king reproduction of results a non-trivial task. This

prevents the direct comparison of approaches, with

several works relying on comparison via Linux go-

vernors (Singla et al., 2015; Reddy et al., 2017).

Runtime management can be enhanced by the ex-

posure of dynamic knobs and monitors, which pro-

vide a mechanism to communicate with the appli-

cation and platform. Speciﬁcally, knobs allow the

tuning of hardware and application parameters by

the RTM, while monitors enable the measurement of

hardware properties and the observation of applica-

tion behaviour, including the setting of performance

targets by the application (Hoffmann et al., 2011;

Fleming and Thomas, 2014; Gadioli et al., 2015;

Leech et al., 2018b). In addition, knobs and moni-

tors can been used to explore application-device tra-

deoffs, such as throughput-power (Hoffmann et al.,

2013) and precision-throughput (Sui et al., 2016), and

locate optimal operating points for applications (Vas-

siliadis et al., 2016). However, runtime management

lacks portability unless these knobs and monitors are

exposed through a consistent interface.

Several frameworks have been proposed in the

past to address the challenge of providing such inter-

faces. Table 1 summarises their features. The most

relevant framework is the Heartbeats API (Hoffmann

et al., 2010), which provides a standardised interface

for single or concurrent applications to communicate

their current and target performance to external ob-

servers, such as an RTM. The Heartbeats API only

allows applications to communicate their throughput

(i.e. the heart rate), therefore it does not allow ot-

her types of parameters to be exposed, such as accu-

racy and error (classed as non-temporal monitors in

column four of Table 1), and prevents tradeoffs bet-

ween them. In addition, it does not extend this inter-

face for monitoring or control of device parameters.

Most of the frameworks reported in Table 1 are based

on the Heartbeats concept and inherit its features, e.g.

application monitors (column three).

In order to perform tradeoffs within a single ap-

plication, multiple monitors of different types must

be exposed, e.g. throughput and error. Column ﬁve

of Table 1 shows that Heartbeats, and most of the fra-

meworks that rely on it, do not support this functiona-

lity. In addition, for an application to meet its require-

ments, a target can be speciﬁed with the monitor. Ho-

wever, there is no indication as to whether the target

is a maximisation or minimisation objective, as listed

in column eight. As a result, these approaches do not

Figure 1: Cross-layer framework and API enabling com-

munication between the application, runtime management

and device layers using knobs and monitors. Examples are

given for an image ﬁlter application on a CPU.

allow fully application-agnostic behaviour.

Columns six and seven show that current frame-

works only provide partial abstraction of RTM to de-

vice communication, and do not include both knobs

and monitors to control hardware components at run-

time. Moreover, most existing works do not operate

on heterogeneous platforms (column nine), which

provide both high performance and energy efﬁciency

by combining conventional CPUs with other accele-

rators. These platforms typically increase the scalabi-

lity of parallel applications and systems, and therefore

they need to be managed by a framework that sup-

ports device-agnostic control. One framework sup-

ports a heterogeneous platform; however, it has been

designed for a speciﬁc platform and introduces a har-

dware dependency in the process (Fleming and Tho-

mas, 2014). This restricts the cross-platform capabi-

lities of current frameworks, meaning that they do not

allow current RTM approaches to be portable across

multiple platforms.

3 PROPOSED FRAMEWORK

To address the limitations of existing frameworks dis-

cussed in Section 2, a framework for application-

and platform-agnostic runtime management of hete-

rogeneous systems is presented. Figure 1 shows the

proposed framework and how the three layers are

connected by novel APIs (App to RTM API and RTM

to device API). This provides consistent interfaces

from an RTM to both hardware platforms and appli-

cations, which enables the design and implementation

of application- and platform-agnostic runtime mana-

gement approaches. As discussed in Section 2, ap-

An Application- and Platform-agnostic Runtime Management Framework for Multicore Systems

Table 2: Application-to-RTM and RTM-to-device API functions for the proposed framework.

Layer Construct Space Identiﬁer Input(s) Output(s) Description

app

knob

disc

cont

min knob, min – Update application knob’s minimum allowed value

max knob, max – Update application knob’s maximum allowed value

get knob value Pull application knob’s current value

mon

min mon, min – Update application monitor’s minimum desired value

max mon, max – Update application monitor’s maximum desired value

weight mon, weight – Update application monitor’s relative importance

set mon, value – Push application monitor’s current value

dev

knob

min knob min Pull device knob’s minimum allowed value

max knob max Pull device knob’s maximum allowed value

init knob init Pull device knob’s initial (default) value

type knob type Pull device knob’s type

set knob, value – Push device knob’s current value

mon

type mon type Pull device monitor’s type

get mon value Pull device monitor’s current value and bounds

plication knobs expose tunable application parame-

ters, e.g. ﬁlter precision, while monitors convey infor-

mation about the behaviour of the applications, e.g.

frame rate. Similarly, device knobs expose tunable

device parameters while monitors convey information

about the status of devices. Exposing knobs and mo-

nitors at both the application and device layer ena-

ble tradeoffs, e.g. performance-energy or accuracy-

temperature, to be explored and exploited by the run-

time management layer.

In addition, the proposed framework facilitates the

comparison of existing RTMs as well as the manage-

ment of concurrently-executing applications and he-

terogeneous platforms. The remainder of this section

provides an overview of the technical concepts of the

proposed framework and details of the novel API.

3.1 Framework Concepts

Structure: The separation of the system into the

three distinct layers—application, runtime manage-

ment and device—shown in Figure 1 reduces de-

sign complexity and provides ﬂexibility during ope-

ration. The application layer comprises any number

of software processes, while the device layer inclu-

des the hardware and its software drivers. The run-

time management layer comprises an RTM responsi-

ble for the control and monitoring of the other two

layers. This separation ensures portability and cross-

compatibility; applications and device drivers only

need to be written once to be used with any imple-

mented RTM.

The framework can be viewed hierarchically “do-

wnwards” since, as far as knob and monitor control

is concerned, applications are masters of the RTM.

Applications make calls to the API, controlling the

presence and conﬁguration of each knob and monitor.

Devices, meanwhile, are the RTM’s slaves since they

must respond to requests to set and get knob and mo-

nitor values, respectively. Thus, applications “pull”

their knob settings from the RTM and “push” moni-

tor updates, while device knobs are pushed from the

RTM and monitor values pulled.

Communication: Knobs and monitors, shown in the

dashed regions of Figure 1, facilitate communica-

tion between the layers. Bounds are attached to both

knobs and monitors, in the form of minima and max-

ima, which allow applications and devices to inform

an RTM of targets and constraints. Knob bounds

represent a range of allowed values while monitor

bounds represent a range of desired values, rather

than a single target. An RTM’s primary objective is

to ensure that the monitor values of all applications

and the device remain within their speciﬁed bounds.

Beyond this, it is free to optimise any unbounded mo-

nitors in order to meet secondary objectives, e.g. to

reduce power consumption. Minimal modiﬁcation of

applications is required to expose knobs and monitors

through the framework.

The image ﬁltering application shown in Figure 1

provides the option of selecting ﬂoat or double pre-

cision for its numeric operations at runtime. This

choice will be controlled by an RTM using an applica-

tion knob with options

{

0, 1

}

. If the same application

requires a minimum throughput, e.g. expressed as a

frame rate α, an application monitor with this bound

can be provided. In this case, the application will

periodically update the current frame rate so that the

RTM can keep it within the range [α, ∞). On the har-

dware side, DVFS of the CPU is achieved via a device

knob with options

{

0, 1, ·· · , 9

}

, enabling the RTM to

switch between ten distinct voltage-frequency pairs.

PEC 2018 - International Conference on Pervasive and Embedded Computing

Finally, to enable thermal management by the RTM,

a temperature sensor is exposed as a device monitor.

Weights: Individual applications may feature multi-

ple performance objectives with differing priorities.

For example, an application aware of both its throug-

hput and accuracy may wish to prioritise the optimisa-

tion of one over the other. In the proposed framework,

such priorities are expressed with a numeric weight

attached to each monitor. These weights instruct the

RTM to expend proportional effort in optimising each

monitor’s value. In a similar manner, application pri-

ority is indicated through attached weights such that,

for example, a higher level of performance can be en-

sured by foreground processes.

Concurrency: Real-world systems commonly exe-

cute more than one application concurrently. Due

to this, an RTM is required to carefully manage sy-

stem resources so that each application can meet its

performance targets. When considering concurrently

executing applications, the framework provides a me-

chanism to identify and manage them simultaneously,

enabling inter-application tradeoffs by the RTM.

Types: Knobs and monitors each have a type selec-

table from a discrete set of options, e.g. TEMP for a

temperature monitor or FREQ for a frequency knob.

This represents a compromise between complete ag-

nosticism and the full provision of information. Pro-

viding “hints” to the RTM simpliﬁes the process of

determining the function of knobs and the properties

represented by monitors, e.g. “lower power is better.”

Spaces: All knob and monitor values are expressed

in standardised, unitless formats to maintain applica-

tion and device agnosticism. The proposed frame-

work allows discrete- and continuous-valued versions

of each knob and monitor so that appropriate optimi-

sation processes can be used by the RTM. These spa-

ces enable the translation of application-speciﬁc in-

formation into agnostic sets, as shown in Figure 1 for

the ranges of the knobs and monitors. Discrete versi-

ons use signed integer values while their continuous

counterparts operate using ﬂoating-point data.

Adaptability: In order to provide maximal ﬂexibi-

lity, all bounds and weights are adjustable at runtime,

and no restrictions are placed on when update to these

can occur. Most commonly, applications create their

knobs and monitors before being executed, however

no limitation is imposed on such events occurring

partway through application execution instead. Ap-

plications are allowed to be attached to and detached

from the framework at any point during runtime. This

capability is in contrast to existing frameworks, most

of which assume a constant application set, contrary

to the typical use of many embedded systems.

3.2 API Speciﬁcation

The proposed framework is realised through novel

API calls that connect the system layers of Figure 1

and enable the exposure of knobs and monitors be-

tween them in a consistent manner across applica-

tions and hardware platforms. Table 2 illustrates

how the API functions are split into application (app)

and device (dev) categories, with subcategories for

knob (knob) and monitor (mon) interaction. Discrete-

(disc) and continuous-valued (cont) versions exist

across the API to indicate knob and monitor typology.

The RTM must be made aware of the allo-

wable and desired values for knobs and monitors,

respectively, in order to ensure that its optimisa-

tions have positive effects. For knobs, functi-

ons app

knob (disc|cont) (min|max)() facilitate

this, letting the application indicate the range in which

values can be chosen. Conversely, monitor functions

app mon (disc|cont) (min|max|weight)() allow

the setting of RTM objectives, with * min()

and * max() functions indicating desired lower

and upper bounds. Where an application re-

quires only a maximum or minimum bound, the

other end of the range can be left unboun-

ded using (DISC|CONT) MIN or (DISC|CONT) MAX.

Intra-application weighting values between 0.0 and

CONT MAX can be used to indicate relative monitor

importance to the RTM using * weight() functi-

ons, guiding its optimisations. All of these settings

can be updated during application execution if requi-

red. Functions app knob (disc|cont) get() and

app mon (disc|cont) set() are used by the appli-

cation to get the current value of a knob from the RTM

and set a value for a monitor to the RTM, respectively.

The timing of these actions is application-controlled.

Device-layer knobs and monitors are exposed

and updated via the RTM-to-device API functions,

as shown in the lower half of Table 2. Functi-

ons dev knob (disc|cont) (min|max)() are

equivalent to their application-layer counterparts,

setting ranges of valid values. Additional functi-

ons dev knob (disc|cont) (type|init)()

return the type of the knob or its initial value,

i.e. that from which the RTM starts its explora-

tion. Type-related functions return values from

deﬁned sets and are called by the RTM using

dev mon (disc|cont) type(). The RTM uses

functions dev knob (disc|cont) set() and

dev mon (disc|cont) get() for setting device

knob values and accessing monitor values and

bounds from the device at runtime.

An Application- and Platform-agnostic Runtime Management Framework for Multicore Systems

4 EVALUATION

In order to demonstrate the capabilities of the fra-

mework and validate its operation, a series of ex-

periments have been carried out. Illustrative RTMs

were used where appropriate to demonstrate speci-

ﬁc concepts. The experimental setup is discussed in

Section 4.1, after which the framework’s basic opera-

tion and ease of use are exempliﬁed in Section 4.2.

Application agnosticism is shown throughout this

section while platform agnosticism is demonstra-

ted in Section 4.3 with the same application-RTM

pair executing on two different heterogeneous plat-

forms. Support for concurrent applications is shown

in Section 4.4, with two different applications execu-

ting on one platform The ability of the framework

to enable direct comparison of RTMs is shown in

Section 4.5 with two recently reported runtime ma-

nagement approaches. Finally, framework overheads

are analysed in Section 4.6.

4.1 Experimental Setup

Two heterogeneous embedded platforms were used

to demonstrate the proposed framework. The

Odroid-XU3 development board, containing an ARM

big.LITTLE architecture with two quad-core CPU

clusters and a GPU, was used to demonstrate the ease

of use of the framework, the direct comparison of

RTMs and to assess overheads. The platform contains

ﬁve temperature sensors to monitor the CPU and GPU

and four power sensors to monitor each CPU cluster,

the GPU and memory. Each of these was exposed

to the framework as a device monitor. Three device

knobs were exposed to provide DVFS for each CPU

cluster and the GPU. Table 3 summarises the knobs

and monitors of the Odroid-XU3.

A Cyclone V SoC Development Kit was used

to demonstrate platform-agnostic operation of the

framework. This platform includes a heterogene-

ous CPU-FPGA system-on-chip containing two ARM

CPUs and FPGA fabric. Using OpenCL, applications

can execute on either the CPUs or the FPGA.

Four different applications from the numerical and

multimedia domains were used to demonstrate the

application-agnostic properties of the framework.

4.2 Agnostic Runtime Management

A basic controller was implemented within the run-

time management layer to illustrate the use of knobs

and monitors for maintaining an application perfor-

mance target while optimising a given device mo-

nitor. Listing 1 shows the code for the controller,

Table 3: Device-level knobs and monitors for Odroid-XU3.

Const. Space Type For No.

knob

disc FREQ LITTLE cluster 1

disc FREQ big cluster 1

disc FREQ GPU 1

mon

cont POW Clusters, RAM, GPU, SoC 5

cont TEMP big cores 4

cont TEMP GPU 1

disc PMC LITTLE cores 16

disc PMC big cores 24

Listing 1: RTM code for agnostic control and monitoring of

application and device knobs and monitors.

1 v oi d r tm :: co n t r o l_ l o o p () {

2 wh i le ( 1 ) {

3 te m p _m o n = d e v _a p i . m o n _ co n t _ get ( t em p _ mon s [ 2

]) ;

4 if ( a pps . s iz e () ){

5 a p p _p e r f = a p p _ mon s _ c on t [ 0 ] ;

6 if ( a p p _p e r f . va l < a p p_ p e rf . min ) {

7 if ( f r eq _ k no b . val < f r e q_k n o b . ma x ) {

8 fr e q _k n o b . va l ++ ;

9 de v _a p i . k n o b _di s c _ se t ( fr e q_ kn o b ,

fr e q _kn o b . v al );

10 }}

11 e ls e i f ( t em p _mo n . v al > t e mp _ m on . ma x ) {

12 f r eq _ k nob . va l - -;

13 d ev_ a pi . k n ob_ d i s c_ s e t ( f r eq _k n ob , f r e q_ k n ob

. v al );

14 } }} }

which ensures that the value of the application per-

formance monitor remains within its bounds. This

is achieved by adjusting the device frequency knob

in order to avoid violations of the monitor bounds

app perf.min and app perf.max (lines 6 – 9). The

optimisation of device temperature (line 11) is the se-

condary objective and is achieved by decrementing

the frequency knob (line 12), trading off excess ap-

plication performance (lines 12–13).

The behaviour of this controller is shown in Fi-

gure 2 while running a numerical benchmarking ap-

plication (Whetstone). This benchmark performs

numerical functions using integer and ﬂoating-point

arithmetic. Its performance is measured in thousands

of Whetstone instructions per second (KIPS), which

is exposed as a continuous monitor with bounds of

[2.30, ∞). Initially, the controller set the device fre-

quency to maximum and observed the device tempe-

rature. As the temperature increased above the max-

imum threshold speciﬁed by temp mon.max (80

◦

C),

the controller reduced the frequency until the tem-

perature was below the threshold whilst ensuring

that the application performance was higher than

PEC 2018 - International Conference on Pervasive and Embedded Computing

Performance (KIPS)

app_perf.min

0.5

1.5

Frequency (GHz)

100

0 10 20 30 40 50 60 70 80

Temperature (°C)

Time (s)

temp_mon.max

Figure 2: Device temperature optimisation under application performance constraints using the controller RTM, including

dynamic adjustment of the temperature threshold from 80 to 60

◦

Figure 3: Design-space exploration of the Jacobi applica-

tion across the Odroid-XU3 and Cyclone V devices.

app perf.min. After 50 seconds, the platform redu-

ced its temperature threshold to 60

◦

C and the RTM

reduced the frequency in response until the updated

monitor bound was satisﬁed while still meeting the

application throughput requirement.

This experiment demonstrates the basic operation

of the framework and illustrates the dynamic nature of

its knobs and monitors. The controller is application-

and platform-agnostic as it could operate, without

modiﬁcation, with any application that exposes a per-

formance monitor and any platform that exposes a fre-

quency knob and temperature monitor.

4.3 Platform Agnosticism

The portability of RTMs and applications implemen-

ted within the framework is demonstrated in Figure 3,

which shows the design-space exploration (DSE) of

the same application across two heterogeneous plat-

forms using the same RTM code. A Jacobi iterative

solver was used as a case-study application.

The Jacobi method solves the system of N linear

equations Ax = b, where A is an N × N matrix and

x and b are N × 1 column vectors. If A is decompo-

sed into diagonal and remainder components D and

R, under suitable conditions x can be computed ite-

ratively, with later iterations containing more accu-

rate results. The application can operate a tradeoff

between the speed of calculation (solves per second)

and the accuracy of the result (mean squared error) by

adjusting the number of iterations performed and the

precision of the data type.

Throughput and accuracy were exposed as moni-

tors while iterations to perform and precision were

exposed as knobs. The DSE extended to application

execution on the heterogeneous components of both

platforms, including the GPU on the Odroid and the

FPGA on the Cyclone V, in addition to the CPUs.

Points in Figure 3 show the resultant throughput and

error for each combination of knob values, with blue

crosses for the Odroid and green triangles for the Cy-

clone V. This experiment demonstrates that the same

application and RTM code can be used on any plat-

form supported within the proposed framework.

4.4 Concurrency Management

This subsection demonstrates how the framework

supports the management of concurrently execu-

ting applications. A runtime control algorithm was

implemented with a target of keeping the throug-

hput monitor of each application within its bounds,

app perf.min and app perf.max, while minimising

device frequency. The behaviour of this controller is

shown in Figure 4, where the execution of two ap-

plications is indicated by their throughput over time.

The top plot shows a video ﬁltering application and

An Application- and Platform-agnostic Runtime Management Framework for Multicore Systems

0 5 10 15 20 25 30 35 40

Video filter throughput

(Frames per second)

app_perf.min

app_perf.max

100

200

300

400

500

0 5 10 15 20 25 30 35 40

Jacobi throughput

(Solves per second)

app_perf.min

app_perf.max

0 5 10 15 20 25 30 35 40

Frequency

(GHz)

Time (s)

Figure 4: Runtime management of the throughput of two concurrently-executing applications through the framework. The

Jacobi application begins execution at 21 seconds and the device frequency is adjusted to compensate.

the middle plot shows the Jacobi iterative solver.

Initially, the video ﬁlter application was the only

application executing. As a result, the runtime con-

troller adjusted the CPU frequency to meet the appli-

cation throughput bounds at the lowest frequency pos-

sible. The Jacobi application began its execution af-

ter 21 seconds, shortly after which the RTM observed

that its throughput was below the desired minimum

bound. The throughput of the video ﬁlter also decrea-

sed due to competition for device resources. To com-

pensate, the controller increased the CPU frequency

such that the throughput of both applications returned

to within their bounds.

4.5 Comparison of RTM Approaches

To demonstrate the framework’s optimisation and

comparative capabilities, two state-of-the-art runtime

management approaches were implemented within

the proposed framework. The ﬁrst approach, RTM-

A (Reddy et al., 2017), aims to optimise power con-

sumption by monitoring hardware performance coun-

ters to identify opportunities where CPU frequency

can be reduced without impacting application perfor-

mance. The second approach, RTM-B (Maeda-Nunez

et al., 2015), employs reinforcement learning to pre-

dict the frequency that should be selected to meet an

application performance target based on previous ap-

plication behaviour. RTM-A was originally evalua-

ted on the Odroid-XU3 platform using standard ben-

chmarks with a reported mean energy saving of 25%

compared to the Linux Ondemand governor. RTM-B

was evaluated on the BeagleBoard-xM platform using

a video decoder application with a reported mean re-

Figure 5: Mean total energy consumed by the Odroid-XU3

running the video decoder application under the control of

each RTM, both with and without the framework (FW). The

experiment was repeated 50 times for each RTM.

duction in energy consumption of 30% when compa-

red to the Ondemand governor.

These two approaches lack portability and direct

comparisons cannot be made due to the different plat-

forms used for experimental validation. Implementa-

tion within the proposed framework allows them to

be directly compared, saving development time and

improving the accuracy of the comparison. To de-

monstrate this, the RTMs were evaluated using an

OpenCV video decoding application on the Odroid-

XU3 platform. The application exposes a continuous

monitor for the frame rate, with a minimum bound of

25 frames per second. The RTMs are directly compa-

red in Figure 5, between bars two and four, showing

that the application consumed a mean total energy of

381 J and 376 J under the control of RTM-A and

RTM-B, respectively. Comparison with the Linux

Ondemand governor (bar ﬁve) shows energy savings

of 17.2% and 18.2%, respectively. This demonstrates

that while RTM-B achieves a greater energy saving, it

PEC 2018 - International Conference on Pervasive and Embedded Computing

Figure 6: Breakdown of the sources of latency introduced

by the framework for communication between the RTM and

device layers.

is less than reported in the literature for this speciﬁc

application and platform pair.

4.6 Overheads

As with any abstraction, the framework introduces an

energy overhead due to the additional computation

required. This overhead can be estimated by com-

paring standalone versions of RTM-A and RTM-B

against their implementations within the framework.

Results of these experiments can be seen in Figure 5

for RTM-A (bars one and two) and for RTM-B (bars

three and four). RTM-A required 19.6 J (5.48%) more

energy, while RTM-B required only 15.2 J (4.23%)

more energy, in the minimum case. The minimum

case was used to minimise the impact of other run-

ning processes on the result. When compared to the

Ondemand governor, the two RTMs still achieved sig-

niﬁcant savings regardless.

The framework also introduces latency overheads

that limit RTM reaction rates. Figure 6 is a visuali-

sation of the steps involved in reading a device moni-

tor inside the framework, from which seven internal

latency sources can be identiﬁed. t

asm

, t

and t

diss

are the times to assemble, transmit and disassemble a

message used for conveying monitor information. t

net

is the message-passing interface latency and t

the time to search for and read a monitor.

The latency related to each API call was measu-

red and found to be 80–200 µs, with 40% attributed

to cross-layer communication. For an RTM reading

one device monitor and setting one device knob per

update, this limits the update rate to 1.67 kHz.

5 CONCLUSIONS

This paper has presented a framework that enables

application- and platform-agnostic runtime manage-

ment of concurrently executing applications on hete-

rogeneous multi-core systems. This is achieved by

visualising a system as three distinct layers connected

by dynamic knobs and monitors that allow a range of

tunable parameters and observable metrics to be ex-

posed. Framework operation with concurrent appli-

cations has been demonstrated. The framework ena-

bles the direct comparison of competing RTM appro-

aches, which was not previously possible, and simpli-

ﬁes RTM development. It also introduces very mo-

dest energy and latency overheads that have limited

impact on the operation and performance of RTMs.

An open-source C++ implementation is available

In addition to the experiments presented in this pa-

per, the framework has been used to explore tempera-

ture variability of a heterogeneous platform for relia-

bility modelling (Tenentes et al., 2017) and to demon-

strate how application knobs and monitors can pro-

vide additional opportunities for system optimisation

(Leech et al., 2018a). Research is ongoing to provide

further validation of the framework and to integrate

additional applications, devices and RTMs.

ACKNOWLEDGEMENTS

This work was supported by the PRiME programme

grant EP/K034448/1 (http://www.prime-project.org)

and EPSRC grant EP/L000563/1.

Data supporting the results presented in

this paper are openly available from the Uni-

versity of Southampton repository available at

https://doi.org/10.5258/SOTON/D0565.

An open source implementation of the frame-

work can be found at https://github.com/PRiME-

project/PRiME-Framework.

The authors would like to thank Joshua M. Levine

and James R. B. Bantock for their role in the initial

development of the PRiME Framework methodology

and API. The authors would like to acknowledge Mo-

hammad Sadegh Dalvandi and Basireddy Karunakar

Reddy for contributions to the experimental results

and the development of runtime algorithms.

REFERENCES

Baldassari, A., Bolchini, C., and Miele, A. (2017). A Dy-

namic Reliability Management Framework for Hete-

rogeneous Multicore Systems. In IEEE Internatio-

nal Symposium on Defect and Fault Tolerance in VLSI

and Nanotechnology Systems.

Das, A., Shaﬁk, R. A., Merrett, G. V., Al-Hashimi, B. M.,

Kumar, A., and Veeravalli, B. (2014). Reinforcement

An Application- and Platform-agnostic Runtime Management Framework for Multicore Systems

Learning-based Inter- and Intra-application Thermal

Optimization for Lifetime Improvement of Multicore

Systems. In Design Automation Conference.

Fleming, S. T. and Thomas, D. B. (2014). Heterogeneous

Heartbeats: A Framework for Dynamic Management

of Autonomous SoCs. In International Conference on

Field-Programmable Logic and Applications.

Gadioli, D., Palermo, G., and Silvano, C. (2015). Appli-

cation Autotuning to Support Runtime Adaptivity in

Multicore Architectures. In International Conference

on Embedded Computer Systems: Architectures, Mo-

deling and Simulation.

Gaspar, F., Tanic¸a, L., Tom

as, P., Ilic, A., and Sousa, L.

(2015). A Framework for Application-guided Task

Management on Heterogeneous Embedded Systems.

ACM Transactions on Architecture and Code Optimi-

zation, 12(4).

Hoffmann, H., Eastep, J., Santambrogio, M. D., Miller,

J. E., and Agarwal, A. (2010). Application Heartbe-

ats: A Generic Interface for Specifying Program Per-

formance and Goals in Autonomous Computing Envi-

ronments. In International Conference on Autonomic

Computing.

Hoffmann, H., Maggio, M., Santambrogio, M. D., Leva,

A., and Agarwal, A. (2013). A Generalized Software

Framework for Accurate and Efﬁcient Management of

Performance Goals. In International Conference on

Embedded Software.

Hoffmann, H., Sidiroglou, S., Carbin, M., Misailovic, S.,

Agarwal, A., and Rinard, M. (2011). Dynamic Knobs

for Responsive Power-aware Computing. In Interna-

tional Conference on Architectural Support for Pro-

gramming Languages and Operating Systems.

Kim, Y. G., Kim, M., and Chung, S. W. (2017). Enhan-

cing Energy Efﬁciency of Multimedia Applications in

Heterogeneous Mobile Multi-core Processors. IEEE

Transactions on Computers, 66(11).

Leech, C., Bragg, G. M., Balsamo, D., Wachter, E., Mer-

rett, G. V., and Al-Hashimi, B. M. (2018a). Ap-

plication Control and Monitoring in Heterogeneous

Multiprocessor Systems. In International Symposium

on Reconﬁgurable Communication-centric Systems-

on-Chip.

Leech, C., Kumar, C., Acharyya, A., Yang, S., Merrett,

G. V., and Al-Hashimi, B. M. (2018b). Runtime per-

formance and power optimization of parallel disparity

estimation on many-core platforms. ACM Transacti-

ons on Embedded Computing Systems, 17(2).

Maeda-Nunez, L. A., Das, A. K., Shaﬁk, R. A., Mer-

rett, G. V., and Al-Hashimi, B. (2015). PoGo: An

Application-speciﬁc Adaptive Energy Minimisation

Approach for Embedded Systems. In HiPEAC Works-

hop on Energy Efﬁciency with Heterogenous Compu-

ting.

Paone, E., Gadioli, D., Palermo, G., Zaccaria, V., and Sil-

vano, C. (2014). Evaluating Orthogonality between

Application Auto-tuning and Run-time Resource Ma-

nagement for Adaptive OpenCL Applications. In In-

ternational Conference on Application-speciﬁc Sys-

tems, Architectures and Processors.

Rahmani, A. M., Haghbayan, M. H., Miele, A., Liljeberg,

P., Jantsch, A., and Tenhunen, H. (2017). Reliability-

aware runtime power management for many-core sy-

stems in the dark silicon era. IEEE Transactions on

Very Large Scale Integration Systems, 25(2).

Reddy, B. K., Singh, A. K., Biswas, D., Merrett, G. V.,

and Al-Hashimi, B. M. (2017). Inter-cluster Thread-

to-core Mapping and DVFS on Heterogeneous Multi-

cores. IEEE Transactions on Multi-scale Computing

Systems, PP(99):1–1.

Singla, G., Kaur, G., Unver, A. K., and Ogras, U. Y. (2015).

Predictive dynamic thermal and power management

for heterogeneous mobile platforms. In Design, Auto-

mation Test in Europe.

Sui, X., Lenharth, A., Fussell, D. S., and Pingali, K. (2016).

Proactive Control of Approximate Programs. In In-

ternational Conference on Architectural Support for

Programming Languages and Operating Systems.

Tenentes, V., Leech, C., Bragg, G. M., Merrett, G., Al-

Hashimi, B. M., Amrouch, H., Henkel, J., and Das,

S. (2017). Hardware and Software Innovations in

Energy-efﬁcient System-reliability Monitoring. In

IEEE International Symposium on Defect and Fault

Tolerance in VLSI and Nanotechnology Systems.

Vassiliadis, V., Chalios, C., Parasyris, K., Antonopoulos,

C. D., Lalis, S., Bellas, N., Vandierendonck, H., and

Nikolopoulos, D. S. (2016). Exploiting Signiﬁcance

of Computations for Energy-constrained Approximate

Computing. International Journal of Parallel Pro-

gramming, 44(5).

Wang, Z., Tian, Z., Xu, J., Maeda, R. K. V., Li, H., Yang,

P., Wang, Z., Duong, L. H. K., Wang, Z., and Chen,

X. (2017). Modular Reinforcement Learning for Self-

adaptive Energy Efﬁciency Optimization in Multicore

System. In Asia and South Paciﬁc Design Automation

Conference.

Yang, S., Shaﬁk, R. A., Merrett, G. V., Stott, E., Levine,

J. M., Davis, J., and Al-Hashimi, B. M. (2015). Adap-

tive Energy Minimization of Embedded Heterogene-

ous Systems using Regression-based Learning. In In-

ternational Workshop on Power and Timing Modeling,

Optimization and Simulation.

PEC 2018 - International Conference on Pervasive and Embedded Computing