the context of real-time image processing on mo-
bile devices by means of an AR manual. We ex-
amine remote execution in comparison to algorithm
simplification and suggest it for smartphones and
smartphone-based head-mounted displays (HMDs)
such as Google Glass both being tied to performance
and energy constraints.
1.1 Taxonomy of Performance
Augmentation
To enhance a system’s performance one could either
develop better hardware (called hardware augmenta-
tion) or change the software design (software aug-
mentation) (Abolfazli et al., 2012). Please note that
in this case the term “augmentation” does not relate to
the concept of Augmented Reality but to approaches
used to enhance computational power (see Figure 1).
Hardware
Software
autonomous
server-based
Fidelity Adaption
Resource Awareness
Remote Execution
Remote Storage
Figure 1: Excerpt from the taxonomy of performance aug-
mentation. Adapted from (Abolfazli et al., 2012).
Fidelity Adaption, i.e. reducing the level of detail
when a lower fidelity is also sufficient, and Resource
Awareness, i.e. using the cheapest resource only just
able to provide the needed functionality
1
, run entirely
onboard. Remote Execution and Remote Storage are
based on the use of a supplemental remote system to
enhance the performance.
1.2 The Example Use Case
Whether remote execution improves the processing
speed depends on the actual application and the ex-
ecution context (executing mobile device, network
quality, etc.). In this paper we examine mobile
real-time computer vision with a prohibitively time-
consuming processing pipeline. The example that
we use is an Augmented Reality system which has
an extensive image classification approach to provide
context-aware step-by-step workflow assistance (Pe-
tersen and Stricker, 2012; Petersen et al., 2013).
The idea of these so-called AR manuals that guide
through a procedure by displaying instructions in the
user’s field of view is quite old (Caudell and Mizell,
1992). However, those manuals are still not widely
1
Like using the cheap, inaccurate phone cell localization
for a weather forecast service and expensive, accurate GPS
for navigation.
used due to their work-intensive authoring process
and the poor dissemination of suitable hardware (e.g.
head-mounted displays).
The authors of (Petersen and Stricker, 2012) ex-
plain, how the first problem can be solved by auto-
matic derivation of scene classifiers and instructions
from a single reference video.
The second problem can be addressed by using
mobile devices like smartphones and tablets for the
presentation layer. Since the system’s vision-based
approach is computationally demanding, adopting it
for such devices is not an easy task. We implement
a simplified version of their system and compare its
runtime behavior to a second implementation with re-
mote execution. From this comparison we draw con-
clusions about the suitability of remote execution for
mobile real-time computer vision.
2 RELATED WORK
Since mobile devices like smartphones and tablets
continuously replace stationary systems, more and
more applications have to be adopted to those plat-
forms regardless of their computational complexity.
Consequently, multiple mobile applications of com-
puter vision have been introduced in the past.
Reducing the mobile computational load by re-
mote execution to achieve a processing speed ade-
quate for such applications is not a new strategy:
(Chun and Maniatis, 2009) distinguish different sub-
classes of this approach like Primary Functionality
Outsourcing, i.e. retaining simple components on the
client and offloading computationally complex ones
or Background Augmentation, i.e. offloading of a
huge one-time task. With focus on image process-
ing, (Wagner and Schmalstieg, 2003) differentiate be-
tween several client/server interaction types like a thin
client, offloading of pose estimation or offloading of
both pose estimation and classification.
Early work in mobile AR with remote execution
includes (Regenbrecht and Specht, 2000; Gausemeier
et al., 2003), both using the client solely as image
source (thin client) and performing all processing
steps on the server. With the improvement of mo-
bile hardware it became feasible to involve the client
in the computation to reduce network load and overall
processing time. The client in the system of (Gamme-
ter et al., 2010) uses object tracking to minimize the
number of requests to the object recognition server.
(Kumar et al., 2012) propose a client performing both
image tracking and feature extraction before sending
a request but they don’t target interactive frame rates.
RemoteExecutionvs.SimplificationforMobileReal-timeComputerVision
157