Efﬁcient Resource Allocation for Sparse Multiple Object Tracking

Rui Figueiredo

, Jo

ao Avelino

, Atabak Dehban

, Alexandre Bernardino

, Pedro Lima

and Helder Ara

ujo

Institute for Systems and Robotics, Instituto Superior T

ecnico, Lisboa, Portugal

Institute for Systems and Robotics, Universidade de Coimbra, Coimbra, Portugal

{ruiﬁgueiredo, joao.avelino, atabak, alex, pal}@isr.tecnico.ulisboa.pt, helder@isr.uc.pt

Keywords:

Active Sensing, Constrained Resource Allocation, Multiple Object Tracking.

Abstract:

In this work we address the multiple person tracking problem with resource constraints, which plays a funda-

mental role in the deployment of efﬁcient mobile robots for real-time applications involved in Human Robot

Interaction. We pose the multiple target tracking as a selective attention problem in which the perceptual agent

tries to optimize the overall expected tracking accuracy. More speciﬁcally, we propose a resource constrained

Partially Observable Markov Decision Process (POMDP) formulation that allows for real-time on-line plan-

ning. Using a transition model, we predict the true state from the current belief for a ﬁnite-horizon, and take

actions to maximize future expected belief-dependent rewards. These rewards are based on the anticipated

observation qualities, which are provided by an observation model that accounts for detection errors due to

the discrete nature of a state-of-the-art pedestrian detector. Finally, a Monte Carlo Tree Search method is em-

ployed to solve the planning problem in real-time. The experiments show that directing the attentional focci

to relevant image sub-regions allows for large detection speed-ups and improvements on tracking precision.

1 INTRODUCTION

Developing efﬁcient adaptive sensing systems that are

capable of dealing with computational and power lim-

itations as well as timing requirements is of the ut-

most importance in a wide range of ﬁelds, including

automatic surveillance (Sommerlade and Reid, 2010),

sports analysis (Wang and Parameswaran, 2004) and

human-robot interaction (HRI) (Mihaylova et al.,

2002).

In multiple object tracking with resource con-

straints scenarios, the observer’s goal is to predict the

best regions in the visual ﬁeld to attend, in the quest

to evaluate if they pertain to a given set of persons of

interest, and thus to prune the visual search space by

ﬁltering out irrelevant image locations. Current state-

of-the art object detection algorithms are based on ex-

haustive search, sliding window approaches, which

are typically inefﬁcient and agnostic to top-down tem-

poral context.

In this work, we propose a probabilistic frame-

work which poses the multiple object tracking-by-

detection problem as an on-line, resource constrained

decision making, aimed at minimizing the com-

bined targets’ state uncertainty, while coping with

computational processing limitations (see Figure 1).

More speciﬁcally, we pose our decision framework

within the Partially Observable Markov Decision

Processes (POMDPs) domain in order to account for

non-deterministic dynamics and partially observable

states. The derived dynamic resource allocation deci-

sion process combines prior knowledge about the tar-

gets’ state dynamics with accumulated probabilistic

information provided from sequentially gathered ob-

servations, in order to optimize multiple target loca-

tion estimation precision (i.e. minimize tracking un-

certainty). In the proposed formulation, actions are

taken from a low dimensional binary space. This

allows for ﬁnding decision policies in real-time us-

ing on-line, tree-based, planning algorithms for ﬁnite

horizon POMDPs (Ross et al., 2008). Our framework

relies on object detections with associated conﬁdence

measures, obtained from visual information, that are

used to drive the observer’s attentional focus during

multiple object tracking.

Our main contributions are the following. First,

we model the state-dependent uncertainty that arises

during detection due to the discrete nature of the slid-

ing window based detector. Then, we apply an online

Monte Carlo Tree Search method to solve the plan-

ning problem in real-time. The computational bene-

ﬁts of our methodology are demonstrated in a multi-

300

Figueiredo R., Avelino J., Dehban A., Bernardino A., Lima P. and AraÃžjo H.

Efﬁcient Resource Allocation for Sparse Multiple Object Tracking.

DOI: 10.5220/0006173103000307

In Proceedings of the 12th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2017), pages 300-307

ISBN: 978-989-758-227-1

ple person tracking scenario, by combining it with a

state-of-the-art pedestrian detection algorithm (Doll

et al., 2014). Moreover, we note that the proposed

decision making pipeline can be combined with any

general object detection algorithm. All the method-

ologies have been implemented in C++ to make them

suitable for video surveillance or real-time applica-

tions involving robotic platforms provided with vi-

sion.

The remainder of this paper is structured as fol-

lows. In section II we overview some related work

available in the literature. In section III we de-

scribe the various components involved in the pro-

posed adaptive tracking pipeline. In section IV we as-

sess the proposed methodology performance by eval-

uating the balance between efﬁciency (low computa-

tional requirements) and effectiveness in multiple ob-

ject tracking task-execution. Finally, in section V we

wrap up with some conclusions and future work.

2 BACKGROUND

Adaptive sensing is a trendy topic in many ar-

eas including computer vision (Gedikli et al.,

2007), robotics (Spaan et al., 2015) and neuro-

science (Van Rooij, 2008). Like biological systems,

artiﬁcial systems are equipped with limited compu-

tational and energetic resources, and thus, modeling

and replicating the observed mechanisms of selective

attention in humans, is of primordial importance to

develop more efﬁcient and robust strategies for visual

tasks.

Current state-of-the art object detectors are based

on expensive binary classiﬁers which typically op-

erate over the full image space, in a sliding win-

dow manner. When combined with fast bottom-

up saliency-based approaches that generate object

bounding box proposals, the overall detection process

becomes more efﬁcient (Zitnick and Doll

ar, 2014),

since regions that are unlikely to contain objects are

discarded for further processing. However, these ap-

proaches are agnostic to object dynamics, and are

solely based on low-level visual features.

Resource-constrained adaptive sensing, is within

a different line of research, and accounts for dynam-

ical uncertain environments and noisy sensors for se-

quential decision making. The temporal integration

of continuously gathered noisy detections is used to

predict future environment states and decide, in a top-

down manner, where to allocate the limited sensing

resources, according to some task-related goal. It has

been shown that adaptive sensing improves not only

processing efﬁciency but also estimation robustness

when compared to non-adaptive approaches (Malloy

and Nowak, 2014).

Adaptive sensing problems can be formulated as

POMDPs (Ahmad and Yu, 2013)(Butko and Movel-

lan, 2010)(Chong et al., 2008) that, depending on the

way they compute the policies, belong to two differ-

ent paradigms: Ofﬂine methods compute full policies

before run time. Despite achieving remarkable perfor-

mance in visual search tasks, these often require the

evaluation of many possible situations, via backward

induction, and hence take a considerable amount of

time (e.g. hours). Online decision approaches avoid

the computational burden of computing full policies

for many situations, by departing from the current be-

lief state and simulating future rewards for a ﬁnite

planning horizon (Ross et al., 2008).

Within the online POMDP domain, the work clos-

est to ours is the one in (Chong et al., 2008), which

proposed a formulation for general adaptive sensing

problems. The authors applied rollout techniques

which are guaranteed to improve upon a provided

base policy, that may be hard or impossible to com-

pute. Rollout techniques evaluate the candidate ac-

tions, by running many Monte-Carlo simulations and

returning the action with the best average outcome.

In this work we rely on a different, widely

known algorithm named Monte Carlo Tree Search

(MCTS) (Browne et al., 2012), which has recently

been given much attention by the Artiﬁcial Intelli-

gence community due to its outstanding performance

in the game Go (Gelly et al., 2012). MCTS combines

tree search with randomized rolllout simulations, be-

ing ideal for decision making under uncertainty. To

our knowledge we are the ﬁrst to apply an online

tree-based POMDP solver in a stochastic resource-

constrained multiple object tracking scenario.

3 ADAPTIVE SENSING:

PROBABILISTIC MULTIPLE

PERSON TRACKING UNDER

RESOURCE CONSTRAINTS

A POMDP for general active sensing can be deﬁned

as a 6-element tuple (X , A,Y , T,O,R) where X , A

and Y denote the set of the possible environment

states, perceptual actions and observations, respec-

tively. State transitions are modeled as a Markov

process and represented by the probability distribu-

tion function (pdf) T (x

t−1

) = p(x

t−1

). Observa-

tions are generated from states according to the pdf

O(x

) = p(y

Under the resource-constrained adaptive sensing

Efﬁcient Resource Allocation for Sparse Multiple Object Tracking

301

Search Regions Proposals

...

Sliding Window-Based Detections

Probabilistic Multiple Pedestrian Tracking

...

Figure 1: The proposed resource-constrained multiple pedestrian tracking pipeline. Given a set of persons being tracked, our

decision making algorithm decides which sub-regions of the visual scene to attend. Then, a sliding window-based detector

is applied to the selected search regions, instead of the whole image. For each region a winning candidate is obtained via

maximum suppression and fed to the associated tracker with probabilistic measures queried from the observation model.

domain, the goal of the planning agent is to de-

vise control strategies that generate perceptual actions

from belief states, such that some intrinsic cumulative

reward is maximized, while accounting for perceptual

limitations. In the rest of this section we describe our

resource-constrained POMDP formulation for multi-

ple pedestrian tracking scenarios.

Let us consider a set of targets indexed by K =

{1,...,K}, being tracked in a 2D image plane I , with

state x

∈ X ⊂ R

given by

k,c

k,s

(1)

where x

k,c

= (u, v) and x

k,s

represent the bounding box

centroid image coordinates and scale, respectively.

Moreover, let us assume a stationary Markov chain

p(x

t−1

) in order to model the object’s state transi-

tion between consecutive frames. Similarly to (Be-

wley et al., 2016) we assume sparsity-in-space and

independence among targets, and a linear constant-

velocity dynamics model, which is a good approxima-

tion for targets that move with low acceleration in 3D

and are not too close to the image plane. Finally, we

assume that the targets’ states are partially observable

and statistically explained by the observation model

distribution p(y

3.1 Recursive Bayesian Estimation

Object tracking can be achieved by means of recursive

Bayesian estimation, according to

def

=p(x

)

=ηp(y

)

(2)

where b

represents the belief posterior probability

over the target state x

, given the set of all gathered

observations y

taken up to time t, η is a normaliz-

ing factor and

p(x

t−1

(3)

represents the belief after the prediction step. Further-

more, we assume Gaussian state transition and ob-

servation noises and hence tracking is optimally per-

formed using K independent Kalman ﬁlters. At each

time instant, each Kalman ﬁlter provides a parametric

posterior probability distribution function (pdf) over

the target state

= N ( ˆx

,Σ

) (4)

where

ˆx

k,c

ˆx

k,s

(5)

is the estimated state and

k,c

0 σ

k,s

(6)

is the error covariance matrix. Note that here we con-

sider a diagonal covariance matrix and aggregate the

centroid components in order to ease the notation.

3.2 Observation Model

The observations provided by the object detector are

localized bounding boxes, obtained with a pedestrian

detection algorithm. More speciﬁcally, at each time

instant the agent collects a set of observations

,k = 1,...,K

(7)

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

302

each corresponding to a noisy projection of the k tar-

get state.

Detection noise has several origins, the easiest to

model being the one originated by the discrete nature

of the detector. The noise affecting the center of a

bounding box ε

k,c

has two origins, both depending

on the scale of the bounding boxes: ε

, the error due

to the sliding window process and ε

, the error due to

the uncertainty of the size of the bounding box. The

value of sliding window jumps Q

depends on the

scale of the detection:

= s

(0) (8)

where Q

is the number of pixels between two con-

secutive sliding window positions at scale n and s

the value of scale n, deﬁned as:

= 2

(9)

where N is the number of scales per octave. The

present implementation of the detector has N = 8.

The value of the jumps of the bounding box

center-bottom due to scale change, depend on the

scale. The number of pixels is given by Q

scx

and

scy

, for the x and y coordinates, respectively. In the

worst case scenario, a jump from the actual scale to

the coarsest one, these values are given by:

scx

= w

n+1

− 2

)/2 (10)

scy

= h

n+1

− 2

)/2 (11)

where w

and h

are the width and the height of the

smallest bounding box (n = 0).

Assuming a Gaussian distribution for these quan-

tization errors, the statistics of ε

are given by





, Σ

)

0 (Q

)

(12)

Regarding ε

, we approximate the statistics of

these errors by the worst case which is given by





, Σ

≈

scx

)

0 (Q

scy

)

(13)

Since both sources of noise are independent but

not additive, our observation model considers the

largest one at each time. This yields the ﬁnal image

observation error ε

∼ N (0,Σ

) (14)

where

= max(Σ

,Σ

) (15)

3.3 Dynamic Search Regions

Let us now consider different time-varying (dynamic)

regions of interest (i.e. bounding boxes) to be at-

tended, each delimiting a target instance hypothesis

[

k∈K

where u

⊂ X (16)

Search regions are deterministically and analytically

determined from beliefs according to the following

mapping function

ˆx

,Σ

→ u

(17)

which is deﬁned as follows

ˆx

k,c

− α

k,c

, ˆx

k,c

+ α

k,c

× (18)

ˆx

k,s

− α

k,s

, ˆx

k,s

+ α

k,s

(19)

where α

and α

are user selected parameters that

control the width of the conﬁdence bounds and thus

the size of the search regions. This deﬁnition accounts

for the conﬁdence level of the true target state being

within the search region. The user selected parame-

ters permit balancing the trade-off between accuracy

and allocation effort (larger vs smaller regions).

Furthermore, we assume that each region has a de-

terministic, time-varying binary activation state

A = {a

∈ B, k ∈ K } = B

(20)

where B = {0, 1} with 0 and 1 meaning ”not process-

ing” and ”processing”, respectively. Decision making

is therefore performed in a ﬁnite multi-dimensional

binary action space and involves selecting which sub

regions of the image space to apply the sliding win-

dow detector to perform measurement update steps.

The belief becomes dependent on actions as follows

) =

(

if a

= 0

ηp(y

)

if a

= 1

(21)

where η is a normalizing constant. For attended re-

gions, the predicted belief is approximated by the ex-

pected expected observation uncertainty given by the

observation model, over a ﬁnite set of space points

corresponding to detection windows Y

⊂ X in the

search region k, according to

) ≈ c

∑

i=1

p(y

k,i

)

k,i

if a

= 1 (22)

where c is a normalizing constant, |Y

| is the num-

ber of detection windows and

k,i

= p(x

). Each

p(y

k,i

) is queried on-line from the learned observa-

tion model. Assessing multiple x

∈ u

instead of just

ˆx

should better approximate the error distribution.

Efﬁcient Resource Allocation for Sparse Multiple Object Tracking

303

3.4 Resource Constrained POMDP with

Belief-dependent Rewards

As previously noted the decision making involved in

resource constrained multiple target tracking scenar-

ios can be formulated within the POMDP framework.

The perceptual agent tries to minimize tracking uncer-

tainty by prioritizing its limited attentional resources

to promising image regions. The instantaneous re-

ward function should thus reﬂect the action contri-

bution to maximizing the information regarding the

targets’ states. Similarly to (Araya et al., 2010) let us

deﬁne the instantaneous reward at time t as the nega-

tive entropy of the belief state, given by the following

expectation

r(b

)) =

logb

(23)

For Gaussian beliefs this reward becomes simply

given by

r(b

)) ≈ − log(|Σ

|) (24)

Inspired by the evidence of visual processing ca-

pacity limitations in humans (Xu and Chun, 2009), we

formulate the proposed resource constrained informa-

tion maximization as follows:

maximize

= E

∑

τ=1

∑

k=1

r(b

t+τ

))

subject to

∑

k=1

t+τ

≤ K

max

∀

τ∈{1,...,T }

∑

k=1

t+τ

|≤ S

max

∀

τ∈{1,...,T }

where T is the planning horizon, E [·] is the expecta-

tion operation, r(·) is the reward function, γ ∈ ]0,1]

is a discount factor, |u

t+τ

| is the area of the k search

region, K

max

is the maximum region-based activation

capacity, A

is the image pixel area and S

max

is the rel-

ative maximum image area that the visual system may

process per time-instant. The ﬁrst constraint reﬂects

short-term memory limitations and allows reducing

the action space (assuming K

max

< K), and thus the

branching factor during planning. The second is mo-

tivated by computational effort and timing limitations

that arise during visual processing and contributes to

prune infeasible planning tree branches, by prioritiz-

ing resources to higher uncertainty targets.

3.5 Monte Carlo Tree Search (MCTS)

The MCTS algorithm relies on Monte-Carlo simula-

tions to assess the nodes of a search tree in a best-

ﬁrst order, by prioritizing the expansion of the most

promising nodes according to their expected reward.

In a nutshell, the algorithm runs Monte Carlo sim-

ulations from the current belief state (i.e. input root

node), and progressively builds a tree of belief states

and outcomes. In the end, the most promising ac-

tion is returned. Each run comprises four phases (see

Fig. 2):

1. Selection: In the selection step a sequence of

actions are chosen within the search tree. Tree

descending is performed from the root until a

leaf node is reached. Action selection is typi-

cally carried out using an algorithm named Up-

per Conﬁdence Bounds for Trees (UCT) (Kocsis

and Szepesv

ari, 2006), which elegantly balances

the exploration-exploitation trade-off, during ac-

tion selection. On the one hand, based on the cur-

rent accumulated simulated knowledge, the plan-

ning agent should select actions that may lead to

the best immediate payoffs (exploitation). On the

other hand, the agent should select unexplored ac-

tions since they may yield better long-term out-

comes;

2. Expansion: an action that leads to an unvisited

node is selected and the resulting expanded leaf

node is appended to the tree;

3. Simulation: From the expanded node, actions are

taken randomly in a Monte-Carlo depth-ﬁrst man-

ner, until a predeﬁned horizon or a terminal state

is reached. Simulation depth (i.e. time hori-

zon) is typically ﬁxed, to deal with real-time con-

straints. Since sampling from a uniform distri-

bution over actions may be suboptimal, problem

speciﬁc knowledge should be incorporated to give

larger sampling probabilities to more promising

actions. In our speciﬁc problem, we bias this sam-

pling distribution such that regions with higher

entropy are prioritized.

4. Back-propagation: Finally, the simulation re-

wards are back-propagated to the root node. This

includes updating the reward rate stored at each

node along the way.

Finally, runs are repeated until a computational bud-

get (i.e. a triggering timeout or a maximum number

of iterations) is reached, and the best action from the

root node is selected.

3.5.1 Upper Conﬁdence Bounds for Trees (UCT)

The idea of using Upper Conﬁdence Bounds (Auer

et al., 2002) on rewards to deal with the exploration

exploitation dilemma in the face of uncertainty, has

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

304

Figure 2: Monte Carlo Tree Search (image taken from (Browne et al., 2012)).

been widely applied to reinforcement learning prob-

lems. In MCTS, Upper Conﬁdence Bounds for Trees

(UCT) are typically employed in the selection phase,

while descending the tree. The upper conﬁdence

bound accounts for the currently estimated value of

the action, and the estimated UCT variance, accord-

ing to

UCT (a) = r + c

logn

(25)

where r is the estimate for the value of the action

based on the simulated payoffs, n

is the number of

times the node has been visited, and n

is the num-

ber of times an action a has been tried from that node.

The constant c is a problem-dependent parameter that

balances the exploration-exploitation trade-off.

4 EXPERIMENTS

In order to evaluate the proposed resource-

constrained tracking approach we performed a

set of experiments on the TUD-Stadtmitte MOTChal-

lenge dataset (Leal-Taix

e et al., 2015), which allows

to evaluate tracking performance with the CLEAR

MOT metrics and known ground truth (Bernardin

and Stiefelhagen, 2008). This dataset comprises a

video sequence of 179 images, acquired with a static

camera with 640 × 480 image resolution. An average

of 8 pedestrians are present in the visual ﬁeld, during

the video. To quantitatively assess the performance

of our methodologies we focused our evaluation in

the time speed-up gains and in the multiple object

tracking precision (MOTP), which is the total error

in estimated position for matched object-hypothesis

pairs over all frames, averaged by the total number of

matches:

MOTP =

∑

i,t

∑

(26)

where d

∈ [0, 100] quantiﬁes the amount of overlap

(in percentage) between the true object o

and its as-

sociated hypothesis bounding boxes, and where c

the number of matches found for time t. The MOTP

shows the ability of the tracker to keep consistent tra-

jectories.

Our aim was to investigate the performance of the

proposed methodologies dependency on the resource-

constraints. We considered the following activa-

tion capacities K

max

∈ {1,2,3,4,5} and maximum

processing image areas S

max

∈ [0.1,1.0]. Since the

MCTS method is randomized, we performed 100 tri-

als for each combination of parameters. The region

size parameters where found empirically and were set

to α

= α

= 1. At each time step, the MCTS plan-

ning root node was set to the current tracking belief,

and the algorithm was allowed to run for 10ms. Fi-

nally, the simulation step depth was set to 3 and γ

was set to 0.9. The association between detections

and trackers was performed with the Hungarian Al-

gorithm (Burkard et al., 2009) using the Mahalanobis

distance. The tracking process is bootstrapped in the

ﬁrst frame, by applying the pedestrian detector to the

whole image and instantiating a tracker for each de-

tection. These trackers are kept during the entire

video sequence, and every non-assigned detection is

discarded, i.e., trackers are not further created.

The results presented in Figure 3 demonstrate that

planning future resource allocations in a constrained

setting, improves simultaneously detection times and

tracking precision, when compared with the baseline,

Efﬁcient Resource Allocation for Sparse Multiple Object Tracking

305

max

1 2 3 4

Gain

max

=1.0

max

=0.7

max

=0.4

max

=0.1

max

0.2 0.4 0.6 0.8 1

Gain

max

1 2 3 4

MOTP (%)

max

0.2 0.4 0.6 0.8 1

MOTP (%)

Figure 3: Speed-up gains and resulting Multiple Object Tracking Precision (MOTP). Bottom-row: Dashed black line repre-

sents the baseline full-window detector.

full-window detector.

As illustrated by the temporal gain plots (ﬁrst row

of Figure 3), our method achieves detection times

around 12 times faster than the baseline detector

applied to the full-window (0.02 against 0.24 sec-

onds, for S

max

= 0.1), with comparable tracking per-

formance. Furthermore, the MOTP metric results

demonstrate that, on the one hand, constraining the at-

tention to regions with high probability of pertaining

a person, allows to improve detection accuracy and to

reduce the possibility of erroneous detections in the

targets’ vicinities, which may lead to bad detection-

tracker associations and hence degrade tracking preci-

sion. On the other hand, ignoring regions that are un-

likely to contain a person allows to reduce the number

of spurious wrong detections (i.e. False positives) that

may also contribute to tracking performance degrada-

tion.

In conclusion, in the constrained setting the allo-

cation of more computational resources yields better

tracking precision, at the cost of increased computa-

tional effort. Therefore, depending on the application

requirements, this trade-off can be easily balanced

by carefully selecting the K

max

and S

max

resource-

constraints.

5 CONCLUSIONS AND FUTURE

WORK

In this paper we have addressed the multiple object

tracking (MOT) problem with constrained resources,

which plays a fundamental role in the deployment of

efﬁcient mobile robots for real-time applications in-

volved in HRI. We have framed the multiple object

tracking within the POMDP domain and proposed

a problem formulation that allows for on-line, real-

time, planning with a state-of-the-art Monte Carlo

Tree Search methodology. The results presented in

this work show that directing the attentional focci to

important image sub-regions allows for large detec-

tion speed-ups improvements on tracking precision.

The major limitation of our approach is still its

incapacity of dealing with non-sparse targets. In the

future, data association should also be considered dur-

ing planning by integrating data association method-

ologies such as joint probabilistic data-association

(JPDA) (Hamid Rezatoﬁghi et al., 2015). Another

shortcoming of our methodology is its incapacity of

locating new pedestrians appearing on the scene, in

an efﬁcient manner. However, this can be easily over-

come by considering proposals generated by bottom-

up saliency methods.

Finally, we note that the targets’ dynamics and the

observation distributions are extremely non-linear and

non-Gaussian. Therefore, a mixture of particle ﬁl-

ters (Okuma et al., 2004) would be more appropriate

VISAPP 2017 - International Conference on Computer Vision Theory and Applications

306

for our particular problem, and hence improve track-

ing accuracy at the cost of some additional computa-

tional effort.

ACKNOWLEDGMENT

This work has been partially supported by the

Portuguese Foundation for Science and Tech-

nology (FCT) project [UID/EEA/50009/2013].

Rui Figueiredo is funded by FCT PhD grant

PD/BD/105779/2014.

REFERENCES

Ahmad, S. and Yu, A. J. (2013). Active sensing as

bayes-optimal sequential decision making. CoRR,

abs/1305.6650.

Araya, M., Buffet, O., Thomas, V., and Charpillet, F.

(2010). A pomdp extension with belief-dependent re-

wards. In Advances in Neural Information Processing

Systems, pages 64–72.

Auer, P., Cesa-Bianchi, N., and Fischer, P. (2002). Finite-

time analysis of the multiarmed bandit problem. Ma-

chine learning, 47(2-3):235–256.

Bernardin, K. and Stiefelhagen, R. (2008). Evaluating mul-

tiple object tracking performance: the clear mot met-

rics. EURASIP Journal on Image and Video Process-

ing, 2008(1):1–10.

Bewley, A., Ge, Z., Ott, L., Ramos, F., and Upcroft, B.

(2016). Simple online and realtime tracking. CoRR,

abs/1602.00763.

Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M.,

Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez,

D., Samothrakis, S., and Colton, S. (2012). A survey

of monte carlo tree search methods. IEEE Transac-

tions on Computational Intelligence and AI in Games,

4(1):1–43.

Burkard, R., Dell’Amico, M., and Martello, S. (2009). As-

signment Problems. Society for Industrial and Ap-

plied Mathematics, Philadelphia, PA, USA.

Butko, N. J. and Movellan, J. R. (2010). Infomax control

of eye movements. Autonomous Mental Development,

IEEE Transactions on, 2(2):91–107.

Chong, E. K., Kreucher, C. M., and Hero, A. O. (2008).

Monte-carlo-based partially observable markov deci-

sion process approximations for adaptive sensing. In

Discrete Event Systems, 2008. WODES 2008. 9th In-

ternational Workshop on, pages 173–180. IEEE.

Doll

ar, P., Appel, R., Belongie, S., and Perona, P. (2014).

Fast feature pyramids for object detection. PAMI.

Gedikli, S., Bandouch, J., von Hoyningen-Huene, N.,

Kirchlechner, B., and Beetz, M. (2007). An adaptive

vision system for tracking soccer players from vari-

able camera settings. In Proceedings of the 5th In-

ternational Conference on Computer Vision Systems

(ICVS).

Gelly, S., Kocsis, L., Schoenauer, M., Sebag, M., Silver,

D., Szepesv

ari, C., and Teytaud, O. (2012). The grand

challenge of computer go: Monte carlo tree search and

extensions. Communications of the ACM, 55(3):106–

113.

Hamid Rezatoﬁghi, S., Milan, A., Zhang, Z., Shi, Q., Dick,

A., and Reid, I. (2015). Joint probabilistic data asso-

ciation revisited. In Proceedings of the IEEE Interna-

tional Conference on Computer Vision, pages 3047–

3055.

Kocsis, L. and Szepesv

ari, C. (2006). Bandit based monte-

carlo planning. In European conference on machine

learning, pages 282–293. Springer.

Leal-Taix

e, L., Milan, A., Reid, I., Roth, S., and Schindler,

K. (2015). MOTChallenge 2015: Towards a bench-

mark for multi-target tracking. arXiv:1504.01942

[cs]. arXiv: 1504.01942.

Malloy, M. L. and Nowak, R. D. (2014). Near-optimal

adaptive compressed sensing. IEEE Transactions on

Information Theory, 60(7):4001–4012.

Mihaylova, L., Lefebvre, T., Bruyninckx, H., Gadeyne, K.,

and Schutter, J. D. (2002). Active sensing for robotics

- a survey. In in Proc. 5 th Intl Conf. On Numerical

Methods and Applications, pages 316–324.

Okuma, K., Taleghani, A., De Freitas, N., Little, J. J., and

Lowe, D. G. (2004). A boosted particle ﬁlter: Multi-

target detection and tracking. In European Conference

on Computer Vision, pages 28–39. Springer.

Ross, S., Pineau, J., Paquet, S., and Chaib-Draa, B. (2008).

Online planning algorithms for pomdps. Journal of

Artiﬁcial Intelligence Research, 32:663–704.

Sommerlade, E. and Reid, I. (2010). Probabilistic surveil-

lance with multiple active cameras. In Robotics and

Automation (ICRA), 2010 IEEE International Confer-

ence on, pages 440–445. IEEE.

Spaan, M. T., Veiga, T. S., and Lima, P. U. (2015).

Decision-theoretic planning under uncertainty with

information rewards for active cooperative percep-

tion. Autonomous Agents and Multi-Agent Systems,

29(6):1157–1185.

Van Rooij, I. (2008). The tractable cognition thesis. Cogni-

tive science, 32(6):939–984.

Wang, J. R. and Parameswaran, N. (2004). Survey of sports

video analysis: research issues and applications. In

Proceedings of the Pan-Sydney area workshop on Vi-

sual information processing, pages 87–90. Australian

Computer Society, Inc.

Xu, Y. and Chun, M. M. (2009). Selecting and perceiving

multiple visual objects. Trends in cognitive sciences,

13(4):167–174.

Zitnick, C. L. and Doll

ar, P. (2014). Edge boxes: Locating

object proposals from edges. In ECCV.

Efﬁcient Resource Allocation for Sparse Multiple Object Tracking

307