Pluggable Drone Imaging Analysis Framework for Mob Detection during

Open-air Events

Jerico Moeyersons, Brecht Verhoeve, Pieter-Jan Maenhaut, Bruno Volckaert and Filip De Turck

IDLab, Department of Information Technology at Ghent University - imec

Zwijnaarde-Technologiepark 15 B-9052 Ghent, Belgium

Keywords:

Drone Thermal Imaging, Video Streaming, Framework, Microservices, Object Detection, Plugin.

Abstract:

Drones and thermal cameras are often combined within applications such as search and rescue, and ﬁre ﬁght-

ing. Due to vendor speciﬁc hardware and software, applications for these drones are hard to develop and

maintain. As a result, a pluggable drone imaging analysis architecture is proposed that facilitates the develop-

ment of custom image processing applications. This architecture is prototyped as a microservice-based plugin

framework and allows users to build image processing applications by connecting media streams using mi-

croservices that connect inputs (e.g. regular or thermal camera image streams) to image analysis services. The

prototype framework is evaluated in terms of modiﬁability, interoperability and performance. This evaluation

has been carried out on the use case of detecting large crowds of people (mobs) during open-air events. The

framework achieves modiﬁability and performance by being able to work in soft real-time and it achieves the

interoperability by having an average successful exchange ratio of 99.998%. A new dataset containing thermal

images of such mobs is presented, on which a YOLOv3 neural network is trained. The trained model is able to

detect mobs on new thermal images in real-time achieving frame rates of 55 frames per second when deployed

on a modern GPU.

1 INTRODUCTION

Throughout history, having an overview of the en-

vironment from high viewpoints held many beneﬁts.

Nowadays, drones are relatively cheap and provide a

rapid manner to get an overview of a speciﬁc area.

The advent of drones and advanced cameras provides

low-cost aerial imaging that creates numerous novel

application opportunities. Traditional cameras for the

visible light spectrum offer high quality images, but

are limited to daytime or artiﬁcially lighted scenes.

Thermal cameras measure thermal radiation of ob-

jects in a scene and can therefore operate in utter dark-

ness, revealing information not visible to the normal

eye (Gade and Moeslund, 2014). The combination of

drones and thermal cameras is used in many different

applications such as geography (Harvey et al., 2016),

agriculture (Bendig et al., 2012), search and res-

cue (Rivera et al., 2017), wildlife monitoring (Chris-

tiansen et al., 2014), disaster response (Gonnissen,

2016), maintenance (Workswell, 2016), etc.

Several vendors offer thermal camera products,

some speciﬁcally designed for drone platforms.

These cameras often use different image formats,

color schemes and interfaces (Gade and Moeslund,

2014). This leads to issues if applications want to

change the camera that is used, or when the camera

is no longer supported by the vendor, because differ-

ent software needs to be built to interact with the new

camera, which often is a non-negligible cost. This

leads to a problem called vendor lock-in that makes

customers dependent on a certain vendor as they can-

not switch product without making substantial costs, a

problem already very tangible for cloud-based appli-

cations (Satzger et al., 2013). Applications across var-

ious ﬁelds often have slightly different functional and

non-functional requirements. As a precursor, several

Belgian ﬁre ﬁghting departments were asked for re-

quirements for a thermal drone platform application.

It quickly became clear that they had various prob-

lems that needed to be solved, such as ﬁnding zones

with potential danger of explosions (due to the combi-

nation of high temperature and explosive goods stored

in the vicinity), measuring temperatures in containers,

identifying hot entrances (to counter backdraft e.g.

when opening a door), detecting invisible methane

ﬁres, ﬁnding missing persons, etc. Some use cases

need to be evaluated in real-time (during ﬁres), while

Moeyersons, J., Verhoeve, B., Maenhaut, P., Volckaert, B. and De Turck, F.

Pluggable Drone Imaging Analysis Framework for Mob Detection during Open-air Events.

DOI: 10.5220/0007260400640072

In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), pages 64-72

ISBN: 978-989-758-351-3

others focus on accuracy (e.g. inspections). An appli-

cation should therefore be able to quickly implement

new detection and analysis features to meet these re-

quirements. Due to the lack of modiﬁability present

in current software solutions, applications built on

top of aerial thermal imaging remain largely niche

and vendor/equipment-locked (Divya, 2017). Appli-

cations would beneﬁt from a software framework fo-

cused on modiﬁability and interoperability, to aid in

developing technology-agnostic end-to-end solutions

connecting thermal cameras to different image analy-

sis / detection modules.

This paper presents the requirements for such a

framework and proposes a suitable software architec-

ture. To test the viability of the architecture, a proof-

of-concept prototype is implemented and evaluated

against the initial requirements. As a use case, de-

tection of large crowds (so-called mobs) during open-

air events is investigated. Monitoring crowds dur-

ing open-air events is important, as mobs can cre-

ate potentially dangerous situations through bottle-

necks, blocking escape routes, etc. Detection of mobs

can also be used for rapid identiﬁcation of ﬁghts

and brawls. Through monitoring and detecting these

mobs, these situations can be identiﬁed before they

become problematic (Steffen and Seyfried, 2010).

The remainder of this paper is organized as fol-

lows. Section 2 presents similar projects on the topic

of modiﬁable image analysis frameworks and thermal

object detection. Section 3 presents the requirements

of the framework and the software architecture de-

signed from these requirements. Section 4 presents

the implementation of the framework prototype. The

mob detection experiment is described in Section 5.

Evaluation of the framework and results of the mob

detection experiment are presented in Section 6. Fi-

nally, Section 7 draws conclusions from this research

and indicates where future efforts in this ﬁeld should

go to.

2 RELATED WORK

The Irish start-up DroneSAR (Slattery, 2017) devel-

oped a search-and-rescue (SAR) drone platform al-

lowing users to stream live images and video from a

drone as it conducts a search for missing persons. The

platform works with any camera, visual and thermal

but focuses on drones from vendor DJI, DroneSAR’s

industry partner. The solution proposed in this paper

is not restricted to only DJI drones and can be em-

ployed for different use-cases.

Amazon introduced the Amazon Kinesis

VideoStreams platform in January 2018 as a

new service for the Amazon Web Services (AWS)

cloud platform. It allows users to stream live video

from devices to the AWS cloud and build applications

for real-time video processing (Inc., 2018). While the

Amazon Kinesis VideoStreams platform certainly has

interesting features, it imposes a vendor lock-in as it

relies solely on Amazon Cloud services, whereas the

open-source solution proposed here can be deployed

on a variety of (cloud) resources and it can even be

deployed locally when there is no up-link to the cloud

available.

The VIPER project by EAVISE and KU Leu-

ven investigated how thermal and video images could

be used for real-time detection of persons using

object detection algorithms based on deep learn-

ing (Goedeme, 2017). Our work evaluated a mob de-

tection experiment based on the YOLOv3 (Redmon

and Farhadi, 2018) object detection algorithm, using

a deep neural network. In previous work, a drone

was used to detect objects in a predeﬁned incident

area such as barrels containing explosive ﬂuid, ﬁres,

ﬁreﬁghters, etc. (Tijtgat et al., 2017a; Tijtgat et al.,

2017b). The framework proposed in this paper offers

an abstraction allowing developers to connect micro-

service based ﬂows in terms of input (e.g. thermal

video, regular video, images), processing (e.g. object

detection, classiﬁcation, localization) and output (e.g.

REST API, custom applications).

3 REQUIREMENTS AND

SOFTWARE ARCHITECTURE

3.1 Function Requirements

Three general actors are identiﬁed for the framework:

an end-user that wants to build an image process-

ing application for a speciﬁc use case, input sen-

sor developers integrating new cameras / streams into

the framework, and stream processing module devel-

opers integrating new analysis/detection algorithms

(e.g. analysis, detection, localization, decision sup-

port) into the framework so that end-users can use

them to build their applications. An end-user should

be able to construct an image processing application

by interconnecting predeﬁned camera and analysis

modules. Hence-built applications should be adapt-

able to new use cases by adding or replacing select

modules, retaining the overall functionality of the ap-

plication. Camera and analysis module developers

should be able to add and distribute new camera in-

tegration modules (e.g. new type of thermal camera

with support for per-pixel labeling of temperatures)

Pluggable Drone Imaging Analysis Framework for Mob Detection during Open-air Events

Figure 1: Component-connector overview of the frame-

work. The clear components are the core components of the

framework that each user needs to install to use the frame-

work. The colored components are used for the distribution

of plugins.

and analysis modules (e.g. detection of people at risk

of being crushed during mass-events) to the frame-

work. This allows end-users to focus on the use case,

instead of the technical details of the hardware plat-

forms or algorithms, and allows them to have a wider

selection of hardware and algorithms.

3.2 Non-functional Requirements

Interoperability, modiﬁability and performance are

identiﬁed as the architecturally signiﬁcant require-

ments. Interoperability speciﬁes that the framework

should be able to interact with various cameras and

analysis software modules via interfaces. The amount

of systems the framework can successfully interact

will increase the business value of the framework, as

end-users can use more devices via the framework to

build applications. The framework needs to be ex-

tendable with new thermal cameras and analysis mod-

ules. Applications built with the framework should

be modiﬁable to integrate new hardware and soft-

ware. The available hardware on-site for use cases

such as forest ﬁre monitoring is not always power-

ful enough to support heavy image analysis software.

The framework should therefore be able to deploy in

a distributed fashion, to allow more computationally

expensive operations to be executed on more powerful

(potentially cloud-based) remote devices. Some use

cases such as live incident response require soft real-

time streaming of video and manipulation of these

video streams, which should be supported for the

framework to be relevant.

3.3 Software Architecture

An architectural pattern analysis based on the require-

ments presented in Section 3.2 was conducted, from

which a combination of the microservices and mi-

crokernel pattern was selected as the base pattern for

the software architecture. The microservice architec-

ture is an approach to developing a single application

as a suite of small services, each running in its own

process and communicating with lightweight mecha-

nisms, often a HTTP resource API (Lewis and Fowler,

2014). The microkernel pattern enables the frame-

work to be extended via a plugin system and allows

the framework to be deployed in a distributed and

scalable manner.

Figure 1 presents a high-level overview of the ar-

chitecture. End-users interact with the framework via

the Command Line Interface (CLI), a textual inter-

face. Commands are forwarded to the Stream module

that manages the layout and state of the components

in the media pipeline (stream) that the end-user builds

for his use case. To activate and place the components

in a certain layout, the Stream module connects with

the Producer and Consumer components that control

the plugins that process the media. Producer plugins

are devices that produce media, such as thermal cam-

eras. Consumer plugins process and consume media,

such as analysis software and displays. Once a stream

is established, the plugins forward media to each other

in the layout speciﬁed by the Stream module. Sup-

port for new cameras and analysis modules can be

added as plugins to the Producer/Consumer Distribu-

tion components that distribute this software so that

end-users can download and install the plugins. Each

module in the architecture is a microservice, allowing

for distributed deployment and enabling the modiﬁa-

bility requirements through interchangeable modules.

Cameras and analysis modules are realized as plugins

for the Producer/Consumer modules implemented as

a microkernel. This allows the framework to easily

build new streams for different use cases and inter-

change components when needed.

3.3.1 Plugin Model

Figure 2 depicts the model of a general framework

plugin. A plugin is part of the microkernel pattern and

deﬁnes three interfaces: a source media endpoint to

receive media from different sources, a listener end-

point to forward the processed media to other listen-

ing plugins and an API for framework control. The

framework uses the API to change which sources and

listener a plugin has and to manage its state. By link-

ing plugins together by conﬁguring the source and

listener resources, the framework can build a media

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

Figure 2: Schematic overview of a plugin.

Figure 3: State transition diagram of a plugin.

processing stream pipeline. Producer plugins have no

sources since they produce media. States are used

to stop and start the media processing of the plug-

ins in the stream. A plugin has the following states:

INACTIVE, PLAY, STOP and PAUSE. Figure 3 de-

picts the state transition diagram for a plugin. A plu-

gin implements the visible states STOP, PAUSE and

PLAY describing if the media processing of the plu-

gin is stopped, paused or processing respectively. The

INACTIVE state is only visible to the framework as

it indicates that there is no active process that runs

the plugin. This is the initial state of a plugin in the

framework. When a plugin is added to a stream, the

plugin microservice is started, transitions to the STOP

state and waits for commands. The REST paradigm

is selected to build this API, with /state, /sources and

/listeners resources that need to be minimally imple-

mented.

3.3.2 Network Topology and Communication

Protocol

The microservices of the framework (Producer,

Steam, Consumer and CLI) and the plugins need a

communication protocol to exchange commands and

data. For sending the commands the HTTP/TCP com-

munication is used. This ensures that commands are

always received and acknowledged increasing the re-

liability of the communication. The asynchronous

RTP/UDP protocol is selected to transfer media be-

tween the plugins to ensure low latency video trans-

fer between plugins to enable soft real-time video

streams. An example of a network topology for a

Figure 4: Network topology. The full lines represent

HTTP/TCP communications, the dashed line RTP/UDP

communications.

stream with 2 Producer plugins and 3 Consumer plu-

gins is depicted in Figure 4.

4 PROTOTYPE

IMPLEMENTATION

The goal of the prototype implementation is to pro-

vide a proof-of-concept framework allowing evalu-

ation with regards to adherence to the requirements

imposed in Section 3. The core framework compo-

nents are implemented, the distribution components

are left out of scope as they focus primarily on the

distribution of supported plugins and as such any ser-

vice repository technology can be used (e.g. Docker

Hub (Docker, 2016)). The core microservices as well

as the plugins are implemented using the Docker soft-

ware containerization framework (Docker Inc., 2018).

Containers virtualize on the operating system and al-

low for portable, lightweight software environments

for processes with minor performance overhead. Us-

ing Docker, the core modules and plugins can be de-

ployed in a local and distributed fashion, scaled up

and down easily as per the needs of the application

and be setup and shutdown easily. The microser-

vice containers communicate via the protocols pre-

sented in Section 3.3.2. The REST APIs are built with

the Flask framework (Ronacher, 2017), a lightweight

Python web development framework well-suited for

rapid prototyping. The Producer/Consumer compo-

nents need access to the Docker daemon running on

the Docker host in order to spin up and shutdown

Docker containers containing the plugin processes.

This is achieved by mounting the Docker client socket

in the Producer/Consumer containers. As a downside,

this gives the container a way to achieve access to the

Pluggable Drone Imaging Analysis Framework for Mob Detection during Open-air Events

Figure 5: Filecam plugin - GStreamer pipeline.

Figure 6: Local plugin - GStreamer pipeline.

host, and as such only trustworthy plugins should be

used (Lvh, 2015; Yasrab, 2018). Two sample plugins

were implemented: Filecam, a plugin that produces

video read from a ﬁle as shown in Figure 5, and Dis-

play, a plugin that forwards media to the display of

the local device as shown in Figure 6. The plugins

transmit media using the GStreamer video streaming

framework (GStreamer, 2018).

The prototype itself can be managed by the cre-

ated CLI. Through the CLI, the framework can be

started, stopped and new streams, connecting a pro-

ducer plugin with a consumer plugin, can be created,

linked, started and managed.

5 MOB DETECTION

5.1 Dataset

Several publicly available datasets for thermal images

exist (Davis and Keck, 2005; Hwang et al., 2015; Wu

et al., 2014; Li et al., 2007). None of these include

large crowds of people, so a new dataset called the

Last Post dataset was created.

. It consists of thermal

video captured at the Last Post ceremony in Ypres,

Belgium. The videos were captured using the Flir

One Pro thermal camera for Android using the Iron

colorscheme. Two main scenes are present in the

dataset, depicted in Figure 7. Mobs are present in

the thermal images, not in the visual images due to

the images being made on separate days. The images

captured for the experiment were manually annotated,

outliers were removed and the dataset was randomly

split in a training and validation set.

5.2 Model

Detecting and classifying objects of interest in im-

ages is known as the object detection problem in

machine learning (Alpaydin, 2014). Several object

detection algorithms and frameworks have been im-

plemented in the past years. A distinction is made

https://github.com/IBCNServices/Last-Post-Dataset

(a) Thermal view of the

square

(b) Visual view of the

square

bridge

(d) Visual view of the

bridge

Figure 7: The Last Post dataset main scenes.

between traditional models, deep learning two-stage

networks and deep learning dense networks. The tra-

ditional and two-stage methods make predictions rel-

atively slow (in the order of seconds on GPU) when

compared to the dense networks (order of millisec-

onds on GPU) (Redmon et al., 2015). Since the goal

is to use the framework in soft real-time use cases the

latter is preferred. The YOLOv3 model is selected

as it achieves state of the art prediction performances,

can make soft real-time predictions and is available

via the open source neural network framework dark-

net (Redmon and Farhadi, 2018; Redmon, 2016). The

model is pre-trained on the ImageNet dataset (Deng

et al., 2009), trained on a NVIDIA Geforce 980 TX

GPU and the SSE loss is optimized using batch gra-

dient descent (Redmon and Farhadi, 2018). To select

the best weights, the average Intersection of Union

(IoU) and mean Average Precision (mAP) are calcu-

lated on predictions on the validation set. The weights

that achieve the highest mAP are selected as the ﬁnal

weights.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

Table 1: Performance test statistics summary, measured in seconds. All the possible commands from the CLI are evaluated.

Statistic Play Stop Pause Add Delete Elements

Mean 0.690 0.804 0.634 1.363 8.402 0.562

Std deviation 0.050 0.059 0.088 1.037 4.669 0.070

Minimum 0.629 0.708 0.549 0.516 0.505 0.517

25 Percentile 0.665 0.775 0.594 1.049 1.154 0.534

Median 0.678 0.800 0.623 1.11 11.132 0.550

75 Percentile 0.700 0.820 0.653 1.233 11.189 0.562

Maximum 1.016 1.279 1.631 6.25 11.846 1.227

Statistic Print View On Off Link

Mean 0.564 1.22 3.58 24.023 0.849

Std deviation 0.0747 0.260 0.498 0.481 0.170

Minimum 0.517 0.757 3.015 23.707 0.637

25 Percentile 0.536 0.998 3.143 23.750 0.798

Median 0.552 1.214 3.500 23.886 0.853

75 Percentile 0.560 1.433 3.850 24.034 0.877

Maximum 1.149 1.691 4.562 25.326 1.261

6 RESULTS

6.1 Framework

First, acceptance tests for the requirements from Sec-

tion 3 were conducted. Common framework opera-

tions such as manipulating and building a stream have

an average execution time of 0.84 seconds with a stan-

dard deviation of 0.37 seconds. Less common oper-

ations such as deactivating a plugin, starting up the

framework and shutting down the framework have an

average execution time of 3.58, 8.40 and 24.02 sec-

onds with standard deviations 4.67, 0.50 and 0.48

respectively. Deactivating plugins (STOP to INAC-

TIVE transitions) takes a long time, as the container

running the process needs to be removed. Real-time

streaming could not be tested due to the GStreamer

framework having no readily available end-to-end

tests. However when streaming and displaying a

video with the framework a panel of 5 human users

could not differentiate between a streaming video and

a video played using a native media player. Great care

must be taken when building plugins, as their process-

ing speed has a direct impact on the real-time stream-

ing performance. The results of the acceptance tests

are summarized in Table 1.

Interoperability is achieved with the REST APIs

and plugin model presented in Section 3.3.2. The in-

teroperability is tested by having the framework ex-

change information with a mock plugin implement-

ing the speciﬁed interface and counting the number of

correct exchanges. The average successful exchange

ratio is 99.998%, caused by the Flask framework not

supporting special characters in an url. The frame-

work can install and detect new plugins at runtime,

achieving runtime modiﬁability at plugin level. Full

results are summarized in Table 2. Different deploy-

ment schemes were not tested for the prototype.

6.2 Mob Detection

A YOLOv3 object detection consumer plugin is cre-

ated and added to a full stream for the evaluation.

YOLO creates a snapshot from the weights of the

model every 100 epochs. (An epoch is when all the

training data are used once to update the weights in

the neural network). This allows to validate each

set of weights on the validation set and show the

evolution of the validation performance. Figure 8

shows these evolutions for the average IoU and mAP

metrics, based on a pre-deﬁned model provided by

YOLO. This pre-deﬁned model is trained on several

colored images from the COCO-dataset (Lin et al.,

2014) whereby multiple classes such as a dog, a bike,

a car, etc. can be detected. As shown in Figure 8, the

mAP gradually grows from epoch 4500 onwards and

stagnates around epoch 11500. This shows that the

model is not learning anymore and is at risk of over-

ﬁtting. The mAP stagnates in the interval of [88%,

91%]. The average IoU shows a similar trend, but

varies more, because predictions on the same images

rarely are exactly the same.

If we restart the training process without the pre-

deﬁned model, the evolutions for the average IoU and

mAP metrics show a remarkable difference compared

to the training process with the pre-deﬁned model, as

shown in Figure 9.

The best mAP value is achieved at epoch 15700

Pluggable Drone Imaging Analysis Framework for Mob Detection during Open-air Events

Table 2: Interoperability tests results (S.: Source, L.: Listener).

Value Play Pause Stop Add S. Update S. Delete S. Add L. Update L. Delete L.

Correct 50000 50000 50000 50000 50000 49999 50000 50000 49999

Incorrect 0 0 0 0 0 1 0 0 1

% 100 100 100 100 100 99.998 100 100 99.998

(a) mAP (%) per epoch. Vertical: mAP (%); horizontal: time (in training epochs).

(b) IoU (%) per epoch. Vertical: IoU (%); horizontal: time (in training epochs).

Figure 8: Validation metrics per epoch based on a pre-deﬁned YOLO model.

(a) mAP (%) per epoch. Vertical: mAP (%); horizontal: time (in training epochs).

(b) IoU (%) per epoch. Vertical: IoU (%); horizontal: time (in training epochs).

Figure 9: Validation metrics per epoch without a pre-deﬁned YOLO model.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods

Figure 10: Model predictions on validation set.

being 90.52%. The weights from this epoch are used

for further testing and validation. The mAP for the

0.5 IoU threshold of YOLOv3 on the COCO bench-

mark dataset is 74.8 %, comparing this to the achieved

mAP for the Last Post dataset, the Last Post mAP is

very high. The reason for this difference is that the

validation set has a high correlation with the training

set. Due to the training set and validation set being

extracted from videos, all images from one video are

correlated in time to each other. Images from the vali-

dation set are thus correlated to images in the training

set, and the model is optimized on these types of im-

ages, explaining the high mAP. This indicates that the

model is overﬁtting on the training data. This was

conﬁrmed when testing the model on unseen videos.

Although the model could detect a mob, it produced

more visual errors. Because this data was not anno-

tated, no metrics could be extracted. Figure 10 de-

picts some predictions of the model on images from

the validation set. The predicted bounding boxes re-

semble the ground truth bounding boxes quite accu-

rately visually.

To test the speed of the predictions of the model,

the total time to predict images in the validation set

was measured. For the NVIDIA Geforce GTX 980

GPU the average prediction time for one image is

14.673 milliseconds, with a standard deviation of

0.517 milliseconds. This indicates that the upper limit

of the frame rate when making predictions on a video

is approximately 68 frames per second on the GPU.

For comparison, predictions with the model were also

made on a CPU, a 2.6 GHz Intel Core i5-2540 pro-

cessor with AVX instructions speedup. The average

prediction time on the CPU is 5.849 seconds with a

standard deviation of 0.438 seconds, resulting in an

upper limit for the frame rate on the CPU of 0.171

frames per second. Clearly real time object detection

with this model is only possible on a GPU. When gen-

erating predictions on a test video the average frame

rate of the video was 55 frames per second.

7 CONCLUSION AND FUTURE

WORK

In this paper a modiﬁable drone thermal imaging

analysis framework is proposed to allow end-users

to build ﬂexible video processing pipelines using dif-

ferent sources and analysis modules. The framework

implements a microservice container plugin architec-

ture. Users can rapidly build image processing appli-

cations by interconnecting input, processing and out-

put plugins. The framework is evaluated by means

of a proof-of-concept implementation, which is tested

on the initial requirements. The proposed frame-

work achieves the modiﬁability and interoperability

requirements at the cost of performance (long shut-

down time of a plugin). The framework is applied

for detecting large crowds of people (mobs) during

open-air events. A new dataset containing thermal im-

ages of such mobs is presented, on which a YOLOv3

neural network is trained. The trained model is able

to detect mobs on new thermal images in real-time

achieving frame rates of 55 frames per second when

deployed on a modern GPU. Cloud auto-scaling is an

extension to the framework that will be investigated

in future work.

REFERENCES

Alpaydin, E. (2014). Introduction to machine learning.

MIT Press, 3 edition.

Bendig, J., Bolten, A., and Bareth, G. (2012). INTRODUC-

ING A LOW-COST MINI-UAV FOR THERMAL-

AND MULTISPECTRAL-IMAGING.

Christiansen, P., Steen, K. A., Jørgensen, R. N., and

Karstoft, H. (2014). Automated detection and recog-

nition of wildlife using thermal cameras. Sensors

(Basel, Switzerland), 14(8):13778–93.

Davis, J. W. and Keck, M. A. (2005). A Two-Stage Tem-

plate Approach to Person Detection in Thermal Im-

agery. Proc. Workshop on Applications of Computer

Vision.

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., and Fei-

Fei, L. (2009). ImageNet: A Large-Scale Hierarchical

Image Database. In CVPR09.

Divya, J. (2017). Drone Technology and Usage: Current

Uses and Future Drone Technology.

Docker (2016). Docker Hub.

Docker Inc. (2018). Docker - Build, Ship, and Run Any

App, Anywhere.

Gade, R. and Moeslund, T. B. (2014). Thermal cameras and

applications: a survey. Machine Vision and Applica-

tions, 25:245–262.

Goedeme, T. (2017). Projectresultaten VLAIO TETRA-

project. Technical report, KU Leuven, Louvain.

Gonnissen, R. (2016). 3DSafeGuard-VL.

Pluggable Drone Imaging Analysis Framework for Mob Detection during Open-air Events

GStreamer (2018). GStreamer: open source multimedia

framework.

Harvey, M. C., Rowland, J. V., and Luketina, K. M. (2016).

Drone with thermal infrared camera provides high res-

olution georeferenced imagery of the Waikite geother-

mal area, New Zealand.

Hwang, S., Park, J., Kim, N., Choi, Y., and Kweon, I. S.

(2015). Multispectral Pedestrian Detection: Bench-

mark Dataset and Baseline. CVPR.

Inc., A. W. S. (2018). What Is Amazon Kinesis Video

Streams?

Lewis, J. and Fowler, M. (2014). Microservices Resource

Guide.

Li, S. Z., Chu, R., Liao, S., and Zhang, L. (2007). Illumina-

tion Invariant Face Recognition Using Near-Infrared

Images. IEEE Transactions on Pattern Analysis and

Machine Intelligence, 29(4):627–639.

Lin, T. Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ra-

manan, D., Doll

ar, P., and Zitnick, C. L. (2014). Mi-

crosoft COCO: Common objects in context. Lecture

Notes in Computer Science (including subseries Lec-

ture Notes in Artiﬁcial Intelligence and Lecture Notes

in Bioinformatics), 8693 LNCS(PART 5):740–755.

Lvh (2015). Don’t expose the Docker socket (not even to a

container).

Redmon, J. (2013–2016). Darknet: Open source neural net-

works in c. http://pjreddie.com/darknet/.

Redmon, J., Divvala, S., Girshick, R., and Farhadi, A.

(2015). You Only Look Once: Uniﬁed, Real-Time

Object Detection.

Redmon, J. and Farhadi, A. (2018). YOLOv3: An Incre-

mental Improvement. axXiv.

Rivera, A. J., Villalobos, A. D., Monje, J. C., Mari

nas, J. A.,

and Oppus, C. M. (2017). Post-disaster rescue fa-

cility: Human detection and geolocation using aerial

drones. IEEE Region 10 Annual International Confer-

ence, Proceedings/TENCON, pages 384–386.

Ronacher, A. (2017). Welcome to Flask Flask Documenta-

tion (0.12).

Satzger, B., Hummer, W., Inzinger, C., Leitner, P., and

Dustdar, S. (2013). Winds of change: From vendor

lock-in to the meta cloud. IEEE Internet Computing,

17(1):69–73.

Slattery, L.-L. (2017). DroneSAR wants to turn drones into

search-and-rescue heroes.

Steffen, B. and Seyfried, A. (2010). Methods for measur-

ing pedestrian density, ﬂow, speed and direction with

minimal scatter. Physica A: Statistical Mechanics and

its Applications, 389(9):1902–1910.

Tijtgat, N., Ranst, W. V., Volckaert, B., Goedem

e, T., and

De Turck, F. (2017a). Embedded Real-Time Object

Detection for a UAV Warning System. 2017 IEEE

International Conference on Computer Vision Work-

shops (ICCVW), pages 2110–2118.

Tijtgat, N., Volckaert, B., and De Turck, F. (2017b). Real-

Time Hazard Symbol Detection and Localization Us-

ing UAV Imagery. 2017 IEEE 86th Vehicular Tech-

nology Conference (VTC-Fall), pages 1–5.

Workswell (2016). Pipeline inspection with thermal diag-

nostics.

Wu, Z., Fuller, N., Theriault, D., and Betke, M. (2014). A

Thermal Infrared Video Benchmark for Visual Analy-

sis. IEEE Conference on Computer Vision and Pattern

Recognition Workshops.

Yasrab, R. (2018). Mitigating Docker Security Issues. Tech-

nical report, University of Science and Technology of

China, Hefei.

ICPRAM 2019 - 8th International Conference on Pattern Recognition Applications and Methods