2 RELATED WORKS
2.1 Overview of the Object Detection
Task
Object detection has applications in many areas of
computer vision, including image retrieval and video
surveillance. Variations in position, color, lighting
conditions, size, etc., greatly affect the performance
of the model. Object detection and tracking processes
should be fine-tuned, that depending on what kind of
problem is going to be solved. This is usually
determined by the characteristics of the target. For
instance, detecting vehicles require different
parameters tuning than detecting pedestrians, animals,
or faces, and so on. This feature-based technique
exploits a notable differentiation character of the
objects that taken from 2D pixel information in an
image (Gonzalez, R. C., and Woods, R. E., 2017).
While using feature points of 2D images, such as
color, intensity, background information it is easy
way to identify object from frames if it will not
change the appearance, position, and size as well.
2.2 Faster RCNN Object Detection
Architecture
Faster RCNN is becoming one of the most used and
popular an object detection architecture presented by
R. Girshick, Sh. Ren, K. He and J. Sun in 2015 that
uses Convolutional Neural Network like other
famous detectors, such as YOLO (You Look Only
Once), SSD (Single Shot Detector) and so on. In
general, Faster RCNN composed from three main
part at all that can be managed building object
detection model process. They are: a) convolution
layers; b) region proposal network; c) classes and
bounding boxes prediction (Faster RCNN, n.d.):
2.3 Training Dataset
We have used our own dataset (600 images) for
training (400) and testing (200) process, which has
been collected throughout internet sources, such as
blogs, posts, and so on. The difference between our
dataset apart from other datasets is that our dataset
images taken by drone camera and environment.
Training process speed and time depends on what
kind of CPU and GPU system we have, if our OS has
last version of GPU hardware system that means we
can get training results faster than using CPU system.
If your computer has no supporting hardware
platform or latest GPU system, we recommend you
do not use your computer for other high memory or
performance required processes while training, that
can affect to obtain training outcomes as well as it
could lead your training process to be time consuming.
Training classifier should train until the loss is
consistently below 0.05 or so that the law starts to
plateau out. A total loss graph estimates that while
learning training dataset images it can loss or
misidentify objects by their features, shapes and other
parameters in one average graph performance. The
total loss of training process performance together
with objectiveness loss, which can show us the
objectiveness score (4e-3≈0.004) of the dataset’s
images, to indicate if this box contains an object or
not while training process is given in Figure 1 below:
Figure 1: Performance of the total losses along with
objectiveness loss.
2.4 Proposed Object Tracking Method
The basis of our tracking method taken from the
Discriminative Correlation Filter with Channel and
Spatial Reliability (CSR-DCF) (A. Lukezic, et al.,
2018) tracking algorithm. Moreover, this algorithm has
been implemented in a C++ and integrated into Open
CV library as a DNN (Deep Neural Networks) module
(OpenCV dnn module, n.d.). We proposed a tracking
system that integration of Faster RCNN object
detection as an object detector and OpenCV_CSRT_
tracker as a tracking algorithm for tracking method.
Implemented object tracking process contains two
parts: first is already explained above, training object
classifier and generating object detection model from
training outcome file. From coming frame algorithm
takes blobs and gives it to object detection model to
predict location of object and classify as an object
class. Output predictions of the detection passes to
tracking algorithm to track predicted box of object
class and so on, any other circumstances of object
prediction changes tracker will invoke the object
classifier and restarts process from the beginning.
3 EXPERIMENTAL RESULTS
After finishing 200K times training our dataset we got