a specific scale, orientation, ... and thus removing the
positive training samples covering all other options of
these scene variations. This will lead to a greatly re-
duced positive training sample set.
Training object categorization algorithms also re-
quires negative training samples, containing clutter
and random elements not related to the object class.
Looking at the constrained scene, we are sure we
can drastically lower the amount of negatives train-
ing samples needed, maybe even to a single negative
training example, e.g. an empty conveyor belt.
Another industrially relevant aspect is how much
training examples are in fact really needed to obtain a
robust classifier. None of the papers describing the
state-of-the-art object categorization algorithms de-
fine a way of determining the exact size of the positive
and negative training sample set in order to include as
many object variations possible.
4.2 The Annotation Phase
Since this preprocessing phase is actually the most
time-consuming part, due to multiple training pre-
requisites of training samples, it is interesting to im-
prove this phase also. Traditionally many thousands
of training samples are required, of which each sam-
ple needs to be manually formatted correctly. Format-
ting exist of grayscale conversion and adding a region
of interest for each object instance. Furthermore, the
latest techniques (like (Leibe and Schiele, 2004) and
(Gall and Lempitsky, 2009)) also model the relation
of different parts towards an object centre, requiring
the centre of each object instance to be defined also.
We plan to evolve from a fully supervised annota-
tion phase towards a semi-supervised. To accomplish
this we will use a selected set of manually annotated
training samples in order to create a first basic clas-
sifier, which in turn will be used to perform object
detections in the remaining training samples. Detec-
tions will be accepted based on a confidence score,
giving an indication of how certain the algorithm ac-
tually detected an object. We then reduce the man-
ual annotation work towards simply accepting or re-
moving suggested selections by this basic classifier.
If the above basic classifier however would not suc-
ceed in achieving a decent performance rate, then a
possible solution is to look deeper into combining our
approach which existing machine learning algorithms
like boosting and online learning.
A more detailed study of the approach mentioned
above will show how much we can actually reduce
the number of manually labelled training samples in
comparison to the complete training sample set.
5 CONCLUSIONS
Many state-of-the-art object categorization tech-
niques try to become robust against widely varying
parameters like illumination and scale changes in the
scene. By doing so they become complex to handle
and result in algorithms that have a good detection
rate but a large computational time.
However in many real-life industrial machine vi-
sion applications, a lot of those changing parameters
are actually constant, except for the intra-class vari-
ability which is still one of the greatest challenges
of today’s object categorization algorithms. Industrial
applications are in need of object categorization algo-
rithms that can handle this intra-class variability, but
at the same time perform real-time processing.
By making use of the known scene and translating
this into constraints for the algorithm we are sure that
we can adopt existing approaches and create a new
set of fast real-time processing object categorization
algorithmsthat can answer to the needs of this specific
application area of industrial applications like random
object picking and object counting.
REFERENCES
Abdel-Hakim, A. and Farag, A. (2006). Csift: A sift de-
scriptor with color invariant characteristics. In CVPR,
pages 1978–1983.
Bay, H., Tuytelaars, T., and Van Gool, L. (2006). Surf:
Speeded up robust features. ECCV, pages 404–417.
Doll´ar, P., Tu, Z., Perona, P., and Belongie, S. (2009). Inte-
gral channel features. In BMVC.
Felzenszwalb, P., Girshick, R., and McAllester, D. (2010).
Cascade object detection with deformable part mod-
els. In CVPR, pages 2241–2248.
Gall, J. and Lempitsky, V. (2009). Class-specific hough
forests for object detection. In CVPR, pages 1022–
1029.
Hsieh, J., Liao, H., Fan, K., Ko, M., and Hung, Y. (1997).
Image registration using a new edge-based approach.
CVIU, pages 112–130.
Leibe, B. and Schiele, B. (2004). Scale-invariant object cat-
egorization using a scale-adaptive mean-shift search.
Pattern Recognition, pages 145–153.
Lewis, J. (1995). Fast normalized cross-correlation. In Vi-
sion interface, volume 10, pages 120–123.
Mindru, F., Tuytelaars, T., Gool, L., and Moons, T. (2004).
Moment invariants for recognition under changing
viewpoint and illumination. CVIU, 94:3–27.
Van Beeck, K., Goedem´e, T., and Tuytelaars, T. (2012). A
warping window approach to real-time vision-based
pedestrian detection in a truck’s blind spot zone. In
ICINCO, volume 2, pages 561–568.
Viola, P. and Jones, M. (2001). Rapid object detection using
a boosted cascade of simple features. In CVPR, pages
I–511.
VISAPP2013-InternationalConferenceonComputerVisionTheoryandApplications
830