6 LITERATURE
Generic recognition systems try to solve a similar
problem as FOREST. While generic recognition sys-
tems are able to recognize several object classes, a
flexible recognition system like FOREST is meant to
be adapted to an arbitrary recognition task. Never-
theless, generic recognition systems have been found
to perform better when using multiple feature chan-
nels (Opelt et al., 2006; Zhang et al., 2005; Hegazy
and Denzler, 2009; Varma and Ray, 2007).
The area of tangible user interfaces provides
two examples for systems which require a flex-
ible rather than a generic recognition functional-
ity: Crayons (Fails and Olsen, 2003) and Papier-
M
ˆ
ach
´
e (Klemmer et al., 2004). Both systems pro-
vide the possibility of creating simple gesture recog-
nition systems for interaction purposes. The under-
lying recognition functionality is, however, limited to
very basic color information.
A recognition system which intends to use web-
cams is Eyepatch. It requires no expert knowledge in
the areas of image recognition, but requires that the
user applies and combines predefined classifiers. An-
other system which intends to use existing webcams
is Vision on Tap (Chiu, 2011). It provides specific
processing blocks which implement motion tracking,
skin color recognition or face recognition. These can
be combined in a visual computing application to cre-
ate custom recognition systems. Although a nice va-
riety of applications can be implemented using these
building blocks, the resulting functionality is effec-
tively limited.
7 CONCLUSIONS
A software framework, FOREST, for the develop-
ment of custom, i.e. user-defined, recognition sys-
tems was presented. In order to be usable by non-
expert users such a system has to fulfill a set of re-
quirements which were discussed and implemented.
In contrast to other existing systems FOREST con-
siders all aspects of the development process from
a non-expert users point of view. The image pro-
cessing functionality is fully automated, requiring no
programming skills or expert knowledge. Interactive
steps in the development process were enhanced us-
ing semi-automatic techniques like, e.g., the identifi-
cation of possible skews in the training data set. The
user is even supported in the assessment of the classi-
fier performance.
In contrast to existing software frameworks FOR-
EST does not provide a collection of algorithm but
instead allows the adaption of the recognition func-
tionality to a specific user-defined recognition task.
FOREST achieves this functionality by extracting a
large heterogeneous feature set from the images and
applying a boosting classifier which selects discrim-
inative features based on the ground truth data pro-
vided by the user. The application of such a hetero-
geneous feature set allows the identification of impor-
tant image properties despite the lack of knowledge
about the (type of) recognition task even with weakly
supervised learning.
REFERENCES
Chiu, K. (2011). Vision On Tap : An Online Computer
Vision Toolkit. Master’s thesis, Massachusetts Insti-
tute of Technology. Dept. of Architecture. Program in
Media Arts and Sciences.
Fails, J. and Olsen, D. (2003). A Design Tool for Camera-
based Interaction. In Proceedings of the SIGCHI
Conference on Human Factors in Computing Systems,
pages 449–456. ACM.
Fei-Fei, L., Fergus, R., and Perona, P. (2004). Learning
Generative Visual Models from Few Training Exam-
ples: An Incremental Bayesian Approach Tested on
101 Object Categories. In IEEE CVPR Workshop on
Generative-Model based Vision.
Hegazy, D. and Denzler, J. (2009). Generic Object Recog-
nition. In Computer Vision in Camera Networks for
Analyzing Complex Dynamic Natural Scenes.
Klemmer, S., Li, J., Lin, J., and Landay, J. (2004). Papier-
M
ˆ
ach
´
e: Toolkit Support for Tangible Input. In Pro-
ceedings of the SIGCHI Conference on Human Fac-
tors in Computing Systems, pages 399–406. ACM.
Lowe, D. (2004). Distinctive Image Features from Scale-
Invariant Keypoints. Intl. Journal of Computer Vision,
60:91–110.
Manjunath, B., Ohm, J.-R., Vasudevan, V., and Yamada, A.
(2001). Color and Texture Descriptors. IEEE Trans-
actions on Circuits and Systems for Video Technology,
11(6):703–715.
Matas, J., Chum, O., Urban, M., and Pajdla, T. (2002). Ro-
bust Wide Baseline Stereo from Maximally Stable Ex-
tremal Regions. In British Machine Vision Confer-
ence, volume 1, pages 384–393.
Mikolajczyk, K. and Schmid, C. (2004). Scale and Affine
Invariant Interest Point Detectors. Intl. Journal of
Computer Vision, 60(1):63–86.
Moehrmann, J. and Heidemann, G. (2013). Semi-
Automatic Image Annotation. In Computer Analysis
of Images and Patterns, volume 8048 of Lecture Notes
in Computer Science, pages 266–273.
Nilsback, M.-E. and Zisserman, A. (2006). A Visual Vocab-
ulary for Flower Classification. In IEEE Conference
on Computer Vision and Pattern Recognition (CVPR),
volume 2, pages 1447–1454. IEEE.
ICPRAM2015-InternationalConferenceonPatternRecognitionApplicationsandMethods
126