surrounding texture patches are extracted from the
image (Rosten, 2006), and synthetic views of the
plane are generated.
Based on the results described previously, the
classifier is trained to be able to recognize about
100-130 different classes (points). The forest is
constructed with 15-20 trees, and a training set
compound of 500 synthetically generated examples,
in less than 30 minutes. This size of the training set
is a good compromise between training time and
final accuracy of the classifier. Training time is a
very important factor in practical situations such as
outdoor setup preparation time. Once the training set
is ready, the system is ready for tracking.
The obtained frame rate is about 20-25 frames
per second (near real-time) on a 1.6Ghz dual core
CPU. This frame rate may vary depending on the
accuracy of the tracker, i.e, depending on the
number of different points to be recognized. The
drift and jitter are well controlled, so no severe
movements of the objects occur. On a lower CPU,
such as the one installed on a JVC portable device,
the obtained frame rate is 5 frames per second, for
the same number of points.
In comparison with a recursive tracking
approach, the tracking by detection allows the
tracking to run faster and being more robust against
partial object occlusion, or fast camera movement.
The tracker can run indefinitely without requiring a
new initialisation.
6 CONCLUSION AND FUTURE
WORK
In this work we have presented an approach of
tracking by detection for plane homography
estimation using the Random Forest based classifier
for interest point matching. An evaluation and a
practical application of the approach in an
augmented reality setup has been described. The
proposed method is able to robustly track a plane
even if partial plane occlusion occurs, at real-time
frame rate.
We think that machine learning techniques such
as Random Forest is a very promising technique for
optical marker-less tracking.
We want to extend our work to support on-line
training classification (Özuysal, 2006). On-line
training allows the tracking to update the model with
new feature points not present in the original
training set. As described in (Williams, 2007) on-
line training can be exploited in several frameworks
such as Simultaneous Localization and Mapping
(SLAM).
Also the use of the new generation Graphic
Processor Units (GPU) to perform some task, such
as the generation of warping transformations during
training step is planned.
ACKNOWLEDGEMENTS
This work has been partially funded under the 6th
Framework Programme of the European Union
within the IST project “IMPROVE” (IST FP6-
004785, http://www.improve-eu.info/).
REFERENCES
Breiman, L., 2001. Random Forests. Machine Learning
Journal, Vol. 45, pages 5-32. ISSN 0885-6125
Hartley, R., Zisserman, A. 2004. Multiple View Geometry
in Computer Vision, Cambridge University Press, 2
nd
edition. ISBN: 0521-54051-8.
Lepetit, V., Fua, P. 2006. Keypoint Recognition Using
Randomized Trees. IEEE Transactions on Pattern
Analysis and Machine Intelligence, Vol. 28(9), pages
1465-1479. ISSN: 0162-8828.
Lepetit, V., Fua, P. 2005. Monocular model-based 3D
object tracking of rigid objects: A survey. Foundations
and Trends® in Computer Graphics and Vision., Vol.
1, pages 1–89.
Lepetit, V., Pilet, J., Fua, P. 2004. Point Matching as a
Classification Problem for Fast and Robust Object
Pose Estimation. In Conference on Computer Vision
and Pattern Recognition. ISBN: 0-7695-2158-4.
Lowe, D. 2004. Distinctive Image Features from Scale
Invariants Keypoints. International Journal of
Computer Vision. Vol. 20(2), Pages 91-110.
Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman,
A., Matas, J., Schaffalitzky, F., Kadir, T., and Gool, L.
V. 2005. A Comparison of Affine Region Detectors.
Int. Journal of Computer Vision. Vol. 65(1-2), pages
43-72. ISSN:0920-5691.
Özuysal, M., Fua, P., Lepetit, V. 2006. Feature Harvesting
for Tracking-By-Detection. In Proc. European
Conference on Computer Vision, pages 592-605.
ISBN:3-540-33836-5.
Rosten, E., Drummond, T. 2006. Machine Learning for
High-Speed Corner Detection. In Proc. European
Conference on Computer Vision. Pages 430- 443.
ISBN 3540338322.
Vacchetti, L., Lepetit, V., Fua, P. 2004. Combining Edge
and Texture Information for Real-Time Accurate 3D
Camera Tracking. In Proc. IEEE and AM International
Symposium on Mixed and Augmented Reality. Vol. 4,
pages 48-57. ISBN:0-7695-2191-6.
Williams, B., Klein, G., Reid, I. 2007. Real-time SLAM
Relocalisation. In Proc. IEEE Interrnational
Conference on Computer Vision.
VISAPP 2008 - International Conference on Computer Vision Theory and Applications
564