on a smart board about registration of participant in
the chair; (3) pronounce the digit sequence from one
to ten; (4) move to another chair.
During the experiments 37 avi files were
recorded in a discussion work mode. After manual
checking it was detected that 90% of files are files
with speaker’s speech and 10% of files are false files
with noises. Such noises are carried out in process of
tester standing up from a chair, because in such
moment chair’s mechanical details carry out high
noise. Also mistakes in detecting sitting participants
influence on appearance of false files. Table 1 shows
results of estimation files with speaker’s speech.
Table 1: The estimation of algorithm of detecting and
recording active participant speech work.
, ms
, ms
, frames
i
Max Mean Min Max Mean Min Max Mean
80 2440 730 3104 6432 5530 23 104 58
A result of experiments shows, that avi file in
mean consists of 137 frames, 59 of them are
duplicated frames and has length of 5 seconds.
Calculated mean FPS in video buffer is 24 frames
per second, this is due to the fact that rounding of
values at calculating a required the total amount of
additional frames in image packets. The total
amount of duplicated frames includes initial delay
between audio and video streams. Also such total
amount of duplicated frames is carry out with
changing camera FPS as a result of noises in a
network devices as well as limited writing speed of
storage devices. An analysis of received data shows
that avi files formed by the system include all
speeches and a small percent of false records.
4 CONCLUSIONS
The audiovisual system for e-learning applications
was developed for automation of recording events in
the smart room. It consists of the four main modules,
which realize multichannel audio and video signal
processing for participants localization, detection of
speakers and recording them. The proposed system
allows us to automate control of audio and video
hardware as well as other devices installed in the
smart room by distant speech recognition of
participant command. The verification of the system
was accomplished on the functional level and also
the estimations of detection quality of participants,
and camera pointing on speaker and speaker
detection error were calculated.
ACKNOWLEDGEMENTS
This work is supported by the Federal Target
Program “Research and Research-Human Resources
for Innovating Russia in 2009-2013” (contract 14.7
40.11.0357).
REFERENCES
Lampi F., 2010 Automatic Lecture Recording. Disserta-
tion. The University of Mannheim, Germany.
Mukhopadhyay, S., Smith, B., 1999 Passive capture and
Structuring of Lectures, Proceedings of ACM
Multimedia, Orlando, FL, USA, Vol.: 1, pp. 477-487.
Rui, Y., Gupta, A., Cadiz, J. J., 2001 Viewing meetings
captured by an omni-directional camera, Proceedings
of ACM CHI, Seattle, WA, USA, pp. 450-457.
Cutler, R., Rui, Y., Gupta, A., Cadiz, J. J., Tashev, I., He,
L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, St.,
2002 Distributed Meetings: A Meeting Capture and
Broadcasting System, Proceedings of ACM
Multimedia, Juan-les-Pins, France, pp. 503-512.
Cruz, G., Hill, R., 1994 Capturing and playing multimedia
events with STREAMS, Proceedings of the second
ACM international conference on Multimedia, San
Francisco, California, USA, pp. 193-200.
Liu, Q., Kimber, D., Foote, J., Wilcox, L., Boreczky, J.,
2002 FLYSPEC: a multi-user video camera system
with hybrid human and automatic control,
Proceedings of ACM Multimedia, Juan-les-Pins,
France, pp. 484-492.
Rui Y., Gupta A., Grudin J., and He L., 2004 Automating
lecture capture and broadcast: Technology and video-
graphy, ACM Multimedia Systems Journal. pp. 3–15
Ronzhin A., Budkov V., and Karpov A., Multichannel
System of Audio-Visual Support of Remote Mobile
Participant at E-Meeting. Springer-Verlag Berlin
Heidelberg, S. Balandin et al. (Eds.): NEW2AN/ru
SMART 2010, LNCS 6294, 2010, pp. 62–71.
Ronzhin Al. L., Prischepa M. V., Budkov V. Yu., Karpov
A. A., Ronzhin A. L., Distributed System of Video
Monitoring for the Smart Space. In. Proc.
GraphiCon’2010. Saint-Petersburg, Russia, 2010 pp.
207-214. (in Rus.).
Maurizio O., Piergiorgio S., Alessio B., Luca C. 2006
Machine Learning for Multimodal Interaction:
Speaker Localization in CHIL Lectures: Evaluation
Criteria and Results. Berlin: Springer, pp. 476–487.
Yusupov R. M., Ronzhin An. L., Prischepa M. V.,
Ronzhin Al. L. Models and Hardware-Software Solu-
tions for Automatic Control of Intelligent Hall. Auto-
mation and Remote Control, Vol. 72, No. 7, 2011 pp.
1389–1397.
Ronzhin An. L., Ronzhin Al. L., Budkov V. Yu.
Audiovisual Speaker Localization in Medium Smart
Meeting Room. In Proc. 8th International Conference
on Information, Communications and Signal
Processing ICICS-2011, Singapore, 2011.
GRAPP 2012 - International Conference on Computer Graphics Theory and Applications
518