In this paper, we propose a method for detecting the location of the speaker that is a target of automatic video filming in distance learning and lecture archive. It is required that a face of a speaking student is filmed in a lecture video. For this purpose, it is necessary to detect the location of a speaker. An acoustic sensor such as a microphone array is used widely to detect the location of a sound source. However, it is difficult to detect the location of a sound source precisely using only microphone array because of sound noise in a large space such as a lecture room. In this paper, we propose a method for detecting more precise location of a speaker in the lecture room using not only the microphone array but also visual sensors. The result shows that the precision ratio of detecting the location of a speaker was improved about 20% by our sensor-fusion method.
展开▼