Stereoscopic or in general multiview video can provide more vivid and accurate information about the scene structure than from monoview video. However one major obstacle for using multiview video is the extremely large amount of data associated with it. This dissertation considers the problem of structure and motion estimation in multiview tele-conferencing type sequences and its application for video sequence compression and for intermediate view generation. First, we describe a novel image alignment approach, which can convert images captured using non-parallel cameras to coplanar like images. This approach greatly eases the computational burden incurred by the non-parallel camera geometry, where one must consider both horizontal and vertical disparities. Next, we introduce a new approach for structure estimation from a stereo pair acquired by two parallel cameras. It is based on a 3D-mesh representation of the imaged object and a parameterization of the structure information by the disparity between corresponding nodes in the image pair. Finally we present a coder for multiview sequences, which exploits the proposed alignment and structure estimation algorithm. By extracting the foreground objects and estimating the disparity field between a selected view and a reference view, the coder can compress the image pair very efficiently. In the mean time, by using the coded structure information, the decoder can generate virtual viewpoints between decoded views, which can be very helpful for tele-presence applications.
展开▼