首页> 外文期刊>IEEE transactions on multimedia >Optimizing Fixation Prediction Using Recurrent Neural Networks for 360∘ Video Streaming in Head-Mounted Virtual Reality
【24h】

Optimizing Fixation Prediction Using Recurrent Neural Networks for 360∘ Video Streaming in Head-Mounted Virtual Reality

机译:头戴式虚拟现实中使用递归神经网络优化360°视频流的注视预测

获取原文
获取原文并翻译 | 示例
           

摘要

We study the problem of predicting the viewing probability of different parts of 360. videos when streaming them to head-mounted displays. We propose a fixation prediction network based on recurrent neural network, which leverages sensor and content features. The content features are derived by computer vision (CV) algorithms, which may suffer from inferior performance due to various types of distortion caused by diverse 360. video projection models. We propose a unified approach with overlapping virtual viewports to eliminate such negative effects, andwe evaluate our proposed solution using severalCValgorithms, such as saliency detection, face detection, and object detection. We find that overlapping virtual viewports increase the performance of these existing CV algorithms that were not trained for 360. videos. We next fine-tune our fixation prediction network with diverse design options, including: 1) with or without overlapping virtual viewports, 2) with or without future content features, and 3) different feature sampling rates. We empirically choose the best fixation prediction network and use it in a 360. video streaming system. We conduct extensive trace-driven simulations with a large-scale dataset to quantify the performance of the 360. video streaming system with different fixation prediction algorithms. The results show that our proposed fixation prediction network outperforms other algorithms in several aspects, such as: 1) achieving comparable video quality (average gaps between -0.05 and 0.92 dB), 2) consuming much less bandwidth (average bandwidth reduction by up to 8Mb/s), 3) reducing the rebuffering time (on average 40 s in bandwidth-limited 4G cellular networks), and 4) running in real-time (at most 124 ms).
机译:我们研究预测将360.video视频流传输到头戴式显示器时不同部分的观看可能性的问题。我们提出了一种基于递归神经网络的注视预测网络,它利用了传感器和内容特征。内容功能是由计算机视觉(CV)算法派生的,由于各种360视频投影模型导致的各种类型的失真,其性能可能会有所下降。我们提出了一种具有重叠虚拟视口的统一方法来消除此类负面影响,并使用几种C评估算法(例如显着性检测,人脸检测和对象检测)来评估我们提出的解决方案。我们发现重叠的虚拟视口可以提高这些现有的CV算法的性能,这些算法未经360视频训练。接下来,我们将使用多种设计选项来微调注视预测网络,包括:1)具有或不具有重叠的虚拟视口; 2)具有或不具有未来的内容特征;以及3)不同的特征采样率。我们根据经验选择最佳的注视预测网络,并将其用于360.视频流系统。我们使用大规模数据集进行了广泛的跟踪驱动模拟,以量化具有不同注视预测算法的360.视频流系统的性能。结果表明,我们提出的固视预测网络在几个方面优于其他算法,例如:1)获得可比的视频质量(-0.05和0.92 dB之间的平均间隙),2)占用更少的带宽(平均带宽减少多达8Mb) / s),3)减少重新缓冲时间(在带宽受限的4G蜂窝网络中平均为40 s),以及4)实时运行(最多124 ms)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号