首页>
外国专利>
WEAKLY-SUPERVISED TEXT-BASED VIDEO MOMENT RETRIEVAL VIA CROSS ATTENTION MODELING
WEAKLY-SUPERVISED TEXT-BASED VIDEO MOMENT RETRIEVAL VIA CROSS ATTENTION MODELING
展开▼
机译:基于弱监督的基于文本的视频时刻通过跨关注建模检索
展开▼
页面导航
摘要
著录项
相似文献
摘要
An electronic device obtains video content and a textual query associated with a video moment in the video content. The video content is divided video segments, and the textual query includes one or more words. Visual features are extracted for each video segment, and textual features are extracted for each word. The visual features and the textual features are combined to generate a similarity matrix in which each element represents a similarity level between a respective video segment and a respective word. Segment-attended sentence features are generated for the textual query based on the textual features and the similarity matrix. The segment-attended sentence features are combined with the visual features of the video segments to determine a plurality of alignment scores, which is used to retrieve a subset of the video content associated with the textual query to be retrieved from the video segments.
展开▼