...
首页> 外文期刊>Engineering Applications of Artificial Intelligence >The TTS-driven affective embodied conversational agent EVA, based on a novel conversational-behavior generation algorithm
【24h】

The TTS-driven affective embodied conversational agent EVA, based on a novel conversational-behavior generation algorithm

机译:基于新颖的会话行为生成算法的TTS驱动的情感体现会话代理EVA

获取原文
获取原文并翻译 | 示例
           

摘要

As a result of the convergence of different services delivered over the internet protocol, internet protocol television (IPTV) may be regarded as the one of the most widespread user interfaces accepted by a highly diverse user domain. Every generation, from children to the elderly, can use IPTV for recreation, as well as for gaining social contact and stimulating the mind. However, technological advances in digital platforms go hand in hand with the complexity of their user interfaces, and thus induce technological disinterest and technological exclusion. Therefore, interactivity and affective content presentations are, from the perspective of advanced user interfaces, two key factors in any application incorporating human-computer interaction (HCI). Furthermore, the perception and understanding of the information (meaning) conveyed is closely interlinked with visual cues and non-verbal elements that speakers generate throughout human-human dialogues. In this regard, co-verbal behavior provides information to the communicative act. It supports the speaker's communicative goal and allows for a variety of other information to be added to his/her messages, including (but not limited to) psychological states, attitudes, and personality. In the present paper, we address complexity and technological disinterest through the integration of natural, human-like multimodal output that incorporates a novel combined data- and rule-driven co-verbal behavior generator that is able to extract features from unannotated, general text. The core of the paper discusses the processes that model and synchronize non-verbal features with verbal features even when dealing with unknown context and/or limited contextual information. In addition, the proposed algorithm incorporates data-driven (speech prosody, repository of motor skills) and rule-based concepts (grammar, gesticon). The algorithm firstly classifies the communicative intent, then plans the co-verbal cues and their form within the gesture unit, generates temporally synchronized co-verbal cues, and finally realizes them in the form of human-like co-verbal movements. In this way, the information can be represented in the form of both meaningfully and temporally synchronized co-verbal cues with accompanying synthesized speech, using communication channels to which people are most accustomed.
机译:由于通过互联网协议交付的不同服务的融合,互联网协议电视(IPTV)可以被视为高度多样化的用户域所接受的最广泛的用户界面之一。从儿童到老年人,每一代人都可以使用IPTV进行娱乐,以及获得社交联系和激发思想。但是,数字平台的技术进步与它们的用户界面的复杂性息息相关,因此引起了技术上的不满和技术排斥。因此,从高级用户界面的角度来看,交互性和情感内容表示是结合人机交互(HCI)的任何应用程序中的两个关键因素。此外,对传达的信息(含义)的感知和理解与说话者在人与人之间的对话中产生的视觉提示和非语言元素紧密相关。在这方面,口语行为为交往行为提供了信息。它支持演讲者的交流目标,并允许在他/她的消息中添加各种其他信息,包括(但不限于)心理状态,态度和个性。在本文中,我们通过集成类似于人的自然多模态输出来解决复杂性和技术问题,该输出结合了新颖的数据和规则驱动的副词行为生成器,能够从未注释的通用文本中提取特征。本文的核心讨论了即使在处理未知上下文和/或有限上下文信息时,也可以对非语言特征与语言特征进行建模和同步的过程。另外,所提出的算法结合了数据驱动的(语音韵律,运动技能库)和基于规则的概念(语法,手势图标)。该算法首先对交际意图进行分类,然后在手势单元内计划语音提示及其形式,生成时间同步的语音提示,最后以类似于人的语音动作的形式实现它们。以这种方式,可以使用人们最习惯的通信渠道,以有意义的和时间上同步的语音提示以及伴随的合成语音的形式来表示信息。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号