Explicit and implicit discourse relations from a cross-lingual perspective - from experience in working on Chinese discourse annotation

机译：来自奇异视角的明确和隐含的话语关系 - 从中国语篇注释工作的经验

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the field of computational linguistics or natural language processing, progress in discourse analysis has been relatively slow, as compared with syntactic parsing or semantic analysis (e.g., word sense disambiguation, semantic role labeling). In this age when statistical, data-driven approaches dominate the field, having a common linguistic resource that is widely accepted by the community is key to advancing the state of the art in this area. To create consistently annotated data for discourse analysis is particularly challenging because one has to deal with larger linguistic structures and there are few linguistic rules to follow. The key to successful discourse annotation is to identify a well-grounded linguistic theory that can be easily operationalized. In the Perm Discourse Treebank (Prasad et al. 2008, Webber and Joshi 1998) the field may have found such a theory. In the PDTB conception, discourse relations revolve around discourse connectives, where each discourse connective is a predicate that takes two arguments. In this way, discourse annotations are anchored by discourse connectives and are thus lexicalized. In our view, lexicalization has been crucial to the success of the PDTB as an annotation project, a large-scale effort characterized by high inter-annotator agreement, a standard metric for annotation consistency. Lexicalization makes highly abstract discourse relations grounded to a specific lexical item. In doing so, it localizes the ambiguity in discourse relations to discourse connectives, where a lexical item can have either a discourse connective use or a non-discourse connective use (e.g., "when"), and one discourse connective can be ambiguous between different discourse relations (e.g., "since"). As a result, it reduces the cognitive load of the annotation task because each annotator can focus on only one discourse connective at a time instead of scores of discourse relations. This in turn enlarges the annotator pool and more annotators will be able to perform the task without having to have extensive training. The long list of annotators who worked on the PDTB annotation attests to this observation. A larger annotator pool and a shorter learning curve translates to the scalability of such an approach. If lexicalization is so important to discourse annotation, what about discourse relations that are not anchored by an explicit discourse connective? The PDTB addresses this by assuming there is an implicit discourse connective that connects its two arguments, which are typically (parts of) adjacent sentences. This is operationalized by identifying punctuation marks (e.g., periods) that serve as boundaries of two adjacent sentences as anchors of implicit discourse relations. The specific discourse relation is determined by testing which discourse connective can be plausibly inserted between these two adjacent sentences. In doing so, the PDTB assumes that (1) the range of possible discourse relations anchored by implicit discourse connectives are basically the same as those anchored by explicit discourse relations, and (2) discourse relations anchored by implicit discourse connectives are mostly local. The first assumption is largely born out in the PDTB. Either a discourse connective can be inserted between two adjacent sentences, or they are related by the fact that they talk about the same entities, or there is no relation between them. The last possibility has a direct bearing on the second question: if there is no relation between two adjacent sentences, does that mean that these sentences have no discourse relations at all with the rest of the text, or that they are related to other discourse segments that are non-local? It is reasonable to assume that all discourse segments are related in a coherent piece of text, and large number of such "no-relations" would call for a significant expansion to the PDTB approach. While it might not be too much to expect that the same high-level discourse relations

机译：在计算语言学或自然语言处理领域，与句法解析或语义分析相比，话语分析的进展相比相对较慢（例如，词感歧义，语义角色标记）。在这个时代，当统计数据驱动的方法占据了该领域的统治性地位，具有广泛接受的常见语言资源，这些资源被社区广泛接受，是推进该领域的艺术状态的关键。为了创建一致的话语分析数据，尤其具有挑战性，因为一个人必须处理更大的语言结构，并且有很少的语言规则。成功的话语注释的关键是识别可以很容易运作的基础语言理论。在烫发话语TreeBank（Prasad等，2008，Webber和Joshi 1998）该领域可能已经找到了这样的理论。在PDTB的概念中，话语关系围绕话语连接旋转，每个话语连接都是谓词，它需要两个参数。通过这种方式，话语注释由话语连接锚定，因此是lexicalized。在我们看来，Lexicalization对PDTB作为注释项目的成功至关重要，这是一种大规模的努力，其具有高的注释协议，标准度量是注释一致性的标准度量。词汇化使得高度抽象的话语关系接地为特定的词汇项目。在这样做时，它定位了话语关系中的歧义与话语连接，其中词汇项目可以具有话语结缔组织或非话语结缔组织（例如，“何时”），并且一个话语连接可以在不同之间模糊话语关系（例如“以来”）。结果，它减少了注释任务的认知负载，因为每个注释器一次只能专注于一次一个话语连接而不是话语关系的分数。这反过来扩大了注释池，更多的注释器将能够执行任务，而无需具有广泛的培训。在PDTB注释上工作的漫长的注释者列表证明了这种观察。更大的注释池和更短的学习曲线转化为这种方法的可扩展性。如果词汇化对话语注释非常重要，那么话语关系呢是未经明确的话语结缔组织的话语？ PDTB通过假设存在一个隐式的话语结缔组来解决它的两个参数，通常是（通常是相邻句子）。这是通过识别用作两个相邻句子的边界的标点符号（例如，期间）来运行，作为隐式话语关系的锚点。具体的话语关系是通过测试在这两个相邻句子之间的话语连接的测试中的测试。在这样做时，PDTB假设（1）由隐式话语联系锚定的可能话语关系范围基本上与通过明确话语关系锚定的人，并且（2）通过隐式话语联系锚定的话语关系主要是本地。第一个假设在PDTB中很大程度上出生。话语连接可以插入两个相邻的句子之间，或者它们与他们谈论同一实体的事实相关，或者它们之间没有关系。最后一个可能性在第二个问题上直接承担：如果两个相邻句子之间没有关系，这是否意味着这些句子完全没有涉及其余文本的话语关系，或者它们与其他话语段相关这是非本地的？假设所有话语段都是合理的，所有话语段都是在连贯的文本中相关的，并且大量这样的“无关系”将要求对PDTB方法进行重大扩张。虽然它可能不会过分预期，但期望相同的高级话语关系

著录项

来源
《Workshop on Advances in Discourse Analysis and its Computational Aspects》|2012年||共2页
会议地点
作者
Nianwen (Bert) Xue;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Cross-lingual implicit discourse relation recognition with co-training [J] . Yao-jie LU, Mu XU, Chang-xing WU, 浙江大学学报（英文版）（C辑：计算机与电子） . 2018,第005期

机译：跨语言隐性话语关系识别与协同训练
2. Learning better discourse representation for implicit discourse relation recognition via attention networks [J] . Zhang Biao, Xiong Deyi, Su Jinsong, Neurocomputing . 2018,第JANa31期

机译：通过注意力网络学习更好的话语表示以进行隐式话语关系识别
3. A Survey of Discourse Representations for Chinese Discourse Annotation [J] . Kang Xiaomian, Zong Chengqing, Xue Nianwen ACM transactions on Asian language information processing . 2019,第3期

机译：汉语话语注释的话语表征研究
4. Explicit and implicit discourse relations from a cross-lingual perspective - from experience in working on Chinese discourse annotation [C] . Nianwen (Bert) Xue Workshop on Advances in Discourse Analysis and its Computational Aspects . 2012

机译：跨语言视角下的显性和隐性话语关系-从中国话语注释的工作经验
5. A narrative enquiry into the educational assistant's perspective: Power relations and discourse in inclusive education. [D] . Workman, Paul M. 2007

机译：对教育助理观点的叙述性探究：全纳教育中的权力关系和话语。
6. Effects of Syntactic Complexity Semantic Reversibility and Explicitness on Discourse Comprehension in Persons with Aphasia and in Healthy Controls [O] . Joshua Levy, Elizabeth Hoover, Gloria Waters, -1

机译：句法复杂性语义可逆性和明确性对失语症患者和健康人话语理解的影响
7. Acquiring Annotated Data with Cross-lingual Explicitation for Implicit Discourse Relation Classification [O] . Wei Shi, Frances Yung, Vera Demberg 2019

机译：获取带有跨语言申报的注释数据，用于隐性话语关系分类

Explicit and implicit discourse relations from a cross-lingual perspective - from experience in working on Chinese discourse annotation

摘要

著录项

相似文献

相关主题

期刊订阅