Unsupervised identification of text reuse in early Chinese literature

Sturgeon Donald

首页> 外文期刊>Literary & linguistic computing >Unsupervised identification of text reuse in early Chinese literature

【24h】

Unsupervised identification of text reuse in early Chinese literature

机译：中国早期文学中文本重用的无监督识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Text reuse in early Chinese transmitted texts is extensive and widespread, often reflecting complex textual histories involving repeated transcription, compilation, and editing spanning many centuries and involving the work of multiple authors and editors. In this study, a fully automated method of identifying and representing complex text reuse patterns is presented, and the results evaluated by comparison to a manually compiled reference work. The resultant data are integrated into a widely used and publicly available online database system with browse, search, and visualization functionality. These same results are then aggregated to create a model of text reuse relationships at a corpus level, revealing patterns of systematic reuse among groups of texts. Lastly, the large number of reuse instances identified make possible the analysis of frequently observed string substitutions, which are observed to be strongly indicative of partial synonymy between strings.

机译：早期中文传输文本中的文本重用是广泛而广泛的，通常反映出复杂的文本历史，涉及跨多个世纪的重复转录，编辑和编辑，涉及多个作者和编辑的工作。在这项研究中，提出了一种识别和表示复杂文本重用模式的全自动方法，并通过与手动编译的参考文献进行比较来评估结果。结果数据被集成到具有浏览，搜索和可视化功能的，广泛使用且公开可用的在线数据库系统中。然后，将这些相同的结果进行汇总，以在语料库级别创建文本重用关系模型，从而揭示文本组之间系统重用的模式。最后，识别出的大量重用实例使分析经常观察到的字符串替换成为可能，而这些替换强烈地表明了字符串之间的部分同义。

著录项

来源
《Literary & linguistic computing》 |2018年第3期|670-684|共15页
作者
Sturgeon Donald;
展开▼
作者单位

Harvard Univ Fairbank Ctr Chinese Studies Room S126 CGIS South Bldg 1730 Cambridge St Cambridge MA 02138 USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Assisting Forensic Identification through Unsupervised Information Extraction of Free Text Autopsy Reports: The Disappearances Cases during the Brazilian Military Dictatorship [J] . Patricia Martin-Rodilla, Marcia L. Hattori, Cesar Gonzalez-Perez Information . 2019,第7期

机译：通过自由文本尸检报告的无监督信息提取来协助法医鉴定：巴西军事独裁统治期间的失踪案
2. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives [J] . Rohan Nanda, Giovanni Siragusa, Luigi Di Caro, Artificial Intelligence and Law . 2019,第2期

机译：无监督和受监督的文本相似性系统，用于自动识别欧洲指令的国家实施措施
3. Unsupervised and supervised text similarity systems for automated identification of national implementing measures of European directives [J] . Rohan Nanda, Giovanni Siragusa, Luigi Di Caro, Artificial Intelligence and Law . 2019,第2期

机译：欧洲指令全国实施措施的自动识别无监督和监督文本相似性系统
4. A Entity Attention-based model for Entity Relation Classification for Chinese Literature Text [C] . Wenrui Xie IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conference . 2021

机译：中国文学文本实体关系分类的实体关注模型
5. Unsupervised Speaker Identification of Quotes in Literary Text利用統計を見る [D] . Tohda Satoshi 2019

机译：文学文本中引语的无监督说话者识别查看用法统计
6. An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature [O] . Changqin Quan, Meng Wang, Fuji Ren -1

机译：从生物医学文献中提取关系的无监督文本挖掘方法
7. Perceptions of Pupils towards Chinese Literature Texts, Factors Influencing Mastery of Chinese Language and Effective Teaching Methods in Teaching Literature Texts [O] . Liew Chiat Ing, Chew Fong Peng 2018

机译：对瞳孔对中国文学文本的看法，影响文学文本教学中汉语掌握的因素及有效教学方法

Unsupervised identification of text reuse in early Chinese literature

摘要

著录项

相似文献

相关主题

期刊订阅