首页> 美国卫生研究院文献>PLoS Clinical Trials >On the Reconstruction of Text Phylogeny Trees: Evaluation and Analysis of Textual Relationships

【2h】

On the Reconstruction of Text Phylogeny Trees: Evaluation and Analysis of Textual Relationships

机译：文本系统树的重建：文本关系的评估与分析

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

Over the history of mankind, textual records change. Sometimes due to mistakes during transcription, sometimes on purpose, as a way to rewrite facts and reinterpret history. There are several classical cases, such as the logarithmic tables, and the transmission of antique and medieval scholarship. Today, text documents are largely edited and redistributed on the Web. Articles on news portals and collaborative platforms (such as Wikipedia), source code, posts on social networks, and even scientific publications or literary works are some examples in which textual content can be subject to changes in an evolutionary process. In this scenario, given a set of near-duplicate documents, it is worthwhile to find which one is the original and the history of changes that created the whole set. Such functionality would have immediate applications on news tracking services, detection of plagiarism, textual criticism, and copyright enforcement, for instance. However, this is not an easy task, as textual features pointing to the documents’ evolutionary direction may not be evident and are often dataset dependent. Moreover, side information, such as time stamps, are neither always available nor reliable. In this paper, we propose a framework for reliably reconstructing text phylogeny trees, and seamlessly exploring new approaches on a wide range of scenarios of text reusage. We employ and evaluate distinct combinations of dissimilarity measures and reconstruction strategies within the proposed framework, and evaluate each approach with extensive experiments, including a set of artificial near-duplicate documents with known phylogeny, and from documents collected from Wikipedia, whose modifications were made by Internet users. We also present results from qualitative experiments in two different applications: text plagiarism and reconstruction of evolutionary trees for manuscripts (stemmatology).

机译：在人类历史上，文字记录发生了变化。有时是由于转录过程中的错误，有时是故意的，作为重写事实和重新解释历史的一种方式。有几种古典情况，例如对数表，以及古董和中世纪学术的传播。如今，文本文档已在网络上进行了大量编辑和重新分发。在新闻门户网站和协作平台（例如Wikipedia）上的文章，源代码，社交网络上的帖子，甚至科学出版物或文学作品，都是文本内容在进化过程中可能会发生变化的一些示例。在这种情况下，给定一组几乎重复的文档，值得找出哪一个是原始文档以及创建整个文档集的更改历史记录。例如，此类功能将在新闻跟踪服务，detection窃，文本批评和版权实施等方面具有直接的应用。但是，这并不是一件容易的事，因为指向文档进化方向的文本特征可能并不明显，并且通常取决于数据集。此外，诸如时间戳之类的辅助信息既不总是可用也不可靠。在本文中，我们提出了一个框架，用于可靠地重建文本系统树，并在各种文本重用场景下无缝地探索新方法。我们在拟议的框架内采用和评估相异性措施和重构策略的不同组合，并通过广泛的实验评估每种方法，包括一组已知系统发育的人工近重复文档以及从Wikipedia收集的文档，这些文档的修改均由互联网用户。我们还将介绍两种不同应用中的定性实验结果：文本text窃和手稿（词根学）进化树的重建。

著录项

期刊名称 PLoS Clinical Trials
作者
Guilherme D. Marmerola; Marina A. Oikawa; Zanoni Dias; Siome Goldenstein; Anderson Rocha;
展开▼
作者单位

展开▼
年(卷),期 2011(11),12
年度 2011
页码 e0167822
总页数 35
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Exploratory analysis of textual data from the Mother and Child Handbook using the text-mining method: Relationships with maternal traits and post-partum depression [J] . Matsuda Yoshio, Manaka Tomoko, Kobayashi Makiko, The journal of obstetrics and gynaecology research . 2016,第6期

机译：使用文本挖掘方法对《母子手册》中的文本数据进行探索性分析：与母体特征和产后抑郁的关系
2. Text Cohesion in English Scientific Texts Written by Saudi Undergraduate Dentistry Students: A Multimodal Discourse Analysis of Textual and Logical Relations in Oral Biology Texts [J] . Hesham Suleiman Alyousef SAGE Open . 2021,第3期

机译：沙特本科牙科学生撰写的英语科学文本中的文本凝聚力：口腔生物学文本中文本与逻辑关系的多模式语篇论证分析
3. Texting Toward Intimacy: Relational Quality, Length, and Motivations in Textual Relationships [J] . Liesel L. Sharabi, David J. Roache, Kimberly B. Pusateri Communication studies . 2019,第5期

机译：向亲密关系发短信：文本关系中的关系质量，时长和动机
4. The Decision Tree Evaluation Analysis for the Relationship of the Marketing and Enterprise Strategy [C] . Mingmei Zhao International Conference on Robots Intelligent System . 2017

机译：营销与企业战略关系的决策树评价分析
5. Analysis and reconstruction of the relationship between a circulation anomaly feature and tree rings: Linear and nonlinear approaches [D] . Ni, Fenbiao. 2000

机译：环流异常特征与年轮之间关系的分析与重建：线性和非线性方法
6. Computer-assisted textual analysis of free-text comments in the Swiss Cancer Patient Experiences (SCAPE) survey [O] . Chantal Arditi, Diana Walther, Ingrid Gilles, 2020

机译：瑞士癌症患者体验（SCAPE）调查中的计算机辅助文本评论
7. On the Reconstruction of Text Phylogeny Trees: Evaluation and Analysis of Textual Relationships. [O] . Guilherme D Marmerola, Marina A Oikawa, Zanoni Dias, 2016

机译：论文本系统发育树的重建：文本关系的评价与分析。

On the Reconstruction of Text Phylogeny Trees: Evaluation and Analysis of Textual Relationships

摘要

著录项

相似文献

相关主题

期刊订阅