Using Word Embeddings for Computing Distances Between Texts and for Authorship Attribution

机译：使用单词嵌入来计算文本之间的距离和作者身份

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, word embeddings are used for the task of supervised authorship attribution. While previous methods have for instance been looking at characters (n-grams), syntax and most importantly token frequencies, the method presented focusses on the implications of semantic relationships between words. With this instead of authors word choices, semantic networks of entities as perceived by authors may come closer into focus. We find that those can be used reliably for authorship attribution. The method is generally applicable as a tool to compare different texts and/or authors through word embeddings which have been trained separately. This is achieved by not comparing vectors directly, but by comparing sets of most similar words for words shared between texts and then aggregating and averaging similarities per text pair. On two literary corpora (German, English), we compute embeddings for each text separately. The similarities are then used to detect the author of an unknown text.

机译：在本文中，单词嵌入被用于监督作者身份归属的任务。例如，虽然先前的方法一直在研究字符（n-gram），语法和最重要的标记频率，但提出的方法着重于单词之间语义关系的含义。以此来代替作者的单词选择，作者所感知的实体的语义网络可能会变得更加集中。我们发现这些可以可靠地用于作者身份归属。该方法通常可用作通过分别训练的词嵌入来比较不同文本和/或作者的工具。这不是通过不直接比较向量，而是通过比较文本之间共享的单词的最相似单词的集合，然后对每个文本对的相似度进行汇总和平均来实现的。在两个文学语料库（德语，英语）上，我们分别计算每个文本的嵌入量。然后使用相似性来检测未知文本的作者。

著录项

来源
《International conference on applications of natural language to information systems》|2017年|274-277|共4页
会议地点
作者
Armin Hoenen;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Authorship attribution; Word embeddings; Text distance;

机译：著作权归属;词嵌入;文字距离;

相似文献

外文文献
中文文献
专利

1. On the role of words in the network structure of texts: Application to authorship attribution [J] . Akimushkin Camilo, Amancio Diego R., Oliveira Osvaldo N. Jr. Physica, A. Statistical mechanics and its applications . 2018,第期

机译：关于文字在文本网络结构中的作用：对Autheration attution的应用
2. Computational Stylometic Approach Based on Frequent Word and Frequent Pair in the Text Mining Authorship Attribution [J] . Tareef Kamil Mustafa, Norwati Mustapha, Masrah Azrifah Azmi Murad, International journal of computer science and network security . 2009,第3期

机译：文本挖掘作者归属中基于频繁词和频繁对的计算风格方法
3. Computational Stylometic Approach Based on Frequent Word and Frequent Pair in the Text Mining Authorship Attribution [J] . Tareef Kamil Mustafa, Norwati Mustapha, Masrah Azrifah Azmi Murad, International journal of computer science and network security . 2009,第3期

机译：文本挖掘作者归属中基于频繁词和频繁对的计算风格方法
4. Using Word Embeddings for Computing Distances Between Texts and for Authorship Attribution [C] . Armin Hoenen International Conference on Applications of Natural Language to Information Systems . 2017

机译：使用Word Embeddings来计算文本和Autheration attuction之间的距离
5. Network Data Analysis of Word Graphs with Applications to Authorship Attribution [D] . Leonard, Timothy. 2018

机译：词图的网络数据分析及其在作者归属中的应用
6. Text Authorship Identified Using the Dynamics of Word Co-Occurrence Networks [O] . Camilo Akimushkin, Diego Raphael Amancio, Osvaldo Novais Oliveira Jr. -1

机译：使用词共现网络的动态识别文本作者身份
7. On the role of words in the network structure of texts: application to authorship attribution [O] . Akimushkin, Camilo, Amancio, Diego R., Oliveira Jr, Osvaldo N. 2017

机译：论词语在网络文本结构中的作用：应用于作者归属

Using Word Embeddings for Computing Distances Between Texts and for Authorship Attribution

摘要

著录项

相似文献

相关主题

期刊订阅