Dataset Construction Method for Word Reading Disambiguation

机译：DataSet施工方法，用于阅读歧义

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The scarcity of large corpora in reading dis-ambiguated words is a major limitation in linguistic analysis and the initiation of a statistical approach to word reading disambiguation. As readings of words are usually not written in documents like meanings of words, therefore, human annotation is necessary but expensive. In this study, a method is proposed to construct a reading disambiguated dataset for word reading disambiguation. The method constructs a dataset of sentences wherein words with ambiguity in reading (pronunciation), called hcteronyms, are tagged for correct reading. In this method, a word with unique reading is labeled to a heteronym, and this unique word is used as a query word to collect sentences that include the word. The word in the collected sentences is replaced by the original ambiguous word and the reading corresponding to that of the query word is tagged as the pronunciation of the heteronym. It was confirmed through experiments that the method was able to collect data effectively, and the collected data was numerically balanced among all the readings of the heteronym.

机译：阅读Dirs-andigated言语中的大型公司的稀缺是语言分析的一个主要限制，并开始统计方法阅读歧义歧义。由于单词的读数通常没有用单词的含义写入文件，因此，人类注释是必要的，但昂贵。在该研究中，提出了一种方法来构建读数歧义的数据集，用于阅读歧义。该方法构造句子的数据集，其中读取（发音）中具有歧义的单词被标记为正确的读数。在此方法中，标有一个具有唯一读取的单词，标记为异义，并且该唯一单词用作查询字来收集包含该单词的句子。收集的句子中的单词由原始模糊的单词替换，与查询字的读数相对应的读数被标记为异常的发音。通过实验证实了该方法能够有效收集数据，并且收集的数据在异常的所有读数中是数值平衡的。

著录项

来源
《Pacific Asia Conference on Language, Information and Computation》|2018年|858 p.|共8页
会议地点
作者
Koki Nishiyama; Kazuhide Yamamoto; Hideharu Nakajima;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机网络;
关键词

相似文献

外文文献
中文文献
专利

1. Effects of information and machine learning algorithms on word sense disambiguation with small datasets [J] . Gondy Leroy, Thomas C. Rindflesch International journal of medical informatics . 2005,第7a8期

机译：信息和机器学习算法对小数据集词义消歧的影响
2. The effect of convolving word length, word frequency, function word predictability and first pass reading time in the analysis of a fixation-related fMRI dataset [J] . Benjamin T. Carter, Steven G. Luke Data in Brief . 2019,第1期

机译：卷积字长，字频，功能字的可预测性和首读时间在固定相关的fMRI数据集分析中的影响
3. Spreading semantic information by Word Sense Disambiguation [J] . Gutierrez Yoan, Vazquez Sonia, Montoyo Andres Knowledge-Based Systems . 2017,第sepa15期

机译：通过词义消除歧义传播语义信息
4. Dataset Construction Method for Word Reading Disambiguation [C] . Koki Nishiyama, Kazuhide Yamamoto, Hideharu Nakajima Pacific Asia Conference on Language, Information and Computation . 2018

机译：单词歧义消除的数据集构建方法
5. Subjectivity word sense disambiguation: A method for sense-aware subjectivity analysis. [D] . Akkaya, Cem. 2014

机译：主观性词义消歧：一种用于感知感知的主观性分析的方法。
6. The effect of convolving word length word frequency function word predictability and first pass reading time in the analysis of a fixation-related fMRI dataset [O] . Benjamin T. Carter, Steven G. Luke 2019

机译：卷积字长词频功能词的可预测性和首读时间对固定相关的fMRI数据集分析的影响
7. Effects of Information and Machine Learning Algorithms on Word Sense Disambiguation with Small Datasets [O] . Leroy, Gondy, Rindflesch, Thomas C. 2005

机译：信息和机器学习算法对小数据集词义消歧的影响

Dataset Construction Method for Word Reading Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅