Dzongkha Word Segmentation using Deep Learning

机译：使用深度学习的宗喀语分词

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Natural Language Processing (NLP) has been applied to machine translation, chatbots, speech recognition, question and answer systems, document summarization and so on. The Dzongkha language of Bhutan, however, has not been considered in NLP systems, due, presumably, to the fact that the language is complex and written as a string of syllables without proper word boundaries. Thus, Dzongkha word segmentation is the essential first step in building the NLP applications. The novelty of our research is in applying Deep Learning to the task of Dzongkha word segmentation, avoiding the need for manual feature engineering. The segmentation problem is formulated as a syllable tagging task. We also incorporate the windows approach where the tag of a syllable depends on its surrounding syllables. Two sets of experiments were designed, with four models of varying context sizes in each set. We evaluated our models using the syllable-tagged-corpus prepared by Dzongkha Development Commission. The model with context size 2 achieved the highest F-score of 94.40% with 94.47% Precision and 94.35% Recall.

机译：自然语言处理（NLP）已应用于机器翻译，聊天机器人，语音识别，问答系统，文档摘要等。但是，由于语言很复杂并且被编写为没有适当单词边界的一串音节，因此在NLP系统中并未考虑不丹的宗喀语。因此，宗喀语分词是构建NLP应用程序必不可少的第一步。我们研究的新颖之处在于将深度学习应用于宗喀语分词的任务，而无需进行人工特征工程。分割问题被表述为音节标记任务。我们还结合了Windows方法，其中音节的标签取决于其周围的音节。设计了两组实验，每组实验中有四个具有不同上下文大小的模型。我们使用宗喀发展委员会准备的音节标记语料库评估了我们的模型。具有上下文大小2的模型以94.47％的精度和94.35％的召回率实现了94.40％的最高F分数。

著录项

来源
《International Conference on Knowledge and Smart Technology》|2020年|1-5|共5页
会议地点
作者
Yeshi Jamtsho; Paisarn Muneesawang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Natural language processing; Machine learning; Tagging; Context modeling; Task analysis; Feature extraction; Neural networks;

机译：自然语言处理;机器学习;标记;上下文建模;任务分析;特征提取;神经网络;

相似文献

外文文献
中文文献
专利

1. Analysing the Methods of Dzongkha Word Segmentation [J] . Parshu Ram Dhungyel, Jānis Grundspe??is Applied Computer Systems . 2017,第1期

机译：宗喀语分词方法分析
2. Analysing the Methods of Dzongkha Word Segmentation [J] . Parshu Ram Dhungyel, Jānis Grundspe??is Applied Computer Systems . 2017,第1期

机译：宗喀语分词方法分析
3. Can infants map meaning to newly segmented words? Statistical segmentation and word learning [J] . Estes KG, Evans JL, Alibali MW, Psychological science: a journal of the American Psychological Society . 2007,第3期

机译：婴儿可以将含义映射到新分割的单词吗？统计分割和单词学习
4. Dzongkha Word Segmentation using Deep Learning [C] . Yeshi Jamtsho, Paisarn Muneesawang International Conference on Knowledge and Smart Technology . 2020

机译：使用深度学习的Dzongkha字分割
5. Word segmentation, word recognition, and word learning: A computational model of first language acquisition. [D] . Daland, Robert. 2009

机译：分词，单词识别和单词学习：母语习得的计算模型。
6. Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning [O] . Alexander E. White, Rebecca B. Dikow, Makinnon Baugh, 2020

机译：使用深度学习生成植物标本室的分割蒙版和用于训练分割模型的数据集
7. Analysing the Methods of Dzongkha Word Segmentation [O] . Dhungyel Parshu Ram, Grundspeņķis Jānis 2017

机译：Dzongkha分词方法分析

Dzongkha Word Segmentation using Deep Learning

摘要

著录项

相似文献

相关主题

期刊订阅