Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging

Shivachi Casper Shikali; Mokhosi Refuoe; Zhou Shijie; Liu Qihe

首页> 外文期刊>ACM transactions on Asian and low-resource language information processing >Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging

【24h】

Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging

机译：使用Conv-LSTM模型进行斯瓦希里语字表示和词语标记的学习音节

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The need to capture intra-word information in natural language processing (NLP) tasks has inspired research in learning various word representations at word, character, or morpheme levels, but little attention has been given to syllables from a syllabic alphabet. Motivated by the success of compositional models in morphological languages, we present a Convolutional-long short term memory (Conv-LSTM) model for constructing Swahili word representation vectors from syllables. The unified architecture addresses the word agglutination and polysemous nature of Swahili by extracting high-level syllable features using a convolutional neural network (CNN) and then composes quality word embeddings with a long short term memory (LSTM). The word embeddings are then validated using a syllable-aware language model (31.267) and a part-of-speech (POS) tagging task (98.78), both yielding very competitive results to the state-of-art models in their respective domains. We further validate the language model using Xhosa and Shona, which are syllabic-based languages. The novelty of the study is in its capability to construct quality word embeddings from syllables using a hybrid model that does not use max-over-pool common in CNN and then the exploitation of these embeddings in POS tagging. Therefore, the study plays a crucial role in the processing of agglutinative and syllabic-based languages by contributing quality word embeddings from syllable embeddings, a robust Conv-LSTM model that learns syllables for not only language modeling and POS tagging, but also for other downstream NLP tasks.

机译：需要在自然语言处理中捕获词内信息（NLP）任务的灵感研究了在语言，角色或语素级别学习各种单词表示的研究，但是从音节字母表中都有很少的关注。通过形态语言成分模型的成功，我们提出了一种卷积的长短期记忆（CONC-LSTM）模型，用于构建音节的斯瓦希里语字表示向量。统一架构通过使用卷积神经网络（CNN）提取高级音节特征来解决斯瓦希里语的单词凹陷和多仪性质，然后用长短短期内存（LSTM）归组质量字嵌入品。然后使用音节感知语言模型（31.267）和一个语音（POS）标记任务（98.78）进行验证单词eMbeddings，两者都在其各自的域中对最先进的模型产生非常竞争力的结果。我们进一步使用Xhosa和Shona验证了语言模型，这些模型是基于Syllabic的语言。该研究的新颖性是它的能力，它可以使用在CNN中不使用MAX-过池的混合模型来构建质量词嵌入的音色，然后在POS标记中利用这些嵌入的嵌入式。因此，该研究在从音节嵌入品中贡献质量单词嵌入来处理agGlutinative和Syslabic的语言中的一个至关重要的作用，这是一个强大的conv-lstm模型，它不仅为语言建模和POS标记而学习音节，而且还用于其他下游NLP任务。

著录项

来源
《ACM transactions on Asian and low-resource language information processing》 |2021年第4期|58.1-58.25|共25页
作者
Shivachi Casper Shikali; Mokhosi Refuoe; Zhou Shijie; Liu Qihe;
展开▼
作者单位

Univ Elect Sci & Technol China Chengdu Sichuan Peoples R China|South Eastern Kenya Univ Kitui Kenya;

Univ Elect Sci & Technol China Chengdu Sichuan Peoples R China;

Univ Elect Sci & Technol China Chengdu Sichuan Peoples R China;

Univ Elect Sci & Technol China Chengdu Sichuan Peoples R China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Deep learning; word representation; syllabic alphabet; language modeling; part-of-speech tagging;

机译：深入学习;词表示;音节字母表;语言建模;词性标记;

相似文献

外文文献
中文文献
专利

1. Chinese Text Similarity Algorithm Based on Part-of-Speech Tagging and Word Vector Model [J] . Zhixin Ma, Mengguang Li Journal of Computers . 2019,第4期

机译：基于词性标注和词向量模型的中文文本相似度算法
2. A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing [J] . YUAN Lichi 电子学报：英文版 . 2014,第001期

机译：使用Word群集和语法解析的词语标记模型
3. A Neural Joint Model with BERT for Burmese Syllable Segmentation, Word Segmentation, and POS Tagging [J] . Mao Cunli, Man Zhibo, Yu Zhengtao, ACM transactions on Asian and low-resource language information processing . 2021,第4期

机译：具有伯尔马斯音节分割，词分割和POS标记的伯特的神经关节模型
4. Tibetan Word Segmentation as Sub-syllable Tagging with Syllable's Part-of-Speech Property [C] . Huidan Liu, Congjun Long, Minghua Nuo, China national conference on computational linguistics;International symposium on natural language processing based on naturally annotated big data . 2015

机译：具有音节词性的藏语分词作为子音节标记
5. Phonological detail of word representations during the earliest stages of word learning. [D] . Fineberg, Ioana Apetroaia. 2003

机译：在单词学习的最早阶段，单词表示的语音细节。
6. Large-Scale Modeling of Wordform Learning and Representation [O] . Daragh E. Sibley, Christopher T. Kello, David C. Plaut, -1

机译：Wordform学习和表示的大规模建模
7. Part-of-speech tagging of Twitter microposts only using distributed word representations and a neural network [O] . Godin Fréderic, De Neve Wesley, Van de Walle Rik 2015

机译：Twitter微博的词性标注仅使用分布式词表示和神经网络

Learning Syllables Using Conv-LSTM Model for Swahili Word Representation and Part-of-speech Tagging

摘要

著录项

相似文献

相关主题

期刊订阅