首页> 外文会议>International conference on applied computing and information technology >A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text
【24h】

A Comparative Study of Using Bag-of-Words and Word-Embedding Attributes in the Spoiler Classification of English and Thai Text

机译:在英语和泰语文本的扰流分类中使用词袋和词嵌入属性的比较研究

获取原文

摘要

This research compares the effectiveness of using traditional bag-of-words and word-embedding attributes to classify movie comments into spoiler or non-spoiler. Both approaches were applied to comments in English, an inflectional language; and in Thai, a non-inflectional language. Experimental results suggested that in terms of classification performance, word embedding was not clearly better than bag of words. Yet, a decision to choose it over bag of words could be due to its scalability. Between Word2Vec and FastText embeddings, the former was favorable when few out-of-vocabulary (OOV) words were present. Finally, although FastText was expected to be helpful with a large number of OOV words, its benefit was hardly seen for Thai language.
机译:这项研究比较了使用传统的词袋和词嵌入属性将电影评论分为破坏者或非破坏者的有效性。两种方法都适用于英语(一种屈折曲折的语言)的注释。以及泰语(一种非母语的语言)。实验结果表明,在分类性能方面,单词嵌入并不明显好于单词袋。然而,决定使用它而不是一字不漏的决定可能是由于它的可伸缩性。在Word2Vec和FastText嵌入之间,当很少出现语音(OOV)单词时,前者是有利的。最后,尽管人们期望FastText可以处理大量的OOV单词,但是对于泰语来说,它的好处几乎看不到。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号