首页> 外文会议>Mexican conference on pattern recognition >Semi-Supervised Approach to Named Entity Recognition in Spanish Applied to a Real-World Conversational System
【24h】

Semi-Supervised Approach to Named Entity Recognition in Spanish Applied to a Real-World Conversational System

机译:半监督方法在西班牙语中的命名实体识别应用于现实世界的会话系统

获取原文

摘要

In this paper, we improve the named-entity recognition (NER) capabilities for an already existing text-based dialog system (TDS) in Spanish. Our solution is twofold: first, we developed a hidden Markov model part-of-speech (POS) tagger trained with the frequencies from over 120-million words; second, we obtained 2,283 real-world conversations from the interactions between users and a TDS. All interactions occurred through a natural-language text-based chat interface. The TDS was designed to help users decide which product from a well-defined catalog best suited their needs. The conversations were manually tagged using the classical Penn Treebank tag set, with the addition of an ENTITY tag for all words relating to a brand or product. The proposed system uses an hybrid approach to NER: first it looks up each word in a previously defined catalog. If the word is not found, then it uses the tagger to tag it with its appropriate POS tag. When tested on an independent conversation set, our solution presented a higher accuracy and higher recall rates compared to a current development from the industry.
机译:在本文中,我们为西班牙语中已经存在的基于文本的对话系统(TDS)改进了命名实体识别(NER)功能。我们的解决方案有两个方面:首先,我们开发了一种隐马尔可夫模型词性(POS)标记器,该标记器使用超过1.2亿个单词的频率进行训练;其次,我们从用户和TDS之间的互动中获得了2,283个现实世界的对话。所有交互都是通过基于自然语言的基于文本的聊天界面进行的。 TDS旨在帮助用户从定义明确的目录中确定最适合他们需求的产品。使用经典的Penn Treebank标签集手动标记对话,并为与品牌或产品有关的所有单词添加一个ENTITY标签。拟议的系统对NER使用一种混合方法:首先,它在先前定义的目录中查找每个单词。如果未找到该单词,则它将使用标记器为其适当的POS标签标记它。与独立的对话集相比,在独立的对话集上进行测试时,我们的解决方案具有更高的准确性和更高的召回率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号