首页> 外文期刊>ACM transactions on Asian language information processing >Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach
【24h】

Weighted Vote-Based Classifier Ensemble for Named Entity Recognition: A Genetic Algorithm-Based Approach

机译:基于加权投票的分类器集成,用于命名实体识别:一种基于遗传算法的方法

获取原文
获取原文并翻译 | 示例
           

摘要

In this article, we report the search capability of Genetic Algorithm (GA) to construct a weighted vote-based classifier ensemble for Named Entity Recognition (NER). Our underlying assumption is that the reliability of predictions of each classifier differs among the various named entity (NE) classes. Thus, it is necessary to quantify the amount of voting of a particular classifier for a particular output class. Here, an attempt is made to determine the appropriate weights of voting for each class in each classifier using GA. The proposed technique is evaluated for four leading Indian languages, namely Bengali, Hindi, Telugu, and Oriya, which are all resource-poor in nature. Evaluation results yield the recall, precision and F-measure values of 92.08%, 92.22%, and 92.15%, respectively for Bengali; 96.07%, 88.63%, and 92.20%, respectively for Hindi; 78.82%, 91.26%, and 84.59%, respectively for Telugu; and 88.56%, 89.98%, and 89.26%, respectively for Oriya. Finally, we evaluate our proposed approach with the benchmark dataset of CoNLL-2003 shared task that yields the overall recall, precision, and F-measure values of 88.72%, 88.64%, and 88.68%, respectively. Results also show that the vote based classifier ensemble identified by the GA-based approach outperforms all the individual classifiers, three conventional baseline ensembles, and some other existing ensemble techniques. In a part of the article, we formulate the problem of feature selection in any classifier under the single objective optimization framework and show that our proposed classifier ensemble attains superior performance to it.
机译:在本文中,我们报告了遗传算法(GA)的搜索功能,可为命名实体识别(NER)构造基于加权投票的分类器集成。我们的基本假设是,每个分类器的预测可靠性在各种命名实体(NE)类之间是不同的。因此,有必要对特定分类器针对特定输出类别的投票数量进行量化。在此,尝试使用GA为每个分类器中的每个类别确定合适的投票权重。针对四种主要的印度语言(即孟加拉语,印地语,泰卢固语和奥里亚语)对提议的技术进行了评估,这些语言本质上都是资源匮乏的。评估结果显示,孟加拉语的召回率,精确度和F测量值分别为92.08%,92.22%和92.15%。印地语分别为96.07%,88.63%和92.20%;泰卢固语分别为78.82%,91.26%和84.59%; Oriya分别为88.56%,89.98%和89.26%。最后,我们使用CoNLL-2003共享任务的基准数据集评估了我们提出的方法,该方法的总体召回率,精确度和F测量值分别为88.72%,88.64%和88.68%。结果还表明,通过基于GA的方法识别的基于投票的分类器集合优于所有单个分类器,三个常规基线集合以及一些其他现有的集合技术。在本文的一部分中,我们在单一目标优化框架下阐述了任何分类器中的特征选择问题,并表明我们提出的分类器集合具有比其更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号