首页> 外文会议>IEEE International Conference on Big Data and Smart Computing >Attribute Extraction by Combing Feature Ranking and Sequence Labeling
【24h】

Attribute Extraction by Combing Feature Ranking and Sequence Labeling

机译:通过组合特征排序和序列标记来提取属性提取

获取原文

摘要

Due to the language characteristics, it is a challenge for the knowledge extraction of Chinese text documents. In this paper, an attribute extraction method based on feature ranking and sequence labeling is proposed. Firstly, we obtain the training corpus by annotating Wikipedia texts with the attribute information extracted from the information box of Wikipedia. To improve the quality of training corpus, the trigger keywords are filtered based on the information entropy. The attribute extraction is regarded as a sequence labeling problem, which exploits the multidimensional features such as part of speech and word context. Then, the conditional random field model is trained on the corpus to extract attributes from the unstructured texts. Experiment results show that our method can effectively improve the quality of training corpus using the keyword filtering technique, and hence improve the performance of attribute extraction. Compared with the rule-based attribute extraction methods, our method can be extended to other fields, which has better portability and expansibility.
机译:由于语言特征,对于中国文本文件的知识提取是一项挑战。本文提出了一种基于特征排序和序列标记的属性提取方法。首先,我们通过向维基百科信息框中提取的属性信息注释维基百科文本来获得培训语料库。为了提高培训语料库的质量,基于信息熵过滤触发关键字。属性提取被视为序列标记问题,其利用多维功能,例如语音和单词上下文的一部分。然后,条件随机字段模型在语料库上培训以从非结构化文本中提取属性。实验结果表明,我们的方法可以有效地提高了使用关键字过滤技术的培训语料库的质量,从而提高了属性提取的性能。与基于规则的属性提取方法相比,我们的方法可以扩展到其他字段,其具有更好的可移植性和可扩展性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号