Attribute Extraction by Combing Feature Ranking and Sequence Labeling

机译：通过组合特征排序和序列标记来提取属性提取

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Due to the language characteristics, it is a challenge for the knowledge extraction of Chinese text documents. In this paper, an attribute extraction method based on feature ranking and sequence labeling is proposed. Firstly, we obtain the training corpus by annotating Wikipedia texts with the attribute information extracted from the information box of Wikipedia. To improve the quality of training corpus, the trigger keywords are filtered based on the information entropy. The attribute extraction is regarded as a sequence labeling problem, which exploits the multidimensional features such as part of speech and word context. Then, the conditional random field model is trained on the corpus to extract attributes from the unstructured texts. Experiment results show that our method can effectively improve the quality of training corpus using the keyword filtering technique, and hence improve the performance of attribute extraction. Compared with the rule-based attribute extraction methods, our method can be extended to other fields, which has better portability and expansibility.

机译：由于语言特征，对于中国文本文件的知识提取是一项挑战。本文提出了一种基于特征排序和序列标记的属性提取方法。首先，我们通过向维基百科信息框中提取的属性信息注释维基百科文本来获得培训语料库。为了提高培训语料库的质量，基于信息熵过滤触发关键字。属性提取被视为序列标记问题，其利用多维功能，例如语音和单词上下文的一部分。然后，条件随机字段模型在语料库上培训以从非结构化文本中提取属性。实验结果表明，我们的方法可以有效地提高了使用关键字过滤技术的培训语料库的质量，从而提高了属性提取的性能。与基于规则的属性提取方法相比，我们的方法可以扩展到其他字段，其具有更好的可移植性和可扩展性。

著录项

来源
《IEEE International Conference on Big Data and Smart Computing》|2018年|788p|共4页
会议地点
作者
Bin Peng; Xiaoming Zhang; Yueying He; Zhoujun Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词
Feature extraction; Labeling; Training; Filtering; Data mining; Encyclopedias; Internet;

机译：特征提取;标签;培训;过滤;数据挖掘;百科全书;互联网;

相似文献

外文文献
中文文献
专利

1. Consumer preferences for food labelling attributes: comparing direct ranking and best-worst scaling for measurement of attribute importance, preference intensity and attribute dominance. [J] . Lagerkvist C. J. Food Quality and Preference . 2013,第2期

机译：消费者对食品标签属性的偏好：比较直接排名和最差标度，以衡量属性的重要性，偏好强度和属性优势。
2. A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction [J] . LiQ., ZhaiH., DelegerL., Journal of the American Medical Informatics Association : . 2013,第5期

机译：一种序列标记方法，可链接药物及其在临床说明和临床试验公告中的属性以提取信息
3. A multi-label feature extraction algorithm via maximizing feature variance and feature-label dependence simultaneously [J] . Xu Jianhua, Liu Jiali, Yin Jing, Knowledge-Based Systems . 2016,第Apra15期

机译：同时最大化特征方差和特征标签依赖性的多标签特征提取算法
4. Attribute Extraction by Combing Feature Ranking and Sequence Labeling [C] . Bin Peng, Xiaoming Zhang, Yueying He, IEEE International Conference on Big Data and Smart Computing . 2018

机译：结合特征排序和序列标注的属性提取
5. A bottom-up extraction of atomic feature vectors and action sequences for video representation. [D] . Burlick, Matthew. 2013

机译：自底向上提取用于视频表示的原子特征向量和动作序列。
6. A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction [O] . Qi Li, Haijun Zhai, Louise Deleger, 2013

机译：一种序列标记方法用于链接药物及其临床注释和临床试验公告中的属性以提取信息
7. A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction [O] . Qi Li, Haijun Zhai, Louise Deleger, 2013

机译：链接药物的序列标记方法及其在临床备注中的临床试验通知中的临床审判公告
8. Feature Extraction From DNA Sequences by Multifractal Analysis. [R] . Zhang, H., Kinsner, W. 2001

机译：多重分形分析从DNa序列中提取特征。

Attribute Extraction by Combing Feature Ranking and Sequence Labeling

摘要

著录项

相似文献

相关主题

期刊订阅