Unsupervised Learning of Tree Alignment Models for Information Extraction

机译：信息提取的无监督学习树对齐模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table- a data structure that better lends itself to high-level data mining and information exploitation. Our algorithm effectively combines tree and string alignment algorithms, as well as domain-specific feature extraction to match semantically related data across search results. The applications of our approach are vast and include hidden web crawling, semantic tagging, and federated search. We build on earlier research on the use of tree alignment for information extraction. In contrast to previous approaches that rely on hand tuned parameters, our algorithm makes use of a variant of Support Vector Machines (SVMs) to learn a parameterized, site-independent tree alignment model. This model can then be used to deduce common structural and textual elements of a set of HTML parse trees. We report some preliminary results of our system's performance on data from websites with a variety of different layouts.

机译：我们提出的算法提取HTML搜索结果中的字段。该算法的输出是一个数据库表 - 的数据结构，更好地适合于高层次的数据挖掘和信息开发。我们的算法有效地结合树和字符串比对算法，以及特定领域的特征提取，来匹配的搜索结果语义相关数据。我们的方法的应用非常广泛，包括隐藏的网页抓取，语义标记，并联合搜索。我们建立关于使用树排列的信息提取早期的研究。与此相反的是依靠手工调整参数以前的方法，我们的算法利用支持向量机（SVM）的变体的学习参数，网站无关树对准模型。然后，该模型可以被用来推导一组HTML解析树中的共同的结构和文本元素。我们有各种不同的布局的报告从网站上我们的系统对数据性能的一些初步结果。

著录项

来源
《IEEE International Conference on Data Mining》|2006年||共5页
会议地点
作者
Philip Zigoris; Damian Eads; Yi Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP274.2-53;
关键词

相似文献

外文文献
中文文献
专利

1. 基于决策树模型的吉林西部居民地分布信息提取(英文) [J] . 连懿, 陈圣波, 王亚楠, 景观研究：英文版 . 2010,第002期
2. Unsupervised Sub-tree Alignment for Tree-to-Tree Translation [J] . Xiao T., Zhu J. The Journal of Artificial Intelligence Research . 2013,第4期

机译：树到树转换的无监督子树对齐
3. Unsupervised Sub-tree Alignment for Tree-to-Tree Translation [J] . Tong Xiao, Jingbo Zhu The Journal of Artificial Intelligence Research . 2013,第Null期

机译：树到树转换的无监督子树对齐
4. Unsupervised-learning-based keyphrase extraction from a single document by the effective combination of the graph-based model and the modified C-value method [J] . Yeom Hongseon, Ko Youngjoong, Seo Jungyun Computer speech and language . 2019,第NOVa期

机译：通过有效结合基于图的模型和改进的C值方法从单个文档中提取基于无监督学习的关键字
5. Unsupervised Learning of Tree Alignment Models for Information Extraction [C] . Philip Zigoris, Damian Eads, Yi Zhang IEEE International Conference on Data Mining . 2006

机译：信息提取的无监督学习树对齐模型
6. Deep Learning Models for Unsupervised and Transfer Learning [D] . Srivastava, Nitish. 2017

机译：用于无监督和转移学习的深度学习模型
7. Unsupervised Machine Learning for Advanced Tolerance Monitoring of Wire Electrical Discharge Machining of Disc Turbine Fir-Tree Slots [O] . Jun Wang, Jose A. Sanchez, Izaro Ayesta, 2018

机译：无监督机器学习用于圆盘涡轮枞树槽线放电加工的高级公差监控
8. Unsupervised Learning of Tree Alignment Models for Information Extraction [O] . 2008

机译：用于信息提取的树对齐模型的无监督学习

Unsupervised Learning of Tree Alignment Models for Information Extraction

摘要

著录项

相似文献

相关主题

期刊订阅