Extracting Informative Sections of Web Documents Based on Scoring DOM Subtrees

机译：基于评分DOM子树提取Web文档的信息部分

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Web documents can be represented and manipulated by DOM trees. In this paper, we present a novel method of automatically extracting informative sections of web documents by using their DOM trees. It gives a score to each DOM subtree of a web page and effectively extracts content by finding a subtree with the highest score.

机译：Web文档可以由DOM树表示和操纵。在本文中，我们提出了一种使用DOM树自动提取Web文档信息部分的新方法。它给网页的每个DOM子树一个分数，并通过找到分数最高的子树来有效地提取内容。

著录项

来源
《Proceedings of the 2008 international conference on internet computing》|2008年|321-324|共4页
会议地点 Las Vegas NV(US);Las Vegas NV(US)
作者
Yong-Hyuk Kim; Dong-ug Kim; Sejun Ahn;
展开▼
作者单位

Department of Computer Science and Engineering, Kwangwoon University Wolgye-dong, Nowon-gu, Seoul, 139-701, Korea;

Daum Communications Corp., Daum GMC 1730-8 Odeung-dong, Jeju, 690-150, Korea;

Daum Communications Corp., Daum GMC 1730-8 Odeung-dong, Jeju, 690-150, Korea;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;
关键词
content extraction; DOM trees; HTML; reformatting;

机译：内容提取； DOM树； HTML;格式化;

相似文献

外文文献
中文文献
专利

1. An Informative DOM Subtree Identification Method from Web Pages in Unfamiliar Web Sites [J] . Masanobu TSURUTA, Hiroyuki SAKAI, Shigeru MASUYAMA IEICE Transactions on Information and Systems . 2008,第4期

机译：陌生网站中网页的信息DOM子树识别方法
2. WISDOM: Web intrapage informative structure mining based on document object model [J] . Hung-Yu Kao, Jan-Ming Ho, Ming-Syan Chen IEEE Transactions on Knowledge and Data Engineering . 2005,第5期

机译：WISDOM：基于文档对象模型的Web页内信息结构挖掘
3. Extracting knowledge from XML document repository: a semantic Web-based approach [J] . Henry M. Kim, Arijit Sengupta Information technology & management . 2007,第3期

机译：从XML文档库中提取知识：一种基于语义Web的方法
4. Web Informative Content Block Detecting Based on Entropy and Parent-Child Relationship in DOM [C] . Zhu Lu, Hu Fei, Li Qingxia, 2008 IEEE international conference on information and automation (ICIA 2008) . 2008

机译：DOM中基于熵和亲子关系的Web信息内容块检测
5. Mapping geospatial events based on extracted spatial information from web documents [D] . Rock, Nathaniel Robert 2011

机译：基于从Web文档中提取的空间信息来映射地理空间事件
6. Documenting Alerts within a Web-based Early Event Detection System [O] . Amy Ising, Meichun Li, Anna Waller 2006

机译：在基于Web的早期事件检测系统中记录警报
7. An Informative DOM Subtree Identification Method from Web Pages in Unfamiliar Web Sites [O] . M. TSURUTA, H. SAKAI, S. MASUYAMA 2008

机译：来自Web页面中的非熟悉网站网页的信息DOM子树识别方法
8. NAEP Scoring of Eight-Grade Informative Writing. NAEP Facts, Vol. 5, No. 2 [R] . 2000

机译：八年级信息写作的NaEp评分。 NaEp Facts，Vol。 5号，2号

Extracting Informative Sections of Web Documents Based on Scoring DOM Subtrees

摘要

著录项

相似文献

相关主题

期刊订阅