HTML Tree Parsing Algorithm Based on Pre-extracted Data

机译：基于预提取数据的HTML树解析算法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the paper, a new method of extracting HTML Tree from web pages is proposed. Its main idea is that the parts of web pages which are not easy to parse including tags and attributes should be handled previously, then the remaining parts are tidied and parsed, and then both the two former extracted parts are deposited in the tree. As integrated the tidying process and the parsing process, the new method does not only keep the web data integrity but also simplify the complexity of algorithms. The test shows that it can parse all kinds of web pages and provide concrete fault tolerance mechanisms.

机译：本文提出了一种从网页中提取HTML树的新方法。它的主要思想是，应对不容易解析的网页部分（包括标签和属性）进行事先处理，然后整理其余部分并进行解析，然后将之前提取的两个部分都存储在树中。通过将整理过程和解析过程集成在一起，新方法不仅保持了Web数据的完整性，而且简化了算法的复杂性。测试表明，它可以解析各种网页并提供具体的容错机制。

著录项

来源
《Mobile Business, 2009. ICMB 2009》|2009年|249-254|共6页
会议地点
作者
Mingqiu Song; Ruixue Zhang; Duo Gang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Internet; hypermedia markup languages; program compilers; tree data structures; HTML tree parsing algorithm; Web data integrity; Web pages; fault tolerance mechanisms; parsing process; preextracted data; HTML parsing; information extracting; web pages tidying;

机译：互联网;超媒体标记语言;程序编译器;树数据结构; HTML树解析算法; Web数据完整性;网页;容错机制;解析过程;预提取的数据; HTML解析;信息提取;网页整理;

相似文献

外文文献
中文文献
专利

1. A Novel Incremental Information Extraction Using Parse Tree Query Language And Parse Tree Databases [J] . Rajula Srilatha, K. Murali International Journal of Computer Trends and Technology . 2013,第10期

机译：使用解析树查询语言和解析树数据库的新型增量信息提取
2. MASCOT HTML and XML parser: An implementation of a novel object model for protein identification data [J] . Yang CGG, Granite SJ, Van Eyk JE, Proteomics . 2006,第21期

机译：MASCOT HTML和XML解析器：用于蛋白质识别数据的新型对象模型的实现
3. Tree kernel-based semantic role labeling with enriched parse tree structure [J] . Zhou GuoDong, Li Junhui, Fan Jianxi, Information Processing & Management . 2011,第3期

机译：具有丰富解析树结构的基于树核的语义角色标记
4. HTML Tree Parsing Algorithm Based on Pre-extracted Data [C] . Mingqiu Song, Ruixue Zhang, Duo Gang Mobile Business, 2009. ICMB 2009 . 2009

机译：基于预提取数据的HTML树解析算法
5. Parallel algorithms for a highly unstructured problem: Natural language parsing using tree adjoining grammar. [D] . Nurkkala, Thomas Benney. 1997

机译：针对高度非结构化问题的并行算法：使用树邻接语法的自然语言解析。
6. Comparison of the Tree-Based Machine Learning Algorithms to Cox Regression in Predicting the Survival of Oral and Pharyngeal Cancers: Analyses Based on SEER Database [O] . Mi Du, Dandara G. Haag, John W. Lynch, 2020

机译：基于树的机器学习算法与Cox回归预测中口腔和咽癌存活的比较：基于SEER数据库分析
7. Parsing algorithms based on tree automata [O] . Andreas Maletti, Giorgio Satta 2009

机译：基于树自动机的解析算法

HTML Tree Parsing Algorithm Based on Pre-extracted Data

摘要

著录项

相似文献

相关主题

期刊订阅