Classification of Web Documents Using a Graph-Based Model and Structural Patterns

机译：使用基于图的模型和结构模式对Web文档进行分类

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of classifying web documents is studied in this paper. A graph-based instead of traditional vector-based model is used for document representation. A novel classification algorithm which uses two different types of structural patterns (subgraphs): contrast and common is proposed. This approach is strongly associated with the classical emerging patterns techniques known from decision tables. The presented method is evaluated on three different benchmark web documents collections for measuring classification accuracy. Results show that it can outperform other existing algorithms (based on vector, graph, and hybrid document representation) in terms of accuracy and document model complexity. Another advantage is that the introduced classifier has a simple, understandable structure and can be easily extended by the expert knowledge.

机译：本文研究了网络文档的分类问题。基于图形的模型而不是传统的基于矢量的模型用于文档表示。提出了一种新颖的分类算法，该算法使用两种不同类型的结构模式（子图）：对比和通用。该方法与决策表中已知的经典新兴模式技术紧密相关。在三个不同的基准Web文档集合上对提出的方法进行了评估，以测量分类准确性。结果表明，在准确性和文档模型复杂性方面，它可以优于其他现有算法（基于矢量，图形和混合文档表示）。另一个优点是，引入的分类器具有简单易懂的结构，并且可以通过专家知识轻松扩展。

著录项

来源
《European Conference on Principle and Practice of Knowledge Discovery in Databases; 20070917-21; Warsaw(PL)》|2007年|P.67-78|共12页
会议地点 Warsaw(PL)
作者
Andrzej Dominik; Zbigniew Walczak; Jacek Wojciechowski;
展开▼
作者单位

Warsaw University of Technology, Institute of Radioelectronics Nowowiejska 15/19, 00-665 Warsaw, Poland;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词

相似文献

外文文献
中文文献
专利

1. Web document classification using topic modeling based document ranking [J] . Youngseok Lee, Jungwon Cho International Journal of Electrical and Computer Engineering . 2021,第3期

机译：使用基于主题建模的文档排名进行Web文档分类
2. Understanding web documents: finding pagelets for transformation using structural patterns [J] . Reza Ferrydiansyah, Bambang Parmanto International Journal of Web Engineering and Technology . 2008,第3期

机译：了解Web文档：查找小页面以使用结构模式进行转换
3. Graph vs. bag representation models for the topic classification of web documents [J] . Papadakis George, Giannakopoulos George, Paliouras Georgios World Wide Web . 2016,第5期

机译：用于Web文档主题分类的图形与袋表示模型
4. Classification of Web Documents Using a Graph-Based Model and Structural Patterns [C] . Andrzej Dominik, Zbigniew Walczak, Jacek Wojciechowski European Conference on Principles and Practice of Knowledge Discovery in Databases . 2007

机译：使用基于图形的模型和结构模式对Web文档进行分类
5. M-InfoSift: A graph-based approach for multiclass document classification. [D] . Venkatachalam, Aravind. 2007

机译：M-InfoSift：一种基于图的多类文档分类方法。
6. Adding Dimensions to the Analysis of the Quality of Health Information of Websites Returned by Google: Cluster Analysis Identifies Patterns of Websites According to their Classification and the Type of Intervention Described [O] . Mubashar Yaqub, Pietro Ghezzi 2015

机译：为Google返回的网站的健康信息质量分析增加维度：聚类分析根据网站的分类和所描述的干预类型来识别网站的模式
7. Classification of News Web Documents Based on Structural Features [O] . Shisanu Tongchim, Virach Sornlertlamvanich, Hitoshi Isahara 2008

机译：基于结构特征的新闻Web文档分类
8. Graph-Based Structural Pattern Learning [R] . Holder, L. B. , Cook, D. J. 2006

机译：基于图的结构模式学习

Classification of Web Documents Using a Graph-Based Model and Structural Patterns

摘要

著录项

相似文献

相关主题

期刊订阅