首页> 外文会议>European Conference on Principle and Practice of Knowledge Discovery in Databases; 20070917-21; Warsaw(PL) >Classification of Web Documents Using a Graph-Based Model and Structural Patterns
【24h】

Classification of Web Documents Using a Graph-Based Model and Structural Patterns

机译:使用基于图的模型和结构模式对Web文档进行分类

获取原文
获取原文并翻译 | 示例

摘要

The problem of classifying web documents is studied in this paper. A graph-based instead of traditional vector-based model is used for document representation. A novel classification algorithm which uses two different types of structural patterns (subgraphs): contrast and common is proposed. This approach is strongly associated with the classical emerging patterns techniques known from decision tables. The presented method is evaluated on three different benchmark web documents collections for measuring classification accuracy. Results show that it can outperform other existing algorithms (based on vector, graph, and hybrid document representation) in terms of accuracy and document model complexity. Another advantage is that the introduced classifier has a simple, understandable structure and can be easily extended by the expert knowledge.
机译:本文研究了网络文档的分类问题。基于图形的模型而不是传统的基于矢量的模型用于文档表示。提出了一种新颖的分类算法,该算法使用两种不同类型的结构模式(子图):对比和通用。该方法与决策表中已知的经典新兴模式技术紧密相关。在三个不同的基准Web文档集合上对提出的方法进行了评估,以测量分类准确性。结果表明,在准确性和文档模型复杂性方面,它可以优于其他现有算法(基于矢量,图形和混合文档表示)。另一个优点是,引入的分类器具有简单易懂的结构,并且可以通过专家知识轻松扩展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号