首页> 外文会议>International conference on information knowledge engineering >Libwebex: A Non-discriminating Web Content Extraction Library
【24h】

Libwebex: A Non-discriminating Web Content Extraction Library

机译:Libwebex:一个无差别的Web内容提取库

获取原文

摘要

Libwebex is designed to be a powerful and flexible tool for the purpose of web page content extraction in the domain of Data Integration. Utilizing the semi-structured nature of web pages, this library provides users with a skeletal view of the web page content. The data elements that need to be extracted are arranged into a hierarchical tree of related topics and content. In this representation, users are able to build custom applications on top of this framework using the object-oriented design concept, the Visitor design pattern. All a programmer needs to provide is the customized behavior to be executed at each node in the tree. The library provides multiple basic building blocks for additional applications, such as archiving, searching, and printing mechanisms.
机译:Libwebex被设计为功能强大且灵活的工具,目的是在数据集成领域提取网页内容。利用网页的半结构化性质,该库为用户提供了网页内容的骨架视图。需要提取的数据元素被安排在相关主题和内容的层次树中。在这种表示形式中,用户可以使用面向对象的设计概念(访客设计模式)在此框架之上构建自定义应用程序。程序员需要提供的只是在树中每个节点上执行的自定义行为。该库为其他应用程序提供了多个基本构建块,例如归档,搜索和打印机制。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号