首页> 外文会议>2010 IEEE Fourth International Conference on Semantic Computing >Achieving Classification and Clustering in One Shot Lesson Learned from Labeling Anonymous Datasets
【24h】

Achieving Classification and Clustering in One Shot Lesson Learned from Labeling Anonymous Datasets

机译:从标记匿名数据集中学到的一门课就可以实现分类和聚类

获取原文

摘要

This paper presents an algorithm LadsComplete which is able to automatically assign label for HTML tabular web data, depending on syntactical similarities between elements of the table. We categorize columns into three types: Disjoint Set Column (DSC), Repeated Prefix / Suffix Column (RPS) and Numeric Column (NUM). For labeling DSC column, our method rely on hits count from web search engine. Experimental results from large number of sites in different domains and subjective evaluation show that the proposed algorithm works fairly well. We hypothesize that our algorithm LadsComplete will do a good job for autonomous label assignment. We are NOT aware of any such prior work that address to connect two orthogonal research viz. wrapper generation and label extraction for value added services such as online comparison shopping.
机译:本文提出了一种LadsComplete算法,该算法能够根据表格元素之间的语法相似性,自动为HTML表格式Web数据分配标签。我们将列分为三种类型:不相交集列(DSC),重复前缀/后缀列(RPS)和数字列(NUM)。为了标记DSC列,我们的方法依赖于Web搜索引擎的点击计数。来自不同领域的大量站点的实验结果和主观评估表明,该算法运行良好。我们假设我们的算法LadsComplete在自主标签分配方面会做得很好。我们尚不知道有任何此类工作可以解决将两个正交研究连接起来的问题。包装器生成和标签提取,以提供增值服务,例如在线比较购物。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号