【24h】

Mining Heterogeneous Transformations for Record Linkage

机译:用于记录联系的挖掘异质变换

获取原文

摘要

Heterogeneous transformations are translations between strings that are not characterized by a single function. E.g., nicknames, abbreviations and synonyms are heterogeneous transformations while edit distances are not. Such transformations are useful for information retrieval, information extraction and text understanding. They are especially useful in record linkage, where we determine whether two records refer to the same entity by examining the similarities between their fields. However, heterogeneous transformations are usually created manually and without assurance they will be useful. This paper presents a data mining approach to discover heterogeneous transformations between two data sets, without labeled training data. In addition to simple transformations, our algorithm finds combinatorial transformations, such as synonyms and abbreviations together. Our experiments demonstrate that we discover many types of specialized transformations, and we show that by exploiting these transformations we can improve record linkage. Our approach makes discovering and exploiting heterogeneous transformations more scalable and robust by lessening the domain and human dependencies.
机译:异构变换是字符串之间的翻译,其不具备单个功能。例如,昵称,缩写和同义词是异构的变换,而编辑距离不是。这种转换对于信息检索,信息提取和文本了解是有用的。它们在记录链接中特别有用,在那里我们通过检查其字段之间的相似性来确定两个记录是否指代相同的实体。然而,异构转化通常是手动创建的,没有保证,它们将是有用的。本文介绍了一种数据挖掘方法,可以在没有标记的训练数据之间发现两个数据集之间的异构变换。除了简单的转换外,我们的算法还发现组合转换,例如同义词和缩写。我们的实验表明,我们发现许多类型的专业转型,我们展示了通过利用这些转变,我们可以改善记录连锁。我们的方法通过减少域和人类依赖性,发现和利用异构变换更具可扩展性和强大。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号