首页> 外文期刊>Journal of the American Society for Information Science and Technology >A Semantic Similarity Approach to Predicting Library of Congress Subject Headings for Social Tags
【24h】

A Semantic Similarity Approach to Predicting Library of Congress Subject Headings for Social Tags

机译:语义相似度预测国会图书馆社会标签主题词的方法

获取原文
获取原文并翻译 | 示例
           

摘要

Social tagging or collaborative tagging has become a new trend in the organization, management, and discovery of digital information. The rapid growth of shared information mostly controlled by social tags poses a new challenge for social tag-based information organization and retrieval. A plausible approach for this challenge is linking social tags to a controlled vocabulary. As an introductory step for this approach, this study investigates ways of predicting relevant subject headings for resources from social tags assigned to the resources. The prediction of subject headings was measured by five different similarity measures: tf-idf, cosine-based similarity (CoS), Jaccard similarity (or Jaccard coefficient; JS), Mutual information (MI), and information radius (IRad). Their results were compared to those by professionals. The results show that a CoS measure based on top five social tags was most effective. Inclusions of more social tags only aggravate the performance. The performance of JS is comparable to the performance of CoS while tf-idf is comparable with up to 70% less than the best performance. MI and IRad have inferior performance compared to the other methods. This study demonstrates the application of the similarity measuring techniques to the prediction of correct Library of Congress subject headings.
机译:社交标记或协作标记已成为数字信息的组织,管理和发现的新趋势。共享信息的快速增长主要由社交标签控制,这对基于社交标签的信息组织和检索提出了新的挑战。解决这一挑战的一种可行方法是将社交标签与受控词汇表联系起来。作为此方法的介绍性步骤,本研究调查了从分配给资源的社会标签预测资源的相关主题词的方法。主题标题的预测通过五种不同的相似性度量来衡量:tf-idf,基于余弦的相似性(CoS),Jaccard相似性(或Jaccard系数; JS),互信息(MI)和信息半径(IRad)。他们的结果与专业人士进行了比较。结果表明,基于前五个社交标签的CoS度量最为有效。包含更多社交标签只会加剧性能。 JS的性能可与CoS的性能相媲美,而tf-idf的性能可与最佳性能相比降低多达70%。与其他方法相比,MI和IRad的性能较差。这项研究证明了相似性测量技术在预测正确的国会图书馆主题词中的应用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号