首页> 外文会议>Workshop of the Cross-Language Evaluation Forum >Improving Web Pages Retrieval Using Combined Fields
【24h】

Improving Web Pages Retrieval Using Combined Fields

机译:使用组合字段改进网页检索

获取原文

摘要

This article describes the participation of the REINA Research Group of the University of Salamanca in WebCLEF 2006. This year we participated in the Monolingual Mixed Task in Spanish. The entire EuroGOV collection was processed to select all the pages in Spanish. All the pages with domain .es were also pre-selected. Our objective this year was to try pre-retrieval techniques of combining information fields or elements from web pages as well as the retrieval capability of these fields. In vector-based retrieval systems, the combining of terms coming from different sources can be achieved by operating on the frequency of the terms in the document using a weight scheme of tf × idf. The BODY field is, of course, the most useful from the retrieval perspective, but the text of the backlinks brings considerable improvement. META fields or tags, however, contribute little to retrieval improvement.
机译:本文介绍了2006年萨拉曼卡大学雷纳研究小组的参与。今年我们参加了西班牙语中的单晶混合任务。整个欧洲欧洲欧洲欧洲欧洲欧洲欧洲欧洲猎户夫集合被处理以选择西班牙语中的所有页面。所有带有域名的页面也被预先选择。我们今年的目标是尝试从网页中组合信息字段或元素的预检索技术以及这些字段的检索能力。在基于向量的检索系统中,可以通过使用TF×IDF的权重方案在文档中的术语上运行来实现来自不同来源的术语的组合。当然,身体领域从检索角度最有用,但反向链接的文本带来了相当大的改进。然而,META字段或标签促进了检索改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号