【24h】

Web Spam Filtering in Internet Archives

机译:Internet档案中的Web垃圾邮件过滤

获取原文

摘要

While Web spam is targeted for the high commercial value of top-ranked search-engine results, Web archives observe quality deterioration and resource waste as a side effect. So far Web spam filtering technologies are rarely used by Web archivists but planned in the future as indicated in a survey with responses from more than 20 institutions worldwide. These archives typically operate on a modest level of budget that prohibits the operation of standalone Web spam filtering but collaborative efforts could lead to a high quality solution for them.In this paper we illustrate spam filtering needs, opportunities and blockers for Internet archives via analyzing several crawl snapshots and the difficulty of migrating filter models across different crawls via the example of the 13 . uk snapshots performed by UbiCrawler that include WEBSPAM-UK2006 and WEBSPAM-UK2007.
机译:虽然网络垃圾邮件的目的是为了获得排名靠前的搜索引擎结果的高商业价值,但网络档案馆却注意到质量下降和资源浪费是其副作用。到目前为止,Web垃圾邮件过滤器技术很少被Web档案管理员使用,但是如一项调查所表明的那样,在未来计划中,来自全球20多个机构的回应。这些归档文件通常以适度的预算水平运作,该预算禁止运行独立的Web垃圾邮件过滤功能,但是协作可以为他们提供高质量的解决方案。 在本文中,我们通过分析几个爬网快照以及通过13个示例在不同爬网之间迁移过滤器模型的难度来说明Internet存档的垃圾邮件过滤需求,机会和阻止者。 UbiCrawler执行的英国快照,包括WEBSPAM-UK2006和WEBSPAM-UK2007。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号