首页> 外国专利> Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures

Change-detection tool indicating degree and location of change of internet documents by comparison of cyclic-redundancy-check(CRC) signatures

机译:变化检测工具,通过比较循环冗余校验(CRC)签名指示互联网文档的变化程度和位置

摘要

A change-detection web server automatically checks web-page documents for recent changes. The server retrieves and compares documents one or more times a week. The user is notified by electronic mail when a change is detected. The user registers a web-page document by submitting his e- mail address and the uniform-resource locator (URL) of the desired document. The document is fetched and the user can select text on the page of interest. Non-selected text is ignored; only changes in the selected text are reported back to the user. Thus changes to less relevant parts of the document are ignored. The document is divided into sections bounded by hyper-text markup-language (HTML) tags. A checksum is generated and stored for each HTML-bound section. Storage requirements are reduced since only checksums are stored rather than the original documents. During periodic comparisons a fresh copy of the document is retrieved, divided into HTML-bound sections and checksums generated for each section. The freshly-generated checksums are compared to the archived checksums. Sections with non-matching checksums are highlighted as changed, and the percentage of changed sections is reported. The user- defined selection is also stored as a checksum and compared to a freshly- generated checksum. Changed checksums outside the user-defined selection do not generate a change notification. Re- ordering of sections does not generate a change notification when the checksums otherwise match. Thus format and layout changes do not generate change notifications, and the frequency of notices to user is reduced.
机译:更改检测Web服务器会自动检查网页文档中是否有最新更改。服务器每周一次检索和比较文档一次。当检测到更改时,通过电子邮件通知用户。用户通过提交其电子邮件地址和所需文档的统一资源定位符(URL)来注册网页文档。提取了文档,用户可以在感兴趣的页面上选择文本。未选择的文本将被忽略;仅将所选文本中的更改报告给用户。因此,对文档中相关性较低的部分所做的更改将被忽略。该文档分为以超文本标记语言(HTML)标记为边界的部分。将为每个绑定HTML的部分生成并存储一个校验和。由于仅存储校验和而不是原始文档,因此减少了存储需求。在定期比较期间,将检索文档的新副本,将其分为与HTML绑定的部分和为每个部分生成的校验和。将新生成的校验和与存档的校验和进行比较。校验和不匹配的部分将突出显示为已更改,并报告已更改部分的百分比。用户定义的选择也存储为校验和,并与新生成的校验和进行比较。用户定义选择之外的已更改校验和不会生成更改通知。如果校验和在其他方面匹配,则部分的重新排序不会生成更改通知。因此,格式和布局更改不会生成更改通知,并且减少了向用户发出通知的频率。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号