首页> 外文学位 >Improving Web retrieval by mining the HTML tags for keywords and exploring the hyperlink structures of Web pages.

【24h】

Improving Web retrieval by mining the HTML tags for keywords and exploring the hyperlink structures of Web pages.

机译：通过挖掘HTML标记的关键字并探索网页的超链接结构来改善Web检索。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The increasing amount of data stored in the World Wide Web (WWW) demands efficient techniques for information retrieval. Search engines often answer queries with millions of URLs and some of them are not directly related to a given inquiry. We explore different aspects of the Web to improve the quality of retrieval results.; We show how to derive a numerical score from three types of links to a given page based on its "prestige". By using such a score, we are able to rank the importance of URLs returned by a search engine.; Similarities among Web documents can be employed to duster and classify Web pages. We define a similarity measure among Web pages and among sets of Web pages using their hyperlink relationships, and then demonstrate how to use this measure to study clustering within a set of pages. Additionally, locations of keywords in the structure of HTML documents are used to find pages similar to a given set of HTML documents. Our findings are used to re-rank those obtained from popular search engines.; Keywords are used to index Web pages and facilitate the search. However, not every document explicitly states its keywords; therefore, an algorithm is needed to discover the keywords from an HTML source file. We claim that there are relationships between the locations of the keywords and HTML tags, and employ data-mining techniques to discover association rules on such relationships; these rules can then be used to discover keywords hidden in documents.

机译：万维网（WWW）中存储的数据量不断增长，需要高效的信息检索技术。搜索引擎通常使用数百万个URL回答查询，其中一些与给定查询没有直接关系。我们探索Web的不同方面，以提高检索结果的质量。我们展示了如何根据给定页面的“信誉”从三种类型的链接得出数值分数。通过使用这样的分数，我们可以对搜索引擎返回的URL的重要性进行排名。 Web文档之间的相似性可用于对Web页面进行除尘和分类。我们使用网页的超链接关系定义网页之间以及网页组之间的相似性度量，然后演示如何使用此度量来研究一组页面内的聚类。另外，HTML文档结构中关键字的位置用于查找与给定HTML文档集相似的页面。我们的发现用于重新排名从热门搜索引擎获得的结果。关键字用于索引网页并促进搜索。但是，并非每个文档都明确声明其关键字。因此，需要一种算法来从HTML源文件中发现关键字。我们声称关键字和HTML标签的位置之间存在关联，并采用数据挖掘技术来发现关于此类关联的关联规则；然后，可以使用这些规则来发现隐藏在文档中的关键字。

著录项

作者
Quevedo-Torrero, Jesus Ubaldo.;
展开▼
作者单位

University of Houston.;

展开▼
授予单位 University of Houston.;
学科 Computer Science.
学位 Ph.D.
年度 2004
页码 115 p.
总页数 115
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval | Science Publications [J] . Ashutosh K. Singh, P. R. Kumar American journal of applied sciences . 2010,第6期

机译：Web结构挖掘：探索超链接和信息检索算法科学出版物
2. Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval [J] . P. Ravi Kumar, Ashutosh Kumar Singh American journal of applied sciences . 2010,第6期

机译：Web结构挖掘：探索超链接和信息检索算法
3. Genetic mining of HTML structures for effective Web-document retrieval [J] . Kim S., Zhang BT. Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2003,第3期

机译：HTML结构的遗传挖掘以有效地检索Web文档
4. Exploring HTML Tags and Metadata to Improve the Expressiveness of Web Search Engine's Queries [C] . Escudeiro Nuno Filipe Fonseca Vasconcelos, Escudeiro Paula Maria de Sa Oliveira International Conference on Computer and Electrical Engineering;ICCEE '09 . 2009

机译：探索HTML标签和元数据以提高Web搜索引擎查询的表达能力
5. Learning Early-Stage Web Development at Scale: Exploring Methods to Assess Learning Through Analysis of HTML and CSS [D] . Kim, Meen Chul. 2021

机译：在规模上学习早期的Web开发：通过分析HTML和CSS来探索评估学习的方法
6. Improving Website Hyperlink Structure Using Server Logs [O] . Ashwin Paranjape, Robert West, Leila Zia, -1

机译：使用服务器日志改善网站超链接结构
7. Web Structure Mining: Exploring Hyperlinks and Algorithms for Information Retrieval [O] . P. R. Kumar, Ashutosh K. Singh 2010

机译：Web结构挖掘：探索超链接和信息检索算法
8. Web Document Clustering Using Hyperlink Structures [R] . He, X., Zha, H., Ding, C. H. Q., 2003

机译：使用超链接结构的Web文档聚类

Improving Web retrieval by mining the HTML tags for keywords and exploring the hyperlink structures of Web pages.

摘要

著录项

相似文献

相关主题

期刊订阅