The increasing amount of data stored in the World Wide Web (WWW) demands efficient techniques for information retrieval. Search engines often answer queries with millions of URLs and some of them are not directly related to a given inquiry. We explore different aspects of the Web to improve the quality of retrieval results.; We show how to derive a numerical score from three types of links to a given page based on its "prestige". By using such a score, we are able to rank the importance of URLs returned by a search engine.; Similarities among Web documents can be employed to duster and classify Web pages. We define a similarity measure among Web pages and among sets of Web pages using their hyperlink relationships, and then demonstrate how to use this measure to study clustering within a set of pages. Additionally, locations of keywords in the structure of HTML documents are used to find pages similar to a given set of HTML documents. Our findings are used to re-rank those obtained from popular search engines.; Keywords are used to index Web pages and facilitate the search. However, not every document explicitly states its keywords; therefore, an algorithm is needed to discover the keywords from an HTML source file. We claim that there are relationships between the locations of the keywords and HTML tags, and employ data-mining techniques to discover association rules on such relationships; these rules can then be used to discover keywords hidden in documents.
展开▼