We present a novel approach to automatic text mining on the World Wide Web. Considering the fact that the enormously dynamic growth of the WWW results in a need for new, more powerful information extraction tools we designed and implemented a system, which adapts techniques originally introduced in the field of data mining. We believe that similar systems, Which usually base on machine learning or natural language processing methods, can prove to be ineffective When dealing with the very large numbers of hypertext documents of different structure and subject. Moreover, such systems tend to treat HTML documents as plain texts not taking into account the additional information contained in their markup tags.
展开▼