首页>
外国专利>
Method and system for calculating phrase-document importance
Method and system for calculating phrase-document importance
展开▼
机译:短语文档重要性的计算方法和系统
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method and system for generating a weight for phrases within each document in a collection of documents. Each document has terms such as words and numbers. Each phrase comprises component terms. Each term frequency represents the number of occurrences of a term in a document, and the phrase frequency represents the number of occurrences of a phrase in a document. To generate the weight, the weighting system first estimates a document frequency for the phrase by multiplying an estimated phrase probability of the phrase times the number of documents that contain each component term. The estimated phrase probability is an estimation of the probability that any phrase in documents that contain each component term is the phrase whose weight is to be estimated. The document frequency is the number of the documents that contain the phrase. The weighting system then estimates a total phrase frequency for the phrase as the average phrase frequency for the phrase times the estimated document frequency for the phrase. The weighting system derives the average phrase frequency from the phrase probability of the phrase and average number of terms per document. The weighting system then combines the estimated document frequency with the estimated total phrase frequency to generate the weight of the phrase.
展开▼