首页>
外国专利>
JACCARD SIMILARITY ESTIMATION OF WEIGHTED SAMPLES: SCALING AND RANDOMIZED ROUNDING SAMPLE SELECTION WITH CIRCULAR SMEARING
JACCARD SIMILARITY ESTIMATION OF WEIGHTED SAMPLES: SCALING AND RANDOMIZED ROUNDING SAMPLE SELECTION WITH CIRCULAR SMEARING
展开▼
机译:加权样本的JACCARD相似度估计:具有圆形度量的缩放和随机圆角样本选择
展开▼
页面导航
摘要
著录项
相似文献
摘要
The disclosed systems and methods include pre-calculation, per object, of object feature bin values, for identifying close matches between objects, such as text documents, that have numerous weighted features, such as specific-length word sequences. Predetermined feature weights get scaled with two or more selected adjacent scaling factors, and randomly rounded. The expanded set of weighted features of an object gets min-hashed into a predetermined number of feature bins. For each feature that qualifies to be inserted by min-hashing into a particular feature bin, and across successive feature bins, the expanded set of weighted features get min-hashed and circularly smeared into the predetermined number of feature bins. Completed pre-calculated sets of feature bin values for each scaling of the object, together with the scaling factor, are stored for use in comparing sampled features of the object with sampled features of other objects by calculating an estimated Jaccard similarity index.
展开▼