首页>
外国专利>
A HEURISTIC METHOD AND SYSTEM TO OPTIMIZE DATA STORAGE AND PROBABILISTIC SIMILARITY APPROACH IN DATA DEDUPLICATION.
A HEURISTIC METHOD AND SYSTEM TO OPTIMIZE DATA STORAGE AND PROBABILISTIC SIMILARITY APPROACH IN DATA DEDUPLICATION.
展开▼
机译:在数据去重复中优化数据存储和概率相似性方法的启发式方法和系统。
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention provides a method for resolving a major issue of space availability in the data storage. The solution prevents duplicate data from being stored and archived. A technique for optimizing the storage includes content and application aware specific data storage. A clustered storage interface is considered for balancing data scalability with minimum metadata communication overhead over the repository and handle resource failure with the proactive measures. The repository is covered across the pool members in a storage pool to detect similar files and achieve high deduplication rate. The solution comprises deterministic and unsupervised probabilistic duplicate detection model using similarity distance metric for resemblance and uniform file distribution avoiding the data skewness. Traffic hits to storage pool; where the pool is associated with multiple pool members and the file is sent to respective pool member based on the high probability of similarity score and availability of disk space. In addition, the queuing system is developed for ongoing backup instance over the failover. It benefits in terms of avoiding the data loss. It shows significant improvement in freeing the storage space with less communication and processing overhead.
展开▼