知识约简是粗糙集理论的重要研究内容之一.经典的知识约简算法是假设所有数据一次性装入内存中,这显然不适合处理海量数据.为此,从属性(集)的可辨识性和不可辨识性出发,给出了可辨识和不可辨识对象对的概念及其性质,并阐述了它们与差别矩阵的关系.利用MapReduce设计了并行计算等价类的方法,提出了面向大规模数据的数据并行知识约简算法,讨论并实现了3种并行策略.最后,通过实验表明了云计算环境下知识约简算法是有效可行的,具有较好的可扩展性.%Knowledge reduction is one of the important research issues in rough set theory. Classical knowledge reduction algorithms assume all the datasets can be loaded into the main memory, which are infeasible for large-scale datasets. Massive data with high dimensions makes attribute reduction a challenging task. To this end, the concepts and properties of discernibility and indis-cernibility object pairs are given in terms of the discernibility and indiscernibility of the attribute(s). The relationship between discernibility matrix and them is illustrated in detail. Then, an algorithm of computing equivalence classes is designed for large-scale data in data parallel, and the corresponding knowledge reduction algorithms are proposed in cloud computing. Finally, three parallelism strategies are implemented and discussed. The experimental results demonstrate that knowledge reduction algorithms in cloud computing can scale well and efficiently process massive datasets on commodity computers.
展开▼