Computing fuzzy rough approximations in large scale information systems

机译：大规模信息系统中的模糊粗略近似计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem.

机译：粗糙集理论是一种流行且功能强大的机器学习工具。它特别适用于处理显示不一致的信息系统，即，对象的条件属性值相同，而决策属性的值不同。与新兴的粒度计算范例一致，粗糙集理论基于对象属性值的不可区分性将对象分组在一起。模糊粗糙集理论将粗糙集理论扩展到具有连续属性的数据，并检测数据中的不一致程度。关键在于将不可分辨关系转变为渐进关系，并承认对象在一定程度上可以相似。在具有数百万个对象的超大型数据集中，无论是在运行时还是在内存方面，计算渐进的不可分辨关系（或换句话说，软颗粒）都非常困难。但是，对于模糊粗糙集分析流水线中概念的上下近似的计算是必需的。 R中当前的非分布式实现受到内存容量的限制。例如，我们发现R中最先进的非分布式实现无法在具有62GB内存的节点上处理30,000行和10个属性。这显然不足以将模糊粗糙集分析扩展到大量数据集。在本文中，我们提出了一种基于消息传递接口（MPI）的并行分布式解决方案，用于在大型信息系统中计算模糊粗略近似。我们的结果表明，我们的并行方法将问题规模扩展到具有数百万个对象的信息系统。据我们所知，到目前为止，在文献中尚未针对此问题提出其他并行和分布式解决方案。

著录项

来源
《IEEE International Congress on Big Data》|2014年|9-16|共8页
会议地点
作者
Asfoor Hasan; Srinivasan Rajagopalan; Vasudevan Gayathri; Verbiest Nele; Cornells Chris; Tolentino Matthew; Teredesai Ankur; De Cock Martine;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
application program interfaces; fuzzy set theory; information systems; learning (artificial intelligence); mathematics computing; message passing; parallel processing; rough set theory; MPI; conditional attributes; decision attribute; fuzzy rough approximation; fuzzy rough set analysis pipeline; fuzzy rough set theory; gradual indiscernibility relation; granular computing paradigm; inconsistency degrees; indiscernibility relation; information systems; large scale information systems; lower approximations; machine learning tool; memory capacity; message passing interface; upper approximations; Approximation algorithms; Approximation methods; Data mining; Data processing; Set theory; Vectors;

机译：应用程序接口;模糊集理论;信息系统;学习（人工智能）;数学计算;消息传递;并行处理;粗糙集理论; MPI;条件属性;决策属性;模糊粗糙近似;模糊粗糙集分析管道;模糊粗糙集合论;渐进不可分辨关系;粒度计算范式;不一致程度;不可分辨关系;信息系统;大规模信息系统;下近似;机器学习工具;内存容量;消息传递接口;上近似;近似算法;近似方法;数据挖掘数据处理集合论向量;

相似文献

外文文献
中文文献
专利

1. Approximations on intuitionistic fuzzy predicate calculus through rough computing [J] . B. N. V. Satish, G. Ganesan Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2014,第4期

机译：通过粗糙计算近似直觉模糊谓词演算
2. The incremental method for fast computing the rough fuzzy approximations [J] . Yi Cheng Data & Knowledge Engineering . 2011,第1期

机译：快速计算粗模糊近似的增量方法
3. Fast algorithms for computing rough approximations in set-valued decision systems while updating criteria values [J] . Luo Chuan, Li Tianrui, Chen Hongmei, Information Sciences: An International Journal . 2015,第Null期

机译：快速算法，可在更新标准值时计算集值决策系统中的近似值
4. Computing fuzzy rough approximations in large scale information systems [C] . Asfoor Hasan, Srinivasan Rajagopalan, Vasudevan Gayathri, IEEE International Congress on Big Data . 2014

机译：计算大规模信息系统中的模糊粗略近似
5. Fuzzy Rough Set Approximations in Large Scale Information Systems. [D] . Asfoor, Hasan. 2015

机译：大规模信息系统中的模糊粗糙集近似。
6. DTI-SNNFRA: Drug-target interaction prediction by shared nearest neighbors and fuzzy-rough approximation [O] . Sk Mazharul Islam, Sk Md Mosaddek Hossain, Sumanta Ray 2021

机译：DTI-SNFRA：由共享最近邻居的药物 - 目标交互预测和模糊粗糙近似
7. Computing Fuzzy Rough Approximations in Large Scale Information Systems [O] . 2015

机译：计算模糊粗糙近似在大规模信息系统中的应用
8. Seventh International Workshop on Rough Sets, Fuzzy Sets, Data Mining, and Granular-Soft Computing (RSFDGrC'99) [R] . Zhong, N. , Skowron, A. , Ohsuga, S. 1999

机译：第七届粗糙集，模糊集，数据挖掘和粒度软计算国际研讨会（RsFDGrC'99）

Computing fuzzy rough approximations in large scale information systems

摘要

著录项

相似文献

相关主题

期刊订阅