Scalable and parallel machine learning algorithms for statistical data mining - Practice experience

机译：用于统计数据挖掘的可扩展和并行机器学习算法-实践与经验

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data' technologies and frameworks that often are based on parallelization methods. Lessons learned with medical and earth science data applications that require parallel clustering and classification techniques such as support vector machines (SVMs) and density-based spatial clustering of applications with noise (DBSCAN) are a substantial part of the contribution. In addition, selected experiences of related `big data' approaches and concrete mining techniques (e.g. dimensionality reduction, feature selection, and extraction methods) will be addressed too. In order to overcome identified challenges, we outline an architecture framework design that we implement with open available tools in order to enable scalable and parallel machine learning applications in distributed systems.

机译：由于测量设备质量的不断提高，许多科学数据集（例如地球科学，医学等）的数量或尺寸都在增加。该贡献将特别关注这些数据集如何利用通常基于并行化方法的新“大数据”技术和框架。在需要并行聚类和分类技术（例如支持向量机（SVM））和基于密度的带噪声的应用程序的空间聚类（DBSCAN）的医学和地球科学数据应用程序中所学到的经验，是其中的重要部分。此外，还将讨论相关“大数据”方法和具体挖掘技术（例如降维，特征选择和提取方法）的部分经验。为了克服已确定的挑战，我们概述了一种架构框架设计，我们使用开放的可用工具来实施该架构框架设计，以便在分布式系统中启用可扩展的并行机器学习应用程序。

著录项

来源
《International Convention on Information and Communication Technology, Electronics and Microelectronics》|2015年|204-209|共6页
会议地点
作者
Riedel M.; Goetz M.; Richerzhagen M.; Glock P.; Bodenstein C.; Memon A.S.; Memon M.S.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Why Dataset Properties Bound the Scalability of Parallel Machine Learning Training Algorithms [J] . Cheng Daning, Li Shigang, Zhang Hanping, IEEE Transactions on Parallel and Distributed Systems . 2021,第7期

机译：为什么数据集属性绑定了并行机器学习培训算法的可扩展性
2. Parallel and Distributed Machine Learning Algorithms for Scalable Big Data Analytics [J] . Henri Bal, Arindam Pal Future generation computer systems . 2020,第Jula期

机译：可扩展大数据分析的并行和分布式机器学习算法
3. Knowledge Enrichment of prediction Using Machine Learning Algorithms for Data Mining and Big Data: a Survey [J] . Karthik Elangovan, Dr. Sethukarasi. T Advances in Natural and Applied Sciences . 2016,第15期

机译：使用机器学习算法进行数据挖掘和大数据的知识丰富化预测：一项调查
4. Scalable and parallel machine learning algorithms for statistical data mining - Practice experience [C] . Riedel M., Goetz M., Richerzhagen M., International Convention on Information and Communication Technology, Electronics and Microelectronics . 2015

机译：统计数据挖掘的可扩展和并联机器学习算法 - 实践与体验
5. Biomedical Text Mining Using Large Scale Distributed Machine Learning Algorithms [D] . Gupta, Neha. 2018

机译：使用大规模分布式机器学习算法的生物医学文本挖掘
6. Aligning text mining and machine learning algorithms with best practices for study selection in systematic literature reviews [O] . E. Popoff, M. Besada, J. P. Jansen, 2020

机译：对齐文本挖掘和机器学习算法具有系统文学评论中的学习选择的最佳实践
7. A statistical learning framework for data mining of large-scale systems : algorithms, implementation, and applications [O] . Tsou Ching-Huei 1973- 2007

机译：用于大规模系统数据挖掘的统计学习框架：算法，实现和应用程序

Scalable and parallel machine learning algorithms for statistical data mining - Practice experience

摘要

著录项

相似文献

相关主题

期刊订阅