首页> 外文会议>International Convention on Information and Communication Technology, Electronics and Microelectronics >Scalable and parallel machine learning algorithms for statistical data mining - Practice experience
【24h】

Scalable and parallel machine learning algorithms for statistical data mining - Practice experience

机译:用于统计数据挖掘的可扩展和并行机器学习算法-实践与经验

获取原文

摘要

Many scientific datasets (e.g. earth sciences, medical sciences, etc.) increase with respect to their volume or in terms of their dimensions due to the ever increasing quality of measurement devices. This contribution will specifically focus on how these datasets can take advantage of new `big data' technologies and frameworks that often are based on parallelization methods. Lessons learned with medical and earth science data applications that require parallel clustering and classification techniques such as support vector machines (SVMs) and density-based spatial clustering of applications with noise (DBSCAN) are a substantial part of the contribution. In addition, selected experiences of related `big data' approaches and concrete mining techniques (e.g. dimensionality reduction, feature selection, and extraction methods) will be addressed too. In order to overcome identified challenges, we outline an architecture framework design that we implement with open available tools in order to enable scalable and parallel machine learning applications in distributed systems.
机译:由于测量设备质量的不断提高,许多科学数据集(例如地球科学,医学等)的数量或尺寸都在增加。该贡献将特别关注这些数据集如何利用通常基于并行化方法的新“大数据”技术和框架。在需要并行聚类和分类技术(例如支持向量机(SVM))和基于密度的带噪声的应用程序的空间聚类(DBSCAN)的医学和地球科学数据应用程序中所学到的经验,是其中的重要部分。此外,还将讨论相关“大数据”方法和具体挖掘技术(例如降维,特征选择和提取方法)的部分经验。为了克服已确定的挑战,我们概述了一种架构框架设计,我们使用开放的可用工具来实施该架构框架设计,以便在分布式系统中启用可扩展的并行机器学习应用程序。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号