首页> 外文会议> >Implementation of scalable K-Means++ clustering for passengers temporal pattern analysis in public transportation system (BRT Trans Jogja case study)
【24h】

Implementation of scalable K-Means++ clustering for passengers temporal pattern analysis in public transportation system (BRT Trans Jogja case study)

机译:在公共交通系统中实现可扩展的K-Means ++聚类用于乘客时间模式分析(BRT Trans Jogja案例研究)

获取原文
获取原文并翻译 | 示例

摘要

The popularity of Bus Rapid Transit (BRT) makes Trans Jogja an alternative of a mass public transportation system for urban mobility. However, without supervision on temporal patterns of passenger's behavior in Trans Jogja on supply and demand, it will result in the decreases of the number of BRT users and the increasing number of private vehicle users, so that traffic jams remain difficult to avoid. Smart Card Automated Fare Collection System (SCAFCS) which is currently used as e-ticketing in Trans Jogja public transport can be used to analyze passengers pattern with data mining approaches. This paper applied SCAFCS data preprocessing with data warehouse mechanism and implemented Hadoop Platform as distributed computing to improve K-Means++ clustering performance on large datasets scalability; in this case, SCAFCS Trans Jogja has a large dataset (volume) and rapid growth data (velocity). Scalable K-Means++ algorithm generates five clusters with characteristics in number of clusters, namely: Very Low, Low, Average, High, Very High. The clusters were used to analyze passengers pattern based on the dimensions of time (temporal), segmentation of passengers (structure) to determine the variability of passengers based on the card they used and transaction peak on boarding location (spatio). Experimental and testing setup was performed by comparing Sum of Square Error (SSE) which is the total squared error of k cluster at the centroid on three algorithms, simple K-Means, K-Means++ and K-Means++ implementation using Hadoop Platform as parallel and distributed computing. K-Means++ with Hadoop Platform implementation generates smaller SSE value than simple K-Means and K-Means++ algorithms; that shows it has good SSE value.
机译:快速公交(BRT)的普及使Trans Jogja成为城市交通的公共交通系统的替代方案。但是,如果不监督跨Jogja的供需状况,就将导致BRT用户数量的减少和私人车辆用户数量的增加,从而使交通拥堵仍然难以避免。当前在Trans Jogja公共交通中用作电子客票的智能卡自动票价收集系统(SCAFCS)可用于通过数据挖掘方法来分析乘客模式。本文将SCAFCS数据预处理与数据仓库机制结合使用,并将Hadoop平台实现为分布式计算,以提高大型数据集可扩展性的K-Means ++集群性能。在这种情况下,SCAFCS Trans Jogja具有大型数据集(体积)和快速增长的数据(速度)。可扩展的K-Means ++算法生成五个簇,这些簇的簇数具有特征,即:非常低,低,平均,高,非常高。这些聚类用于基于时间(时间)维度,乘客细分(结构),基于乘客使用的卡和登机位置(空间)上的交易高峰确定乘客的变异性来分析乘客模式。通过比较平方误差总和(SSE)来进行实验和测试设置,平方误差总和是使用Hadoop平台并行执行的三种算法(简单的K-Means,K-Means ++和K-Means ++)在质心处的k簇的总平方误差。分布式计算。与简单的K-Means和K-Means ++算法相比,采用Hadoop平台实施的K-Means ++产生的SSE值更小;说明它具有良好的SSE值。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号