首页> 外文会议>IEEE International Conference on Semantic Computing >DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs
【24h】

DistSim - Scalable Distributed in-Memory Semantic Similarity Estimation for RDF Knowledge Graphs

机译:Distsim - RDF知识图表的可扩展分布式语义相似性估算

获取原文

摘要

In this paper, we present DistSim, a Scalable Distributed in-Memory Semantic Similarity Estimation framework for Knowledge Graphs. DistSim provides a multitude of state-of-the-art similarity estimators. We have developed the Similarity Estimation Pipeline by combining generic software modules. For large scale RDF data, DistSim proposes MinHash with locality sensitivity hashing to achieve better scalability over all-pair similarity estimations. The modules of DistSim can be set up using a multitude of (hyper)-parameters allowing to adjust the tradeoff between information taken into account, and processing time. Furthermore, the output of the Similarity Estimation Pipeline is native RDF. DistSim is integrated into the SANSA stack, documented in scala-docs, and covered by unit tests. Additionally, the variables and provided methods follow the Apache Spark MLlib name-space conventions. The performance of DistSim was tested over a distributed cluster, for the dimensions of data set size and processing power versus processing time, which shows the scalability of DistSim w.r.t. increasing data set sizes and processing power. DistSim is already in use for solving several RDF data analytics related use cases. Additionally, DistSim is available and integrated into the open-source GitHub project SANSA.
机译:在本文中,我们介绍了一个关于知识图表的可扩展分布式内存语义相似性估计框架。 Distsim提供了多种最先进的相似性估算器。我们通过组合通用软件模块开发了相似性估计管道。对于大规模的RDF数据,Distsim提出了具有局部性灵敏度散列的Minhash,以实现全面相似性估算的更好可扩展性。可以使用众多(超级) - 参数来建立Distsim的模块,允许在考虑所考虑的信息之间调整权衡和处理时间。此外,相似性估计管道的输出是本地RDF。 Distsim集成到Sansa堆栈中,记录在Scala-Docs中,并被单位测试覆盖。此外,变量和提供的方法遵循Apache Spark Mllib名称空间约定。在分布式集群上测试了Distsim的性能,用于数据集大小和处理功率与处理时间的尺寸,这表示Distsim W.R.T的可扩展性。增加数据集大小和处理能力。 Distsim已经用于解决几个RDF数据分析相关用例。此外,Distsim可提供并集成到开源GitHub项目Sansa中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号