首页> 外文学位 >An automatic email mining approach using semantic non-parametric K-Means++ clustering.
【24h】

An automatic email mining approach using semantic non-parametric K-Means++ clustering.

机译:使用语义非参数K-Means ++群集的自动电子邮件挖掘方法。

获取原文
获取原文并翻译 | 示例

摘要

Email inboxes are now filled with huge varieties of voluminous messages and thus increasing the problem of "email overload" which places financial burden on companies and individuals. Email mining provides solution to email overload problem by automatically grouping emails into meaningful and similar groups based on email subjects and contents. Existing email mining systems such as Kernel-Selected clustering and BuzzTrack, do not consider the semantic similarity between email contents, also when large number of email messages are clustered to a single folder they retain the problem of email overload.;This thesis proposes a system named AEMS for automatic folder and sub-folder creation, indexing of the created folders with link to each folder and sub-folder, also an Apriori-based folder summarization containing important keywords from the folder. Thesis aims at solving email overload problem through semantic re-structuring of emails. In AEMS model, a novel approach named Semantic Non-parametric K-Means++ clustering is proposed for folder creation, which avoids, (1) random seed selection by selecting the seed according to email weights, and (2) pre-defined number of clusters using the similarity between the email contents. Experiments show the effectiveness and efficiency of the proposed techniques using large volumes of email datasets.;Keywords: Email Mining, Email Overload, Email Management, Data Mining, Clustering, Feature Selection, Folder Summarization.
机译:现在,电子邮件收件箱中充满了大量各种各样的邮件,因此增加了“电子邮件超载”的问题,这给公司和个人带来了财务负担。电子邮件挖掘通过根据电子邮件主题和内容将电子邮件自动分组为有意义的相似组,从而为电子邮件过载问题提供了解决方案。现有的电子邮件挖掘系统(如内核选择的聚类和BuzzTrack)没有考虑电子邮件内容之间的语义相似性,而且当将大量电子邮件聚类到单个文件夹时,它们仍然存在电子邮件过载的问题。名为AEMS的文件夹,用于自动创建文件夹和子文件夹,为创建的文件夹建立索引并链接到每个文件夹和子文件夹,还基于Apriori的文件夹摘要,其中包含该文件夹中的重要关键字。本文旨在通过电子邮件的语义重构来解决电子邮件过载问题。在AEMS模型中,提出了一种名为语义非参数K-Means ++聚类的新颖方法来创建文件夹,该方法避免了(1)通过根据电子邮件权重选择种子来随机选择种子,以及(2)预先定义的簇数使用电子邮件内容之间的相似性。实验证明了使用大量电子邮件数据集所提出技术的有效性和效率。关键词:电子邮件挖掘,电子邮件超载,电子邮件管理,数据挖掘,聚类,功能选择,文件夹摘要。

著录项

  • 作者

    Soni, Gunjan.;

  • 作者单位

    University of Windsor (Canada).;

  • 授予单位 University of Windsor (Canada).;
  • 学科 Information Technology.;Computer Science.
  • 学位 M.Sc.
  • 年度 2013
  • 页码 104 p.
  • 总页数 104
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号