...
首页> 外文期刊>Journal of ICT Research and Applications >Document Grouping by Using Meronyms and Type-2 Fuzzy Association Rule Mining
【24h】

Document Grouping by Using Meronyms and Type-2 Fuzzy Association Rule Mining

机译:使用同义词和2类模糊关联规则挖掘的文档分组

获取原文
           

摘要

The growth of the number of textual documents in the digital world, especially on the World Wide Web, is incredibly fast. This causes an accumulation of information, so we need efficient organization to manage textual documents. One way to accurately classify documents is using fuzzy association rules. The quality of the document clustering is affected by phase extraction of key terms and type of fuzzy logic system (FLS) used for clustering. The use of meronyms in the extraction of key terms to obtain cluster labels helps obtaining meaningful cluster labels and in addition ambiguities and uncertainties that occur in the rules of type-1 fuzzy logic systems can be overcome by using type-2 fuzzy sets. This study proposes a method of key term extraction based on meronyms with an initialization cluster using fuzzy association rule mining for document clustering. This method consists of four stages, i.e. preprocessing of the document, extraction of key terms with meronyms, extraction of candidate clusters, and cluster tree construction. Testing of this method was done with three different datasets: classic, Reuters, and 20 Newsgroup. Testing was done by comparing the overall F-measure of the method without meronyms and with meronyms. Based on the testing, the method with meronyms in the extraction of keywords produced an overall F-measure of 0.5753 for the classic dataset, 0.3984 for the Reuters dataset, and 0.6285 for the 20 Newsgroup dataset.
机译:在数字世界中,尤其是在万维网上,文本文档数量的增长速度非常快。这会导致信息积累,因此我们需要高效的组织来管理文本文档。准确分类文档的一种方法是使用模糊关联规则。文档聚类的质量受关键术语的相位提取和用于聚类的模糊逻辑系统(FLS)类型的影响。在提取关键术语以获取聚类标签时使用同义词,有助于获得有意义的聚类标签,此外,通过使用第二类模糊集可以克服第一类模糊逻辑系统规则中出现的歧义和不确定性。本研究提出了一种基于模糊化关联规则挖掘的基于初始词聚类的同义词的关键词提取方法。该方法包括四个阶段,即文档的预处理,提取带有同义词的关键术语,提取候选聚类以及构建聚类树。使用三个不同的数据集对该方法进行了测试:经典,路透社和20新闻组。通过比较该方法的整体F量度(不带别名和带别名)来进行测试。根据测试,在关键词提取中使用多义词的方法产生的总体F度量标准数据集为0.5753,路透数据集为0.3984,而20个新闻组数据集为0.6285。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号