...
首页> 外文期刊>Current Journal of Applied Science and Technology >Classification of News Document in English Basedon Ontology
【24h】

Classification of News Document in English Basedon Ontology

机译:基于本体的英文新闻文档分类

获取原文
           

摘要

Aims: This paper aims to propose ontology method of news document classification. The common method of document classification is based on morphology of term, without considering the meaning. It is impact to the number of term-document and computational time. Furthermore, the performance is decrease, even though the number of training data is increase.Methodology: The main idea of ontology is to handle the similarity of terms that have different morphological form but the same meaning (synonym). The ontology is built using WordNet database to find similary of meaning among terms-document. The terms that have similar meaning are merged including their term frequency to be constructed in vector space model. After that, the unknown document is classified using cosine similarity measurement of the weight-term. The text document that is used is English news text in general topic, such as interest, money-fx, trade, and crude. The experiment is compared to the conventional method which is document classification without ontology.Results: Classification of news document can be implemented using cosine similarity method based on ontology. The performance measure of this method including precission, recall and f-measure has increased eventhough the number of terms is reduced.
机译:目的:本文旨在提出新闻文档分类的本体方法。文档分类的常用方法是基于术语的词法,而不考虑其含义。它影响术语文档的数量和计算时间。此外,即使训练数据的数量增加,性能也会下降。方法论:本体的主要思想是处理具有不同形态形式但含义相同(同义词)的术语的相似性。本体是使用WordNet数据库构建的,以在术语文档之间找到相似的含义。具有相似含义的术语将合并,包括要在向量空间模型中构建的术语频率。之后,使用权重项的余弦相似度测量对未知文档进行分类。使用的文本文档是一般主题下的英文新闻文本,例如利息,money-fx,贸易和原油。实验结果与传统的无本体分类方法进行了比较。结果:新闻文档的分类可以采用基于本体的余弦相似度方法进行。尽管减少了术语数量,但该方法的性能度量(包括精确度,召回率和f度量)已经提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号