首页> 外文学位 >An analysis of document category prediction responses to classifier model parameter treatment permutations within the software design patterns subject domain.
【24h】

An analysis of document category prediction responses to classifier model parameter treatment permutations within the software design patterns subject domain.

机译:对软件设计模式主题域内对分类器模型参数处理置换的文档类别预测响应的分析。

获取原文
获取原文并翻译 | 示例

摘要

This empirical study evaluates the document category prediction effectiveness of Naive Bayes (NB) and K-Nearest Neighbor (KNN) classifier treatments built from different feature selection and machine learning settings and trained and tested against textual corpora of 2300 Gang-Of-Four (GOF) design pattern documents.;Analysis of the experiment's trials, powered by a framework based on WordStat 5.1 with QDA Miner 1.1 by Provalis Research, shows that there is a statistically significant correlation between category prediction success and classifier construction settings when assessed at the 5% significance level using the Friedman test. The best classifier was found to have a prediction success rate of just under 65 percent.;Results demonstrate that classifiers should be built using the feature selection Chi-square statistic and the basis for dictionary keywords selection should be occurrence. To minimize Type 1 errors, classifiers should use the KNN machine learning algorithm and trained using percentage of keywords weighted using inverse document frequency. To minimize Type II errors, the NB algorithm should be employed using keyword frequency with no weighting.
机译:这项实证研究评估了朴素贝叶斯(NB)和近邻K(KNN)分类器处理方法的文档类别预测效果,这些处理方法是通过不同的特征选择和机器学习设置构建的,并针对2300个四方(GOF)文本语料库进行了训练和测试)设计模式文件。ProvalisResearch在基于WordStat 5.1和QDA Miner 1.1的框架下进行的实验试验分析表明,当以5%进行评估时,类别预测成功与分类器构造设置之间存在统计上的显着相关性显着性水平使用弗里德曼检验。发现最佳分类器的预测成功率接近65%。结果表明,应使用特征选择卡方统计量建立分类器,并应为词典关键词选择奠定基础。为了最大程度地减少类型1的错误,分类器应使用KNN机器学习算法,并使用通过反向文档频率加权的关键字百分比进行训练。为了最大程度地减少II类错误,应使用关键字频率不加加权的NB算法。

著录项

  • 作者

    Pankau, Brian L.;

  • 作者单位

    Colorado Technical University.;

  • 授予单位 Colorado Technical University.;
  • 学科 Library Science.;Computer Science.;Information Technology.
  • 学位 D.C.S.
  • 年度 2009
  • 页码 314 p.
  • 总页数 314
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号