【24h】

Predicting User Competence from Text

机译:通过文本预测用户能力

获取原文

摘要

We explore the possibility of learning user competence from a text by using natural language processing and machine learning (ML) methods. In our context, competence is defined as the ability to identify the wildlife appearing in images and classifying into species correctly. We evaluate and compare the performance (regarding accuracy and F-measure) of the three ML methods. Naive Bayes (NB), Decision Trees (DT) and K-nearest neighbors (KNN). applied to the text corpus obtained from the Snapshot Sen-rengeti discussion forum posts. The baseline results show, that regarding accuracy. DT outperforms NB and KNN by 16.00%, and 15.00% respectively. Regarding F-measure. K-NN outperforms NB and DT by 12.08% and 1.17%, respectively. We also propose a hybrid model that combines the three models (DT. NB and KNN). We improve the baseline results with the calibration technique and additional features. Adding a bi-gram feature has shown a dramatic increase (from 48.38% to 64.40%) of accuracy for NB model. We achieved to push the accuracy limit in the baseline models from 93.39% to 94.09%.
机译:我们探索通过使用自然语言处理和机器学习(ML)方法从文本中学习用户能力的可能性。在我们的上下文中,能力定义为识别出现在图像中的野生动植物并正确分类的能力。我们评估和比较三种ML方法的性能(关于准确性和F量度)。朴素贝叶斯(NB),决策树(DT)和K近邻(KNN)。应用于从Snapshot Sen-rengeti讨论论坛帖子中获得的文本语料库。基线结果表明,这与准确性有关。 DT的表现分别优于NB和KNN,分别为16.00%和15.00%。关于F测度。 K-NN的性能分别超过NB和DT,分别为12.08%和1.17%。我们还提出了一个混合模型,该模型结合了三种模型(DT。NB和KNN)。我们使用校准技术和其他功能来改善基线结果。添加二元语法功能后,NB模型的准确性显着提高(从48.38%增至64.40%)。我们实现了将基准模型的准确性限制从93.39%提升到94.09%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号