首页> 外文期刊>British Journal of Mathematics & Computer Science >Feature-based Model for Extraction and Classification of High Quality Questions in Online Forum
【24h】

Feature-based Model for Extraction and Classification of High Quality Questions in Online Forum

机译:基于特征的在线论坛高质量问题的提取与分类模型

获取原文
           

摘要

Aims: To design and implement a classification-based model using specific features for identification and extraction of high quality questions in a thread. Study Design: The study design is divided into three modules: preprocessing, configuration, and question classification Place and Duration of Study: Department of Computer Science of the Federal University of Technology Akure, between June 2016 and December 2016 Methodology: This research proposes a way of identifying, extracting and classifying questions in order to enhance high quality answers in an online forum. One of the major issues in question extraction and classification in forum is the restriction on the number of categories considered such as Who, What, Where, Where, Which, Why and How which are not sufficient to capture all possible questions. In this work, a number of parameters were proposed and aggregated using fuzzy logic for context based spam detection and removal in order to enhance question identification and classification. Part of speech (POS) tagging was applied to analyse the structure of each extracted sentence based on the presence and position of predefined question tags; with this, issues like case sensitivity, grammatical construction and synonyms are addressed. Question classification is carried out with Na?ve Bayes and identifying semantic relationship between extracted questions is achieved with cosine similarity model. Experiments were performed on dataset constructed from Research Gate website. Results: We presented questions extracted from researchgate website into the system. The output consists of the corresponding POS tags and the category the question is classified into. The number of questions extracted from the website is dependent on the number of questions available in a forum. We were able to achieve a successful result of 3015 correctly extracted and classified questions at 80% POS tag occurrence. Conclusion: Our approach to question identification and classification was effective and covers more question categories. This can be applied to any question answering system.
机译:目的:使用特定功能设计和实现基于分类的模型,以识别和提取线程中的高质量问题。研究设计:研究设计分为三个模块:预处理,配置和问题分类研究地点和持续时间:联邦工业大学阿库雷分校计算机科学系,2016年6月至2016年12月方法:本研究提出了一种方法确定,提取和分类问题,以提高在线论坛的高质量答案。论坛中问题提取和分类的主要问题之一是对所考虑的类别数量的限制,例如,谁,什么,哪里,哪里,哪个,为什么和如何,不足以捕获所有可能的问题。在这项工作中,提出了许多参数,并使用模糊逻辑对基于上下文的垃圾邮件检测和清除使用模糊逻辑进行了汇总,以增强对问题的识别和分类。基于预定义问题标签的存在和位置,使用了词性(POS)标签来分析每个提取句子的结构;这样,解决了区分大小写,语法构造和同义词等问题。使用朴素贝叶斯进行问题分类,并使用余弦相似度模型识别提取的问题之间的语义关系。实验是从Research Gate网站构建的数据集上进行的。结果:我们提出了从researchgate网站提取的问题到系统中。输出包括相应的POS标签和问题分类的类别。从网站中提取的问题数量取决于论坛中可用的问题数量。在80%的POS标签发生率下,我们能够成功获得3015个正确提取和分类的问题的成功结果。结论:我们的问题识别和分类方法是有效的,涵盖了更多的问题类别。这可以应用于任何问答系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号