首页> 外文会议>International Conference on Data and Software Engineering >Focused crawler for the acquisition of health articles
【24h】

Focused crawler for the acquisition of health articles

机译:专注于获取健​​康用品的履带

获取原文

摘要

The health intervention by using technology can be the alternative to the doctor, especially for common health problem. To support the technology, we need health knowledge base as the foundation. The artificial intelligence and hardware development nowadays support this requirement. The big picture of our research is building the application that can utilize the health knowledge base to provide health intervention. As the first step, we collect the articles related to health. To realize it, we build the focused crawler that implements multithreaded programming, Larger-Sites-First algorithm and also Naïve Bayes classifier. We find that the articles acquisition is going to saturate along with the increment of threads. Furthermore, the implementation of Larger-Sites-First algorithm do increase the number of crawled articles, but it is not significant. In addition, Naïve Bayes recognizes ≥ 90 percent articles in perfect condition for both health and non-health category. However, the performance goes down when recognizing the non-health articles which contain health keywords.
机译:使用技术进行健康干预可以替代医生,特别是对于常见的健康问题。为了支持该技术,我们需要健康知识库为基础。当今的人工智能和硬件开发支持这一要求。我们研究的全局是构建可以利用健康知识库提供健康干预的应用程序。第一步,我们收集与健康相关的文章。为了实现这一目标,我们构建了专注的爬虫,该爬虫实现了多线程编程,Large-Sites-First算法以及朴素贝叶斯分类器。我们发现文章获取将随着线程的增加而饱和。此外,Large-Sites-First算法的实现确实增加了已爬网文章的数量,但这并不重要。此外,朴素贝叶斯(NaïveBayes)识别出≥90%处于健康和非健康类别的最佳状态的文章。但是,当识别包含健康关键字的非健康文章时,性能会下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号