【24h】

Text Comprehensiveness Ranking

机译:文字综合排名

获取原文

摘要

When we use a search engine to find interesting texts for read, we often find something that could be too difficult to follow or too easy for us to learn anything interesting. In this work, we propose an algorithm for text ranking based on the text comprehensiveness, such as we can rank texts from the most difficult one to the easiest one. Given the ranking result, a high school student and a researcher may find texts of different comprehensiveness levels to read even their queries are identical. Specifically, given a set of articles with different comprehensiveness levels, the proposed ranking method can recursively separate articles into different groups if they are with different comprehensiveness levels. The comprehensiveness measure is based on the observation that given two groups of articles of the same subject but not the same comprehensiveness level, easy articles may not use the terms that are frequently used in difficult articles, while difficult articles may still use the terms that could be used by easy articles. We tested the measure in an article database that consists of articles of different comprehensiveness levels and different subjects. The result shows that the proposed ranking method can recursively separate texts of different comprehensiveness levels with very high accuracy. The algorithm can also separate two article groups where each has a mixed comprehensiveness level. Based on an EM-like procedure, we can gradually refine the result to filter out the article set that is considered more difficult than the rest of the articles when the procedure converges. We also tested the proposed method in various databases including the CCSS corpus and an article database that consists of research articles from journals and magazines.
机译:当我们使用搜索引擎查找有趣的文本以供阅读时,我们经常会发现某些内容可能太难理解或太容易使我们无法学习任何有趣的内容。在这项工作中,我们提出了一种基于文本综合性的文本排名算法,例如,我们可以对最困难的文本到最简单的文本进行排名。给定排名结果,即使他们的查询相同,高中生和研究人员也可能会发现具有不同综合程度的文本以阅读。具体来说,给定一组具有不同综合程度的文章,如果它们具有不同的综合程度,则所提出的排序方法可以将文章递归地分为不同的组。全面性度量基于以下观察结果:给定两组主题相同但综合程度不同的文章,简单文章可能不会使用困难文章中经常使用的术语,而困难文章可能仍会使用可能被简单的文章使用。我们在文章数据库中测试了该度量,该数据库由不同全面性级别和不同主题的文章组成。结果表明,所提出的排序方法能够以很高的精度递归地分离不同综合水平的文本。该算法还可将两个商品组分开,每个商品组具有不同的综合程度。基于类似EM的过程,我们可以逐步完善结果以过滤出该过程收敛时被认为比其余文章更困难的文章集。我们还在各种数据库(包括CCSS语料库和由期刊和杂志中的研究文章组成的文章数据库)中测试了该方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号