Text Comprehensiveness Ranking

机译：文字综合排名

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

When we use a search engine to find interesting texts for read, we often find something that could be too difficult to follow or too easy for us to learn anything interesting. In this work, we propose an algorithm for text ranking based on the text comprehensiveness, such as we can rank texts from the most difficult one to the easiest one. Given the ranking result, a high school student and a researcher may find texts of different comprehensiveness levels to read even their queries are identical. Specifically, given a set of articles with different comprehensiveness levels, the proposed ranking method can recursively separate articles into different groups if they are with different comprehensiveness levels. The comprehensiveness measure is based on the observation that given two groups of articles of the same subject but not the same comprehensiveness level, easy articles may not use the terms that are frequently used in difficult articles, while difficult articles may still use the terms that could be used by easy articles. We tested the measure in an article database that consists of articles of different comprehensiveness levels and different subjects. The result shows that the proposed ranking method can recursively separate texts of different comprehensiveness levels with very high accuracy. The algorithm can also separate two article groups where each has a mixed comprehensiveness level. Based on an EM-like procedure, we can gradually refine the result to filter out the article set that is considered more difficult than the rest of the articles when the procedure converges. We also tested the proposed method in various databases including the CCSS corpus and an article database that consists of research articles from journals and magazines.

机译：当我们使用搜索引擎查找有趣的文本以供阅读时，我们经常会发现某些内容可能太难理解或太容易使我们无法学习任何有趣的内容。在这项工作中，我们提出了一种基于文本综合性的文本排名算法，例如，我们可以对最困难的文本到最简单的文本进行排名。给定排名结果，即使他们的查询相同，高中生和研究人员也可能会发现具有不同综合程度的文本以阅读。具体来说，给定一组具有不同综合程度的文章，如果它们具有不同的综合程度，则所提出的排序方法可以将文章递归地分为不同的组。全面性度量基于以下观察结果：给定两组主题相同但综合程度不同的文章，简单文章可能不会使用困难文章中经常使用的术语，而困难文章可能仍会使用可能被简单的文章使用。我们在文章数据库中测试了该度量，该数据库由不同全面性级别和不同主题的文章组成。结果表明，所提出的排序方法能够以很高的精度递归地分离不同综合水平的文本。该算法还可将两个商品组分开，每个商品组具有不同的综合程度。基于类似EM的过程，我们可以逐步完善结果以过滤出该过程收敛时被认为比其余文章更困难的文章集。我们还在各种数据库（包括CCSS语料库和由期刊和杂志中的研究文章组成的文章数据库）中测试了该方法的有效性。

著录项

来源
《IEEE/WIC/ACM International Conference on Web Intelligence》|2015年|21-25|共5页
会议地点
作者
Ghaluh Indah P. S; Junaidillah Fadlil; Rudy Cahyadi H. P; Hsing-Kuo Pao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Comprehensiveness measure; Document ranking; Expectation-Maximization; Recommendation; Search engine;

机译：全面性度量;文档排名;期望最大化;推荐;搜索引擎;

相似文献

外文文献
中文文献
专利

1. 10 Years Ranking Two Third more Sales Anniversary of the most comprehensive Ranking of the Top-Companies in the German Meat Industry [J] . Fleischwirtschaft . 2014,第12期

机译：10年位居德国肉类行业顶级公司最全面排名第二三位
2. Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images [J] . Asghar Ali Chandio, Md. Asikuzzaman, Mark Pickering, Data in Brief . 2020,第3期

机译：Cursive-Text：自然场景图像中的端到端核心文本识别的全面数据集
3. A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts [J] . David Westergaard, Hans-Henrik St?rfeldt, Christian T?nsberg, PLoS Computational Biology . 2018,第2期

机译：对1500万篇全文文章中的文本挖掘与相应摘要进行全面，定量的比较
4. Text Comprehensiveness Ranking [C] . Ghaluh Indah P. S, Junaidillah Fadlil, Rudy Cahyadi H. P, IEEE/WIC/ACM International Conference on Web Intelligence . 2015

机译：文字全面性排名
5. Ranking, labeling, and summarizing short text in social media [D] . Khabiri, Elham 2013

机译：在社交媒体中对短文本进行排名，标记和汇总
6. Cursive-Text: A Comprehensive Dataset for End-to-End Urdu Text Recognition in Natural Scene Images [O] . Asghar Ali Chandio, Md. Asikuzzaman, Mark Pickering, 2020

机译：草书文本：用于自然场景图像中端到端乌尔都语文本识别的综合数据集
7. Domain specific concept ontologies and text summarization as hierarchical fuzzy logic ranking indicator on malay text corpus [O] . Shaiful Bakhtiar Bin Rodzman, Normaly Kamal Ismail, Nurazzah Abd Rahman, 2019

机译：域特定的概念本体和文本摘要作为马来文本语料库上的分层模糊逻辑排名指示

Text Comprehensiveness Ranking

摘要

著录项

相似文献

相关主题

期刊订阅