Rule Based Chunk Extraction from PDF Documents Using Regular Expressions and Natural Language Processing

Amol Rajaram Karad; Rahul Raghvendra Joshi

首页> 外文期刊>International journal of computational intelligence research >Rule Based Chunk Extraction from PDF Documents Using Regular Expressions and Natural Language Processing

【24h】

Rule Based Chunk Extraction from PDF Documents Using Regular Expressions and Natural Language Processing

机译：使用正则表达式和自然语言处理从PDF文档的规则的块提取

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Natural Language Processing (NLP) is a stimulating and vital field of Artificial Intelligence (AI).The NLP can be used to find out the required intelligence through the system under consideration,so that system behaves as per convenience and efficiency expected by the user.The proposed system demonstrates application of NLP and by using Regular Expressions to categorize and classify sentences in Word/PDF (Portable Document Format) documents according to rules provided by user.Thousands of similar kind of PDF documents can be easily processed by reading them page wise,the proposed system produces results according to the user defined rules those are applicable to all input PDF documents.Single rule is written by considering one input PDF document and apply the same to all other input PDF documents of the proposed system to create individual data chunks out of all documents and display them on User Interface in table format.

机译：自然语言处理（NLP）是人工智能（AI）的刺激和重要领域。NLP可用于通过所考虑的系统找出所需的智能，因此系统按照用户预期的便利性和效率而行为。。建议的系统演示了NLP的应用，并通过使用user.UST.Thousands的句号来对Word / PDF（可移植文档格式）文档中的句子进行分类和分类句子，可以通过读取它们来轻松处理PDF文档的类似类型的PDF文档 Wise，所提出的系统根据用户定义的规则生成结果，这些结果适用于所有输入的PDF文档。通过考虑一个输入PDF文档来编写一个规则，并将其应用于所提出的系统的所有其他输入PDF文档以创建单个数据。在所有文档中的块中的块并在表格格式的用户界面上显示它们。

著录项

来源
《International journal of computational intelligence research》 |2021年第1期|65-70|共6页
作者
Amol Rajaram Karad; Rahul Raghvendra Joshi;
展开▼
作者单位

Department Symbiosis Institute of Technology (SIT) Affiliated to Symbiosis International University (SIU) Gram-Lavale Tal-Mulshi Pune 412115 Maharashtra INDIA;

Department Symbiosis Institute of Technology (SIT) Affiliated to Symbiosis International University (SIU) Gram-Lavale Tal-Mulshi Pune 412115 Maharashtra INDIA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Natural Language Processing (NLP); Regular Expressions; Artificial Intelligence; PDF; data chunks; User Interface;

机译：自然语言处理（NLP）;常用表达;人工智能;PDF;数据块;用户界面;

相似文献

外文文献
中文文献
专利

1. Rule Based Chunk Extraction from PDF Documents Using Regular Expressions and Natural Language Processing [J] . Amol Rajaram Karad, Rahul Raghvendra Joshi International Journal of Applied Engineering Research . 2015,第3期

机译：使用正则表达式和自然语言处理从PDF文档中基于规则的块提取
2. Concept Relation Extraction from Construction Documents Using Natural Language Processing [J] . Mohammed Al Qady, Amr Kandil Journal of Construction Engineering and Management . 2010,第3期

机译：使用自然语言处理从施工文件中提取概念关系
3. Syntactic and semantic information extraction from NPP procedures utilizing natural language processing integrated with rules [J] . Yongsun Choi, Minh Duc Nguyen, Thomas N. Kerr Nuclear engineering and technology . 2021,第3期

机译：利用规则集成的自然语言处理的NPP程序句法和语义信息提取
4. Improved Text Extraction from PDF Documents for Large-Scale Natural Language Processing [C] . Joerg Tiedemann International conference on intelligent text processing and computational linguistics . 2014

机译：改进的从PDF文档中提取文本以进行大规模自然语言处理
5. Internet data extraction based on automatic regular expression inference. [D] . Lin, Ye. 2007

机译：基于自动正则表达式推断的Internet数据提取。
6. Clinical trial cohort selection based on multi-level rule-based natural language processing system [O] . Long Chen, Yu Gu, Xin Ji, 2019

机译：基于多级规则的自然语言处理系统的临床试验队列选择
7. An Extendible Regular Expression Compiler for Finite-state Approaches in Natural Language Processing [O] . 2008

机译：一种用于自然语言处理中有限状态方法的可扩展正则表达式编译器

Rule Based Chunk Extraction from PDF Documents Using Regular Expressions and Natural Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅