NLP Analytics in Finance with DoRe: a French 257M Tokens Corpus of Corporate Annual Reports

机译：与DORE的金融中的NLP分析：法国257M令牌的公司年度报告的语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent advances in neural computing and word embeddings for semantic processing open many new applications areas which had been left unaddressed so far because of inadequate language understanding capacity. But this new kind of approaches rely even more on training data to be operational. Corpora for financial applications exists, but most of them concern stock market prediction and are in English. To address this need for the French language and regulation oriented applications which require a deeper understanding of the text content, we hereby present "DoRe", a French and dialectal French Corpus for NLP analytics in Finance, Regulation and Investment. This corpus is composed of: (a) 2350 Annual Reports from 336 companies among the most capitalized companies in: France (Euronext Paris) & Belgium (Euronext Brussels), covering a time frame from 2009 to 2019, and (b) related MetaData containing information for each company about its IS1N code, capitalization and sector. This corpus is designed to be as modular as possible in order to allow for maximum reuse in different tasks pertaining to Economics, Finance and Regulation. After presenting existing resources, we relate the construction of the DoRe corpus and the rationale behind our choices, concluding on the spectrum of possible uses of this new resource for NLP applications.

机译：神经计算和Word Embeddings的最新进展对于语义处理开辟了许多新的应用领域，因为语言理解能力不足，迄今为止被遗弃了。但这种新的方法甚至更多地依赖于培训数据进行操作。对财务应用的Corpora存在，但其中大部分都关注股票市场预测，呈英文。为了解决需要更深入了解文本内容的法语和法规的申请，我们在此提出“DORE”，法国和辩证法为金融，监管和投资中的NLP分析。该语料库由以下内容组成：（a）来自336家公司的2350个公司中最资本化公司的年度报告：法国（Euronext Paris）和比利时（Eureonext Brussels），涵盖了2009年至2019年的时间框架，（b）相关元数据每个公司的信息有关其IS1N代码，大写和部门的信息。该语料库设计为尽可能模块化，以便在与经济学，金融和调节有关的不同任务中最大限度地重复使用。在呈现现有资源之后，我们涉及DORE语料库的构建以及我们选择背后的理由，结束了对NLP应用程序可能使用的可能用途的频谱。

著录项

来源
《International Conference on Language Resources and Evaluation》|2020年|2261-2267|共7页
会议地点
作者
Corentin Masson; Patrick Paroubek;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Corpus; French; Finance; Annual Reports;

机译：语料库;法语;金融;年度报告;

相似文献

外文文献
中文文献
专利

1. Building a Trustworthy Corporate Identity: A Corpus-Based Analysis of Stance in Annual and Corporate Social Responsibility Reports [J] . Fuoli Matteo Applied linguistics . 2018,第6期

机译：建立可信赖的企业形象：年度和企业社会责任报告中基于语料库的立场分析
2. Corporate governance fraud detection from annual reports using big data analytics [J] . G. Sudha Sadasivam, Mutyala Subrahmanyam, Dasaraju Himachalam, International Journal of Big Data Intelligence . 2016,第1期

机译：使用大数据分析从年度报告中检测公司治理欺诈
3. Measuring the Readability of Sustainability Reports: A Corpus-Based Analysis Through Standard Formulae and NLP [J] . Smeuninx Nils, De Clerck Bernard, Aerts Walter Journal of business communication . 2020,第1期

机译：衡量可持续发展报告的可读性：通过标准公式和NLP进行基于语料库的分析
4. A Corpus of Corporate Annual and Social Responsibility Reports: 280 Million Tokens of Balanced Organizational Writing [C] . Sebastian G. M. Haendschke, Sven Buechel, Jan Goldenstein, First workshop on economics and natural language processing . 2018

机译：公司年度和社会责任报告文集：2.8亿个平衡的组织写作令牌
5. Corporate reporting: From stewardship to contract the annual reports of the United States Steel Corporation (1902--2006). [D] . Carduff, Kevin Christopher. 2010

机译：企业报告：从管理到签订美国钢铁公司的年度报告（1902--2006）。
6. Social Responsibility Practices of EHR Vendors: An Analysis of Disclosures in Annual Corporate Reports and Websites [O] . Brian R. Jackson 2018

机译：电子病历供应商的社会责任实践：年度公司报告和网站中的披露分析
7. Building a trustworthy corporate identity : A corpus-based analysis of stance in annual and corporate social responsibility reports [O] . Fuoli, Matteo 2016

机译：建立值得信赖的企业形象：基于语料库的年度和企业社会责任报告中的立场分析

NLP Analytics in Finance with DoRe: a French 257M Tokens Corpus of Corporate Annual Reports

摘要

著录项

相似文献

相关主题

期刊订阅