首页> 外文会议>International Conference on Language Resources and Evaluation >NLP Analytics in Finance with DoRe: a French 257M Tokens Corpus of Corporate Annual Reports
【24h】

NLP Analytics in Finance with DoRe: a French 257M Tokens Corpus of Corporate Annual Reports

机译:与DORE的金融中的NLP分析:法国257M令牌的公司年度报告的语料库

获取原文

摘要

Recent advances in neural computing and word embeddings for semantic processing open many new applications areas which had been left unaddressed so far because of inadequate language understanding capacity. But this new kind of approaches rely even more on training data to be operational. Corpora for financial applications exists, but most of them concern stock market prediction and are in English. To address this need for the French language and regulation oriented applications which require a deeper understanding of the text content, we hereby present "DoRe", a French and dialectal French Corpus for NLP analytics in Finance, Regulation and Investment. This corpus is composed of: (a) 2350 Annual Reports from 336 companies among the most capitalized companies in: France (Euronext Paris) & Belgium (Euronext Brussels), covering a time frame from 2009 to 2019, and (b) related MetaData containing information for each company about its IS1N code, capitalization and sector. This corpus is designed to be as modular as possible in order to allow for maximum reuse in different tasks pertaining to Economics, Finance and Regulation. After presenting existing resources, we relate the construction of the DoRe corpus and the rationale behind our choices, concluding on the spectrum of possible uses of this new resource for NLP applications.
机译:神经计算和Word Embeddings的最新进展对于语义处理开辟了许多新的应用领域,因为语言理解能力不足,迄今为止被遗弃了。但这种新的方法甚至更多地依赖于培训数据进行操作。对财务应用的Corpora存在,但其中大部分都关注股票市场预测,呈英文。为了解决需要更深入了解文本内容的法语和法规的申请,我们在此提出“DORE”,法国和辩证法为金融,监管和投资中的NLP分析。该语料库由以下内容组成:(a)来自336家公司的2350个公司中最资本化公司的年度报告:法国(Euronext Paris)和比利时(Eureonext Brussels),涵盖了2009年至2019年的时间框架,(b)相关元数据每个公司的信息有关其IS1N代码,大写和部门的信息。该语料库设计为尽可能模块化,以便在与经济学,金融和调节有关的不同任务中最大限度地重复使用。在呈现现有资源之后,我们涉及DORE语料库的构建以及我们选择背后的理由,结束了对NLP应用程序可能使用的可能用途的频谱。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号