首页> 美国卫生研究院文献>Chemical Science >Found in Translation: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models
【2h】

Found in Translation: predicting outcomes of complex organic chemistry reactions using neural sequence-to-sequence models

机译:发现于翻译中:使用神经序列到序列模型预测复杂的有机化学反应的结果

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Based on this analogy, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a tokenization, which is arbitrarily extensible with reaction information. Using an attention-based model borrowed from human language translation, we improve the state-of-the-art solutions in reaction prediction on the top-1 accuracy by achieving 80.3% without relying on auxiliary knowledge, such as reaction templates or explicit atomic features. Also, a top-1 accuracy of 65.4% is reached on a larger and noisier dataset.
机译:有一个直观的类比,就是有机化学家对化合物的理解与语言使用者对单词的理解。基于此类推,可以引入基本概念并分析语言分析对有机化学领域的潜在影响。在这项工作中,我们通过引入无模板序列到序列模型,训练有素的端到端和完全数据驱动的方法,将反应预测任务转换为翻译问题。我们提出了标记化,它可以随反应信息任意扩展。使用从人类语言翻译中借鉴的基于注意力的模型,我们在不依赖辅助知识(例如反应模板或显式原子特征)的情况下达到80.3%的水平,从而提高了对top-1准确性的反应预测的最新解决方案。此外,在更大,更嘈杂的数据集上,其top-1准确性达到65.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号