首页> 外文学位 >Context-aware scanning and determinism-preserving grammar composition, in theory and practice.
【24h】

Context-aware scanning and determinism-preserving grammar composition, in theory and practice.

机译:在理论和实践中都可以使用上下文感知扫描和确定性语法结构。

获取原文
获取原文并翻译 | 示例

摘要

This thesis documents several new developments in the theory of parsing, and also the practical value of their implementation in the Copper parser generator.;The most widely-used apparatus for syntactic analysis of programming languages consists of a scanner based on deterministic finite automata, built from a set of regular expressions called the lexical syntax, and a separate parser, operating on the output of this scanner, built from a context-free grammar and utilizing the LALR(1) algorithm.;While the LALR(1) algorithm has the advantage of guaranteeing a non-ambiguous parse, and the approach of keeping the scanner and parser separate make the compilation process more clean and modular, it is also a brittle approach. The class of grammars that can be parsed with an LALR(1) parser is not closed under grammar composition, and small changes to an LALR(1) grammar can remove the grammar from the LALR(1) class. Also, the separation of scanner and parser prevent the use, in any organized way, of parser context to resolve ambiguities in the lexical syntax.;One area in which both of these drawbacks pose a particular problem is that of parsing embedded and extensible languages. In particular, it forms one of the last major impediments to the development of an extensible compiler in which language extensions are imported and composed by the end user (programmer) in an analogous manner to the way libraries are presently imported. This is due not only to the problem of the LALR(1) grammar class not being closed under composition, but to the very real possibility that the lexical syntax of two different extensions will clash, making it impossible to construct a scanner without extensive manual resolution of ambiguities, if at all.;This thesis presents three innovations that are a large step towards eliminating parsing as an Achilles' heel in this application. Firstly, it describes a new algorithm of scanning called context-aware scanning, in which the scanner at each scan is made aware of what sorts of tokens are recognized as valid by the parser at that point. By allowing the use of parser context in the scanner to disambiguate, context-aware scanning makes the specification of embedded languages much simpler---instead of specifying a scanner that must explicitly switch "modes" to scan on the different embedded languages, one simply compiles a context-aware scanner from a single lexical specification, which has implicit "modes" to scan properly on each embedded language. Similarly, in an extensible language, this puts a degree of separation between the lexical syntax of different language extensions, so that any clashes of this sort will not be an issue.;Secondly, the thesis describes an analysis that operates on grammar extensions of a certain form, and can recognize their membership in a class of extensions that can be composed with each other and still produce a deterministic parser---enabling end-users to compose extensions written by different language developers with this guarantee of determinism. The analysis is made practical by context-aware scanning, which ensures a lack of lexical issues to go along with the lack of syntactic nondeterminism. It is this analysis---the first of its kind---that is the largest step toward realizing the sort of extensible compiler described above, as the writer of each extension can test it independently using the analysis and thus resolve any lexical or syntactic issues with the extension before the end user ever sees it.;Thirdly, the thesis describes a corollary of this analysis, which allows extensions that have passed the analysis to be distributed in parse table form and then composed on-the-fly by the end users, with the same guarantee of determinism. Besides expediting the operation of composition, this also enables the use of the analysis in situations where the writer of a language or language extension does not want to release its grammar to the public.;Finally, the thesis discusses how these approaches have been implemented and made practical in Copper, including runtime tests, implementations and analyses of real-world grammars and extensions.
机译:这篇论文记录了解析理论的一些新发展,以及它们在Copper解析器生成器中实现的实用价值。;用于编程语言语法分析的最广泛使用的设备包括一个基于确定性有限自动机的扫描仪,来自一组称为词法语法的正则表达式,以及一个单独的解析器,对该解析器的输出进行操作,该解析器是根据上下文无关的语法并使用LALR(1)算法构建的;而LALR(1)算法具有保证无歧义解析的优势,以及使扫描器和解析器保持分离的方法使编译过程更加简洁和模块化,这也是一种脆弱的方法。可以使用LALR(1)解析器解析的语法类别不会在语法组成下关闭,对LALR(1)语法进行的细微更改都可以将语法从LALR(1)类中删除。同样,扫描程序和解析器的分离会阻止以任何有组织的方式使用解析器上下文来解决词汇语法中的歧义。这两个缺点都构成一个特殊问题的一个领域是解析嵌入式语言和可扩展语言。特别是,它构成了可扩展编译器开发的最后主要障碍之一,在该可扩展编译器中,语言扩展是由最终用户(程序员)导入并由终端用户(程序员)以与当前导入库的方式类似的方式组成的。这不仅是由于LALR(1)语法类没有在合成下关闭的问题,而且还因为两个不同扩展名的词法语法发生冲突的可能性极高,这使得没有广泛的手动解决方案就无法构建扫描仪本文提出了三项创新,这是朝着消除此应用程序中的致命弱点迈出的一大步。首先,它描述了一种称为上下文感知扫描的新扫描算法,该扫描算法使每次扫描的扫描器都知道解析器在此时识别出哪种令牌是有效的。通过允许在扫描器中使用解析器上下文来消除歧义,上下文感知扫描使嵌入式语言的规范变得更加简单-而不是指定必须显式切换“模式”以扫描不同嵌入式语言的扫描器,一个简单从单个词汇规范编译一个上下文感知的扫描器,该规范具有隐式“模式”,可以在每种嵌入式语言上进行正确扫描。同样,在可扩展语言中,这将不同语言扩展的词法语法分隔开一定程度,因此,此类冲突不会成为问题。其次,本文描述了一种对a的语法扩展进行分析的方法。并以某种形式识别它们的成员身份,这些成员可以相互组成并且仍然可以产生确定性的解析器-使得最终用户可以在保证确定性的前提下编写由不同语言开发人员编写的扩展。通过上下文感知扫描使分析变得切实可行,这确保了缺乏词汇问题以及缺乏语法不确定性。这是分析(这是第一个此类分析),是实现上述可扩展编译器的最大步骤,因为每个扩展的作者都可以使用分析独立地对其进行测试,从而解析出任何词法或句法第三,论文描述了这种分析的推论,它允许通过分析的扩展以解析表的形式分发,然后在最后即时组成用户,并具有确定性的保证。除了加快作文的操作速度外,这还使得在语言或语言扩展的作者不想向公众发布其语法的情况下也可以使用分析。最后,本文讨论了如何实现这些方法以及在Copper中变得实用,包括运行时测试,现实语法和扩展的实现和分析。

著录项

  • 作者

    Schwerdfeger, August.;

  • 作者单位

    University of Minnesota.;

  • 授予单位 University of Minnesota.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 318 p.
  • 总页数 318
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号