2012-11-02

PLY(Python-based Lex-Yacc) and pyCparser

8.20: web.adda/dev.py/ply
11.2: PLY compared with ANTLR:
There are several reasons to use ANTLR
over one of the Python parsers
like PLY and PyParsing.
The GUI interface is very nice,
it has a sophisticated understanding of
how to define a grammar,
it's top-down approach means the
generated code is easier to understand,
tree-grammers simplify some otherwise
tedious parts of writing parsers,
and it supports multiple output languages:
its builds parsers best in C, C# and Java;
but also has some support for generating Python .
ANTLR is written in Java;
Unlike the standard yacc/lex combination,
it combines both lexing and parsing rules
into the same specification document,
and adds tree transformation rules
for manipulating the AST.
. so, if you prefer python tools over Java, PLY ahead!

PLY (Python-based Lex-Yacc)
-- a dev.py parser --
Python Projects Using PLY:
    pycparser. A C parser implemented in Python.
non-python projects using PLY:
    cppheaderparser. A C++ header parser.
    ZXBasic. A cross compiler
translates BASIC into Z80 assembler.
    CGCC. A C GCode compiler
for industrial CNC machines
PLY's readme:
PLY requires the use of Python 2.2 or greater.
However, use the latest Python release if possible.
PLY-3.0 the first PLY release to support Python 3.
However, backwards compatibility with Python 2.2 is still preserved.
PLY provides dual Python 2/3 compatibility by
restricting its implementation to a common subset .
You should not convert PLY using 2to3
--it is not necessary and may in fact break the implementation.
    It uses LR-parsing which is reasonably efficient
and well suited for larger grammars.
PLY provides most of the standard lex/yacc features
including support for empty productions,
precedence rules, error recovery,
and support for ambiguous grammars.
 -  PLY provides *very* extensive error reporting
 The original was developed for instructional purposes.
 As a result, the system tries to identify
 the most common typesof errors made by novice users.
 -  PLY uses Python introspection features
 to build lexers and parsers.
 This greatly simplifies the task of parser construction
 since it reduces the number of files
 and eliminates the need to run a separate lex/yacc
tool before running your program.
downloads
   v3.4 source[2012.8]:
The example directory includes a PLY specification
for ANSI C as given in K&R 2nd Ed. 
    current documentation.
    simple example.
    Slides from Dave's PyCon'07 Talk about PLY.
Support
    issues page(report bugs here)
    ply-hack group 
(devoted to PLY use and development)
   Development support group
(issue tracking, repository access, etc.)

11.2: summarizing this page's links:
Andrew Dalke`How to construct a parser using PLY
. general-purpose tools help build parsers
with machine understandable grammar;
eg, BNF (Backus-Naur Form).
. lex and yacc were introduced as part of Unix.
Lex is a lexer [breaks the text into lexical elements]
and yacc is "yet another compiler compiler" .
There are many parser tools for Python.
PLY is the more modern tool like SPARK .
As with the final code in the "parsing by hand" talk
the lexer assumes all the tokens
are described by regular expressions
and [that their meaning
is not dependent on context .]
. techniques to resolve ambiguity include
lookahead (see what the next tokens are)
and precedence rules .
The limited parsing algorithms go by names like
LALR(1), SLR, LL and LR.
The most general purpose is the
Earley algorithm (which SPARK used).
Dr. Dobb's Journal`Shannon Behrens`PLY tutorial:
. the Python Lex-Yacc (PLY) toolset is for
prototyping interpreter front-ends in Python:
it creates fully working interpreters
in surprisingly few lines of code.
. unlike unix-C's Lex and Yacc,
PLY relies on Python reflection,
so there are no separate Lex and Yacc files
—everything is in Python.

. the example language is lazily evaluated
and supports functional programming .
Python supports runtime type detection.
Using Python's type system
means not inventing your own .
. Python list operators make it is easy to
work with the parse trees [at the back end ].
. targeting Python means having
access to the Python libraries .

Lex: characters -> tokens
(ints, strings, keywords, and symbols);
Yacc: tokens -> language constructs [productions]
(eg, variable declarations, function definitions, and if statements).
Unlike Yacc in C, which uses LALR-parsing,
PLY uses LR-parsing,
which is reasonably efficient and well suited,
but slightly restricts the types of grammars
that can be successfully parsed.

The front end is implemented with PLY:
1: write the tokenizer using Python Lex;
Unlike Lex in C, which uses unions,
the returned value has an attribute named "value" .
. due to Python's [2004] regular expression library,
lexer can only read input from a string;
2: lex.lex() is used to build the tokenizer.
. support for being a main program is included
(if __name__ == '__main__':)
that tests the functionality of the tokenizer
using regression tests.
This is almost essential when writing a language.
3: write the parser using Python Yacc;
After importing the tokenizer's output;
a function is defined for each [production].
4: yacc.yacc() is called
to construct the LR parsing tables
for the grammar in the given parser rules.
5: Implement the Back End .
[your front end built the parser trees;
now you need to translate these
into your target language (eg, x86 ASM).]

Python Magazine`Pablo Troncoso`
"A Grammar-Based Approach for Decoding Binary Streams",
[11.2: this site was down and its archive was too! ]

the license:
Copyright (C) 2001-2011, David M. Beazley (Dabeaz LLC)
* Redistributions of source code
must retain the above copyright notice,
  this list of conditions and the following disclaimer.
* Neither the name of the David Beazley or Dabeaz LLC
may be used to endorse or promote
products derived from this software
without specific prior written permission.
* Redistributions in binary form must do the same
  in the documentation and/or other materials .