Compilers - Am I writing one?

Michael Torrie torriem at gmail.com
Mon Jun 30 09:44:03 MDT 2008


Levi Pearson wrote:
> To clarify Jacob's comment, you've got to eliminate left-recursion in
> the grammar before recursive descent parsing will work *at all*.
> Consider the following grammar rule:
> 
>         <foo>: <foo> + <bar>
> 
> In a recursive descent parser, you move through the productions in a
> rule from left to right and call the parsers for those productions.
> So, when you hit a 'foo', you call the 'foo' parser.  The first
> production is 'foo', so you call the 'foo' parser, et voila, infinite
> recursion.

The problem with EBNF is that it doesn't easily allow the representation
of iteration without resorting to recursion.  For example, consider the
parsing of a mathematical expression, dealing with adding and
subtracting.  As you allude to, it looks something like this:

factor: factor + term |
        factor - term |
        term

We could put this as right-recursive, which is very easy to parse.  But
sometimes this would not be correct (it would be here, however).  What
we'd really like to say is that:

factor: term followed by 0 or more "+ term" or 0 or more "- term"

This is why I find it so much easier to use syntax diagrams rather than
EBNF both to show the formal grammar and to construct the parser, since
much of what you want to express is iterative.

So for the example above, a syntax diagram could take the form:

factor ---> term ------------------------------->
        /        \                      /
       |          |--> '+' ---> term --
       |          |          /          \
       |            -> '-' -             |
        \                               /
          -----------------------------

This diagram makes it very easy, provided you have at least a few token
lookup, to construct the recursive-descent parser.

def factor():
    result = term()

    while peak_next_token in ['+','-']:
        op = next_token()
        switch op:
            case '+':
                result += term()
            case '-':
                result -= term()
            else:
                raise ParseError
    return result

This is the way we did our compiler in CS 431 at BYU, and it works well
enough.  Certainly it's easier to create syntax diagrams than form EBNF
statements, although they are certainly equivalent.  This example does
convert fairly easily to EBNF, but other examples are not so easy.  For
example the syntax diagram for an if/then/elif/else block.  Using syntax
diagrams you can represent the entire thing in one production, and quite
easily build a recursive-descent parser for that.  Converting it to
EBNF, however, requires one to create quite a few productions.  It's
much more complicated.

Many parser generators, like ANTLR, allow you to define grammars in this
way and automatically generate the parser.



More information about the PLUG mailing list