Compilers - Am I writing one?

Hans Fugal hans at fugal.net
Mon Jun 30 09:49:32 MDT 2008


Michael Torrie wrote:
> Levi Pearson wrote:
>> To clarify Jacob's comment, you've got to eliminate left-recursion in
>> the grammar before recursive descent parsing will work *at all*.
>> Consider the following grammar rule:
>>
>>         <foo>: <foo> + <bar>
>>
>> In a recursive descent parser, you move through the productions in a
>> rule from left to right and call the parsers for those productions.
>> So, when you hit a 'foo', you call the 'foo' parser.  The first
>> production is 'foo', so you call the 'foo' parser, et voila, infinite
>> recursion.
> 
> The problem with EBNF is that it doesn't easily allow the representation
> of iteration without resorting to recursion.  For example, consider the
> parsing of a mathematical expression, dealing with adding and
> subtracting.  As you allude to, it looks something like this:
> 
> factor: factor + term |
>         factor - term |
>         term
> 
> We could put this as right-recursive, which is very easy to parse.  But
> sometimes this would not be correct (it would be here, however).  What
> we'd really like to say is that:
> 
> factor: term followed by 0 or more "+ term" or 0 or more "- term"
> 
> This is why I find it so much easier to use syntax diagrams rather than
> EBNF both to show the formal grammar and to construct the parser, since
> much of what you want to express is iterative.
> 
> So for the example above, a syntax diagram could take the form:
> 
> factor ---> term ------------------------------->
>         /        \                      /
>        |          |--> '+' ---> term --
>        |          |          /          \
>        |            -> '-' -             |
>         \                               /
>           -----------------------------
> 
> This diagram makes it very easy, provided you have at least a few token
> lookup, to construct the recursive-descent parser.
> 
> def factor():
>     result = term()
> 
>     while peak_next_token in ['+','-']:
>         op = next_token()
>         switch op:
>             case '+':
>                 result += term()
>             case '-':
>                 result -= term()
>             else:
>                 raise ParseError
>     return result
> 
> This is the way we did our compiler in CS 431 at BYU, and it works well
> enough.  Certainly it's easier to create syntax diagrams than form EBNF
> statements, although they are certainly equivalent.  This example does
> convert fairly easily to EBNF, but other examples are not so easy.  For
> example the syntax diagram for an if/then/elif/else block.  Using syntax
> diagrams you can represent the entire thing in one production, and quite
> easily build a recursive-descent parser for that.  Converting it to
> EBNF, however, requires one to create quite a few productions.  It's
> much more complicated.
> 
> Many parser generators, like ANTLR, allow you to define grammars in this
> way and automatically generate the parser.

In which way? They take ascii art as input? I had no trouble with the 
syntax diagrams in 431, but I found them hard to work with. I couldn't 
send one by email easily. I had to have a PDF viewer and scan through 
dozens of pages of diagrams (we know how much fun that can be) or lug 
the large book (which I printed out at Kinkos) around. I decided to 
write my own parser generator, and so I had to enter all those diagrams 
into BNF format (which seemed like a reasonable machine-readable 
choice), and while tedious I don't remember it being difficult. In fact, 
I seem to remember very quickly finding a few patterns, and converting 
to BNF was primarily a rote exercise.

-- 
Hans Fugal ; http://hans.fugal.net

There's nothing remarkable about it. All one has to do is hit the
right keys at the right time and the instrument plays itself.
     -- Johann Sebastian Bach



More information about the PLUG mailing list