Compilers - Am I writing one?
Hans Fugal
hans at fugal.net
Mon Jun 30 09:49:32 MDT 2008
Michael Torrie wrote:
> Levi Pearson wrote:
>> To clarify Jacob's comment, you've got to eliminate left-recursion in
>> the grammar before recursive descent parsing will work *at all*.
>> Consider the following grammar rule:
>>
>> <foo>: <foo> + <bar>
>>
>> In a recursive descent parser, you move through the productions in a
>> rule from left to right and call the parsers for those productions.
>> So, when you hit a 'foo', you call the 'foo' parser. The first
>> production is 'foo', so you call the 'foo' parser, et voila, infinite
>> recursion.
>
> The problem with EBNF is that it doesn't easily allow the representation
> of iteration without resorting to recursion. For example, consider the
> parsing of a mathematical expression, dealing with adding and
> subtracting. As you allude to, it looks something like this:
>
> factor: factor + term |
> factor - term |
> term
>
> We could put this as right-recursive, which is very easy to parse. But
> sometimes this would not be correct (it would be here, however). What
> we'd really like to say is that:
>
> factor: term followed by 0 or more "+ term" or 0 or more "- term"
>
> This is why I find it so much easier to use syntax diagrams rather than
> EBNF both to show the formal grammar and to construct the parser, since
> much of what you want to express is iterative.
>
> So for the example above, a syntax diagram could take the form:
>
> factor ---> term ------------------------------->
> / \ /
> | |--> '+' ---> term --
> | | / \
> | -> '-' - |
> \ /
> -----------------------------
>
> This diagram makes it very easy, provided you have at least a few token
> lookup, to construct the recursive-descent parser.
>
> def factor():
> result = term()
>
> while peak_next_token in ['+','-']:
> op = next_token()
> switch op:
> case '+':
> result += term()
> case '-':
> result -= term()
> else:
> raise ParseError
> return result
>
> This is the way we did our compiler in CS 431 at BYU, and it works well
> enough. Certainly it's easier to create syntax diagrams than form EBNF
> statements, although they are certainly equivalent. This example does
> convert fairly easily to EBNF, but other examples are not so easy. For
> example the syntax diagram for an if/then/elif/else block. Using syntax
> diagrams you can represent the entire thing in one production, and quite
> easily build a recursive-descent parser for that. Converting it to
> EBNF, however, requires one to create quite a few productions. It's
> much more complicated.
>
> Many parser generators, like ANTLR, allow you to define grammars in this
> way and automatically generate the parser.
In which way? They take ascii art as input? I had no trouble with the
syntax diagrams in 431, but I found them hard to work with. I couldn't
send one by email easily. I had to have a PDF viewer and scan through
dozens of pages of diagrams (we know how much fun that can be) or lug
the large book (which I printed out at Kinkos) around. I decided to
write my own parser generator, and so I had to enter all those diagrams
into BNF format (which seemed like a reasonable machine-readable
choice), and while tedious I don't remember it being difficult. In fact,
I seem to remember very quickly finding a few patterns, and converting
to BNF was primarily a rote exercise.
--
Hans Fugal ; http://hans.fugal.net
There's nothing remarkable about it. All one has to do is hit the
right keys at the right time and the instrument plays itself.
-- Johann Sebastian Bach
More information about the PLUG
mailing list