Wednesday, July 13, 2011

Generating syntax diagrams from Parcon parsers (and from regular expressions)

Parcon 0.1.23 is out, and it has what I think is the coolest feature of Parcon yet: syntax diagram generation. Parcon can now automatically generate syntax diagrams for any Parcon grammar, and from regular expressions as well.

Let's take a look at the expression evaluator example provided on Parcon's PyPI page:
from parcon import number, Forward, InfixExpr
import operator
expr = Forward()
term = number[float] | "(" + expr + ")"
term = InfixExpr(term, [("*", operator.mul), ("/", operator.truediv)])
term = InfixExpr(term, [("+", operator.add), ("-", operator.sub)])
expr << term(name="expr")

To generate a syntax diagram for this expression evaluator, all we have to do is this:
expr.draw_productions_to_png({}, "syntax-expr.png")

Which produces the following image: (click on the image to view a larger version)



At this point, you might be wondering how the syntax diagram generator is able to work out what names to give all of the productions in the resulting syntax diagram. It can't tell what Python variables hold each parser; Python doesn't provide any means for figuring that out. So how does it do it?

The answer is that little bit at the end of the last line of the expression evaluator: (name="expr"). parser(name="example") is short for Name("example", parser). Name is a parser that functions exactly as the parser it's constructed with, but it "tags", so to speak, the parser with a name that will be used when generating syntax diagrams. The number parser (which we imported from Parcon) is also named by Parcon itself, hence number and digit as productions.

So far, so good. But what if we wanted part of our grammar to be written as a regular expression? Regular expressions do have their advantages. Parcon supports this via the Regex parser, but you might think that the resulting syntax diagram would simply include a box containing the regular expression itself.

This, however, is not what happens. Parcon is smart enough that, for most regular expressions, it can actually parse through the regular expression and create a syntax diagram for it. For example:
r = Regex("hello (world|james|alex)\\.( How are you\\?)?")(name="example regex")
r.draw_productions_to_png({}, "test8.png")

The resulting image, believe it or not, is this:



Parcon can decompose most regular expressions in this way. There are a few that it can't; regular expressions that include lookahead are an example of this. For such regular expressions, they will be included simply as a box containing the entire regex.

Now, of course, comes the "gotcha" in syntax diagram generation: not all Parcon parsers can be converted to syntax diagrams. Some, like Except, don't really have any sensible way in which they could be drawn as a syntax diagram. Some, like Bind, don't have enough information to actually generate a syntax diagram. Only parsers that subclass parcon.railroad.Railroadable (or one of its subclasses, such as parcon._GRParser) can be converted to syntax diagrams.

In fact, if you try to create a syntax diagram from a Parcon grammar that uses any of these parsers, you'll get an exception. But don't panic and start thinking that you'll have to avoid these parsers altogether; this is where Description comes in.

Description is similar to Name, both in use and in functionality; parser(description="example") and parser(desc="example") are both shortcuts for Description("example", parser). However, there's one major difference between Name and Description: the contents of Name instances are themselves converted to syntax diagrams and included in the resulting .png file, whereas the contents of Description instances are not. Thus:
person = Literal("world")(name="person")
greeting = (Literal("hello") + person)(name="greeting")
greeting.draw_productions_to_png({}, "greeting-name.png")

results in:



but:
person = Literal("world")(description="person")
greeting = (Literal("hello") + person)(name="greeting")
greeting.draw_productions_to_png({}, "greeting-description.png")

results in:



This can be used to allow parsers that can't be converted to syntax diagrams to take part in your grammar: as long as they're wrapped in a Description, you can use them just fine, since Parcon won't ever try to convert them into a syntax diagram.

That's it for today, folkes. The syntax diagram generator can actually be used by itself, independently from Parcon, to generate arbitrary syntax diagrams, but I'll save that for another blog post. I'll also save what that empty dict I keep passing to draw_productions_to_png is for another blog post.

1 comment:

  1. Those diagrams are beautiful! They look like they come from a well-designed 1970s programming book — yet are produced from a program? Wow, very nice; I will have to remember the name of this tool!

    ReplyDelete