Sunday, June 26, 2011

Parsing into a dict

If you've ever used Pyparsing before, you probably know that you can "tag" certain parts of a Pyparsing grammar, and the results will provide a dictionary mapping those tags to the values that their corresponding grammar components parsed to:
>>> from pyparsing import *
>>> parser = Word(alphas).setResultsName("first") + Word(alphas).setResultsName("second")
>>> results = parser.parseString("one two")
>>> results["first"]
>>> results["second"]

So, how does one go about doing this in Parcon? The answer is not as obvious as it is with Pyparsing, since Parcon parsers result in a single value, not a composite of tokens and a dictionary in the form of Pyparsing's ParseResults. This doesn't mean that tagging's impossible, however, or even difficult. It's actually relatively simple:
>>> from parcon import *
>>> parser = (alpha_word["first"] + alpha_word["second"])[dict]
>>> results = parser.parse_string("one two")
>>> results["first"]
>>> results["second"]
>>> results
{'second': 'two', 'first': 'one'}
>>> type(results)
<type 'dict'>

What's actually going on here?

The first thing that you need to know is that parser["tag"] is short for Tag("tag", parser). This only works when "tag" is a string (unicode strings also work); Tag must be used explicitly for other types of values.

What Tag does is wraps the result of whatever parse it's passed in a Pair, with the key set to "tag" (or whatever value was specified as the tag) and the value set to whatever the underlying parser resulted in.

Pair is a subclass of tuple (it's actually created by collections.namedtuple). Parcon's concatenation of tuples when using + does not apply to namedtuples or any value that simply subclasses tuple, so instances of Pair will be preserved even when using + to string things together.

From this, we can tell that the parser:
alpha_word["first"] + alpha_word["second"]

will result in (Pair('first', 'one'), Pair('second', 'two')) when handed "one two" as input. So how do we get this into a dictionary?

Simple. You'll notice that in the example near the top of this post, that word-parsing parser snippet was wrapped in parentheses and then transformed with [dict], which, as you probably know, is short for Translate. Here's the magical property: Pair, though it may be its own class, subclasses from tuple, which means that dict will recognize a tuple of tuples and convert it into a dictionary. Bingo. We simply pass the result through a Transform with dict as the transformation function, and our result becomes:
{'second': 'two', 'first': 'one'}

which is exactly what we want.

Of course, if you have a bunch of nested concatenations and list-producing parsers (such as ZeroOrMore), you'll probably want to change [dict] to [flatten][dict] to flatten them all out. flatten treats Pairs (and any other subclass of tuple) as individual objects and so does not flatten them out, so this works as expected.

1 comment:

  1. When half in} a blackjack event, you play in opposition to other players, not in opposition to the on line casino. 카지노사이트 All players start with the identical amount of chips and play the identical variety of arms. The player who has essentially the most chips at the finish of the spherical wins and then performs in opposition to the other desk winners. Sometimes more than one player can advance to the next spherical. Blackjack Poker is performed with a normal 52 card deck.