>>> from pyparsing import * >>> parser = Word(alphas).setResultsName("first") + Word(alphas).setResultsName("second") >>> results = parser.parseString("one two") >>> results["first"] 'one' >>> results["second"] 'two'
So, how does one go about doing this in Parcon? The answer is not as obvious as it is with Pyparsing, since Parcon parsers result in a single value, not a composite of tokens and a dictionary in the form of Pyparsing's ParseResults. This doesn't mean that tagging's impossible, however, or even difficult. It's actually relatively simple:
>>> from parcon import * >>> parser = (alpha_word["first"] + alpha_word["second"])[dict] >>> results = parser.parse_string("one two") >>> results["first"] 'one' >>> results["second"] 'two' >>> results {'second': 'two', 'first': 'one'} >>> type(results) <type 'dict'>
What's actually going on here?
The first thing that you need to know is that parser["tag"] is short for Tag("tag", parser). This only works when "tag" is a string (unicode strings also work); Tag must be used explicitly for other types of values.
What Tag does is wraps the result of whatever parse it's passed in a Pair, with the key set to "tag" (or whatever value was specified as the tag) and the value set to whatever the underlying parser resulted in.
Pair is a subclass of tuple (it's actually created by collections.namedtuple). Parcon's concatenation of tuples when using + does not apply to namedtuples or any value that simply subclasses tuple, so instances of Pair will be preserved even when using + to string things together.
From this, we can tell that the parser:
alpha_word["first"] + alpha_word["second"]
will result in (Pair('first', 'one'), Pair('second', 'two')) when handed "one two" as input. So how do we get this into a dictionary?
Simple. You'll notice that in the example near the top of this post, that word-parsing parser snippet was wrapped in parentheses and then transformed with [dict], which, as you probably know, is short for Translate. Here's the magical property: Pair, though it may be its own class, subclasses from tuple, which means that dict will recognize a tuple of tuples and convert it into a dictionary. Bingo. We simply pass the result through a Transform with dict as the transformation function, and our result becomes:
{'second': 'two', 'first': 'one'}
which is exactly what we want.
Of course, if you have a bunch of nested concatenations and list-producing parsers (such as ZeroOrMore), you'll probably want to change [dict] to [flatten][dict] to flatten them all out. flatten treats Pairs (and any other subclass of tuple) as individual objects and so does not flatten them out, so this works as expected.
No comments:
Post a Comment