Dalke Scientific Software: More science. Less time. Products

[Download sre_dump.py]

Python's standard re regular expression library uses the sre modules developed by Secret Labs AB. (Tack så mycket, effbot!)

It is an internal module, which means it isn't documented and you shouldn't use it. Despite that, it's a very useful module if you are like me and develop alternate ways to parse using regular expressions. With a bit of work you can make your own special-purpose C code given a regular expression, or make a SAX event generating parser generator like Martel.

Less esoterically, you can do like Jeff Petkau did and create strings which match a given pattern.

When writing these tools, it's very helpful to know where you are in the tree. sre_dump lets you dump the tree back into a regular expression, as in:

>>> import sre_dump, sre_parse
>>> tree = sre_parse.parse("AB|CD")
>>> tree
[('branch', (None, [[('literal', 65), ('literal', 66)], [('literal', 67), ('literal', 68)]]))]
>>> sre_dump.dump(tree)
'AB|CD'
>>> sre_dump.dump(tree[0][1][1][0])
'AB'
>>>

For more in-depth debugging of regular expression generated state diagrams, you will want to know where the given node came from. Viewing just the text of the subpattern doesn't help because it can exist several times in the pattern. The 'dump_and_offsets' function also returns a list of locations for each subexpression. The list is a 3-tuple of (expression, start position, end position).

>>> s, offsets = sre_dump.dump_and_offsets(tree)
>>> def show_offsets(s, offsets):
...     print s
...     for expr, i, j, text in offsets:
...        print " "*i + "-"*(j-i) + " " *(len(s)-j+1), s[i:j]
...
>>> show_offsets(s, offsets)
AB|CD
-      A
 -     B
   -   C
    -  D
-----  AB|CD
>>>

sre_dump has been placed in the public domain.



Copyright © 2001-2020 Andrew Dalke Scientific AB