Slide 1
Bioinformatics
Biopython
Many Formats
Human Genome Project
What does a format look like?
Need to parse a format
lex/yacc are too complicated
Roll one by hand
What's needed? / Use cases
More requirements
Icarus
General form of most formats
Form of a parser
Arrangment of blocks
This is regular format
Most bioinformatics formats are regular!
Slide 18
Parsing with regular expressions
Can't get all of the data
The regexp makes a parse tree
XML
SAX traversal of the parse tree
Martel - a new regexp engine
Martel (continued)
Why Plex is needed
With Plex
Example use - displaying as XML
Adding HTML markup
Marking up semi-structured formats
Using XML tools
Equivalent Bioperl code
Large Files
RecordReaders
Format definition using RecordReaders
Named Group Repeats
Other features
Timings
Validation
Version Detection? "Or" each format
Version Detection Overhead
XSLT
Iterators
make_iterator
Mindy
Creating and using a Mindy Database
Bugs
Future Work
Naming
Why "Martel"?
Author: Andrew Dalke
E-mail: dalke@dalkescientific.com