Dalke Scientific Software: More science.  Less time.

Presentations by Andrew Dalke
Dalke Scientific Software, LLC

O'Reilly Bioinformatics Conference
January 28-31, 2002

Note: Each slide has a '.' at the bottom left corner which links to the next slide


Biopython (www.biopython.org) is an international collaboration to collect and produce open source bioinformatics tools written in the Python programming language. Founded in 1999 it now has users and developers from four different continents. Biopython is one of the projects developed under the umbrella of the Open Bioinformatics Foundation (www.open-bio.org).

This is an exciting time in the Biopython. We are approaching the 1.0 release. The distribution includes interfaces to query local and distributed bioinformatics tools and databases, data structures for manipulating biological resources, a generalized parsing framework which unifies the many flat-file formats, libraries for working with different database systems, and servers to provide remote access to these services.

The presentation will start with an overview of the Biopython project then show how the different aspects can be used to build a bioinformatics platform for a research group. It will end with pointers to other Python projects in the life sciences, including projects in structural biology and chemical informatics.
  1. Title page
  2. My Background
  3. What is Python?
  4. What about Perl?
  5. My clients
  6. Adding array elements in Perl
  7. Biopython
  8. There are other modules
  9. Sequence as string
  10. Sequence as object
  11. Typed Sequences
  12. Sequence Record
  13. What's in a record?
  14. More record information
  15. Reading from any file
  16. Loading from the web
  17. Loading from the BioSQL database
  1. Writing a record
  2. Conversion by hand
  3. Easy conversion
  4. Similarity Searches
  5. What's in a search?
  6. Flat-file indexing
  7. Mindy
  8. Mindy lookups
  9. Python Elsewhere in the Life Sciences
  10. PyDaylight
  11. More PyDaylight
  12. Other small molecule chemistry tools
  13. Structural Biology
  14. PyMol
  15. Commercial Python Support
  16. For more information

Martel, Parsing, and Bioformats

One of the tedious problems in bioinformatics is dealing with legacy flat-file formats. In this talk Andrew will present Martel, the Biopython parser generator which lets us handle existing files as if they are already in XML.
  1. Title page
  2. Motivation
  3. Requirements
  4. Guiding Problem
  5. Wait!
  6. "Traditional parsing"
  7. Event-driven parsers
  8. Fine grained events
  9. Regular Expressions
  10. Named Groups
  11. The parsed string is a tree
  12. Returning the parse tree
  13. What it looks like
  14. ContentHandler
  15. Using DOM
  1. LAX
  2. Markup
  3. A real format, SWISS-PROT 38
  4. Supporting Multiple Formats
  5. Std tags
  6. StdHandler
  7. Format hierarchies
  8. Testing
  9. Formats change
  10. Changed RX line
  11. Tweaking an existing format
  12. But wait, there's more!
  13. Call for Help
  14. More on Martel

A Plea for Paranoia

Code, documentation, and databases all have errors. All too often they are silent and hidden. Just because something seems to work and has been used for years doesn't mean it's right. I'll show problems taken from bio software and suggestions for how to write and test code to make errors noisy and visible.
  1. Title page
  2. Undocumented features
  3. Oops!
  4. Formats are not to spec
  5. Formats Change
  6. Be Loud - Be Noisy
  7. GenBank parsing
  8. Test the data
  9. Don't be doubly paranoid
  10. Suggestions on Paranoia