I teach training courses in Python programming for computational
chemistry, with an emphasis on cheminformatics.
No public courses are currently scheduled. I will be in Ireland for
OpenEye's CUP conference in September and in England in October. If
you are interested in on-site training during those trips,
let me know!
Computational chemists are not programmers, but programming is an
essential skill for developing new algorithms, generating data, and
analyzing results. Most working scientists have little training in
programming and end up spending a lot of time figuring out how to
parse a file format or work with a software library, rather than
figuring out the science.
My training course is designed for just these people.
I teach both corporate courses and public ones. The presentations are
a mixture of lecture and hands-on exercises in a similar style to my
NBN
courses. The specific topics will vary based on the needs of the
audience. Contact me if there's something you specifically want me to
discuss, and see the bottom of this page for a
partial list of topics I can cover.
Who am I?
My name is Andrew Dalke. I am a professional software developer with
years of experience creating tools for cheminformatics, molecular
modeling, bioinformatics, and related fields. Some the more public
projects I've been part of are VMD, NAMD, BioPython, PyDaylight, and
the Open Bioinformatics Foundation. Thoughout my career I have worked
closely with chemists to help them be more effective, by developing
software, providing one-on-one advice, training, and writing
essays
about software side of this field.
Most of my work over the last 10 years has been in Python, which is
the most popular high-level language in computational chemistry. Many
tools, especially in molecular visualization and chemical informatics,
have Python interfaces. Python is also one of the most popular
computer languages in the world, with mature software libraries for
everything from image manipulation and SQL databases to GUI and web
development. I am a member of the Python Software Foundation, which
is the non-profit that holds the copyright to Python.
Scheduled courses
The following courses are meant for computational chemists with some
programming background. It is not an introduction to programming
course. You must know how to write programs* and should have some experience
with Python.
Course fees include coffe breaks, lunch, and all presentation
materials. Each day starts at 9.00 and ends around 17.30.
If you have any questions, send me an email at trainingdiscard@dalkescientific.com. If you are from an
academic or non-profit group then you may qualify for a discount.
Python programming
Leipzig, Germany, 14-15 February 2011
This intense two day course covers the basics of how to use
Python. The first day is an overview the core
Python language and OEChem. The second day
covers essential libraries for calling out to command-line programs, handling CSV files, making plots, and
more.
Attendees must have some programming experience (know how to use
variables, for-loops, if-statements, and know how to use text editors
and command-line tools.
Registration will be €900 including VAT and is limited to 8
people. Special discount: attend this course plus the Django course for €2,000 instead of
€2,200.
The course will be hosted by and invoices sent by the Python Academy, located at
Zur Schule 20, Leipzig. Register
for the Python course or contact me at trainingdiscard@dalkescientific.com if you have any
questions.
Web Application Development with Django
Leipzig, Germany, 16-18 February 2011
This three day course walks through two real-world examples based on
the Django web application framework: an interactive descriptor
calculator and a PubChem database search system. This course is meant
for computational chemists who want to set up an in-house
cheminformatics server for specific analysis tasks. The topics I'll
cover are:
Attendees must have some existing Python experience. I will be
teaching a two day Python course
immediately before the Django course. If you attend both courses you
will get a discount.
Registration for the Django course is €1,300 and is limited to 8
people. Special discount: attend this course plus the Python
course for €2,000 instead of €2,200.
The course will be hosted by and invoices sent by the Python Academy, located at
Zur Schule 20, Leipzig. Register
for the Python course or ontact me at trainingdiscard@dalkescientific.com if you have any
questions.
What experience do you need?
My courses are meant for computational chemists who are not programmers
but have some programming experience. Computational chemists in this
case means small-molecule chemistry with an emphasis on
cheminformatics and a bit of molecular modeling. You must already
know the basic science, like SMILES, molecular graph representation,
SMARTS and substructure searches.
The phrase "some programming experience" means people who are
comfortable with strings, integers, floats, variables, if-statements,
for-loops, variables, lists/arrays, and defining functions. You must
also be comfortable working on the command-line and using a text
editor or IDE to write programs. You do not need to know object
oriented programming.
You should have some experience with Python but that is not essential.
I'll teach the Python-specific features as I work through my examples.
Most of the code will look similar in other languages so it should be
easy to follow.
For those just starting off in Python, the Python Beginner's
Guide contains links to many resources including online tutorials
and a list of books. You might be interested in the tutor mailing
list "for folks who want to ask questions regarding how to learn
computer programming with the Python language."
I have taught beginning programming to bioinformatics graduate
students. You can see my lecture
notes under the header "Introduction to Programming for
Bioinformatics in Python."
is a
python 2D plotting library which produces publication quality figures
for hard-copy and interactive use. I'll work through several examples
based on chemistry data sets, such as producing a scatter plot and
exporting the result. See
for small molecule chemistry.
It's full-featured, fast, and powerful, but a bit on the hard side to
use. Some of the topics I can cover are:
OpenBabel and RDKit
OpenBabel and RDKit are two open source cheminformatics
toolkits with Python interfaces. I've used OpenBabel a bit, mostly
through the high-level pybel library.
You might be thinking "why not teach OpenBabel instead of OEChem"?
After all, the library is freely available so anyone can install it
without needing a special license. I could teach it, but I don't have
as much experience as I do with OEChem. There are certain nuances of
every library, which I don't know for OpenBabel. I also think that
OEChem is a better library for most use, if you have the money and
consider proprietary software acceptable.
As an example of a difference, OpenBabel follows the Daylight model
and assumes that the right behavior is to convert everything into the
same chemistry model. OEChem doesn't make that assumption and instead
has functions to convert between different chemistry models; you have
to call those functions. This causes small differences in how the
libraries treat conditions like aromaticity. I have more experience
with the OEChem way and can explain it better.
If you are interested in me teaching specifically OpenBabel or RDKit
then contact me.
developing web applications
Django is a popular web
development framework for Python which makes developing web
applications much simpler than traditional CGI programming.
These lectures will be based around developing a web application for
substructure searches and will cover how to work with the database,
generate templates, structure the URLs, using CSS, and Javascript
interactivity with JQuery.
Software development best practices
Software development is more than sitting down and programming. How
do you keep track of changes to the code over time? If you change
code, how do you figure out if the change broke something? How to
multiple people work together on the same code base? What are some of
the common development traps that people can get stuck in? Why do you
need to care about security?
Many of these are covered in Greg Wilson's Software Carpentry. I'll
specifically talk about version control, project builds, testing with
nose
and development practices like code reviews, YAGNI
and agile development.
R and Python
R is a great software
environment for statistical computing and generating plots. If you
are building models or doing data mining then you should know about
this project. R includes its own programming language and a number of
high-quality analysis packages. It's not a general purpose language
like Python and it doesn't include the diversity of modules that
Python has.
RPy is an interface module
which lets Python call R functions directly, including plotting. I've
used it in a model calculation system using OEChem and other tools to
compute descriptor values then passing the code over R to evalute the
model.
R and Python have different ways of doing things. RPy minimize the
difference, it's still there. I'll describe some of the basic R data
type, how to create them from Python, how to call the R functions
through RPy, and how to understand the R documentation enough to be
able to call it from Python.