Science. Shared. Data.: molecules

Monday, April 1, 2013

Extending FragIt for Molecular Fragmentation with Conjugate Caps

As the co-author of FragIt(web, code), a piece of software written in Python using the Open Babel API, to help setup and fragment large molecules in fragment based methods, I have to return to this (excellent) piece of software to make it work with Molecular Fragmentation with Conjugate Caps (MFCC) methods which is the approach that my new work-place has taken as their method of choice in fragment based methods. Currently, FragIt only (officially) supports the Fragment Molecular Orbital (FMO) method.

Step I of this transformation is to realize that FMO and MFCC are two very different beasts. While the theory of FMO is more hairy, the information that FMO needs to run is vastly less than MFCC. In FMO, you specify pairs of atoms between which you wish to fragment and then everything goes along nicely whereas in MFCC you build fragments, attach caps and build extra fragments from those caps (called conjugate caps). The latter is very tough to do generally which is the approach and idea of FragIt in the first place: Mostly because getting the correct SMARTS you need is very cumbersome. As with the previous approach of FragIt, I'll make it work for proteins by selecting the appropriate SMARTS and program it in such a way that extensions to other systems would be straight-forward by simply figuring out additional patterns.

Right now my approach is to build “capping”-SMARTS that select atoms around places of fragmentation and build those caps in Open Babel. Non-trivial task is non-trivial I must admit.

Stay tuned as I explore the SMARTS and code needed to accomplish this.

Saturday, April 28, 2012

Construction of a New Basis Set. Part I. Planning.

As the title suggest, I'd like to share my experience (over several parts) with the construction of a new basis set for use in the calculation of NMR spin-spin coupling constants (SSCC). There are several key aspects that one must take into consideration and to loosely just name a few of them, they are:

Which molecule(s) am I interested in developing a new basis set for?
What basis set will I use to create my new basis set? Will I generate it from scratch?
What geometry will I use?
What is my approach for making the new basis set? (Interestingly, this point has many sub-points)

Uncontract, add specific functions and then recontract?
Uncontract, remove specific functions and then recontract?

Publish my results.

This post is about points one and two and coincidentally will not contain any data. This will change in the next post.

This whole project fell out the sky and hit me in the head because I attended a course with associate professor Stephan Sauer on molecular electromagnetism and instead of solving the exercises, I opted to do a mini-project. There is nothing wrong with broadening your academic abilities I thought and here I am. The project was: "Make new spin-spin coupling constant basis sets for Gallium, Germanium, Arsenic, Selenium and Bromine atoms". While Stephan had participated in publishing one paper on H₂Se and the corresponding basis set(Warning - paywall), I thought why not try and use that as inspiration for making my own and actually see if I could improve on what he did in the first place. Given the project, item number one was pretty clear - use the most basic molecules you can think of (or draw) and use that as your starting point. I chose: GaH, GeH₄, AsH₃, H₂Se and HBr as staring points.

Since Stephan has also contributed in the form of building basis sets by using Dunnings correlation consistent aug-cc-pVTZ basis set, that was also a natural starting point for me. I guess that you need somewhat of an iron will if you want to start from scratch, but I suppose in some cases - why not?

In any case, the plan was:

See what (and equally important how it) was done for H₂Se. I also had another paper for the rows of elements just above my row(Warning - paywall). It also turned out to help a great deal too.
Try and be systematic about it. (I guess everyone is a bit sporadic with their placement of data once it starts rolling, am I right?)
Getting started since these calculations do not scale very well with the number of basis functions we use, and lets face it - even for hydrogen, the aug-cc-pVTZ-J basis set is HUGE.
Share the data with the world when I get it.

It'll be a grand experience.