Science. Shared. Data.: fmo

Showing posts with label fmo. Show all posts

Saturday, August 3, 2013

What is it with this linear scaling stuff anyway?

Enormous amounts of research time has gone into researching computational methods that are linear scaling with respect to the system size. That is, double the size of your system and you only double the computation time. If just all methods were as such, the queue on your local super computer cluster would be easier to guess when computers were available instead of seeing a wall of 200+ hours of jobs just sitting there because people don't give a crap.

Inspired by +Jan Jensen and a recent blog post of his (which I was reminded of when I wrote another blog post on the subject of many-body expansions), I set out to actually do the calculations on timings myself albeit with a different goal in mind.

2-body calculations
Even if you use the many-body expansion of the energy, I showed that the accumulated number of calculations one would need increases dramatically for large N-body. If we only focus on doing one- and two-body calculations, the effect is barely visible in the previous plot, but calculating the computational time from Jan's linear model (only do nearest neighbors) together with one where we do all pairs, we see that even at the two-body level, there is no linear scaling unless you do some approximations.

Here, I have assumed a computational scaling of $\alpha=2.8$ and uniform monomer sizes. I've assumed that a monomer calculation takes 1s and there is no overhead nor interaction at the monomer level.

Admittedly, the linear model is crude, but it shows the best scaling you could hope for by including the minimum amount of two-body calculations. In a more realistic case, you would end up somewhere between the red and the black line, but that is the subject for a future post.

This is why we need linear scaling!

3-body calculations
Just for the fun of it, here is the 3-body scaling

and I dare not think of what the time would be for the calculation without approximations for higher n-body calculations.

I think that we can all agree on that approximations must be made or else we are doomed.

We need linear scaling!

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Wednesday, April 24, 2013

Visualizing a lot of FMO PIEDA numbers. Part II

In the last post on the subject of FMO PIEDA numbers, there were some issues with the fragmentation that did not match between two different snapshots in the same reaction trajectory.

I fixed that fragmentation problem, ran the numbers again and it looks like some neat stuff can be extracted.

Remember that the structures are generated using PM6 with MOZYME enabled and that energy-refinement of the barrier is done at the FMO2/6-31G(d)/PCM level of theory. The CON structures have some part of the structure constrained (far away from the active site) and the UCO structures are allowed to fully relax. The problem still looks like this

To tease you with some data, here is a table from a short write-up I did. Column two and three represents the difference in energy (by the method in the first column) between the TS and the reactant. The last column is the energy difference between column two and three. The first row is thus the difference in sum of one-body energies, second row is difference in the sum of fragment interaction energies (FIEs) and the last row is the difference in total FMO2 energy.

	dE(1,5)_con	dE(1,5)_uco	ddE(uco-con)
FMO1-MP2/6-31G(d)/PCM	25.4	29.4	4.0
$\sum_{IJ} \Delta E^{MP2}_{IJ}$	-3.7	18.6	22.4
FMO2-MP2/6-31G(d)/PCM	25.9	55.0	29.1

The one-body case is a solved case because inspection reveals that the internal energies of the substrate contributes +3.5 kcal/mol, ASP119 contributes +5.3 kcal/mol and ASN35 contributes -4.3 kcal/mol which when summed up is roughly the 4 kcal/mol we need. The rest is just internal re-arrangement that eventually cancels out.

The problem with the barrier height, however, is that at the two-body level (second row), the CON structures provide almost 4 kcal/mol worth of stabilization whereas the UCO structures are destabilized by 18.6 kcal/mol. It would be natural to investigate what happens if we look at the FIEs between the substrate and the entire enzyme. This looks like the following

what is curious is that we see distinct peaks - the largest (positive) peak can be attributed to an interaction with ARG112, but if we sum all IEFs up they amount to +0.5 kcal/mol which is very far from the +18.6 we are trying to account for. The conclusion is (currently) that because the protein is allowed to fully relax, many small contributions amount to the 18.6.

credits to this sites table-generator because I lost all my HTML skills whilst fighting FORTRAN

Monday, April 8, 2013

Visualizing a lot of FMO PIEDA numbers. A Preliminary look.

I've assisted in calculating some energy refinements using FMO2-MP2/PCM/6-31G(d) on top of a colleagues proposed trajectory calculated using PM6//MOZYME for part of the reaction step of Bacillus circulans xylanase. Two versions of this path was produced. One with constraints on some part of the structure (CON) and one without constraints (UCO). They are shown here

This FMO2 barrier is quite unrealistic, especially the unconstrained one so what is cause this large reaction barrier?

To investigate this, we are trying to run some PIEDA calculations to figure out which pairs are interacting strongly (and perhaps differently). PIEDA gives you an energy decomposition analysis of the individual pairs in an FMO calculation. So we get electrostatic contributions, exchange-repulsion, charge-transfer (and what is left), dispersion and solvation energy too.

The BCX-system we are looking at currently has 302 fragments (a total of 3172 atoms) which is actually the whole protein and its substrate. That means you get 45451 unique pairs and each pair is decomposed into five different terms giving you a wonderful 227255 numbers you have to visualize somehow. I haven't really figured out how to do this in a nice way, so instead I will plot the the total interaction energies between each unique pair of fragment for snapshot 2 and 3 for the blue curve (UCO) in the above graph.

I discovered the problem ... the fragmentation was not exactly the same along the entire UCO path which of course will make everything break down. Back to the drawing board then. What gave it away? Look here at the difference in pair-energies between frame 0 and frame 5 (click to see it in its full size - hell even that does not justify it)

So there it is. An unrealistic result can be your own fault no matter how hard you actually try to convince yourself that you used the same scripts for both runs.