Committor Simulation and Analysis for Alanine Dipeptide
This sequence of notebooks provides an example of how to calculate the committor for a real biomolecular system (alanine dipeptide).
The first notebook deals with selecting the snapshots to use as starting
points for the committor. In this particular case, we didn’t care about the
statistics of those points, so we select them directly from storage
.
This is because these points were being harvested as part of an approach to
estimate the committor at any given point. If you need the statistics of
those points within the TPS ensemble, e.g., for the transition state
ensemble, then you should select your points differently (by looping over
storage.steps
and taking the active
sample set).
The second notebook runs the committor simulation. The third notebook
prepares the analysis data in a way that was needed for this particular
project, where each “shot” of the committor was treated as a separate
“experiment.” While this approach is not normally needed, if you already
have data that can easily be put into this format, the fourth notebook shows
how it can be used to construct a ShootingPointAnalysis
object.
That notebook also proceeds to demonstrate the kinds of analysis that object
can perform.
Select Snapshots¶
This notebook will take an old (and large) TPS simulation file, select some snapshots to use as input data.
Note: this first version is quick and dirty. There might be some points to consider to select better snapshots. But this is just intended to get initial data to our colleagues.
import openpathsampling as paths
import random
%%time
storage = paths.Storage("alanine_dipeptide_tps.nc", "r")
print storage.file_size_str
n_snapshots = len(storage.snapshots)
print n_snapshots
stateA = storage.volumes['C_7eq']
stateB = storage.volumes['alpha_R']
Now we do the main calculation: every snapshot must not be in a state, and we never re-use a snapshot. (In other words, randomly chosen without replacement.)
In addition, OPS snapshots are always listed in pairs, with velocities reversed. (The data is only stored once, but both can be accessed directly from the snapshot storage.) Because of this, we'll make sure we only take the even-numbered snapshots.
%%time
snapshots = []
while len(snapshots) < 1000:
random_choice = random.randint(0, (n_snapshots/2)-1)
snap = storage.snapshots[random_choice*2]
if not stateA(snap) and not stateB(snap) and snap not in snapshots:
snapshots.append(snap)
new_store = paths.Storage("snapshots.nc", "w")
new_store.save(snapshots);
# save the old engine because we'll re-use its topology later
new_store.save(storage.engines[0]);
new_store.sync()
new_store.close()
(1_select_snapshots.ipynb; 1_select_snapshots.py)
Committor Simulation¶
import openpathsampling as paths
import mdtraj as md
import numpy as np
from simtk import unit
input_storage = paths.Storage("snapshots.nc", "r")
print input_storage.file_size_str
print len(input_storage.snapshots)
# 2000 instead of 1000 because reversed snapshots are automatically counted
snapshots_to_run = input_storage.snapshots[::2]
# conveniently, we saved the engine; have to re-do everything else
engine = input_storage.engines[0]
# set up the collective variables for our states
phi = paths.MDTrajFunctionCV(name="phi",
f=md.compute_dihedrals,
topology=engine.topology,
indices=[[4, 6, 8, 14]])
psi = paths.MDTrajFunctionCV(name="psi",
f=md.compute_dihedrals,
topology=engine.topology,
indices=[[6, 8, 14, 16]])
# define our states
deg = 180.0/np.pi
C_7eq = (paths.PeriodicCVDefinedVolume(phi, lambda_min=-180/deg, lambda_max=0/deg,
period_min=-np.pi, period_max=np.pi) &
paths.PeriodicCVDefinedVolume(psi, lambda_min=100/deg, lambda_max=200/deg,
period_min=-np.pi, period_max=np.pi)
).named("C_7eq")
# similarly, without bothering with the labels:
alpha_R = (paths.PeriodicCVDefinedVolume(phi, -180/deg, 0/deg, -np.pi, np.pi) &
paths.PeriodicCVDefinedVolume(psi, -100/deg, 0/deg, -np.pi, np.pi)).named("alpha_R")
# OpenMM requires everything to have units
# beta = 1.0 / (k_B T)
temperature = 300.0 * unit.kelvin
beta = 1.0 / (temperature * unit.BOLTZMANN_CONSTANT_kB)
randomizer = paths.RandomVelocities(beta=beta, engine=engine)
output_storage = paths.Storage("committor_simulation.nc", "w")
simulation = paths.CommittorSimulation(storage=output_storage,
engine=engine,
states=[C_7eq, alpha_R],
randomizer=randomizer,
initial_snapshots=snapshots_to_run)
simulation.run(n_per_snapshot=10)
(2_committor_simulation.ipynb; 2_committor_simulation.py)
Committor Analysis¶
(Note: some of this requires OpenPathSampling 0.9.1 or later)
The particular approach used in this notebook is to save each shot as a separate "experiment," where each of the experiements is a tuple matching initial snapshot to final state. That was a peculiarity of the project from which this example originated. Typically, you could just go directly to the analysis of the shooting points. However, if you already have data from elsewhere that can be put into this format, the next notebook shows you how you can build a committor analysis from such a list.
import openpathsampling as paths
%%time
simulation_storage = paths.AnalysisStorage("committor_simulation.nc")
C_7eq = simulation_storage.volumes['C_7eq']
alpha_R = simulation_storage.volumes['alpha_R']
analyzer = paths.ShootingPointAnalysis(steps=None, states=[C_7eq, alpha_R])
%%time
# the shooting point snapshot for each
shooting_pts = [analyzer.step_key(step) for step in simulation_storage.steps]
%%time
# get the final states from each partial trajectory
final_states_list = [analyzer.analyze_single_step(step) for step in simulation_storage.steps]
# check that there's only one state per item in that list
for f in final_states_list:
assert len(f) == 1
# flatten the list
final_states = [f[0] for f in final_states_list]
experiments = zip(shooting_pts, final_states)
output_storage = paths.Storage("committor_results.nc", "w")
output_storage.save(C_7eq)
output_storage.save(alpha_R)
output_storage.tag['experiments'] = experiments
output_storage.sync()
output_storage.close()
(3_committor_analysis.ipynb; 3_committor_analysis.py)
Analysis help¶
This covers stuff that you will need to know in order to use the committor_results.nc
file.
%matplotlib inline
import matplotlib.pyplot as plt
import openpathsampling as paths
import numpy as np
import pandas as pd
pd.options.display.max_rows = 10
storage = paths.Storage("committor_results.nc", "r")
phi = storage.cvs['phi']
psi = storage.cvs['psi']
%%time
C_7eq = storage.volumes['C_7eq']
alpha_R = storage.volumes['alpha_R']
experiments = storage.tag['experiments']
The experiments
object is a list of tuples (snapshot, final_state)
. Each snapshot
is an OPS snapshot object (a point in phase space), and the final_state
is either the C_7eq
object or the alpha_R
object.
Directly obtaining a committor analysis¶
As it happens, experiments
is in precisely the correct format to be used in one of the approaches to constructing a committor analysis.
This section requires OpenPathSampling 0.9.1 or later.
%%time
committor_analyzer = paths.ShootingPointAnalysis.from_individual_runs(experiments)
Before going further, let's talk a little bit about the implementation of the ShootingPointAnalysis
object. The main thing to understand is that the purpose of that object is to histogram according to configuration. The first snapshot encountered is kept as a representative of that configuration.
So whereas there are 10000 snapshots in experiments
(containing the full data, including velocities), there are only 1000 entries in the committor_analyzer
(because, in this data set, I ran 1000 snapshots with 10 shots each.)
Per-configuration results¶
The .to_pandas()
function creates a pandas table with configurations as the index, the final states as columns, and the number of times that configuration led to that final state as entries. With no argument, to_pandas()
using the an integer for each configuration.
committor_analyzer.to_pandas()
You can also pass it a function that takes a snapshot and returns a (hashable) value. That value will be used for the index. These collective variables return numpy arrays, so we need to cast the 1D array to a float
.
psi_hash = lambda x : float(psi(x))
committor_analyzer.to_pandas(label_function=psi_hash)
You can also directly obtain the committor as a dictionary of (representative) snapshot to committor value. The committor here is defines as the probability of ending in a given state, so you must give the state.
committor = committor_analyzer.committor(alpha_R)
# show the first 10 values
{k: committor[k] for k in committor.keys()[:10]}
Committor histogram in 1D¶
hist1D, bins = committor_analyzer.committor_histogram(psi_hash, alpha_R, bins=20)
bin_widths = [bins[i+1]-bins[i] for i in range(len(bins)-1)]
plt.bar(left=bins[:-1], height=hist1D, width=bin_widths, log=True);
Committor histogram in 2D¶
ramachandran_hash = lambda x : (float(phi(x)), float(psi(x)))
hist2D, bins_phi, bins_psi = committor_analyzer.committor_histogram(ramachandran_hash, alpha_R, bins=20)
# not the best, since it doesn't distinguish NaNs, but that's just a matter of plotting
plt.pcolor(bins_phi, bins_psi, hist2D.T, cmap="winter")
plt.clim(0.0, 1.0)
plt.colorbar();
Obtaining information from the snapshots¶
The information committor_results.nc
should be everything you could want, including initial velocities for every system. In principle, you'll mainly access that information using collective variables (see documentation on using MDTraj to create OPS collective variables). However, you may decide to access that information directly, so here's how you do that.
# let's take the first shooting point snapshot
# experiments[N][0] gives shooting snapshot for experiment N
snapshot = experiments[0][0]
OpenMM-based objects come with units. So snapshot.coordinates
is a unitted value. This can be annoying in analysis, so we have a convenience snapshot.xyz
to get the version without units.
snapshot.coordinates
snapshot.xyz
For velocities, we don't have the convenience function, but if you want to remove units from velocities you can do so with velocity / velocity.unit
.
snapshot.velocities
snapshot.velocities / snapshot.velocities.unit
Note that snapshots include coordinates and velocities. We have several sets of initial velocities for each initial snapshot. Taking the second shooting snapshot and comparing coordinates and velocities:
snapshot2 = experiments[1][0]
np.all(snapshot.coordinates == snapshot2.coordinates)
np.any(snapshot.velocities == snapshot2.velocities)