# Committor Simulation and Analysis for Alanine Dipeptide

This sequence of notebooks provides an example of how to calculate the committor for a real biomolecular system (alanine dipeptide).

The first notebook deals with selecting the snapshots to use as starting
points for the committor. In this particular case, we didn’t care about the
statistics of those points, so we select them directly from `storage`

.
This is because these points were being harvested as part of an approach to
estimate the committor at any given point. If you need the statistics of
those points within the TPS ensemble, e.g., for the transition state
ensemble, then you should select your points differently (by looping over
`storage.steps`

and taking the `active`

sample set).

The second notebook runs the committor simulation. The third notebook
prepares the analysis data in a way that was needed for this particular
project, where each “shot” of the committor was treated as a separate
“experiment.” While this approach is not normally needed, if you already
have data that can easily be put into this format, the fourth notebook shows
how it can be used to construct a `ShootingPointAnalysis`

object.
That notebook also proceeds to demonstrate the kinds of analysis that object
can perform.

# Select Snapshots¶

This notebook will take an old (and large) TPS simulation file, select some snapshots to use as input data.

Note: this first version is quick and dirty. There might be some points to consider to select better snapshots. But this is just intended to get initial data to our colleagues.

```
import openpathsampling as paths
import random
```

```
%%time
storage = paths.Storage("alanine_dipeptide_tps.nc", "r")
```

```
print storage.file_size_str
```

```
n_snapshots = len(storage.snapshots)
print n_snapshots
```

```
stateA = storage.volumes['C_7eq']
stateB = storage.volumes['alpha_R']
```

Now we do the main calculation: every snapshot must not be in a state, and we never re-use a snapshot. (In other words, randomly chosen without replacement.)

In addition, OPS snapshots are always listed in pairs, with velocities reversed. (The data is only stored once, but both can be accessed directly from the snapshot storage.) Because of this, we'll make sure we only take the even-numbered snapshots.

```
%%time
snapshots = []
while len(snapshots) < 1000:
random_choice = random.randint(0, (n_snapshots/2)-1)
snap = storage.snapshots[random_choice*2]
if not stateA(snap) and not stateB(snap) and snap not in snapshots:
snapshots.append(snap)
```

```
new_store = paths.Storage("snapshots.nc", "w")
```

```
new_store.save(snapshots);
```

```
# save the old engine because we'll re-use its topology later
new_store.save(storage.engines[0]);
```

```
new_store.sync()
new_store.close()
```

(1_select_snapshots.ipynb; 1_select_snapshots.py)

# Committor Simulation¶

```
import openpathsampling as paths
import mdtraj as md
import numpy as np
from simtk import unit
```

```
input_storage = paths.Storage("snapshots.nc", "r")
```

```
print input_storage.file_size_str
print len(input_storage.snapshots)
```

```
# 2000 instead of 1000 because reversed snapshots are automatically counted
snapshots_to_run = input_storage.snapshots[::2]
```

```
# conveniently, we saved the engine; have to re-do everything else
engine = input_storage.engines[0]
# set up the collective variables for our states
phi = paths.MDTrajFunctionCV(name="phi",
f=md.compute_dihedrals,
topology=engine.topology,
indices=[[4, 6, 8, 14]])
psi = paths.MDTrajFunctionCV(name="psi",
f=md.compute_dihedrals,
topology=engine.topology,
indices=[[6, 8, 14, 16]])
```

```
# define our states
deg = 180.0/np.pi
C_7eq = (paths.PeriodicCVDefinedVolume(phi, lambda_min=-180/deg, lambda_max=0/deg,
period_min=-np.pi, period_max=np.pi) &
paths.PeriodicCVDefinedVolume(psi, lambda_min=100/deg, lambda_max=200/deg,
period_min=-np.pi, period_max=np.pi)
).named("C_7eq")
# similarly, without bothering with the labels:
alpha_R = (paths.PeriodicCVDefinedVolume(phi, -180/deg, 0/deg, -np.pi, np.pi) &
paths.PeriodicCVDefinedVolume(psi, -100/deg, 0/deg, -np.pi, np.pi)).named("alpha_R")
```

```
# OpenMM requires everything to have units
# beta = 1.0 / (k_B T)
temperature = 300.0 * unit.kelvin
beta = 1.0 / (temperature * unit.BOLTZMANN_CONSTANT_kB)
```

```
randomizer = paths.RandomVelocities(beta=beta, engine=engine)
```

```
output_storage = paths.Storage("committor_simulation.nc", "w")
```

```
simulation = paths.CommittorSimulation(storage=output_storage,
engine=engine,
states=[C_7eq, alpha_R],
randomizer=randomizer,
initial_snapshots=snapshots_to_run)
```

```
simulation.run(n_per_snapshot=10)
```

```
```

(2_committor_simulation.ipynb; 2_committor_simulation.py)

# Committor Analysis¶

(Note: some of this requires OpenPathSampling 0.9.1 or later)

The particular approach used in this notebook is to save each shot as a separate "experiment," where each of the experiements is a tuple matching initial snapshot to final state. That was a peculiarity of the project from which this example originated. Typically, you could just go directly to the analysis of the shooting points. However, if you already have data from elsewhere that can be put into this format, the next notebook shows you how you can build a committor analysis from such a list.

```
import openpathsampling as paths
```

```
%%time
simulation_storage = paths.AnalysisStorage("committor_simulation.nc")
C_7eq = simulation_storage.volumes['C_7eq']
alpha_R = simulation_storage.volumes['alpha_R']
```

```
analyzer = paths.ShootingPointAnalysis(steps=None, states=[C_7eq, alpha_R])
```

```
%%time
# the shooting point snapshot for each
shooting_pts = [analyzer.step_key(step) for step in simulation_storage.steps]
```

```
%%time
# get the final states from each partial trajectory
final_states_list = [analyzer.analyze_single_step(step) for step in simulation_storage.steps]
```

```
# check that there's only one state per item in that list
for f in final_states_list:
assert len(f) == 1
# flatten the list
final_states = [f[0] for f in final_states_list]
```

```
experiments = zip(shooting_pts, final_states)
```

```
output_storage = paths.Storage("committor_results.nc", "w")
```

```
output_storage.save(C_7eq)
output_storage.save(alpha_R)
output_storage.tag['experiments'] = experiments
```

```
output_storage.sync()
output_storage.close()
```

(3_committor_analysis.ipynb; 3_committor_analysis.py)

# Analysis help¶

This covers stuff that you will need to know in order to use the `committor_results.nc`

file.

```
%matplotlib inline
import matplotlib.pyplot as plt
import openpathsampling as paths
import numpy as np
import pandas as pd
pd.options.display.max_rows = 10
```

```
storage = paths.Storage("committor_results.nc", "r")
```

```
phi = storage.cvs['phi']
psi = storage.cvs['psi']
```

```
%%time
C_7eq = storage.volumes['C_7eq']
alpha_R = storage.volumes['alpha_R']
experiments = storage.tag['experiments']
```

The `experiments`

object is a list of tuples `(snapshot, final_state)`

. Each `snapshot`

is an OPS snapshot object (a point in phase space), and the `final_state`

is either the `C_7eq`

object or the `alpha_R`

object.

## Directly obtaining a committor analysis¶

As it happens, `experiments`

is in precisely the correct format to be used in one of the approaches to constructing a committor analysis.

This section requires OpenPathSampling 0.9.1 or later.

```
%%time
committor_analyzer = paths.ShootingPointAnalysis.from_individual_runs(experiments)
```

Before going further, let's talk a little bit about the implementation of the `ShootingPointAnalysis`

object. The main thing to understand is that the purpose of that object is to histogram according to configuration. The first snapshot encountered is kept as a representative of that configuration.

So whereas there are 10000 snapshots in `experiments`

(containing the full data, including velocities), there are only 1000 entries in the `committor_analyzer`

(because, in this data set, I ran 1000 snapshots with 10 shots each.)

### Per-configuration results¶

The `.to_pandas()`

function creates a pandas table with configurations as the index, the final states as columns, and the number of times that configuration led to that final state as entries. With no argument, `to_pandas()`

using the an integer for each configuration.

```
committor_analyzer.to_pandas()
```

You can also pass it a function that takes a snapshot and returns a (hashable) value. That value will be used for the index. These collective variables return numpy arrays, so we need to cast the 1D array to a `float`

.

```
psi_hash = lambda x : float(psi(x))
committor_analyzer.to_pandas(label_function=psi_hash)
```

You can also directly obtain the committor as a dictionary of (representative) snapshot to committor value. The committor here is defines as the probability of ending in a given state, so you must give the state.

```
committor = committor_analyzer.committor(alpha_R)
```

```
# show the first 10 values
{k: committor[k] for k in committor.keys()[:10]}
```

### Committor histogram in 1D¶

```
hist1D, bins = committor_analyzer.committor_histogram(psi_hash, alpha_R, bins=20)
```

```
bin_widths = [bins[i+1]-bins[i] for i in range(len(bins)-1)]
plt.bar(left=bins[:-1], height=hist1D, width=bin_widths, log=True);
```

### Committor histogram in 2D¶

```
ramachandran_hash = lambda x : (float(phi(x)), float(psi(x)))
hist2D, bins_phi, bins_psi = committor_analyzer.committor_histogram(ramachandran_hash, alpha_R, bins=20)
```

```
# not the best, since it doesn't distinguish NaNs, but that's just a matter of plotting
plt.pcolor(bins_phi, bins_psi, hist2D.T, cmap="winter")
plt.clim(0.0, 1.0)
plt.colorbar();
```

## Obtaining information from the snapshots¶

The information `committor_results.nc`

should be *everything* you could want, including initial velocities for every system. In principle, you'll mainly access that information using collective variables (see documentation on using MDTraj to create OPS collective variables). However, you may decide to access that information directly, so here's how you do that.

```
# let's take the first shooting point snapshot
# experiments[N][0] gives shooting snapshot for experiment N
snapshot = experiments[0][0]
```

OpenMM-based objects come with units. So `snapshot.coordinates`

is a unitted value. This can be annoying in analysis, so we have a convenience `snapshot.xyz`

to get the version without units.

```
snapshot.coordinates
```

```
snapshot.xyz
```

For velocities, we don't have the convenience function, but if you want to remove units from velocities you can do so with `velocity / velocity.unit`

.

```
snapshot.velocities
```

```
snapshot.velocities / snapshot.velocities.unit
```

Note that snapshots include coordinates and velocities. We have several sets of initial velocities for each initial snapshot. Taking the second shooting snapshot and comparing coordinates and velocities:

```
snapshot2 = experiments[1][0]
```

```
np.all(snapshot.coordinates == snapshot2.coordinates)
```

```
np.any(snapshot.velocities == snapshot2.velocities)
```