Creating collective variables
Why OPS wraps CVs
One of the powerful tools in OPS is its storage subsystem, which not only
stores the data generated by the path sampling simulation, but also the
objects necessary to run the simulation. In addition, OPS CVs can store
their results to disk (we call this “disk caching”), which facilitates
later analysis. In order to associate the function with its cached results,
the function must be wrapped in an OPS CollectiveVariable
object.
Subclasses of that class are designed to make practical use easier and more
efficient, and will be described below.
Integrations with other packages
There’s no need to reinvent the wheel: many other packages already implement the kinds of functions that you’re likely to use as CVs. And the process of communicating with one of those packages to get the CV out (e.g., converting to their trajectory format) tends to be standard within that package. As a result, we’ve created special wrappers that make it very easy to communicate with those packages. There are special CV wrappers for:
If you’re calculating something common (distances, angles, dihedrals, or any of the many other widely-used collective variables), then the best approach is to use one of those packages.
As an example, here’s how one calculates distances using MDTraj (assuming
MDTraj has been imported as md
):
distance_cv = paths.MDTrajFunctionCV("dist", md.compute_distances,
topology, atom_pairs=[[0, 1]])
A CV using the MDTraj integration requires a name, the MDTraj function to be
used, and an OPS topology object. Any additional parameters required by the
MDTraj function must be given with keyword arguments (or “kwargs”); i.e.,
the atom_pairs=
part above must be present. See the documentation of
the packages themselves for relevant keyword arguments.
Wrapping your own function in a CV
Sometimes you’ll need to write your own function. This section, and the ones that follow it, will explain how to do that. As an example, let’s consider the simple 2D toy models used in several examples. These can be thought of as a ball moving on a 2D surface.
As a first example, let’s create a CV that calculates the distance of that ball from the origin:
def distance(snapshot):
import math
x = snapshot.xyz[0][0]
y = snapshot.xyz[0][1]
return math.sqrt(x**2 + y**2)
distance_cv = paths.FunctionCV("dist", distance)
Note that the import
statement is including inside the function body.
This is normally discouraged in Python, but here we use this as a way to
ensure that the math
package is also imported when the CV is reloaded
from storage. The next two lines get the \(x\) and \(y\) positions
of the particle. The function then returns the distance. On the last line,
we wrap this function in an OPS FunctionCV
.
In principle, a CV can return anything. However, CVs that are intended to be
used as inputs to Volume
objects (states or interfaces) should
return a single numeric value.
For a case like this, it may be better to wrap in the function using a
CoordinateFunctionCV
, since the it only depends on the
coordinates. This can offer some small perfomance/storage improvements when
using methods that involve path reversal (changing the direction of time).
To test that your CV gives the result you expect, create a snapshot called
snap
and try:
distance_cv(snap)
Generalizing your CV function
In the toy examples, we use distances from a central point to define CVs related to each state. For different states, the central point is different. We could create a new function, as above, for each state. But those functions would be quite repetitive, so OPS enables an easier way to do this:
def distance(snapshot, center):
import math
x = snapshot.xyz[0][0]
y = snapshot.xyz[0][1]
return math.sqrt((x - center[0])**2 + (y - center[1])**2)
distance_cv = paths.CoordinateFunctionCV("dist", distance,
center=[-0.5, -0.5])
Now we can specify a second CV based on the same function, just by changing
the center
parameter.
Your input function can take arbitrary keyword arguments (or “kwargs”), and
you just need to pass those arguments to the CV wrapper. Note that, as with
the specialized wrappers above, you must create your CVs with explicit
keyword arguments. That is, the center=
part in the last line is
required.
Using CVs based on other CVs
What if I had a similar problem, but instead of the 2D plane being the actual coordinates, it was a plane defined by two other collective variables (such as two distances)? In OPS, we can create CVs that use other CVs as input. In the example below, I’ll first create CVs to get the \(x\) and \(y\) positions, then I’ll use those as inputs to a distance function:
def get_coord(snapshot, element):
return snapshot.xyz[0][element]
cv_x = paths.CoordinateFunctionCV("cv_x", get_coord, element=0)
cv_y = paths.CoordinateFunctionCV("cv_y", get_coord, element=1)
def distance(snapshot, cv_x, cv_y, center):
import math
x = cv_x(snapshot)
y = cv_y(snapshot)
return math.sqrt((x - center[0])**2 + (y - center[1])**2)
distance_cv = paths.CoordinateFunctionCV("dist", distance,
cv_x=cv_x, cv_y=cv_y,
center=[-0.5, -0.5])
In addition to being convenient, this can approach can also be useful if your CV calculations are expensive. If a part of your CV is recalculated, you can make it into its own CV. Once it has been calculated once, it will be cached (and disk-cached, if requested), meaning that it can be re-used without regenerating it.
Another way this can improve performance is that sometimes, it’s easier to calculate multiple CVs at once, rather than calculating them individually. For example, the overhead for converting to another trajectory format (such as MDTraj) may become large. Here’s a way to avoid that:
all_dists = paths.MDTrajFunctionCV("all_dists", md.compute_distances,
topology, atom_pairs=[[0, 1], [0, 2]])
def select_distance(snapshot, all_dists_cv, pair_number):
return all_dists_cv(snapshot)[pair_number]
dist_0_1 = paths.CoordinateFunctionCV("dist_0_1", select_distance,
all_dists_cv=all_dists,
pair_number=0)
dist_0_2 = paths.CoordinateFunctionCV("dist_0_2", select_distance,
all_dists_cv=all_dists,
pair_number=1)
Let’s say that your code later uses dist_0_1
and then dist_0_2
on a
snapshot called snap
. When dist_0_1(snap)
is called, the
all_dists
CV is used, meaning that we use MDTraj to calculate the
distance. But all the distances are calculated, and the result is cached
(always cached in memory, cached to disk if that has been enabled). When
dist_0_2(snap)
is called, it finds that all_dists
was already called
on snap
, so it pulls the result from the cache instead of actually using
MDTraj again.
Security: Loading from storage without the function
The ability to store an arbitrary function is extremely powerful. However, it can also be extremely dangerous. Since the function could, in principle, do anything, someone could create a malicious CV that does damage to your system. To prevent that, you can set a “safe mode” that prevents the storage from loading the Python functions associated with CVs. In safe mode, you can still read the disk-cached values of the CV, but you can’t calculate the value of the CV for a snapshot that doesn’t have a disk-cached result. If you need to calculate the value for new snapshots, you can associate the CV with a function after reloading it in safe mode. It is your responsibility to ensure that the function is the same as the original one! Note that this approach can also be used if Python changing its internal object code (as happened between Python 2 and Python 3).
If you are loading an OPS file from an unknown/untrusted source, we always recommend using safe mode.