Creating collective variables

Why OPS wraps CVs

One of the powerful tools in OPS is its storage subsystem, which not only stores the data generated by the path sampling simulation, but also the objects necessary to run the simulation. In addition, OPS CVs can store their results to disk (we call this “disk caching”), which facilitates later analysis. In order to associate the function with its cached results, the function must be wrapped in an OPS CollectiveVariable object. Subclasses of that class are designed to make practical use easier and more efficient, and will be described below.

Integrations with other packages

There’s no need to reinvent the wheel: many other packages already implement the kinds of functions that you’re likely to use as CVs. And the process of communicating with one of those packages to get the CV out (e.g., converting to their trajectory format) tends to be standard within that package. As a result, we’ve created special wrappers that make it very easy to communicate with those packages. There are special CV wrappers for:

If you’re calculating something common (distances, angles, dihedrals, or any of the many other widely-used collective variables), then the best approach is to use one of those packages.

As an example, here’s how one calculates distances using MDTraj (assuming MDTraj has been imported as md):

distance_cv = paths.MDTrajFunctionCV("dist", md.compute_distances,
                                     topology, atom_pairs=[[0, 1]])

A CV using the MDTraj integration requires a name, the MDTraj function to be used, and an OPS topology object. Any additional parameters required by the MDTraj function must be given with keyword arguments (or “kwargs”); i.e., the atom_pairs= part above must be present. See the documentation of the packages themselves for relevant keyword arguments.

Wrapping your own function in a CV

Sometimes you’ll need to write your own function. This section, and the ones that follow it, will explain how to do that. As an example, let’s consider the simple 2D toy models used in several examples. These can be thought of as a ball moving on a 2D surface.

As a first example, let’s create a CV that calculates the distance of that ball from the origin:

def distance(snapshot):
    import math
    x = snapshot.xyz[0][0]
    y = snapshot.xyz[0][1]
    return math.sqrt(x**2 + y**2)

distance_cv = paths.FunctionCV("dist", distance)

Note that the import statement is including inside the function body. This is normally discouraged in Python, but here we use this as a way to ensure that the math package is also imported when the CV is reloaded from storage. The next two lines get the \(x\) and \(y\) positions of the particle. The function then returns the distance. On the last line, we wrap this function in an OPS FunctionCV.

In principle, a CV can return anything. However, CVs that are intended to be used as inputs to Volume objects (states or interfaces) should return a single numeric value.

For a case like this, it may be better to wrap in the function using a CoordinateFunctionCV, since the it only depends on the coordinates. This can offer some small perfomance/storage improvements when using methods that involve path reversal (changing the direction of time).

To test that your CV gives the result you expect, create a snapshot called snap and try:

distance_cv(snap)

Generalizing your CV function

In the toy examples, we use distances from a central point to define CVs related to each state. For different states, the central point is different. We could create a new function, as above, for each state. But those functions would be quite repetitive, so OPS enables an easier way to do this:

def distance(snapshot, center):
    import math
    x = snapshot.xyz[0][0]
    y = snapshot.xyz[0][1]
    return math.sqrt((x - center[0])**2 + (y - center[1])**2)

distance_cv = paths.CoordinateFunctionCV("dist", distance,
                                         center=[-0.5, -0.5])

Now we can specify a second CV based on the same function, just by changing the center parameter.

Your input function can take arbitrary keyword arguments (or “kwargs”), and you just need to pass those arguments to the CV wrapper. Note that, as with the specialized wrappers above, you must create your CVs with explicit keyword arguments. That is, the center= part in the last line is required.

Using CVs based on other CVs

What if I had a similar problem, but instead of the 2D plane being the actual coordinates, it was a plane defined by two other collective variables (such as two distances)? In OPS, we can create CVs that use other CVs as input. In the example below, I’ll first create CVs to get the \(x\) and \(y\) positions, then I’ll use those as inputs to a distance function:

def get_coord(snapshot, element):
    return snapshot.xyz[0][element]

cv_x = paths.CoordinateFunctionCV("cv_x", get_coord, element=0)
cv_y = paths.CoordinateFunctionCV("cv_y", get_coord, element=1)

def distance(snapshot, cv_x, cv_y, center):
    import math
    x = cv_x(snapshot)
    y = cv_y(snapshot)
    return math.sqrt((x - center[0])**2 + (y - center[1])**2)

distance_cv = paths.CoordinateFunctionCV("dist", distance,
                                         cv_x=cv_x, cv_y=cv_y,
                                         center=[-0.5, -0.5])

In addition to being convenient, this can approach can also be useful if your CV calculations are expensive. If a part of your CV is recalculated, you can make it into its own CV. Once it has been calculated once, it will be cached (and disk-cached, if requested), meaning that it can be re-used without regenerating it.

Another way this can improve performance is that sometimes, it’s easier to calculate multiple CVs at once, rather than calculating them individually. For example, the overhead for converting to another trajectory format (such as MDTraj) may become large. Here’s a way to avoid that:

all_dists = paths.MDTrajFunctionCV("all_dists", md.compute_distances,
                                   topology, atom_pairs=[[0, 1], [0, 2]])

def select_distance(snapshot, all_dists_cv, pair_number):
    return all_dists_cv(snapshot)[pair_number]

dist_0_1 = paths.CoordinateFunctionCV("dist_0_1", select_distance,
                                      all_dists_cv=all_dists,
                                      pair_number=0)

dist_0_2 = paths.CoordinateFunctionCV("dist_0_2", select_distance,
                                      all_dists_cv=all_dists,
                                      pair_number=1)

Let’s say that your code later uses dist_0_1 and then dist_0_2 on a snapshot called snap. When dist_0_1(snap) is called, the all_dists CV is used, meaning that we use MDTraj to calculate the distance. But all the distances are calculated, and the result is cached (always cached in memory, cached to disk if that has been enabled). When dist_0_2(snap) is called, it finds that all_dists was already called on snap, so it pulls the result from the cache instead of actually using MDTraj again.

Security: Loading from storage without the function

The ability to store an arbitrary function is extremely powerful. However, it can also be extremely dangerous. Since the function could, in principle, do anything, someone could create a malicious CV that does damage to your system. To prevent that, you can set a “safe mode” that prevents the storage from loading the Python functions associated with CVs. In safe mode, you can still read the disk-cached values of the CV, but you can’t calculate the value of the CV for a snapshot that doesn’t have a disk-cached result. If you need to calculate the value for new snapshots, you can associate the CV with a function after reloading it in safe mode. It is your responsibility to ensure that the function is the same as the original one! Note that this approach can also be used if Python changing its internal object code (as happened between Python 2 and Python 3).

If you are loading an OPS file from an unknown/untrusted source, we always recommend using safe mode.