Setting up sample sets
Or: How do I get the initial conditions?
Path sampling methods such as TIS involving simultaneously sampling multiple
path ensembles. This means that we need to not only know the trajectory, but
which ensemble it came from. Because of this, OPS uses data objects
Sample
to associate a trajectory with an ensemble and
SampleSet
to collect multiple Sample
instances.
For simulations such as PathSampling
, you must provide a
SampleSet
as initial conditions for the simulation. This document
deals with several ways to associate trajectories with ensembles, assuming
you’ve already generated a valid trajectory. See Getting an initial trajectory for
details on generating the trajectory.
The following sections provide several options for how to get a
SampleSet
once you’ve obtained the relevant trajectories and have
the ensemble objects (often contained in a TransitionNetwork
that
you’ve created). We’ll discuss the advantages and disadvantages of each
approach.
Loading from a file
This is the easiest, and will probably be the one you want to use whenever possible.
storage = paths.Storage("myfile.nc", mode='r')
# often you'll want to load the state of the last-saved MC step:
final_step = storage.steps[-1]
sample_set = final_step.active
# alternatively, you might want a specific sample set you stored
sample_set = storage.sample_sets[42] # if you know you want 42
This returns exactly the sample set that previous existed, including the connection to the previously-used ensembles. Although this is probably the best approach for most use cases, there are important situations where you would not us it:
If you don’t already have a file with OPS sample sets (chicken and egg, right?)
If you don’t want to associate the trajectories with the same ensembles as before. This might be because you’re changing the network that you’re sampling, e.g., using TPS trajectories as initial conditions for TIS, or changing the TIS network you’re using.
Using the move scheme
The move scheme knows the list of all ensembles that it might require for the first move, so you should use it to ensure that your sample set includes representatives for every ensemble. It can also take trajectories and associate them with ensembles. This is a good approach for creating initial conditions the first time you set up a simulation.
# scheme is a MoveScheme object
# trajectories is a trajectory or list of trajectories
sample_set = scheme.initial_conditions_from_trajectories(trajectories)
This will also give some output on missing ensembles/extra ensembles. Ensembles are considered “missing” if they might be required as input for the move scheme, but they don’t have a trajectory associated with them in the sample set. Ensembles are considered “extra” if they have a representative in the sample set, but can’t be used by the move scheme (not possible in this setup process).
Aside: Sanity checks
There are a few ways to make sure that your sample set is reasonable for
your simulation. OPS will automatically run these before running a path
sampling simulation, but you can check them yourself. Note that they
function based on assert
statements, so this won’t work if you disable
asserts with python -O
.
# assert that each trajectory can be in the associated ensemble
sample_set.sanity_check()
# assert that the sample set has the right ensembles represented to be
# initial conditions for the move scheme
scheme.assert_inital_conditions(sample_set)
Other approaches for sample sets
The first two use cases, loading from a file and using the move scheme’s
initial_conditions_from_trajectories
method, will probably meet nearly
all of your needs. However, there are a few other approaches. These are
legacy approaches that existed before the more general and simpler
approaches were fully stabilized, but they might still be useful.
Mapping equivalent ensembles
All objects in OpenPathSampling have a unique universal identifier (UUID) that gets set when they are created. However, it is possible to create two objects (e.g., two ensembles) that are equivalent, but do not share the same UUID. This would occur if you created the same ensemble in two different networks (e.g., by creating a new network with fewer ensembles than the original one).
sample_set = paths.SampleSet.translate_ensembles(old_sample_set, new_ensembles)
The main use case where this would make more sense than using the move scheme would be if you wanted to ensure that the ensembles for each trajectory was preserved, e.g., continuing a simulation with a modified network. However, be aware that there’s no guarantee that the analysis tools will correctly handle data that combines results from both networks.
Manually matching trajectories and ensembles
Of course, you can always manually create samples, and put them into a sample set:
samp0 = paths.Sample(replica=0, trajectory=traj0, ensemble=ens0)
samp1 = paths.Sample(replica=1, trajectory=traj1, ensemble=ens1)
...
sampN = paths.Sample(replica=N, trajectory=trajN, ensemble=ensN)
sample_set = paths.SampleSet([samp0, samp1, ..., sampN])
In all cases, we strongly recommend that you double check the correctness of the sample set using the sanity checks listed above as soon as you’ve created the sample sets. This can save later confusion.