NaN Handling in Engines

Developers of DynamicsEngine subclasses have several options for how their engines should behave if they encounter a NaN.


tutorial_handle_nan

How to deal with errors in engines

Imports

In [1]:
import openpathsampling as paths

import openpathsampling.engines.openmm as dyn_omm
import openpathsampling.engines as dyn
from simtk.openmm import app
import simtk.openmm as mm
import simtk.unit as unit

import mdtraj as md

import numpy as np

Setting up the engine

Now we set things up for the OpenMM simulation. We will need a openmm.System object and an openmm.Integrator object.

To learn more about OpenMM, read the OpenMM documentation. The code we use here is based on output from the convenient web-based OpenMM builder.

In [2]:
# this cell is all OpenMM specific
forcefield = app.ForceField('amber96.xml', 'tip3p.xml')
pdb = app.PDBFile("../resources/AD_initial_frame.pdb")
system = forcefield.createSystem(
    pdb.topology, 
    nonbondedMethod=app.PME, 
    nonbondedCutoff=1.0*unit.nanometers,
    constraints=app.HBonds, 
    rigidWater=True,
    ewaldErrorTolerance=0.0005
)
hi_T_integrator = mm.LangevinIntegrator(
    500*unit.kelvin, 
    1.0/unit.picoseconds, 
    2.0*unit.femtoseconds)
hi_T_integrator.setConstraintTolerance(0.00001)

The storage file will need a template snapshot. In addition, the OPS OpenMM-based Engine has a few properties and options that are set by these dictionaries.

In [3]:
template = dyn_omm.snapshot_from_pdb("../resources/AD_initial_frame.pdb")
openmm_properties = {'OpenCLPrecision': 'mixed'}
engine_options = {
    'n_frames_max': 2000,
    'nsteps_per_frame': 10
}
In [4]:
engine = dyn_omm.Engine(
    template.topology, 
    system, 
    hi_T_integrator, 
    openmm_properties=openmm_properties,
    options=engine_options
).named('500K')
In [5]:
engine.initialize('OpenCL')

Defining states

We define stupid non-existant states which we can never hit. Good grounds to generate nan or too long trajectories.

In [6]:
volA = paths.EmptyVolume()
volB = paths.EmptyVolume()
In [7]:
init_traj_ensemble = paths.AllOutXEnsemble(volA) | paths.AllOutXEnsemble(volB)

Create a bad snapshot

In [8]:
nan_causing_template = template.copy()
kinetics = template.kinetics.copy()
# this is crude but does the trick
kinetics.velocities = kinetics.velocities.copy()
kinetics.velocities[0] = \
    (np.zeros(template.velocities.shape[1]) + 1000000.) * \
    unit.nanometers / unit.picoseconds
nan_causing_template.kinetics = kinetics
In [9]:
# generate trajectory that includes frame in both states
try:
    trajectory = engine.generate(nan_causing_template, [init_traj_ensemble.can_append])
except dyn.EngineNaNError as e:
    print 'we got NaNs, oh no.'
    print 'last valid trajectory was of length %d' % len(e.last_trajectory)
except dyn.EngineMaxLengthError as e:
    print 'we ran into max length.'
    print 'last valid trajectory was of length %d' % len(e.last_trajectory)
we got NaNs, oh no.
last valid trajectory was of length 1

Now we will make a long trajectory

In [10]:
engine.options['n_frames_max'] = 10
engine.options['on_max_length'] = 'fail'
In [11]:
# generate trajectory that includes frame in both states
try:
    trajectory = engine.generate(template, [init_traj_ensemble.can_append])
except dyn.EngineNaNError as e:
    print 'we got NaNs, oh no.'
    print 'last valid trajectory was of length %d' % len(e.last_trajectory)
except dyn.EngineMaxLengthError as e:
    print 'we ran into max length.'
    print 'last valid trajectory was of length %d' % len(e.last_trajectory)
0 0 2 10
0 0 3 10
0 0 4 10
0 0 5 10
0 0 6 10
0 0 7 10
0 0 8 10
0 0 9 10
0 0 10 10
we ran into max length.
last valid trajectory was of length 10

What, if that happens inside of a simulation?

In [12]:
mover = paths.ForwardShootMover(
    ensemble=init_traj_ensemble, 
    selector=paths.UniformSelector(),
    engine=engine)

Should run indefinitely and hit the max frames of 10.

In [13]:
init_sampleset = paths.SampleSet([paths.Sample(
            trajectory=paths.Trajectory([template] * 5),
            replica=0,
            ensemble = init_traj_ensemble
    )])

Run the PathMover and check the change

In [14]:
change = mover.move(init_sampleset)
0 0 2 10
0 0 3 10
0 0 4 10
0 0 5 10
0 0 6 10
0 0 7 10
0 0 8 10
0 0 9 10
0 0 10 10
In [15]:
assert(isinstance(change, paths.movechange.RejectedMaxLengthSampleMoveChange))
In [16]:
assert(not change.accepted)
In [17]:
change.samples[0].details.__dict__.get('stopping_reason')
Out[17]:
'max_length'

Let's try again what happens when nan is encountered

In [18]:
init_sampleset = paths.SampleSet([paths.Sample(
            trajectory=paths.Trajectory([nan_causing_template] * 5),
            replica=0,
            ensemble = init_traj_ensemble
    )])
In [19]:
change = mover.move(init_sampleset)
In [20]:
assert(isinstance(change, paths.movechange.RejectedNaNSampleMoveChange))
In [21]:
assert(not change.accepted)
In [22]:
change.samples[0].details.__dict__.get('stopping_reason')
Out[22]:
'nan'

Change the behaviour of the engine to ignore nans. This is really not advised, because not all platforms support this. CPU will always throw an nan error and

In [23]:
engine.options['on_nan'] = 'ignore'
engine.options
Out[23]:
{'n_frames_max': 10,
 'n_steps_per_frame': 10,
 'on_error': 'fail',
 'on_max_length': 'fail',
 'on_nan': 'ignore',
 'on_retry': 'full',
 'retries_when_error': 0,
 'retries_when_max_length': 0,
 'retries_when_nan': 2,
 'timestep': None}
In [24]:
change = mover.move(init_sampleset)
0 0 2 10
0 0 3 10
0 0 4 10
0 0 5 10
0 0 6 10
0 0 7 10
0 0 8 10
0 0 9 10
0 0 10 10
In [ ]:
 

(tutorial_handle_nan.ipynb; tutorial_handle_nan.py)