Compile input file: Overview

Input files to the compile command can be in either YAML or JSON format. Files in YAML format should have the extension .yaml or .yml and files in JSON format should have the extension .json or .jsn. Most examples here will use YAML format, although this page will include equivalent versions in JSON.

The advantages for the YAML format are that it generally requires less typing, and it allows comments. The main advantage of JSON is that it is easier to see the structure of mappings and lists than in YAML. Which you use is personal preference – the compile command reads both equally.

Terminology

There are a few terms that we will use to describe parts of the input file:

  • keyword: Like most input file formats, an import part of OPS compile files is mapping keywords that identify a property to a user-defined value for that property.

  • top-level keyword: Only certain keywords are allowed at the outermost level of the nested structure. These keywords are the top-level keywords. The top-level keywords in OPS are: engines, cvs, volumes, states, networks, and schemes.

Input files for OPS are structured as nested key-value pairs or lists. Each object that you will store in the database has certain keywords that are needed define it, and the values you assign to those keywords depend the simulation you intend to run.

Example: Basics and evaluated expressions

Let’s use an OpenMM engine as a simple example. Since this is an engine, we’ll put it under the engines top-level keyword. From the documentation on the OpenMM engine, we know that it requires several parameters:

engines:
  # note that this is a list of engines, even though we only have one!
  # this is why there is a '-' before the mapping
  - type: openmm  # must be exactly openmm to be identified correctly
    name: my_engine
    topology: topology.pdb       # note that filenames should be relative
    system: system.xml           # to the directory from which you run
    integrator: integrator.xml   # the compile command
    n_steps_per_frame: 1.0 / 0.002  # we allow math here!
    n_frames_max: 10000

We create this within the engines top-level keyword. An engine can only be created in a context where the CLI is expecting engines, and the top-level keyword gives us a way to create that context.

The type keyword is required for all engines (and, indeed, most objects), and it must exactly match the identifier string. Note, however, that comments are allowed.

We give this object the name my_engine. That name is a free choice for the user, but since it will be used on the command line, it can be useful to avoid spaces or other characters that require escaping in the shell.

The topology, system, and integrator parameters take filenames. These filenames should be relative to the path from which the openpathsampling compile command will be invoked.

The final two parameters, n_steps_per_frame and n_frames_max, are integers. However, for n_steps_per_frame, we provide a string which will be treated as an evaluated expression. In this example, we imagine that perhaps the integrator’s timestep is 0.002 ps and that the user wants to save a frame of the trajectory every 1.0 ps, so there are 1.0 / 0.002 = 500 steps per frame. Internally, this will be converted to an integer according to Python’s int builtin. For n_frames_max, we just give the integer value.

The same code could also be implemented in JSON:

{"engines": [{
  "type": "openmm",
  "name": "my_engine",
  "topology": "topology.pdb",
  "system": "system.xml",
  "integrator": "integrator.xml",
  "n_steps_per_frame": "1.0 / 0.002",
  "n_frames_max": 10000,
}]}

In JSON, the n_frames_max parameter could either be left as an integer (as it is) or it could be made a string. However, evaluated expressions, such as n_steps_per_frame, must be strings.

Example: Re-using a named object, optional arguments

CVs based on MDTraj require a topology, which can be provided either by a file or by an engine. Engines are always parsed before CVs, so we can use the engine that we defined above to provide the topology for our CV. Let’s create a couple of MDTraj-based CVs; see our documentation on compiling MDTraj CVs for more details on the keywords. Note that this assumes it is in the same file as the engine definition above.

cvs:
  # we're making 2 CVs; each one starts with a '-' so that YAML knows
  # this is a new item in our list
  - type: mdtraj
    name: dist_AB
    topology: my_engine
    func: compute_distances
    kwargs:
      atom_pairs: [[0, 10]]
    # not included: period_min, period_max. They're not required here.

  - type: mdtraj
    name: phi
    topology: my_engine
    func: compute_dihedrals
    kwargs:
      indices: [[4, 6, 8, 14]]
    period_min: -np.pi
    period_max: np.pi
    # Using evaluated expressions for the period

In both CVs, we give a type and a name, just as we did for the engine. In this case, we use the engine’s name for the topology argument. This is how you re-use objects within a file in input files for the compile command.