Run definition DSL
This page provides the full specification of the JSON-based DSL used to define runs. For a more progressive introduction to workflows, please refer to this guide.
JSON schema
A run file should contain a single JSON object formed of the following fields.
Name | Type | Description |
---|---|---|
workflow | string | Specification of the workflow to execute. |
name | string; optional | A human-readable name. |
notes | string; optional | Notes describing the purpose of this experiment. |
tags | string[]; optional | Some tags, used when searching for runs. |
seed | long; optional | Seed to be used for deterministic reproduction. If not specified, a random one will be generated. |
repeat | integer; optional; default: 1 | Number of times to repeat each run. |
params | object; optional | Mapping between parameter names and their values. |
Specifying parameters
When defining an experiment, you can specify values for parameters. You must specify a value for all parameters, except if they are only used by optional inputs. You can either specify a single value or multiple values for any parameter, triggering the execution of multiple versions of the same original workflow. Accio currently supports several ways to specify parameters, described in the next sections.
The different values that a parameter will take are defined via a JSON object whose only key is values
, mapped to a JSON array with all values taken by the parameter.
The order of values has no importance.
Values should be specified using the same format as for workflow input values.
For example:
{
"params": {
"epsilon": {
"values": [1, 0.1, 0.001, 0.0001]
}
}
}
Of course, a singleton can be passed as values
.
If at least one parameter has more than one possible value, the cross product of all values will be taken to determine the runs to actually trigger.
For example, 8 (4 x 2 x 1) different runs will be launched with the following configuration:
{
"params": {
"epsilon": {
"values": [1, 0.1, 0.001, 0.0001]
},
"uri": {
"values": ["/path/to/geolife", "/path/to/cabspotting"]
},
"level": {
"values": [12]
}
}
}
Controlling randomness with a seed
Some workflows may include operators marked as unstable, which means they need some source of randomness when being executed.
This randomness is provided through a seed.
By default, a random seed is generated for each run, but you may fix it through the seed
key.
If a run definition gives birth to several runs, the seed specified in the definition will be used to deterministically generate a seed for each child run.
If a run definition corresponds to a single run, the specified seed will be used directly.