An image definition CWL file is a
YAML containing objects. An object is a data structure equivalent to the "object" type in JSON, consisting of a unordered set of name/value pairs (referred to here as fields) and where the name is a string and the value is a string, number, boolean, array, or object.
A process is a basic unit of computation which accepts input data, performs some computation, and produces output data. Examples include CommandLineTools, Workflows, and ExpressionTools. In this tutorial we will only deal with CommandLineTools.
Each CWL file required by SCHeMa must optionally contain the following sections:
- General information about the process,
- An input object is an object describing the inputs to an invocation of a process based on an input schema, which describes the valid format (required fields, data types) for an input object.
- An output object is an object describing the output resulting from an invocation of a process based on an output schema, which describes the valid format for an output object.
Each component will be described in detail in the following sections.
General process information
An example of process information specification can be seen in the following example:
cwlVersion: v1.0
class: CommandLineTool
baseCommand: /home/bufet/bufet.bin
hints:
DockerRequirement:
dockerPull: zagganas/bufet:latest
cwlVersion
The cwlVersion field indicates the version of the CWL spec used by the document. Currently it is v1.0
class
The class field indicates this document describes a command line tool.
baseCommand
The baseCommand provides the name of program that will actually run inside the container.
Retrieving image from an external repository (optional)
We need to specify some hints for how to find the image we want. In this case we list just our requirements for the docker image in DockerRequirements. The dockerPull: parameter takes the same value that you would pass to a docker pull command. That is, the name of the container image (you can even specify the tag, which is good idea for best practises when using containers for reproducible research).
Input specification
The inputs of a tool is a list of input parameters that control how to run the tool. An example of an input object specification can be seen below:
inputs:
miRNA-Gene interactions file:
type: file
inputBinding:
position: 1
Output file name:
type: string
default: /data/output.txt
inputBinding:
position: 2
miRNA query file:
type: file
inputBinding:
position: 3
Ontology file:
type: file
inputBinding:
position: 4
Number of random miRNA groups:
type: int
default: 1000000
inputBinding:
position: 5
Number of threads:
type: int
default: 8
inputBinding:
position: 6
prefix: -nt
Each parameter has an
id which specifies the name of parameter, and
type describing what types of values are valid for that parameter. Available types are string, int, long, float, double and file. Furthermore, for parameters other than file, the user can specify a default value by using the
default field. The
inputBinding field is used to provide more details about a certain input. More specifically, the
position of the parameter (in what order it is passed to the script inside the container) is defined and additionally, the input requires a
prefix (usually specified with a hyphen or a double hyphen) then the user can specify it.
Putting it all together
cwlVersion: v1.0
class: CommandLineTool
baseCommand: /home/bufet/bufet.bin
hints:
DockerRequirement:
dockerPull: zagganas/bufet:latest
inputs:
miRNA-Gene interactions file:
type: file
inputBinding:
position: 1
Output file name:
type: string
default: /data/output.txt
inputBinding:
position: 2
miRNA query file:
type: file
inputBinding:
position: 3
Ontology file:
type: file
inputBinding:
position: 4
Number of random miRNA groups:
type: int
default: 1000000
inputBinding:
position: 5
Number of threads:
type: int
default: 8
inputBinding:
position: 6
prefix: -nt
outputs: []