recodex-wiki/Exercise-Configuration.md

# Exercise Configuration

In ReCodEx there are two configurations of exercise High Level Configuration (HiLC) and Low Level Configuration (LoLC). LoLC is used in backend, by workers for instance and should be general enough to create all kinds of worker tasks. On the other hand HiLC should be easy enough to be written or composed by normal application users, preferably in the form of graphical editation. But this configuration always has to be somehow stored, that is where this document comes in handy.

HiLC is divided in several parts which takes care of different things. There are **ExerciseConfig**, **Pipelines**, **Limits** and **EnvironmentConfig**. From these components configuration of exercise is composed and on every submit new LoLC is compiled from it.

## Compilation

Compilation is process which generates [[Job Configuration]] based on _Exercise Configuration_ described below.

### Steps

Follows list of steps which are needed when compiling _Exercise Configuration_ to _Job Configuration_. Further description of these steps will be provided in appropriate sections.

* Pipelines Merger
    * Variables Resolver
* Boxes Sorter
* Boxes Optimizer
* Test Directories Resolver
* Boxes Compiler

First step when the compilation of exercise configuration start is to **merge pipelines** in a particular tests into one directed tree. After that there is list of tests which each contains the tree of all boxes which should be needed for one exercise execution. But the pipelines itself are not connected to variables from environment or exercise config which might be useful further on. Therefore the second step will try to **find variables** referenced in ports of boxes in all trees. At the time we have list of trees but for the execution we need some ordering thus we need to **topologically sort** the trees into basically arrays.

But that is still not all. We still have array of tests but this is undesirable because all tests may contain compilation and generally tasks which can be the same for all tests and can have the same output. So we need component which takes care of that and optimize multiple test trees into one. **Boxes optimizer** is such component which heuristically merge duplicate boxes from different tests and solve appropriate eventual conflicts which might arise. Now carry on, to provide more level of separation to executed solutions, all tests should have their **individual execution folder**. Of course there is consecutive step in compilation to do just that. We have to go through all boxes and to particular port or to be more precise to all concerned variables add its execution folder.

Finally we are at the end of compilation, the last thing which has to be done is **compilation of boxes** itself. Compilation will take the tree from optimizer and crawl all boxes in it. Each box will be compiled in job configuration task, also there has to be resolved assigning of limits to appropriate execution tasks. After that the result of compilation will be _Job Configuration_ structure which can be handed over to backend of ReCodEx.

#### Pipelines Merger

TODO

#### Variables Resolver

TODO

#### Boxes Sorter

TODO

#### Boxes Optimizer

To be implemented... For the time being this feature is implemented only in simple way which just connects test trees into one without any further optimizations.

#### Test Directories Resolver

TODO

#### Boxes Compiler

**Priorities of Tasks:**

* Compilation - 100
* Execution - 90
* Judge - 80
* {default} - 42
* Dump Results - 1

## Variables and Ports

In whole exercise configuration and appropriate structures variables and ports are used. All have to have a type. It was decided that there will be six types which should be sufficient for every possible usage. List of them follows:

* **string** - textual value
* **string[]** - array of strings
* **file** - corresponds to file created during evaluation of submission
* **file[]** - array of files
* **remote-file** - corresponds to external file which has to be downloaded during evaluation of submission
* **remote-file[]** - array of remote files

## ExerciseConfig

Represents basic exercise configuration which connects all things together. For some reasons there two formats of this configuration, one which is saved in the database and the other one which is sent back to web application. Both formats are described bellow.

### Frontend Format

Returned as JSON.

Mandatory items are bold, optional italic, description of items follows:

* **${list of environments}** - root element is list of environments
    * **name** - identifier of the environment from database
    * **tests** - list of tests
        * **name** - identifier of the test which serves as unique identifier
        * **pipelines** - list of pipelines contained in test
            * **name** - identifier of pipeline database entity
            * _variables_ - list of variables for this pipeline
                * **name** - unique identifier of variable
                * **type** - one of the supported types
                * **value** - either single scalar value or array is variable is of array type

Example:

```
[
  {
    "name":"java8",
    "tests":[
      {
        "name":"Test 1",
        "pipelines":[
          {
            "name":"pipelineJava",
            "variables":[
              {
                "name":"varJava",
                "type":"string",
                "value":"valJava"
              }
            ]
          }
        ]
      },
      {
        "name":"Test 2",
        "pipelines":[
          {
            "name":"pipeline2",
            "variables":[
              {
                "name":"varB",
                "type":"file",
                "value":"valB"
              }
            ]
          }
        ]
      }
    ]
  },
  {
    "name":"cpp11",
    "tests":[
      {
        "name":"Test 1",
        "pipelines":[
          {
            "name":"pipeline1",
            "variables":[
              {
                "name":"varA",
                "type":"string",
                "value":"valA"
              }
            ]
          }
        ]
      },
      {
        "name":"Test 2",
        "pipelines":[
          {
            "name":"pipeline2",
            "variables":[
              {
                "name":"varCpp",
                "type":"file",
                "value":"valCpp"
              }
            ]
          }
        ]
      }
    ]
  }
]
```

### Backend Format

Whole configuration consists of tests which should have defined pipelines from which they are composed. There are default pipelines and also special pipelines for runtime environments. If definition of some environment pipeline is missing they are taken from default pipelines of appropriate test.

Stored in yaml.

Mandatory items are bold, optional italic, description of items follows:

* **environments** - list of environments identifiers which belong to exercise
* **tests** - map of tests indexed by test unique identifier
    * **${test identification}** - test unique identifier from database
        * **environments** - map of environments which redefines default pipelines from this test
            * **${environment identification}** - unique environment identifier from database
                * **pipelines** - list of redefined pipelines, if this list is empty, then list of pipelines is replaced by defaults
                    * **name** - name of the pipeline to which following variables belongs to
                    * **variables** - list of variables
                        * **name** - unique identifier of the variable
                        * **type** - one of the supported variable types
                        * **value** - either single scalar value or array is variable is of array type

Example:

```
environments:
  - java8
  - cpp11
tests:
  "Test 1":
    environments:
      java8:
        pipelines:
          - name: pipelineJava
            variables:
              - name: varJava
                type: string
                value: valJava
      cpp11: []
  "Test 2":
    environments:
      cpp11:
        pipelines:
          - name: pipeline2
            variables:
              - name: varCpp
                type: file
                value: valCpp
```

## Pipeline

Pipelines are sent to clients in JSON format and are stored in API in corresponding YAML with the same structure.

Important features:

* Every port either have to have defined reference to variable or it has to be blank. Actual value (for example string) is not allowed in port. If variable name is declared in port it has to exist in variables table.
* Connection between ports can be **one-to-one** or **one-to-many** from the perspective of output port. That means it is possible to have one output port which redirects variables to two or more input ports. Of course there has to be exception, it is allowed to have variable which is used only in input port, value of this variable has to be defined in pipeline variables table.
* Variables table in pipeline can contain **references** to external variables, these references can be directed to variables from environment configuration or exercise configuration. Variable is reference if it starts with the character **'$'**, variable cannot be used inside variable value (textual value "hello $world", where world should be reference, is not allowed). If for some reasons is needed to use variable value which starts with dollar sign it has to be escaped with backslash, so this "\$1 million" is actual value and not a reference.

### Boxes

* DataInBox and DataOutBox are special boxes which are treated differently from the others. This means that their deletion or even some breaking changes may have unforseen consequences. They are used for importing and exporting files in/out from pipeline. For importing string or array of strings, variable references have to be used. Inputs or outputs from pipeline may have been connected to another pipeline or to supervisor/student inputs.
* Data boxes have to be unconditionally used for importing or exporting files from pipelines. Variable references are not usable here since these references are only substitutions. For example files uploaded by supervisor (inputs and outputs) have to have input boxes in order to be properly downloaded from fileserver during execution.
* Every (except data boxes) box is used only in BoxService for creation purposes and then through abstract Box interface which is of course using inheritance for providing general usage schema. Thanks to this, creation of new boxes is quite simple and straightforward.

### Configuration

Mandatory items are bold, optional italic, description of items follows:

* **variables** - list of variables for this pipeline
    * **name** - unique identifier of the variable
    * **type** - one of the supported variable types
    * **value** - either single scalar value or array is variable is of array type
* **boxes** - list of boxes which are defined in this pipeline
    * **name** - unique identification of box
    * **type** - one of the supported box types
    * **portsIn** - map of input ports
        * **${port identification}** - unique identification of port
            * **type** - one of the supported port types
            * **value** - reference to variable which has to be defined in pipeline variables table, also port has to match
    * **portsOut** - map of output ports
        * **${port identification}** - unique identifier of port
            * **type** - one of the supported port types
            * **value** - reference to variable which has to be defined in pipeline variables table, also port has to match

Example:

```
{
   "variables":[
      {
         "name":"source_file",
         "type":"file",
         "value":"source.cpp"
      }
   ],
   "boxes": [
      {
         "name":"source",
         "portsIn":[],
         "portsOut":[{ "source_file":[{"type":"file", "value":"source_file"}] }],
         "type":"data"
      },
      {
         "name":"test",
         "portsIn":[],
         "portsOut":[{
            "test_file":[{"type":"file", "value":"test_file"}],
            "expected_output":[{"type":"file", "value":"expected_output"}]
         }],
         "type":"data"
      },
      {
         "name":"compilation",
         "portsIn":[{ "input_file":[{"type":"file", "value":"source_file"}] }],
         "portsOut":[{ "output_file":[{"type":"file", "value":"binary_file"}] }],
         "type":"compilation"
      },
      {
         "name":"run",
         "portsIn":[{ "binary_file":[{"type":"file", "value":"binary_file"}] }],
         "portsOut":[{ "output_file":[{"type":"file", "value":"actual_output"}] }],
         "type":"execution"
      },
      {
         "name":"judge",
         "portsIn":[{
            "actual_output":[{"type":"file", "value":"actual_output"}],
            "expected_output":[{"type":"file", "value":"expected_output"}]
         }],
         "portsOut":[{ "score":[{"type":"file", "value":"score"}] }],
         "type":"evaluation"
      }
   ]
}
```

## Limits

Limits are applied to whole test, that means if there are multiple execution tasks, all are going to have these same limits. Limits has to be specified in a way it contains at least one time limit and also memory limit.

Mandatory items are bold, optional italic, description of items follows:

* **${test identification}** - identifier of test from database
    * _wall-time_ - elapsed real-time in seconds, defined as float
    * _cpu-time_ - elapsed cpu-time in seconds, defined as float
    * **memory** - maximal memory usage in kilobytes
    * _parallel_ - maximal number of threads/processes used

Example:

```
test-id-1:
  wall-time: 5
  cpu-time: 6.4
  memory: 50
  parallel: 500
test-id-2:
  wall-time: 6
  memory: 60
```

## ExerciseEnvironmentConfig

Configuration for particular environments is stored here. This configuration can be seen in two formats the one which is returned to the web-app and the one in which configuration is stored. Environment configuration is stored in individual database entities, but it is desirable to return it as a whole for the whole exercise. Hence there appears to be two formats, both of them are described bellow.

Important features:

* Variable of type `file` or `file[]` in environment config can contain **wildcards**. These wildcards are then matched against files submitted in solution. For every wildcard/variable there has to be at least one file which match it.
* Variables table in exercise environment config can contain **references** to variables which should be given during submitting of solution. Variable is reference if it starts with the character **'$'**, variable cannot be used inside variable value (textual value "hello $world", where world should be reference, is not allowed). If for some reasons is needed to use variable value which starts with dollar sign it has to be escaped with backslash, so this "\$1 million" is actual value and not a reference.

### Frontend Format

Mandatory items are bold, optional italic, description of items follows:

* **{list of environments}** - root element is list of exercise environment configurations
	* **runtimeEnvironmentId** - identification of environment taken from database
	* **variablesTable** - list of variables
		* **name** - unique identification of variable
		* **type** - one of the supported variable types
		* **value** - either single scalar value or array is variable is of array type

Example:

```
[
  {
    "runtimeEnvironmentId":"CRuntime",
    "variablesTable":[
      {
        "name":"varA"
        "type":"string",
        "value":"valA"
      },
      {
        "name":"varB"
        "type":"file",
        "value":"valB"

      }
    ]
  },
  {
    "runtimeEnvironmentId":"JavaRuntime",
    "variablesTable":[
      {
        "name":"varA"
        "type":"file",
        "value":"javaA"
      },
      {
        "name":"varB"
        "type":"string",
        "value":"javaB"
      }
    ]
  }
]
```

### Backend Format

In API environment configurations are stored differently from how they are returned to the web-app. For every runtime environment there is individual database entity which holds environment configuration. Therefore there is only need to store variables table.

Mandatory items are bold, optional italic, description of items follows:

* **variablesTable** - list of variables
	* **name** - unique identification of variable
	* **type** - one of the supported variable types
	* **value** - either single scalar value or array is variable is of array type

Example:

```
variablesTable:
  - name: varName
    type: string
    value: varValue
  - name: source_file
    type: file
    value: source.cpp
```