Following text requires knowledge of basic terminology used by ReCodEx. Please, check [separate page](https://github.com/ReCodEx/GlobalWiki/wiki/Terminology).
Following text requires knowledge of basic terminology used by ReCodEx. Please, check [separate page](https://github.com/ReCodEx/GlobalWiki/wiki/Terminology).
### Basics
### Basics
Job is a set/list of tasks (it is generally a set, but order of tasks have some meaning). These tasks may have dependencies (arbitrary number), which needs to be observed. When isoeval processes job, it creates a task graph, where tasks are vertices and dependencies are edges (A -> B means that the task B is on the dependency list of task A) and creates it linear ordering. The graph must be acyclic (otherwise linear ordering will not exist) and the isoeval attempts to execute maximal number of tasks possible. Tasks without dependencies can be executed directly, other tasks are executed when all their dependencies have been successfully completed.
Job is a set/list of tasks (it is generally a set, but order of tasks have some meaning). These tasks may have dependencies (arbitrary number), which needs to be observed. When recodex-worker processes job, it creates a task graph, where tasks are vertices and dependencies are edges (A -> B means that the task A is on the dependency list of task B) and creates its linear ordering. The graph must be acyclic (otherwise linear ordering will not exist) and the recodex-worker attempts to execute maximal number of tasks possible. Tasks without dependencies can be executed directly, other tasks are executed when all their dependencies have been successfully completed.
Tasks are executed sequentially -- by the linear ordering of the task graph. Parallel tasks (tasks, which are not directly dependent and thus their linear ordering may be arbitrary) are ordered by their priority (first) and by their order in the configuration file (second). Priority is important for specifying evaluation flow. See sample picture for better understanding.
Tasks are executed sequentially -- by the linear ordering of the task graph. Parallel tasks (tasks, which are not directly dependent and thus their linear ordering may be arbitrary) are ordered first by their priority (higher number => higher priority) and second by their order in the configuration file. Priority is important for specifying evaluation flow. See sample picture for better understanding.
![Picture of task serialization](https://github.com/ReCodEx/GlobalWiki/raw/master/images/Assignment_overview.png)
![Picture of task serialization](https://github.com/ReCodEx/GlobalWiki/raw/master/images/Assignment_overview.png)
Each task has a unique ID (alphanum string like _CompileA_, _RunAA_, or _JudgeAB_ in the picture). These IDs are used to identify tasks (for dependency references, in the log, ...). Numbers in bottom right corner are priorities of each task. Higher number is greater priority. It means, that if task _RunAA_ is done, next must be _JudgeAA_ and not _RunAB_ (that will be also valid linear ordering, but _RunAB_ has lower priority).
Each task has a unique ID (alphanum string like _CompileA_, _RunAA_, or _JudgeAB_ in the picture). These IDs are used to identify tasks (for dependency references, in the log, ...). Numbers in bottom right corner are priorities of each task. Higher number is greater priority. It means, that if task _RunAA_ is done, next must be _JudgeAA_ and not _RunAB_ (that will be also valid linear ordering, but _RunAB_ has lower priority).
### Task
### Task
Task is an atomic piece of work executed by isoeval. There are two basic types of tasks (so far):
Task is an atomic piece of work executed by recodex-worker. There are two basic types of tasks:
- **Execute external process** (optionally inside Isolate). Linux default will be mandatory in Isolate, this option is here because of Windows.
- **Execute external process** (optionally inside Isolate). Linux default will be mandatory in Isolate, this option is here because of Windows.
- **Perform internal operation**. External processes are meant for compilation, testing, or execution of external judges. Internal operations comprise commands, which are typically related to file/directory maintenance and other evaluation management stuff. Few important examples:
- **Perform internal operation**. External processes are meant for compilation, testing, or execution of external judges. Internal operations comprise commands, which are typically related to file/directory maintenance and other evaluation management stuff. Few important examples:
- Create/delete/move/rename file/directory
- Create/delete/move/rename file/directory
- (un)zip/tar/gzip/bzip file(s)
- (un)zip/tar/gzip/bzip file(s)
- fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP).
- fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the isoeval as it would simplify these operations and their portability among platforms. Furthermore, it is quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the recodex-worker as it would simplify these operations and their portability among platforms. Furthermore, it is quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
**External Tasks**
**External Tasks**
(some of the properties specified here may also apply for internal tasks -- needs to be determined later)
These tasks are typically executed in isolate (with given parameters) and the recodex-worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job.
These tasks are typically executed in isolate (with given parameters) and the isoeval waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job.
- **stdin** - can be configured to read from existing file or from /dev/null.
- **stdin** - can be configured to read from existing file or from /dev/null.
- **stdout** and **stderr** - can be individually redirected to a file or discarded. Optionally, a copy can be directed to selected log (for example common log for compilations). In any case, outputs of all tasks are saved in external files (inside log directory) and optionally included into the job output (see Directories).
- **stdout** and **stderr** - can be individually redirected to a file or discarded. If this output options are specified, than it is possible to upload output files with results by copying them in result directory.
- **limits** - task have time and memory limits; if these limits are exceeded, the task also fails. Additionally, a second memory/time limit may be provided (for the Isolate) -- then the first limits are "soft limits" (used only to determine, whether the task succeeded) and the second limits are hard limits (really kills the process).
- **limits** - task have time and memory limits; if these limits are exceeded, the task also fails.
The task results (exit code, time, and memory consumption) are save into parameter global structure (see Parameters And Results).
The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and eventually sent back to frontend application to address which was specified on input.
### Directories and Files
### Directories and Files
The isoeval job is restricted to operate on several subdirs; each path used in task configuration must start with identifier of one of these dirs and no '..' are allowed in paths.
For each job execution unique directory structure is created. Job is not restricted to specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside job. In recodex-worker configuration one can specify worker default directory, this is base of every file which is produced by recodex-worker.
- **input** - where the input files (source codes) are prepared
Inside this directory temporary files for job execution are created:
- **output** - anything that is moved/copied to this dir is taken as output of the job and sent back to frontend (where it is stored)
- **${DEFAULT}/downloads/${WORKER_ID}/${JOB_ID}** - where the downloaded archive is saved
- **box** - empty dir which can be used for compilation/evaluation/... (an internal task for cleaning this box may exist). **Every test needs to have separate subfolder here to avoid sharing data between tests.**
- **${DEFAULT}/submission/${WORKER_ID}/${JOB_ID}** - decompressed submission is stored here
- **log** - directory, where log files are and where outputs of all tasks are copied. Each task has a directory here with the same name with _stdout_ and _stderr_ files. Tasks cannot access **log** directory, except they can produce or redirect output to logs.
- **${DEFAULT}/eval/${WORKER_ID}/${JOB_ID}** - this directory is accessible in job configuration using variables and all execution should happen here
- **${DEFAULT}/temp/${WORKER_ID}/${JOB_ID}** - directory where all sort of temporary files can be stored
- **${DEFAULT}/results/${WORKER_ID}/${JOB_ID}** - again accessible directory from job configuration which is used to store all files which will be upload on fileserver
### Configuration
### Configuration
Configuration of the job which is passed to worker is generated from two parts:
Configuration of the job which is passed to worker is generated from two parts:
- **template** - Common template for similar kinds of tasks. Contains allmost all instructions - when fetch, move, rename files, run commands, judges, ..., task dependencies and priorities. This template can be shared by more problem assignments or every problem (probably in compiller class) can have different one.
- **template** - Common template for similar kinds of tasks. Contains allmost all instructions - when fetch, move, rename files, run commands, judges, ..., task dependencies and priorities. This template can be shared by more problem assignments or every problem (probably in compiller class) can have different one.
- **isoeval config** - includes data for instancioning the template, e.q. input file names, ...
- **isoeval config** - includes data for instancioning the template, e.q. input file names, ...
Final configuration for worker is computer generated from those two configs.
Final configuration for worker is computer generated from those two configs.
Job configuration consist of some general information and then from list of tasks (one or more)
#### Configuration items
If not specified otherwise than its mandatory item! Mandatory items are bold, optional italic.
- **submission** - information about this particular submission
- **job-id** - textual ID which should unique in whole recodex
- **language** - no specific function, just for debugging and clarity
- **file-collector** - address from which fetch tasks will download data
- _log_ - default is false, can be omitted, determines whether job execution will be logged into one shared log
- **tasks** - list (not map) of individual tasks
- **task-id** - unique indetifier of task in scope of one submission
- **priority** - higher number, higher priority
- **fatal-failure** - if true, than execution of whole job will be stopped after failing of this one
- **dependencies** - list of dependencies which have to be fulfilled before this task, can be omitted if there is no dependencies
- **cmd** - description of command which will be executed
- **bin** - the binary itself
- _args_ - list of arguments which will be sent into execution unit
- _stdin_ - file to which standard input will be redirected, used only in external tasks, can be omitted
- _stdout_ - file to which standard output will be redirected, used only in external tasks, can be omitted
- _stderr_ - file to which error output will be redirected, used only in external tasks, can be omitted
- _sandbox_ - wrapper for external tasks which will run in sandbox, if defined task is automatically external
- **name** - name of used sandbox
- **limits** - list of limits which can be passed to sandbox
- **hw-group-id** - determines specific limits for specific machines
- _time_ - time of execution in second
- _wall-time_ - wall time in seconds
- _extra-time_ - extra time which will be added to execution
- _stack-size_ - size of stack of executed program in kilobytes
- _memory_ - overall memory limit for application in kilobytes
- _parallel_ - integral number of processes which can run simultaneously, time and memory limits are merged from all potential processes/threads
- _disk-size_ - size of all io operations from/to files in kilobytes
- _disk-files_ - number of files which can be opened
- _environ-variable_ - wrapper for map of environmental variables, union with default worker configuration
- _chdir_ - this will be working directory of executed application
- _bound-directories_ - list of structures reprezenting directories which will be visible inside sandbox, union with default worker configuration
- **src** - source pointing to actual system directory
- **dst** - destination inside sandbox which can have its own filesystem binding
- **mode** - determines connection mode of specified directory, one of values: RW, NOEXEC, FS, MAYBE, DEV
#### Configuration example
#### Configuration example
This configuration example is written in YAML and serves only for demostration purposes. Therefore it is not working example which can be used in real traffic. Some items can be omitted and defaults will be used.
This configuration example is written in YAML and serves only for demostration purposes. Therefore it is not working example which can be used in real traffic. Some items can be omitted and defaults will be used.
```
```
--- # only one document which contains job, aka. list of tasks and some general infos
--- # only one document which contains job, aka. list of tasks and some general infos
submission: # information about this particular submission
submission:
job-id: eval_5
job-id: eval_5
language: "cpp"
language: "cpp"
file-collector: "http://localhost:36587"
file-collector: "http://localhost:36587"
log: true # default is false, can be omitted, determines whether job execution will be logged
log: true
tasks:
tasks:
- task-id: "fetch_input"
- task-id: "fetch_input"
priority: 2
priority: 2
@ -72,7 +109,7 @@ tasks:
- task-id: "move_test01"
- task-id: "move_test01"
priority: 3
priority: 3
fatal-failure: true
fatal-failure: true
dependencies: # can be omitted if there is no dependencies
dependencies:
- compile_test01
- compile_test01
cmd:
cmd:
bin: "mv"
bin: "mv"
@ -89,14 +126,13 @@ tasks:
args:
args:
- "-v"
- "-v"
- "-f 01.in"
- "-f 01.in"
stdin: "01.in" # can be omitted if there is no binding to stdin
stdin: "01.in"
stdout: "01.out" # can be omitted if there is no binding to stdout
stdout: "01.out"
stderr: "01.err" # can be omitted if there is no binding to stderr
stderr: "01.err"
sandbox: # if defined task is external and will be run in sandbox
sandbox:
name: "isolate" # mandatory information
name: "isolate"
limits: # if not defined, then worker default configuration of limits is loaded
limits:
# anything of the specified limits can be omitted and will be loaded from worker defaults
- hw-group-id: group1
- hw-group-id: group1 # determines specific limits for specific machines
time: 5 # seconds
time: 5 # seconds
wall-time: 6 # seconds
wall-time: 6 # seconds
extra-time: 2 # seconds
extra-time: 2 # seconds
@ -113,7 +149,7 @@ tasks:
- src: /tmp/isoeval/eval_5
- src: /tmp/isoeval/eval_5
dst: /evaluate
dst: /evaluate
mode: RW,NOEXEC
mode: RW,NOEXEC
- hw-group-id: group2 # determines specific limits for specific machines