You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
recodex-wiki/Job-configuration.md

414 lines
17 KiB
Markdown

# Job configuration
Following description will cover configuration as seen from the API point of
view, worker may have some optional items mandatory (they are filled by API
automatically). Bold items in lists describing the values are mandatory, italic
ones are optional.
## Variables
Because the frontend does not know which worker gets the job, its necessary to
be a little general in a configuration file. This means that some worker
specific things has to be transparent. Good example of this is that some
(evaluation) directories may be placed differently across all workers. To
provide a solution, variables are established. There are some restrictions where
variables can be used, basically whenever filesystem paths can be used.
Usage of variables in the configuration is simple and kind of shell-like. Name
of a variable is put inside braces which are preceded with dollar sign. Real
usage is then something like this: `${VAR}`. There should be no quotes or
apostrophes around variable name, just simple text in braces. Parsing is simple
and whenever there is dollar sign with braces job execution unit automatically
assumes that this is a variable, so there is no chance to have this kind of
substring anywhere else.
List of usable variables in a job configuration:
- **WORKER_ID** -- integral identification of worker, unique on a server
- **JOB_ID** -- identification of this job
- **SOURCE_DIR** -- directory where source codes of the job are stored
8 years ago
- **EVAL_DIR** -- evaluation directory which points inside the sandbox and is
automatically bound there
- **RESULT_DIR** -- results from the job can be copied here, but only with
internal copy task
- **TEMP_DIR** -- general temporary directory which is not dependent on
operating system
- **JUDGES_DIR** -- directory in which judges are stored (outside sandbox)
## Tasks
Task is an atomic piece of work executed by worker. There are two basic types of
tasks:
- **Execute external process** (optionally inside isolate). External processes
are meant for compilation, testing, or execution of judges. Linux default is
mandatory usage of isolate sandbox, this option is present because of Windows,
where is currently no sandbox available.
- **Perform internal operation**. Internal operations comprise commands, which
are typically related to file/directory maintenance and other evaluation
management stuff. Few important examples:
- Create/delete/move/rename file/directory
- (un)zip/tar/gzip/bzip file(s)
- fetch a file from the file repository (either from worker cache or
download it by HTTP GET).
Even though the internal operations may be handled by external executables
(`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the
worker as it would simplify these operations and their portability among
platforms. Furthermore, it is quite easy to implement them using common
libraries (e.g., _zlib_, _curl_).
A task may be marked as essential; in such case, failure will immediately cause
termination of the whole job. Nice example usage is task with program
compilation. Without success it is obvious, that the job is broken and every
test will fail anyway.
### Internal tasks
- **Archivate task** can be used for packing and compressing a directory.
Calling command is `archivate`. Requires two arguments:
- path and name of the directory to be archived
- path and name of the target archive. Only `.zip` format is supported.
- **Extract task** is opposite to archivate task. It can extract different types
of archives. Supported formats are the same as supports `libarchive` library
(see [libarchive wiki](https://github.com/libarchive/libarchive/wiki)),
mainly `zip`, `tar`, `tar.gz`, `tar.bz2` and `7zip`. Please note, that system
administrator may not install all needed packages, so some formats may not be
accessible. Please, consult your system administrator for more information.
Archives could contain only regular files or directories (no symlinks, block
and character devices sockets or pipes allowed). Calling command is `extract`
and requires two arguments:
- path and name of the archive to extract
- directory, where the archive will be extracted
- **Fetch task** will get a file. It can be downloaded from remote fileserver or
8 years ago
just copied from local cache if available. Calling command is `fetch` with
two arguments:
- name of the requested file without path (file sources are set up in worker
8 years ago
configuration file)
- path and name on the destination. Providing a different destination name
can be used for easy rename.
- **Copy task** can copy files and directories. Detailed info can be found on
reference page of
[boost::filesystem::copy](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#copy).
Calling command is `cp` and require two arguments:
- path and name of source target
8 years ago
- path and name of destination target
- **Make directory task** can create arbitrary number of directories. Calling
command is `mkdir` and requires at least one argument. For each provided
argument will be called
[boost::filesystem::create_directories](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#create_directories)
command.
- **Rename task** will rename files and directories. Detailed bahavior can be
found on reference page of
[boost::filesystem::rename](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#rename).
Calling command is `rename` and require two arguments:
- path and name of source target
- path and name of destination target
- **Remove task** is for deleting files and directories. Calling command is `rm`
and require at least one argument. For each provided one will be called
[boost::filesystem::remove_all](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#remove_all)
command.
### External tasks
External tasks are arbitrary executables, typically ran inside isolate (with
given parameters) and the worker waits until they finish. The exit code
determines, whether the task succeeded (0) or failed (anything else).
There are several additional options:
- **stdin** -- task can be configured to read from existing file or from
`/dev/null`.
- **stdout** and **stderr** -- task output can be individually redirected to a
file or discarded. If this output options are specified, than it is possible
to upload output files with results by copying them into result directory.
- **limits** -- task has time and memory limits; if these limits are exceeded,
the task is failed.
The task results (exit code, time, and memory consumption, etc.) are saved into
result yaml file and sent back to the frontend application to address which was
specified on the initiation of job evaluation.
## Judges
Judges are treated as normal external commands, so there is no special task type
for them. Binaries are installed alongside with worker executable in standard
directories (on both Linux and Windows systems).
All packed judges are adopted from old CodEx with only very small modifications.
ReCodEx judges base directory is in `${JUDGES_DIR}` variable, which can be used
in the job config file.
- **recodex-judge-normal** is a base judge used by most of the exercises. This
judge compares two text files. It compares only text tokens regardless on
amount of whitespaces between them.
```
Usage: recodex-judge-normal [-r | -n | -rn] <file1> <file2>
```
- file1 and file2 are paths to files that will be compared
- switch options `-r` and `-n` can be specified as optional arguments.
- `-n` judge will treat newlines as ordinary whitespace (it will ignore
line breaking)
- `-r` judge will treat tokens as real numbers and compares them
accordingly (with some amount of error)
- **recodex-judge-filter** can be used for preprocessing output files before
real judging. This judge filters C-like comments from a text file. The comment
starts with double slash sequence (`//`) and finishes with newline. If the
comment takes whole line, then whole line is filtered.
```
Usage: recodex-judge-filter [inputFile [outputFile]]
```
8 years ago
- if `outputFile` is omitted, std. output is used instead.
- if both files are omitted, application uses std. input and output.
- **recodex-judge-shuffle** is for judging results with semantics of a set,
where ordering is not important. Two files are compared with no regards for
whitespace (whitespace acts just like token delimiter).
```
Usage: recodex-judge-shuffle [-[n][i][r]] <file1> <file2>
```
- `-n` ignore newlines (newline is considered only a whitespace)
8 years ago
- `-i` ignore items order on the row (tokens on each row may be permuted)
- `-r` ignore order of rows (rows may be permuted); this option has no
effect when `-n` is used
## Configuration items
- **submission** -- general information about this particular submission
- **job-id** -- textual ID which should be unique in whole recodex
- _file-collector_ -- URL address from which fetch tasks will download data
(API will fill)
- _log_ -- default is false, can be omitted, determines whether job
execution will be logged into one shared log file
- **hw-groups** -- a list of hardware groups for which there are limits
specified in this configuration
- **tasks** -- list (not map) of individual tasks
- **task-id** -- unique identifier of the task in scope of one submission
- _priority_ -- higher number, higher priority, defaults to 1
- _fatal-failure_ -- if true, than execution of whole job will be stopped
after failing of this one, defaults to false
- _dependencies_ -- list of dependencies which have to be fulfilled before
this task, can be omitted if there is no dependencies; YAML list of values
- **cmd** -- description of command which will be executed
- **bin** -- the binary itself (absolute path of external command or
name of internal task, job variables can be used)
- _args_ -- list of arguments which will be sent into execution unit
- _test-id_ -- ID of the test this task is part of -- must be specified for
tasks which the particular test's result depends on
- _type_ -- type of the task, can be omitted, default value is _inner_ --
possible values are: _inner_, _initiation_, _execution_, _evaluation_.
Each logical test must contain 0 or more _initiation_ tasks, at least one
8 years ago
task of type _execution_ (time and memory limits exceeded are presented to
user) and exactly one of type _evaluation_ (typicaly judge). _Inner_ task
type is mainly for internal tasks, but can be used for external tasks,
which are not part of any test.
- _sandbox_ -- wrapper for external tasks which will run in sandbox, if
defined task is automatically external
- **name** -- name of used sandbox
- _stdin_ -- file to which standard input will be redirected, can be
8 years ago
omitted; job variables can be used, usually `${EVAL_DIR}`, has to
be accessible inside sandbox
- _stdout_ -- file to which standard output will be redirected, can be
8 years ago
omitted; job variables can be used, usually `${EVAL_DIR}`, has to
be accessible inside sandbox
- _stderr_ -- file to which error output will be redirected, can be
8 years ago
omitted; job variables can be used, usually `${EVAL_DIR}`, has to
be accessible inside sandbox
8 years ago
- _output_ -- true/false, if true then output from stdout and stderr
(in that order) will be written in `result.yaml`, limitation on
length defined in worker configuration
7 years ago
- _chdir_ -- this will be working directory of executed application
- _limits_ -- list of limits which can be passed to sandbox, can be
omitted, in that case defaults will be used
- **hw-group-id** -- determines specific limits for specific
machines
- _time_ -- time of execution in second
- _wall-time_ -- wall time in seconds
- _extra-time_ -- extra time which will be added to execution
- _stack-size_ -- size of stack of executed program in kilobytes
- _memory_ -- overall memory limit for application in kilobytes
7 years ago
- _extra-memory_ -- memory limit which will be added to overall one, in kilobytes
- _parallel_ -- integral number of processes which can run
simultaneously, time and memory limits are merged from all
potential processes/threads, 0 for unlimited
- _disk-size_ -- size of all IO operations from/to files in
kilobytes
- _disk-files_ -- number of files which can be opened
- _environ-variable_ -- wrapper for map of environmental variables,
union with default worker configuration
- _bound-directories_ -- list of structures representing directories
which will be visible inside sandbox, union with default worker
configuration. Contains 3 suboptions: **src** -- source pointing
to actual system directory (absolute path), **dst** -- destination
inside sandbox which can have its own filesystem binding (absolute
8 years ago
path inside sandboxed directory structure) and **mode** --
determines connection mode of specified directory, one of values:
RW (allow read-write access), NOEXEC (disallow execution of
binaries), FS (mount device-less filesystem like `/proc`), MAYBE
(silently ignore the rule if the bound directory does not exist),
DEV (allow access to character and block devices).
## Configuration example
This configuration example serves only for demonstration purposes. Some items
can be omitted and defaults from worker configuration will be used.
```{.yml}
---
submission: # happy hippoes fence
job-id: hippoes
file-collector: http://localhost:9999/exercises
log: true
hw-groups:
- group1
tasks:
- task-id: "compilation"
priority: 2
fatal-failure: true
cmd:
bin: "/usr/bin/gcc"
args:
- "solution.c"
- "-o"
- "a.out"
sandbox:
name: "isolate"
limits:
- hw-group-id: group1
parallel: 0
- task-id: "fetch_test_1"
priority: 4
fatal-failure: false
dependencies:
- compilation
cmd:
bin: "fetch"
args:
- "1.in"
- "${SOURCE_DIR}/kuly.in"
- task-id: "evaluation_test_1"
priority: 5
fatal-failure: false
dependencies:
- fetch_test_1
cmd:
bin: "a.out"
sandbox:
name: "isolate"
limits:
- hw-group-id: group1
time: 0.5
memory: 8192
- task-id: "fetch_test_solution_1"
priority: 6
fatal-failure: false
dependencies:
- evaluation_test_1
cmd:
bin: "fetch"
args:
- "1.out"
- "${SOURCE_DIR}/1.out"
- task-id: "judging_test_1"
priority: 7
fatal-failure: false
dependencies:
- fetch_test_solution_1
cmd:
bin: "${JUDGES_DIR}/recodex-judge-normal"
args:
- "1.out"
- "plot.out"
sandbox:
name: "isolate"
limits:
- hw-group-id: group1
parallel: 0
- task-id: "rm_junk_test_1"
priority: 8
fatal-failure: false
dependencies:
- judging_test_1
cmd:
bin: "rm"
args:
- "${SOURCE_DIR}/kuly.in"
- "${SOURCE_DIR}/plot.out"
- "${SOURCE_DIR}/1.out"
...
```
## Results
Results of tasks are sent back as YAML file in compressed results archive. This
archive can contain additional files, such as job logging information and files
which were explicitly copied into results directory. Results file contains job
identification and results of individual tasks.
### Results items
- **job-id** -- identification of the job
- **hw-group** -- hardware group identifier of worker which performed the
evaluation
- _error_message_ -- present only if whole execution failed and none of tasks
were executed
- **results** -- list of tasks results
- **task-id** -- unique identification of a task in scope of this job
- **status** -- three states: OK (execution of task was successful;
sandboxed program could be killed, but sandbox exited normally), FAILED
(error while executing task), SKIPPED (execution of task was skipped)
- _error_message_ -- defined only in internal tasks on failure
- _output_ -- output from stdout and stderr of sandboxed program,
limitation on length defined in worker configuration
- _sandbox_results_ -- if defined than this task was external and was run in
sandbox
- **exitcode** -- exit code integer
- **time** -- time in which program exited in seconds
- **wall-time** -- wall time in seconds
- **memory** -- how much memory program used in kilobytes
- _max-rss_ -- maximum resident set size used in kilobytes (see manual
page of isolate)
- **status** -- two letter status code: OK (success), RE (runtime
error), SG (program died on signal), TO (timed out), XX (internal
error of the sandbox)
- _exitsig_ -- description of exit signal
- **killed** -- boolean determining if program exited correctly or was
killed
- _message_ -- status message on failure
### Results example
```{.yml}
---
job-id: 5
hw-group: "group1"
results:
- task-id: compile1
status: OK
sandbox_results:
exitcode: 0
time: 5
wall-time: 5
memory: 50000
max-rss: 50000
status: RE
exitsig: 1
killed: true
message: "Time limit exceeded"
- task-id: eval1
status: FAILED
error_message: "Task failed, something very bad happend!"
.
.
.
...
```
<!---
// vim: set formatoptions=tqn flp+=\\\|^\\*\\s* textwidth=80 colorcolumn=+1:
-->