You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

458 lines
26 KiB
Markdown

# Assignments
Assignments are programming tasks that can be tested and evaluated by a worker after user submits his/hers solution. An assignment is described by a YAML file that contains information on how to build, run and test it. One submitted assignment is called a (worker) job.
## Basics
Job is a set/list of tasks (it is generally a set, but order of tasks have some meaning). These tasks may have dependencies (arbitrary number), which needs to be observed. When worker processes a job, it creates a task graph, where tasks are vertices and dependencies are edges (A -> B means that the task A is on the dependency list of task B, so A must be run earlier) and creates its linear ordering. The graph must be acyclic (otherwise linear ordering will not exist) and the worker attempts to execute maximal number of tasks possible. Tasks without dependencies can be executed directly, other tasks are executed when all their dependencies have been successfully completed.
Tasks are executed sequentially -- by the linear ordering of the task graph. Parallel tasks (tasks, which are not directly dependent and thus their linear ordering may be arbitrary) are ordered first by their priority (higher number means higher priority) and secondly by their order in the configuration file. Priority is important for specifying evaluation flow. See sample picture for better understanding.
![Task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png)
Each task has a unique ID (alphanum string like _CompileA_, _RunAA_, or _JudgeAB_ in the picture). These IDs are used to identify tasks (for dependency references, in the log, ...). Numbers in bottom right corner are priorities of each task. Higher number means greater priority. It means, that if task _RunAA_ is done, next must be _JudgeAA_ and not _RunAB_ (that will be also valid linear ordering, but _RunAB_ has lower priority).
## Task
Task is an atomic piece of work executed by worker. There are two basic types of tasks:
- **Execute external process** (optionally inside Isolate). External processes are meant for compilation, testing, or execution of external judges. Linux default is mandatory usage of isolate sandbox, this option is present because of Windows, where is currently no sandbox available.
- **Perform internal operation**. Internal operations comprise commands, which are typically related to file/directory maintenance and other evaluation management stuff. Few important examples:
- Create/delete/move/rename file/directory
- (un)zip/tar/gzip/bzip file(s)
- fetch a file from the file repository (either from worker cache or download it by HTTP GET).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it is quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
A task may be marked as essential; in such case, failure will immediately cause termination of the whole job. Nice example usage is task with program compilation. Without success it is obvious, that the job is broken and every test will fail.
### Internal tasks
- **Archivate task** can be used for pack and compress a directory. Calling command is `archivate`. Requires two arguments:
- path and name of the directory to be archived
- path and name of the target archive. Only `.zip` format is supported.
- **Extract task** is opposite to archivate task. It can extract different types of archives. Supported formats are the same as supports `libarchive` library (see [libarchive wiki](https://github.com/libarchive/libarchive/wiki)), mainly `zip`, `tar`, `tar.gz`, `tar.bz2` and `7zip`. Please note, that system administrator may not install all packages needed, so some formats may not work. Please, consult your system administrator for more information. Archives could contain only regular files or directories (ie. no symlinks, block and character devices sockets or pipes allowed). Calling command is `extract` and requires two arguments:
- path and name of the archive to extract
- directory, where the archive will be extracted
- **Fetch task** will give you a file. It can be downloaded from remote file server or just copied from local cache if available. Calling comand is `fetch` with two arguments:
- name of the requested file without path (file sources are set up in worker configuratin file)
- path and name on the destination. Providing a different destination name can be used for easy rename.
- **Copy task** can copy files and directories. Detailed info can be found on reference page of [boost::filesystem::copy](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#copy). Calling command is `cp` and require two arguments:
- path and name of source target
- path and name of destination targer
- **Make directory task** can create arbitrary number of directories. Calling command is `mkdir` and requires at least one argument. For each provided argument will be called [boost::filesystem::create_directories](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#create_directories) command.
- **Rename task** will rename files and directories. Detailed bahavior can be found on reference page of [boost::filesystem::rename](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#rename). Calling command is `rename` and require two arguments:
- path and name of source target
- path and name of destination target
- **Remove task** is for deleting files and directories. Calling command is `rm` and require at least one argument. For each provided one will be called [boost::filesystem::remove_all](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#remove_all) command.
### External tasks
External tasks are arbitrary executables, typically ran inside isolate (with given parameters) and the worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else).
- **stdin** -- can be configured to read from existing file or from `/dev/null`.
- **stdout** and **stderr** -- can be individually redirected to a file or discarded. If this output options are specified, than it is possible to upload output files with results by copying them into result directory.
- **limits** -- task has time and memory limits; if these limits are exceeded, the task is failed.
The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and sent back to frontend application to address which was specified on input.
### Judges
Judges are treated as normal external commands, so there is no special task type for them. Binaries are installed alongside with worker executable in standard directories (on both Linux and Windows systems).
Judges should be used for comparision of outputted files from execution tasks and sample outputs fetched from fileserver. Results of this comparision should be at least information if files are same or not. Extension for this is percentual results based on correctness of given files. All of the judges results have to be printed to standard output.
All packed judges are adopted from old Codex with only very small modifications. ReCodEx judges base directory is in `${JUDGES_DIR}` variable, which can be used in job config file.
#### Judges interface
For future extensibility is critical that judges have some shared interface of calling and return values.
- Parameters: There are two mandatory positional parameters which has to be files for comparision
- Results:
- _comparison OK_
- exitcode: 0
- stdout: there is one line with double value which should be percentage of correctness of quality of two given files
- _error during execution_
- exitcode: 1
- stderr: there should be description of error
#### ReCodEx judges
Below is list of judges which are packed with ReCodEx project and comply above requirements.
- **recodex-judge-normal** is base judge used by most of exercises. This judge compares two text files. It compares only text tokens regardless on amount of whitespace between them.
```
Usage: recodex-judge-normal [-r | -n | -rn] <file1> <file2>
```
- file1 and file2 are paths to files that will be compared
- switch options `-r` and `-n` can be specified as a 1st optional argument.
- `-n` judge will treat newlines as ordinary whitespace (it will ignore line breaking)
- `-r` judge will treat tokens as real numbers and compares them accordingly (with some amount of error)
- **recodex-judge-filter** can be used for preprocess output files before real judging. This judge filters C-like comments from a text file. The comment starts with double slash sequence (`//`) and finishes with newline. If the comment takes whole line, then whole line is filtered.
```
Usage: recodex-judge-filter [inputFile [outputFile]]
```
- if `outputFile` is ommited, std. output is used instead.
- if both files are ommited, application uses std. input and output.
- **recodex-judge-shuffle** is for judging results with semantics of set, where ordering is not important. This judge compares two text files and returns 0 if they matches (and 1 otherwise). Two files are compared with no regards for whitespace (whitespace acts just like token delimiter).
```
Usage: recodex-judge-shuffle [-[n][i][r]] <file1> <file2>
```
- `-n` ignore newlines (newline is considered only a whitespace)
- `-i` ignore items order on the row (tokens on each row may be permutated)
- `-r` ignore order of rows (rows may be permutated); this option has no effect when `-n` is used
## Job configuration
Configuration of the job which is passed to worker is generated on demand by web API. Each job has unique one.
### Configuration items
Here is the list with description of allowed options. Mandatory items are bold, optional italic.
- **submission** -- information about this particular submission
- **job-id** -- textual ID which should be unique in whole recodex
- **language** -- no specific function, just for debugging and clarity
- **file-collector** -- address from which fetch tasks will download data
- _log_ -- default is false, can be omitted, determines whether job execution will be logged into one shared log
- **hw-groups** -- list of hardware groups for which are specified limits in this configuration
- **tasks** -- list (not map) of individual tasks
- **task-id** -- unique identifier of task in scope of one submission
- **priority** -- higher number, higher priority
- **fatal-failure** -- if true, than execution of whole job will be stopped after failing of this one
- _dependencies_ -- list of dependencies which have to be fulfilled before this task, can be omitted if there is no dependencies
- **cmd** -- description of command which will be executed
- **bin** -- the binary itself (full path of external command or name of internal task)
- _args_ -- list of arguments which will be sent into execution unit
- _test-id_ -- ID of the test this task is part of -- must be specified for tasks which the particular test's result depends on
- _type_ -- type of the task, can be omitted, default value is _inner_ -- possible values are: _inner_, _initiation_, _execution_, _evaluation_. Each logical test must contain 0 or more _initiation_ tasks, at least one task of type _execution_ (time and memory limits exceeded are presentet to user) and exactly one of type _evaluation_ (typicaly judge). _Inner_ task type is mainly for internal tasks, but can be used for external tasks, which are not part of any test.
- _sandbox_ -- wrapper for external tasks which will run in sandbox, if defined task is automatically external
- **name** -- name of used sandbox
- _stdin_ -- file to which standard input will be redirected, can be omitted
- _stdout_ -- file to which standard output will be redirected, can be omitted
- _stderr_ -- file to which error output will be redirected, can be omitted
- **limits** -- list of limits which can be passed to sandbox
- **hw-group-id** -- determines specific limits for specific machines
- _time_ -- time of execution in second
- _wall-time_ -- wall time in seconds
- _extra-time_ -- extra time which will be added to execution
- _stack-size_ -- size of stack of executed program in kilobytes
- _memory_ -- overall memory limit for application in kilobytes
- _parallel_ -- integral number of processes which can run simultaneously, time and memory limits are merged from all potential processes/threads
- _disk-size_ -- size of all IO operations from/to files in kilobytes
- _disk-files_ -- number of files which can be opened
- _environ-variable_ -- wrapper for map of environmental variables, union with default worker configuration
- _chdir_ -- this will be working directory of executed application
- _bound-directories_ -- list of structures representing directories which will be visible inside sandbox, union with default worker configuration. Contains 3 suboptions: **src** -- source pointing to actual system directory, **dst** -- destination inside sandbox which can have its own filesystem binding and **mode** -- determines connection mode of specified directory, one of values: RW (allow read-write access), NOEXEC (disallow execution of binaries), FS (mount device-less filesystem like `/proc`), MAYBE (silently ignore the rule if the bound directory does not exist), DEV (allow access to character and block devices).
### Configuration example
This configuration example is written in YAML and serves only for demonstration purposes. Some items can be omitted and defaults from worker configuration will be used.
```{.yml}
---
submission: # happy hippoes fence
job-id: hippoes
language: c
file-collector: http://localhost:9999/tasks
log: true
hw-groups:
- group1
tasks:
- task-id: "compilation"
priority: 2
fatal-failure: true
cmd:
bin: "/usr/bin/gcc"
args:
- "solution.c"
- "-o"
- "a.out"
sandbox:
name: "isolate"
limits:
- hw-group-id: group1
parallel: 0
chdir: ${EVAL_DIR}
bound-directories:
- src: ${SOURCE_DIR}
dst: ${EVAL_DIR}
mode: RW
- task-id: "fetch_test_1"
priority: 4
fatal-failure: false
dependencies:
- compilation
cmd:
bin: "fetch"
args:
- "1.in"
- "${SOURCE_DIR}/kuly.in"
- task-id: "evaluation_test_1"
priority: 5
fatal-failure: false
dependencies:
- fetch_test_1
cmd:
bin: "a.out"
sandbox:
name: "isolate"
limits:
- hw-group-id: group1
time: 0.5
memory: 8192
chdir: ${EVAL_DIR}
bound-directories:
- src: ${SOURCE_DIR}
dst: ${EVAL_DIR}
mode: RW
- task-id: "fetch_test_solution_1"
priority: 6
fatal-failure: false
dependencies:
- evaluation_test_1
cmd:
bin: "fetch"
args:
- "1.out"
- "${SOURCE_DIR}/1.out"
- task-id: "judging_test_1"
priority: 7
fatal-failure: false
dependencies:
- fetch_test_solution_1
cmd:
bin: "${JUDGES_DIR}/recodex-judge-normal"
args:
- "1.out"
- "plot.out"
sandbox:
name: "isolate"
limits:
- hw-group-id: group1
parallel: 0
chdir: ${EVAL_DIR}
bound-directories:
- src: ${SOURCE_DIR}
dst: ${EVAL_DIR}
mode: RW
- task-id: "rm_junk_test_1"
priority: 8
fatal-failure: false
dependencies:
- judging_test_1
cmd:
bin: "rm"
args:
- "${SOURCE_DIR}/kuly.in"
- "${SOURCE_DIR}/plot.out"
- "${SOURCE_DIR}/1.out"
...
```
## Job variables
Because frontend does not know which worker gets the job, its necessary to be a little general in configuration file. This means that some worker specific things has to be transparent. Good example of this is that some (evaluation) directories may be placed differently across all workers. To provide a solution, variables were established. There are of course some restrictions where variables can be used. Basically whenever filesystem paths can be used, variables can be used.
Usage of variables in configuration is simple and kind of shell-like. Name of variable is put inside braces which are preceded with dollar sign. Real usage is then something like this: ${VAR}. There should be no quotes or apostrophes around variable name, just simple text in braces. Parsing is simple and whenever there is dollar sign with braces job execution unit automatically assumes that this is a variable, so there is no chance to have this kind of substring anywhere else.
List of usable variables in job configuration:
- **WORKER_ID** -- integral identification of worker, unique on server
- **JOB_ID** -- identification of this job
- **SOURCE_DIR** -- directory where source codes of job are stored
- **EVAL_DIR** -- evaluation directory which should point inside sandbox. Note, that some existing directory must be bound inside sanbox under **EVAL_DIR** name using _bound-directories_ directive inside limits section.
- **RESULT_DIR** -- results from job can be copied here, but only with internal task
- **TEMP_DIR** -- general temp directory which is not dependent on operating system
- **JUDGES_DIR** -- directory in which judges are stored (outside sandbox)
## Directories and Files
For each job execution unique directory structure is created. Job is not restricted to use only specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside a job. DEFAULT variable represents root of working directory for all workers. This directory is specified worker's configuration and can be the same for multiple instances on the same server. No variable of this name is defined for use in job YAML configuration, it is used just for this example.
List of temporary files for job execution:
- **\${DEFAULT}/downloads/\${WORKER_ID}/\${JOB_ID}** -- where the downloaded archive is saved
- **\${DEFAULT}/submission/\${WORKER_ID}/\${JOB_ID}** -- decompressed submission is stored here
- **\${DEFAULT}/eval/\${WORKER_ID}/\${JOB_ID}** -- this directory is accessible in job configuration using variables and all execution should happen here
- **\${DEFAULT}/temp/\${WORKER_ID}/\${JOB_ID}** -- directory where all sort of temporary files can be stored
- **\${DEFAULT}/results/\${WORKER_ID}/\${JOB_ID}** -- again accessible directory from job configuration which is used to store all files which will be upload on fileserver, usually there will be only yaml result file and optionally log, every other file has to be copied here explicitly from job
## Results
Results of tasks are sent back in YAML format compressed into archive. This archive can contain further files, such as job logging information and files which were explicitly copied into results directory. Results file contains job identification and results of individual tasks.
### Results items
List of items from results file. Mandatory items are bold, optional ones italic.
- **job-id** -- identification of job to which this results belongs
- **hw-group** -- Hardware group identifier of worker which performed the evaluation
- _error_message_ -- present only if whole execution failed and none of tasks were executed
- **results** -- list of tasks results
- **task-id** -- unique identification of task in scope of this job
- **status** -- three states: OK (execution of task was successful; sandboxed program could be killed, but sandbox exited normally), FAILED (error while executing task), SKIPPED (execution of task was skipped)
- _error_message_ -- defined only in internal tasks on failure
- _sandbox_results_ -- if defined than this task was external and was run in sandbox
- **exitcode** -- integer which executed program gave on exit
- **time** -- time in seconds in which program exited
- **wall-time** -- wall time in seconds
- **memory** -- how much memory program used in kilobytes
- **max-rss** -- maximum resident set size used in kilobytes
- **status** -- two letter status code: OK (success), RE (runtime error), SG (program died on signal), TO (timed out), XX (internal error of the sandbox)
- **exitsig** -- description of exit signal
- **killed** -- boolean determining if program exited correctly or was killed
- **message** -- status message on failure
### Example result file
```{.yml}
---
job-id: 5
hw-group: "group1"
results:
- task-id: compile1
status: OK
sandbox_results:
exitcode: 0
time: 5
wall-time: 5
memory: 50000
max-rss: 50000
status: RE
exitsig: 1
killed: true
message: "Time limit exceeded"
- task-id: eval1
status: FAILED
error_message: "Task failed, something very bad happend!"
.
.
.
...
```
## Scoring
Every assignment consists of tasks. Only some tasks however are part of the evaluation. Those tasks are grouped into **tests**. Each task might have assigned a _test-id_ parameter, as described above. Every test must consist of at least one task: _evaluation_ by a judge. There can be zero or multiple tasks of _initiation_ and _execution_ type, _evaluation_ is exactly one. _Execution_ tasks retrieve information about the execution such as elapsed time and memory consumed, _evaluation_ provides a score -- float number between 0 and 1. _Initiation_ tasks are used for example for input data generation.
Total resulting score of the assignment submission is then calculated according to a supplied score config (described below). Total score is also a float between 0 and 1. This number is then multiplied by the maximum of points awarded for the assignment by the teacher assigning the exercise -- not the exercise author.
### Simple score calculation
First implemented calculator is simple score calculator with test weights. This calculator just looks at the score of each test and put them together according to the test weights specified in assignment configuration. Resulting score is calculated as a sum of products of score and weight of each test divided by the sum of all weights. The algorithm in Python would look something like this:
```
sum = 0
weightSum = 0
for t in tests:
sum += t.score * t.weight
weightSum += t.weight
score = sum / weightSum
```
Sample score config in YAML format:
```{.yml}
testWeights:
a: 300 # test with id 'a' has a weight of 300
b: 200
c: 100
d: 100
```
### Logs
During the execution tasks can use one shared log. There is no use for multiple logs, one per task for example, because of pretty small amount of information logged. By default logging is disabled, enabling can be done in job configuration.
After execution the log is packed with results into archive and sent back to fileserver. So the log can be found here for further processing.
## Case study
We present some of the courses that might use ReCodEx to evaluate homework
assignments and outline the setup of the evaluation with respect to the concept
of stages.
### Simple programming exercises
For example introductory programming courses such as Programming I or Java
programming.
In the simplest case we only need one stage that builds the program and passes
the test inputs to its standard input. Outputs are compared with the default judge.
### Compiler principles
This course uses multiple tools in a pipeline-like fashion -- for example `flex`
and `bison`.
We create a stage for each of the steps of this pipeline -- we run flex and test
the output, then we run bison on top of previous stage results and do the same. This is more advanced configuration and ReCodEx is specifically designed to support such evaluation pipeline.
### XML technologies
In this course, students choose a topic they model using XML -- for example a
library or a bulletin board. During the semester, they expand this project by
adding XSLT transformations, XQuery scripts, XPath queries, etc. These are
tested against fixed requirements (e.g. using some particular language
constructs).
This course already has a rather sophisticated application for testing homework
assignments, so we only include it for demonstration purposes.
Because every assignment focuses on a different technology, we would need a new
type of stage for each one. These stages would only run some checker programs
against the submitted sources (and possibly try to check their syntax etc.). ReCodEx is not primarily determined to perform static analysis, but surely it is also possible.
### Non-procedural programming
This course is different from other programming courses, because it only teaches
input/output manipulation by the end of the semester. In their assignments,
students are mostly required to write a function/predicate that behaves
according to a specification (e.g. appends an item at the end of a list).
Due to this, we need to take the function submitted by a student and combine it
with a snippet of code that reads the standard input and calls the submitted
function. This could be nicely achieved by setting the build command.
### Operating systems
The operating systems course requires students to work on a simple OS kernel
that is then run in a MIPS simulator called `msim`. There are various tests that
checks if the student's implementation of core OS mechanisms is correct. These
tests are compiled into the kernel.
Each of these tests could be represented by a stage that compiles the kernel
with the test and then runs it against different configurations of `msim`.