From 1f67a55f71814fae5c41cbbeedbc0d9ca8a977b7 Mon Sep 17 00:00:00 2001 From: Petr Stefan Date: Fri, 20 Jan 2017 22:08:07 +0100 Subject: [PATCH] Shuffle some chapters --- Job-configuration.md | 270 ++++++++++++++++++++++++++++--------------- Rewritten-docs.md | 234 ++++++++++--------------------------- 2 files changed, 239 insertions(+), 265 deletions(-) diff --git a/Job-configuration.md b/Job-configuration.md index 5e3a77e..00242d4 100644 --- a/Job-configuration.md +++ b/Job-configuration.md @@ -2,9 +2,183 @@ Following description will cover configuration as seen from API point of view, worker may have some optional items mandatory (they are filled by API -automatically). Bold items in lists describing the values are mandatory, italic +automatically). Bold items in lists describing the values are mandatory, italic ones are optional. +## Variables + +Because frontend does not know which worker gets the job, its necessary to be a +little general in configuration file. This means that some worker specific +things has to be transparent. Good example of this is that some (evaluation) +directories may be placed differently across all workers. To provide a solution, +variables were established. There are of course some restrictions where +variables can be used. Basically whenever filesystem paths can be used, +variables can be used. + +Usage of variables in configuration is simple and kind of shell-like. Name of +variable is put inside braces which are preceded with dollar sign. Real usage is +then something like this: ${VAR}. There should be no quotes or apostrophes +around variable name, just simple text in braces. Parsing is simple and whenever +there is dollar sign with braces job execution unit automatically assumes that +this is a variable, so there is no chance to have this kind of substring +anywhere else. + +List of usable variables in job configuration: + +- **WORKER_ID** -- integral identification of worker, unique on server +- **JOB_ID** -- identification of this job +- **SOURCE_DIR** -- directory where source codes of job are stored +- **EVAL_DIR** -- evaluation directory which should point inside sandbox. Note, + that some existing directory must be bound inside sanbox under **EVAL_DIR** + name using _bound-directories_ directive inside limits section. +- **RESULT_DIR** -- results from job can be copied here, but only with internal + task +- **TEMP_DIR** -- general temp directory which is not dependent on operating + system +- **JUDGES_DIR** -- directory in which judges are stored (outside sandbox) + +## Tasks + +Task is an atomic piece of work executed by worker. There are two basic types of +tasks: + +- **Execute external process** (optionally inside Isolate). External processes + are meant for compilation, testing, or execution of external judges. Linux + default is mandatory usage of isolate sandbox, this option is present because + of Windows, where is currently no sandbox available. +- **Perform internal operation**. Internal operations comprise commands, which + are typically related to file/directory maintenance and other evaluation + management stuff. Few important examples: + - Create/delete/move/rename file/directory + - (un)zip/tar/gzip/bzip file(s) + - fetch a file from the file repository (either from worker cache or + download it by HTTP GET). + +Even though the internal operations may be handled by external executables +(`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the +worker as it would simplify these operations and their portability among +platforms. Furthermore, it is quite easy to implement them using common +libraries (e.g., _zlib_, _curl_). + +A task may be marked as essential; in such case, failure will immediately cause +termination of the whole job. Nice example usage is task with program +compilation. Without success it is obvious, that the job is broken and every +test will fail. + +### Internal tasks + +- **Archivate task** can be used for pack and compress a directory. Calling + command is `archivate`. Requires two arguments: + - path and name of the directory to be archived + - path and name of the target archive. Only `.zip` format is supported. +- **Extract task** is opposite to archivate task. It can extract different types + of archives. Supported formats are the same as supports `libarchive` library + (see [libarchive wiki](https://github.com/libarchive/libarchive/wiki)), mainly + `zip`, `tar`, `tar.gz`, `tar.bz2` and `7zip`. Please note, that system + administrator may not install all packages needed, so some formats may not + work. Please, consult your system administrator for more information. Archives + could contain only regular files or directories (ie. no symlinks, block and + character devices sockets or pipes allowed). Calling command is `extract` and + requires two arguments: + - path and name of the archive to extract + - directory, where the archive will be extracted +- **Fetch task** will give you a file. It can be downloaded from remote file + server or just copied from local cache if available. Calling comand is + `fetch` with two arguments: + - name of the requested file without path (file sources are set up in worker + configuratin file) + - path and name on the destination. Providing a different destination name + can be used for easy rename. +- **Copy task** can copy files and directories. Detailed info can be found on + reference page of + [boost::filesystem::copy](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#copy). + Calling command is `cp` and require two arguments: + - path and name of source target + - path and name of destination targer +- **Make directory task** can create arbitrary number of directories. Calling + command is `mkdir` and requires at least one argument. For each provided + argument will be called + [boost::filesystem::create_directories](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#create_directories) + command. +- **Rename task** will rename files and directories. Detailed bahavior can be + found on reference page of + [boost::filesystem::rename](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#rename). + Calling command is `rename` and require two arguments: + - path and name of source target + - path and name of destination target +- **Remove task** is for deleting files and directories. Calling command is `rm` + and require at least one argument. For each provided one will be called + [boost::filesystem::remove_all](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#remove_all) + command. + +### External tasks + +External tasks are arbitrary executables, typically ran inside isolate (with +given parameters) and the worker waits until they finish. The exit code +determines, whether the task succeeded (0) or failed (anything else). + +- **stdin** -- can be configured to read from existing file or from `/dev/null`. +- **stdout** and **stderr** -- can be individually redirected to a file or + discarded. If this output options are specified, than it is possible to upload + output files with results by copying them into result directory. +- **limits** -- task has time and memory limits; if these limits are exceeded, + the task is failed. + +The task results (exit code, time, and memory consumption, etc.) are saved into +result yaml file and sent back to frontend application to address which was +specified on input. + +## Judges + +Judges are treated as normal external commands, so there is no special task type +for them. Binaries are installed alongside with worker executable in standard +directories (on both Linux and Windows systems). + +Judges should be used for comparision of outputted files from execution tasks +and sample outputs fetched from fileserver. Results of this comparision should +be at least information if files are same or not. Extension for this is +percentual results based on correctness of given files. All of the judges +results have to be printed to standard output. + +All packed judges are adopted from old CodEx with only very small modifications. +ReCodEx judges base directory is in `${JUDGES_DIR}` variable, which can be used +in job config file. + +- **recodex-judge-normal** is base judge used by most of exercises. This judge + compares two text files. It compares only text tokens regardless on amount of + whitespace between them. + ``` + Usage: recodex-judge-normal [-r | -n | -rn] + ``` + - file1 and file2 are paths to files that will be compared + - switch options `-r` and `-n` can be specified as a 1st optional argument. + - `-n` judge will treat newlines as ordinary whitespace (it will ignore + line breaking) + - `-r` judge will treat tokens as real numbers and compares them + accordingly (with some amount of error) + +- **recodex-judge-filter** can be used for preprocess output files before real + judging. This judge filters C-like comments from a text file. The comment + starts with double slash sequence (`//`) and finishes with newline. If the + comment takes whole line, then whole line is filtered. + ``` + Usage: recodex-judge-filter [inputFile [outputFile]] + ``` + - if `outputFile` is ommited, std. output is used instead. + - if both files are ommited, application uses std. input and output. + +- **recodex-judge-shuffle** is for judging results with semantics of set, where + ordering is not important. This judge compares two text files and returns 0 + if they matches (and 1 otherwise). Two files are compared with no regards for + whitespace (whitespace acts just like token delimiter). + ``` + Usage: recodex-judge-shuffle [-[n][i][r]] + ``` + - `-n` ignore newlines (newline is considered only a whitespace) + - `-i` ignore items order on the row (tokens on each row may be permutated) + - `-r` ignore order of rows (rows may be permutated); this option has no + effect when `-n` is used + ## Configuration items - **submission** -- information about this particular submission @@ -179,98 +353,6 @@ tasks: ... ``` - -## Job variables - -Because frontend does not know which worker gets the job, its necessary to be a -little general in configuration file. This means that some worker specific -things has to be transparent. Good example of this is that some (evaluation) -directories may be placed differently across all workers. To provide a solution, -variables were established. There are of course some restrictions where -variables can be used. Basically whenever filesystem paths can be used, -variables can be used. - -Usage of variables in configuration is simple and kind of shell-like. Name of -variable is put inside braces which are preceded with dollar sign. Real usage is -then something like this: ${VAR}. There should be no quotes or apostrophes -around variable name, just simple text in braces. Parsing is simple and whenever -there is dollar sign with braces job execution unit automatically assumes that -this is a variable, so there is no chance to have this kind of substring -anywhere else. - -List of usable variables in job configuration: - -- **WORKER_ID** -- integral identification of worker, unique on server -- **JOB_ID** -- identification of this job -- **SOURCE_DIR** -- directory where source codes of job are stored -- **EVAL_DIR** -- evaluation directory which should point inside sandbox. Note, - that some existing directory must be bound inside sanbox under **EVAL_DIR** - name using _bound-directories_ directive inside limits section. -- **RESULT_DIR** -- results from job can be copied here, but only with internal - task -- **TEMP_DIR** -- general temp directory which is not dependent on operating - system -- **JUDGES_DIR** -- directory in which judges are stored (outside sandbox) - -## Directories and Files - -For each job execution unique directory structure is created. Job is not -restricted to use only specified directories (tasks can do whatever is allowed -on system), but it is advised to use them inside a job. Following directories -are created under working directory of the worker for a job execution. This -directory is configurable and can be the same for multiple worker instances. - -- `downloads/${WORKER_ID}/${JOB_ID}` -- where the downloaded archive is saved -- `submission/${WORKER_ID}/${JOB_ID}` -- decompressed submission is stored here -- `eval/${WORKER_ID}/${JOB_ID}` -- this directory is accessible in job - configuration using variables and all execution should happen here -- `temp/${WORKER_ID}/${JOB_ID}` -- directory where all sort of temporary files - can be stored -- `results/${WORKER_ID}/${JOB_ID}` -- again accessible directory from job - configuration which is used to store all files which will be upload on - fileserver, usually there will be only yaml result file and optionally log, - every other file has to be copied here explicitly from job - -## ReCodEx judges - -Below is list of judges which are packed with ReCodEx project. - -- **recodex-judge-normal** is base judge used by most of exercises. This judge - compares two text files. It compares only text tokens regardless on amount of - whitespace between them. - ``` - Usage: recodex-judge-normal [-r | -n | -rn] - ``` - - file1 and file2 are paths to files that will be compared - - switch options `-r` and `-n` can be specified as a 1st optional argument. - - `-n` judge will treat newlines as ordinary whitespace (it will ignore - line breaking) - - `-r` judge will treat tokens as real numbers and compares them - accordingly (with some amount of error) - -- **recodex-judge-filter** can be used for preprocess output files before real - judging. This judge filters C-like comments from a text file. The comment - starts with double slash sequence (`//`) and finishes with newline. If the - comment takes whole line, then whole line is filtered. - ``` - Usage: recodex-judge-filter [inputFile [outputFile]] - ``` - - if `outputFile` is ommited, std. output is used instead. - - if both files are ommited, application uses std. input and output. - -- **recodex-judge-shuffle** is for judging results with semantics of set, where - ordering is not important. This judge compares two text files and returns 0 - if they matches (and 1 otherwise). Two files are compared with no regards for - whitespace (whitespace acts just like token delimiter). - ``` - Usage: recodex-judge-shuffle [-[n][i][r]] - ``` - - `-n` ignore newlines (newline is considered only a whitespace) - - `-i` ignore items order on the row (tokens on each row may be permutated) - - `-r` ignore order of rows (rows may be permutated); this option has no - effect when `-n` is used - - ## Results Results of tasks are sent back in YAML format compressed into archive. This @@ -306,7 +388,7 @@ identification and results of individual tasks. killed - **message** -- status message on failure -### Example result file +### Results example ```{.yml} --- diff --git a/Rewritten-docs.md b/Rewritten-docs.md index b9a79e2..529fea0 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -2567,155 +2567,10 @@ used. ``` - # Implementation -## The backend - -The backend is the part which is hidden to the user and which has only -one purpose: evaluate user’s solutions of their assignments. - -@todo: describe the configuration inputs of the Backend - -@todo: describe the outputs of the Backend - -@todo: describe how the backend receives the inputs and how it -communicates the results - -Whole backend is not just one service/component, it is quite complex system on its own. - -@todo: describe the inner parts of the Backend (and refer to the Wiki -for the technical description of the components) - -### Tasks - -Task is an atomic piece of work executed by worker. There are two basic types of -tasks: - -- **Execute external process** (optionally inside Isolate). External processes - are meant for compilation, testing, or execution of external judges. Linux - default is mandatory usage of isolate sandbox, this option is present because - of Windows, where is currently no sandbox available. -- **Perform internal operation**. Internal operations comprise commands, which - are typically related to file/directory maintenance and other evaluation - management stuff. Few important examples: - - Create/delete/move/rename file/directory - - (un)zip/tar/gzip/bzip file(s) - - fetch a file from the file repository (either from worker cache or - download it by HTTP GET). - -Even though the internal operations may be handled by external executables -(`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the -worker as it would simplify these operations and their portability among -platforms. Furthermore, it is quite easy to implement them using common -libraries (e.g., _zlib_, _curl_). - -A task may be marked as essential; in such case, failure will immediately cause -termination of the whole job. Nice example usage is task with program -compilation. Without success it is obvious, that the job is broken and every -test will fail. - -#### Internal tasks - -- **Archivate task** can be used for pack and compress a directory. Calling - command is `archivate`. Requires two arguments: - - path and name of the directory to be archived - - path and name of the target archive. Only `.zip` format is supported. -- **Extract task** is opposite to archivate task. It can extract different types - of archives. Supported formats are the same as supports `libarchive` library - (see [libarchive wiki](https://github.com/libarchive/libarchive/wiki)), mainly - `zip`, `tar`, `tar.gz`, `tar.bz2` and `7zip`. Please note, that system - administrator may not install all packages needed, so some formats may not - work. Please, consult your system administrator for more information. Archives - could contain only regular files or directories (ie. no symlinks, block and - character devices sockets or pipes allowed). Calling command is `extract` and - requires two arguments: - - path and name of the archive to extract - - directory, where the archive will be extracted -- **Fetch task** will give you a file. It can be downloaded from remote file - server or just copied from local cache if available. Calling comand is - `fetch` with two arguments: - - name of the requested file without path (file sources are set up in worker - configuratin file) - - path and name on the destination. Providing a different destination name - can be used for easy rename. -- **Copy task** can copy files and directories. Detailed info can be found on - reference page of - [boost::filesystem::copy](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#copy). - Calling command is `cp` and require two arguments: - - path and name of source target - - path and name of destination targer -- **Make directory task** can create arbitrary number of directories. Calling - command is `mkdir` and requires at least one argument. For each provided - argument will be called - [boost::filesystem::create_directories](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#create_directories) - command. -- **Rename task** will rename files and directories. Detailed bahavior can be - found on reference page of - [boost::filesystem::rename](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#rename). - Calling command is `rename` and require two arguments: - - path and name of source target - - path and name of destination target -- **Remove task** is for deleting files and directories. Calling command is `rm` - and require at least one argument. For each provided one will be called - [boost::filesystem::remove_all](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#remove_all) - command. - - -#### External tasks - -External tasks are arbitrary executables, typically ran inside isolate (with -given parameters) and the worker waits until they finish. The exit code -determines, whether the task succeeded (0) or failed (anything else). - -- **stdin** -- can be configured to read from existing file or from `/dev/null`. -- **stdout** and **stderr** -- can be individually redirected to a file or - discarded. If this output options are specified, than it is possible to upload - output files with results by copying them into result directory. -- **limits** -- task has time and memory limits; if these limits are exceeded, - the task is failed. - -The task results (exit code, time, and memory consumption, etc.) are saved into -result yaml file and sent back to frontend application to address which was -specified on input. - - -#### Judges - -Judges are treated as normal external commands, so there is no special task type -for them. Binaries are installed alongside with worker executable in standard -directories (on both Linux and Windows systems). - -Judges should be used for comparision of outputted files from execution tasks -and sample outputs fetched from fileserver. Results of this comparision should -be at least information if files are same or not. Extension for this is -percentual results based on correctness of given files. All of the judges -results have to be printed to standard output. - -All packed judges are adopted from old Codex with only very small modifications. -ReCodEx judges base directory is in `${JUDGES_DIR}` variable, which can be used -in job config file. - -##### Judges interface - -For future extensibility is critical that judges have some shared interface of -calling and return values. - -- Parameters: There are two mandatory positional parameters which has to be - files for comparision -- Results: - - _comparison OK_ - - exitcode: 0 - - stdout: there is one line with double value which should be set to 1.0 - - _comparison BAD_ - - exitcode: 1 - - stdout: there is one line with double value which should be percentage - of correctness of quality of two given files - - _error during execution_ - - exitcode: 2 - - stderr: there should be description of error -### Broker +## Broker @todo: gets stuff done, single point of failure and center point of ReCodEx universe @@ -2724,11 +2579,11 @@ calling and return values. - API notification using curl, authentication using HTTP Basic Auth - asynchronous resending progress messages -### Fileserver +## Fileserver @todo: stores particular data from frontend and backend, hashing, HTTP API -### Worker +## Worker @todo: describe a bit of internal structure in general - two threads @@ -2740,7 +2595,7 @@ calling and return values. @todo: describe how jobs are generally executed -#### Runtime environments +### Runtime environments ReCodEx is designed to utilize a rather diverse set of workers -- there can be differences in many aspects, such as the actual hardware running the worker @@ -2764,17 +2619,53 @@ However, limits can differ between runtime environments -- formally speaking, limits are a function of three arguments: an assignment, a hardware group and a runtime environment. -### Monitor +### Directories and files + +For each job execution unique directory structure is created. Job is not +restricted to use only specified directories (tasks can do whatever is allowed +on system), but it is advised to use them inside a job. Following directories +are created under working directory of the worker for a job execution. This +directory is configurable and can be the same for multiple worker instances. + +- `downloads/${WORKER_ID}/${JOB_ID}` -- where the downloaded archive is saved +- `submission/${WORKER_ID}/${JOB_ID}` -- decompressed submission is stored here +- `eval/${WORKER_ID}/${JOB_ID}` -- this directory is accessible in job + configuration using variables and all execution should happen here +- `temp/${WORKER_ID}/${JOB_ID}` -- directory where all sort of temporary files + can be stored +- `results/${WORKER_ID}/${JOB_ID}` -- again accessible directory from job + configuration which is used to store all files which will be upload on + fileserver, usually there will be only yaml result file and optionally log, + every other file has to be copied here explicitly from job + +### Judges interface + +For future extensibility is critical that judges have some shared interface of +calling and return values. + +- Parameters: There are two mandatory positional parameters which has to be + files for comparision +- Results: + - _comparison OK_ + - exitcode: 0 + - stdout: there is one line with double value which should be set to 1.0 + - _comparison BAD_ + - exitcode: 1 + - stdout: there is one line with double value which should be percentage + of correctness of quality of two given files + - _error during execution_ + - exitcode: 2 + - stderr: there should be description of error + +## Monitor @todo: not necessary component which can be omitted, proxy-like service -### Cleaner +## Cleaner @todo: if it is something what to say here -## The frontend - -### REST API +## REST API @todo: what to mention - basic - GET, POST, JSON, Header, ... @@ -2785,7 +2676,7 @@ runtime environment. - Automatic detection of the runtime environment - users must submit correctly named files, assuming the RTE from the extensions -#### Used technologies +### Used technologies @todo: PHP7 – how it is used for typehints, Nette framework – how it is used for routing, Presenters actions endpoints, exceptions and @@ -2795,7 +2686,7 @@ problem with the extension and how we reported it and how to treat it in the future when the bug is solved. Relational database – we use MariaDB, Doctine enables us to switch the engine to a different engine if needed -#### Data model +### Data model @todo: Describe the code-first approach using the Doctrine entities, how the entities map onto the database schema (refer to the attached schemas @@ -2809,7 +2700,7 @@ grouping of entities and how they are related: - submission + solution + reference solution + solution evaluation - comment threads + comments -#### Request handling +### Request handling A typical scenario for handling an API request is matching the HTTP request with a corresponding handler routine which creates a response object, that is then @@ -2829,11 +2720,11 @@ encoding. This method has to be called in every presenter action. An alternative approach would be using the internal payload object of the presenter, which is more convenient, but provides us with less control. -#### Authentication +### Authentication @todo -#### Permissions +### Permissions In a system storing user data has to be implemented some kind of permission checking. Each user has a role, which corresponds to his/her privileges. @@ -2856,7 +2747,7 @@ or two simple conditions. With this two concepts together it is possible to easily cover all cases of permission checking with quite a small amount of code. -#### Uploading files +### Uploading files There are two cases when users need to upload files using the API -- submitting solutions to an assignment and creating a new exercise. In both of these cases, @@ -2869,7 +2760,7 @@ Storing and removing files from the server is done through the `App\Helpers\UploadedFileStorage` class which maps the files to their records in the database using the `App\Model\Entity\UploadedFile` entity. -#### Forgotten password +### Forgotten password When user finds out that he/she does not remember a password, he/she requests a password reset and fills in his/her unique email. A temporary access token is @@ -2884,13 +2775,13 @@ and can be injected into any presenter. This solution is quite safe and user can handle it on its own, so administrator does not have to worry about it. -#### Job configuration parsing and modifying +### Job configuration parsing and modifying @todo how the YAML is parsed @todo how it can be changed and where it is used @todo how it can be stored to a new YAML -#### Solution loading +### Solution loading When a solution evaluation is finished by the backend, the results are saved to the fileserver and the API is notified by the broker. The results are parsed and @@ -2907,7 +2798,7 @@ we do not parse the results right away but we postpone this until the student save some resources when the solution results are not important (e.g., the student finds a bug in his solution before the submission has been evaluated). -##### Parsing of the results +#### Parsing of the results The results are stored in a YAML file. We map the contents of the file to the classes of the `App\Helpers\EvaluationResults` namespace. This process @@ -2915,12 +2806,12 @@ validates the file and gives us access to all of the information through an interface of a class and not only using associative arrays. This is very similar to how the job configuration files are processed. -### API endpoints +## API endpoints @todo: Tell the user about the generated API reference and how the Swagger UI can be used to access the API directly. -### Web application +## Web application @todo: what to mention: - used libraries, JSX, ... @@ -3151,11 +3042,12 @@ as a server, its IP address and port is configurable in the API. #### Asynchronous communication between broker and API -Only a fraction of the errors that can happen during evaluation can be detected -while there is a ZeroMQ connection between the API and broker. To notify the -frontend of the rest, the API exposes an endpoint for the broker for this purpose. -Broker uses this endpoint whenever the status of a job changes (it's finished, -it failed permanently, the only worker capable of processing it disconnected...). +Only a fraction of the errors that can happen during evaluation can be detected +while there is a ZeroMQ connection between the API and broker. To notify the +frontend of the rest, the API exposes an endpoint for the broker for this +purpose. Broker uses this endpoint whenever the status of a job changes +(it's finished, it failed permanently, the only worker capable of processing +it disconnected...). When a request for sending a report arrives from the backend then the type of the report is inferred and if it is an error which deserves attention of the