master
Martin Polanka 8 years ago
parent 73b5e69efe
commit e43f75195c

@ -6,7 +6,7 @@
![Overall Architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png)
**Web app** is main part of whole project from user point of view. It provides nice user interface and it's the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom fronends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate it's results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users.
**Web app** is main part of whole project from user point of view. It provides nice user interface and it's the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate it's results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users.
## Communication
@ -66,7 +66,7 @@ Broker acts as server when communicating with worker. Listening IP address and p
- FAILED - something bad happened and job was not executed at all
- UPLOADED - results are uploaded to fileserver
- STARTED - evaluation of tasks started
- ENDED - evaluation of tasks is finnished
- ENDED - evaluation of tasks is finished
- ABORTED - evaluation of job encountered internal error, job will be rescheduled to another worker
- FINISHED - whole execution is finished and worker ready for another job execution
- TASK - task state changed - see below
@ -107,14 +107,14 @@ Worker is communicating with file server only from _execution thread_ (see pictu
#### Worker side
Worker is cabable of 2 things - download file and upload file. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >= 400). Worker have list of credentials to all available file servers in it's config file.
Worker is cabable of 2 things - download file and upload file. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >= 400). Worker have list of credentials to all available file servers in its config file.
- download file - standard HTTP GET request to given URL expectingi file content as response
- download file - standard HTTP GET request to given URL expecting file content as response
- upload file - standard HTTP PUT request to given URL with file data as body - same as command line tool `curl` with option `--upload-file`
#### File server side
File server has it's own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server doesn't provide authentication or secured connection by itself, but it's supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers:
File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server doesn't provide authentication or secured connection by itself, but it's supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers:
- **GET /submission_archives/\<id\>.\<ext\>** - gets an archive with submitted source code and corresponding configuration of this job evaluation
- **GET /tasks/\<hash\>** - gets a file, common usage is for input files or reference result files
@ -153,7 +153,7 @@ Commands from broker to monitor:
### Broker - Web API communication
Broker communicates with main REST API through ZeroMQ connection over TCP. Socket
type on broker side is ROUTER, on frontend part it's REQ. Broker acts as a
type on broker side is ROUTER, on frontend part it's DEALER. Broker acts as a
server, its IP address and port is configurable in the API.
#### Commands from API to broker:
@ -168,7 +168,6 @@ server, its IP address and port is configurable in the API.
#### Commands from broker to API (all are responses to **eval** command):
- **ack** - this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request
- **accept** - broker is capable of routing request to a worker
- **reject** - broker can't handle this job (for example when the requirements
specified by the headers cannot be met). There are (rare) cases when the
@ -182,7 +181,7 @@ File server has a REST API for interaction with other parts of ReCodEx. Descript
- **GET /results/\<id\>.\<ext\>** - download archive with evaluated results of job _id_
- **POST /submissions/\<id\>** - upload new submission with identifier _id_. Expects that the body of the POST request uses file paths as keys and the content of the files as values. On successful upload returns JSON `{ "archive_path": <archive_url>, "result_path": <result_url> }` in response body. From _archive_path_ the submission can be downloaded (by worker) and corresponding evaluation results should be uploaded to _result_path_.
- **POST /tasks** - upload new files, which will be available by names eqal to `sha1sum` of their content. There can be uploaded more files at once. On successful upload returns JSON `{ "result": "OK", "files": <file_list> }` in response body, where _file_list_ is dictionary of original file name as key and new URL with already hashed name as value.
- **POST /tasks** - upload new files, which will be available by names equal to `sha1sum` of their content. There can be uploaded more files at once. On successful upload returns JSON `{ "result": "OK", "files": <file_list> }` in response body, where _file_list_ is dictionary of original file name as key and new URL with already hashed name as value.
There are no plans yet to support deleting files from this API. This may change in time.
@ -191,7 +190,7 @@ Web API calls these fileserver endpoints with standard HTTP requests. There are
### Monitor - Web app communication
Monitor interacts with web application through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be multiple listeners per channel, even (shortly) delayed connections will receive all messages from the very begining.
Monitor interacts with web application through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be multiple listeners per channel, even (shortly) delayed connections will receive all messages from the very beginning.
When monitor receives **progress** message from broker there are two options:
@ -205,7 +204,7 @@ Message JSON format is dictionary (associative array) with keys:
- FAILED - something bad happened and job was not executed at all
- UPLOADED - results are uploaded to fileserver
- STARTED - evaluation of tasks started
- ENDED - evaluation of tasks is finnished
- ENDED - evaluation of tasks is finished
- ABORTED - evaluation of job encountered internal error, job will be rescheduled to another worker
- FINISHED - whole execution is finished and worker ready for another job execution
- TASK - task state changed - see below
@ -213,17 +212,17 @@ Message JSON format is dictionary (associative array) with keys:
- **task_state** - state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED".
- COMPLETED - task was successfully executed without any error, subsequent task will be executed
- FAILED - task ended up with some error, subsequent task will be skipped
- SKIPPED - some of the previous dependencies failed to execute, so this task wont be executed at all
- SKIPPED - some of the previous dependencies failed to execute, so this task won't be executed at all
### Web app - Web API communication
Provided web application runs as javascript client inside user's browser. It communicates with REST API on the server through standard HTTP requests. Documentation of the main REST API is in separate document due to it's extensiveness. Results are returned as JSON payload, which is simply parsed in web application and presented to the users.
Provided web application runs as javascript client inside user's browser. It communicates with REST API on the server through standard HTTP requests. Documentation of the main REST API is in separate document due to its extensiveness. Results are returned as JSON payload, which is simply parsed in web application and presented to the users.
## Assignments
Assignments are programming tasks that can be tested and evaluated by a worker after user submits his solution. An assignment is described by a YAML file that contains information on how to build, run and test it. One submitted assignment is called a (worker) job.
Assignments are programming tasks that can be tested and evaluated by a worker after user submits his/hers solution. An assignment is described by a YAML file that contains information on how to build, run and test it. One submitted assignment is called a (worker) job.
### Basics
@ -233,11 +232,11 @@ Tasks are executed sequentially -- by the linear ordering of the task graph. Par
![Picture of task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png)
Each task has a unique ID (alphanum string like _CompileA_, _RunAA_, or _JudgeAB_ in the picture). These IDs are used to identify tasks (for dependency references, in the log, ...). Numbers in bottom right corner are priorities of each task. Higher number is greater priority. It means, that if task _RunAA_ is done, next must be _JudgeAA_ and not _RunAB_ (that will be also valid linear ordering, but _RunAB_ has lower priority).
Each task has a unique ID (alphanum string like _CompileA_, _RunAA_, or _JudgeAB_ in the picture). These IDs are used to identify tasks (for dependency references, in the log, ...). Numbers in bottom right corner are priorities of each task. Higher number means greater priority. It means, that if task _RunAA_ is done, next must be _JudgeAA_ and not _RunAB_ (that will be also valid linear ordering, but _RunAB_ has lower priority).
### Task
Task is an atomic piece of work executed by recodex-worker. There are two basic types of tasks:
Task is an atomic piece of work executed by worker. There are two basic types of tasks:
- **Execute external process** (optionally inside Isolate). External processes are meant for compilation, testing, or execution of external judges. Linux default is mandatory usage of isolate sandbox, this option is present because of Windows, where is currently no sandbox available.
- **Perform internal operation**. Internal operations comprise commands, which are typically related to file/directory maintenance and other evaluation management stuff. Few important examples:
@ -245,7 +244,7 @@ Task is an atomic piece of work executed by recodex-worker. There are two basic
- (un)zip/tar/gzip/bzip file(s)
- fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it is quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it's quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
#### Internal tasks
@ -290,8 +289,8 @@ Even though the internal operations may be handled by external executables (`mv`
External tasks are arbitrary executables, typically ran inside isolate (with given parameters) and the worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job.
- **stdin** - can be configured to read from existing file or from `/dev/null`.
- **stdout** and **stderr** - can be individually redirected to a file or discarded. If this output options are specified, than it is possible to upload output files with results by copying them in result directory.
- **limits** - task have time and memory limits; if these limits are exceeded, the task also fails.
- **stdout** and **stderr** - can be individually redirected to a file or discarded. If this output options are specified, than it's possible to upload output files with results by copying them into result directory.
- **limits** - task has time and memory limits; if these limits are exceeded, the task is failed.
The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and sent back to frontend application to address which was specified on input.
@ -364,7 +363,7 @@ Here is the list with description of allowed options. Mandatory items are bold,
- **file-collector** - address from which fetch tasks will download data
- _log_ - default is false, can be omitted, determines whether job execution will be logged into one shared log
- **tasks** - list (not map) of individual tasks
- **task-id** - unique indetifier of task in scope of one submission
- **task-id** - unique identifier of task in scope of one submission
- **priority** - higher number, higher priority
- **fatal-failure** - if true, than execution of whole job will be stopped after failing of this one
- _dependencies_ - list of dependencies which have to be fulfilled before this task, can be omitted if there is no dependencies
@ -386,16 +385,16 @@ Here is the list with description of allowed options. Mandatory items are bold,
- _stack-size_ - size of stack of executed program in kilobytes
- _memory_ - overall memory limit for application in kilobytes
- _parallel_ - integral number of processes which can run simultaneously, time and memory limits are merged from all potential processes/threads
- _disk-size_ - size of all io operations from/to files in kilobytes
- _disk-size_ - size of all IO operations from/to files in kilobytes
- _disk-files_ - number of files which can be opened
- _environ-variable_ - wrapper for map of environmental variables, union with default worker configuration
- _chdir_ - this will be working directory of executed application
- _bound-directories_ - list of structures reprezenting directories which will be visible inside sandbox, union with default worker configuration. Contains 3 suboptions: **src** - source pointing to actual system directory, **dst** - destination inside sandbox which can have its own filesystem binding and **mode** - determines connection mode of specified directory, one of values: RW, NOEXEC, FS, MAYBE, DEV
- _bound-directories_ - list of structures representing directories which will be visible inside sandbox, union with default worker configuration. Contains 3 suboptions: **src** - source pointing to actual system directory, **dst** - destination inside sandbox which can have its own filesystem binding and **mode** - determines connection mode of specified directory, one of values: RW, NOEXEC, FS, MAYBE, DEV
#### Configuration example
This configuration example is written in YAML and serves only for demostration purposes. Some items can be omitted and defaults from worker configuration will be used.
This configuration example is written in YAML and serves only for demonstration purposes. Some items can be omitted and defaults from worker configuration will be used.
```{.yml}
---
@ -501,7 +500,7 @@ tasks:
Because frontend does not know which worker gets the job, its necessary to be a little general in configuration file. This means that some worker specific things has to be transparent. Good example of this is that some (evaluation) directories may be placed differently across all workers. To provide a solution, variables were established. There are of course some restrictions where variables can be used. Basically whenever filesystem paths can be used, variables can be used.
Usage of variables in configuration is simple and kind of shell-like. Name of variable is put inside braces which are preceded with dollar sign. Real usage is than something like this: ${VAR}. There should be no quotes or apostrophies around variable name, just simple text in braces. Parsing is simple and whenever there is dollar sign with braces job execution unit automatically assumes that this is a variable, so there is no chance to have this kind of substring anywhere else.
Usage of variables in configuration is simple and kind of shell-like. Name of variable is put inside braces which are preceded with dollar sign. Real usage is then something like this: ${VAR}. There should be no quotes or apostrophes around variable name, just simple text in braces. Parsing is simple and whenever there is dollar sign with braces job execution unit automatically assumes that this is a variable, so there is no chance to have this kind of substring anywhere else.
List of usable variables in job configuration:
@ -580,7 +579,7 @@ results:
### Scoring
Every assignment consists of tasks. Only some tasks however are part of the evaluation. Those tasks are grouped into **tests**. Each task might be assigned a _test-id_ parameter, as described above. Every test must consist of at least two tasks: execution and evaluation by a judge. The former retrieves information about the execution such as elapsed time and memory consumed, the latter result with a score - float between 0 and 1. There may be more than one execution tasks, but evaluation task must be exactly one.
Every assignment consists of tasks. Only some tasks however are part of the evaluation. Those tasks are grouped into **tests**. Each task might have assigned a _test-id_ parameter, as described above. Every test must consist of at least two tasks: execution and evaluation by a judge. The former retrieves information about the execution such as elapsed time and memory consumed, the latter result with a score - float between 0 and 1. There may be more than one execution tasks, but evaluation task must be exactly one.
Total resulting score of the assignment submission is then calculated according to a supplied score config (described below). Total score is also a float between 0 and 1. This number is then multiplied by the maximum of points awarded for the assignment by the teacher assigning the exercise - not the exercise author.
@ -609,7 +608,7 @@ testWeights:
#### Logs
During the execution tasks can use one shared log. There is no use for multiple logs, one per task for example, because of pretty small amount of information loged. By default loging is disabled, enabling can be done in job configuration.
During the execution tasks can use one shared log. There is no use for multiple logs, one per task for example, because of pretty small amount of information logged. By default logging is disabled, enabling can be done in job configuration.
After execution the log is packed with results into archive and sent back to fileserver. So the log can be found here for further processing.
@ -679,7 +678,7 @@ This article will describe in detail execution flow of submission from the point
### Web Application
First thing user has to submit his/her solution to web application. Generally web application has to store submitted files and hand over all needed information about submission to broker. More detailed description follows:
First thing user has to submit his/hers solution to web application. Generally web application has to store submitted files and hand over all needed information about submission to broker. More detailed description follows:
- user submits his solution to web application
- T
@ -704,7 +703,7 @@ Broker gets information about new submission from web application. At this point
### Worker
Worker gets request from broker to evaluate particular submission. Next step is to evaluate given submission and results upload to fileserver. After this worker only send broker that submission was evaluated. More detailed description follows:
Worker gets request from broker to evaluate particular submission. Next step is to evaluate given submission and upload results to fileserver. After this worker only send broker that submission was evaluated. More detailed description follows:
- "listening" thread gets multipart message from `broker` with command "eval"
- "listening" thread hand over whole message through `inproc` socket to "execution" thread
@ -733,11 +732,11 @@ Worker gets request from broker to evaluate particular submission. Next step is
- of course there has to be cleaning after whole evaluation which will deinitialize all needed variables and also delete all used temporary folders
- all of previous was in "execution" thread which now have to tell "listening" thread that execution is done
- this is done through multipart message "done" with packed job identification addressed to "listening" thread
- action of "listening" is now pretty straightforward "done" message is resend to `broker`
- action of "listening" is now pretty straightforward "done" message is resent to `broker`
### Broker
Broker gets done message from worker and basically only mark submission as done in its internal structures. No messages are send to web application, because of lazy evaluation on frontend side. More detailed description follows:
Broker gets done message from worker and basically only mark submission as done in its internal structures. After that broker has to tell Web API that execution of particular job ended. More detailed description follows:
- broker gets "done" message from worker after successfull execution of job
- appropriate `worker` structure is found based on its identification
@ -747,11 +746,12 @@ Broker gets done message from worker and basically only mark submission as done
- after that only missing thing is to send that request to worker and loop back to worker execution
- if worker queue is empty then appropriate worker remains free and waiting for another execution request
// TODO: broker -> api job_done message
### Web Application
Only remaining part is evaluation of results. This is provided on demand when user wants them. Results are obtained from fileserver and evaluated. More detailed description follows:
Only remaining part is evaluation of results. Results are obtained from fileserver and evaluated. More detailed description follows:
- evaluation of execution results is provided on user demand
- T
- O
- D
@ -765,9 +765,9 @@ Only remaining part is evaluation of results. This is provided on demand when us
Installation of whole ReCodEx solution is a very complex process. It's recommended to have good unix skills with basic knowledge of project architecture.
There are a lot of different GNU/Linux distributions with different package management, naming convenction and version release policies. So it's impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief informations about installation in every project component's own chapter.
There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it's impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter.
Distribution of our choice is CentOS, currently in version 7. It's a well known server distribution, derived from enterpreise distrubution from Red Hat, so it's very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer version of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain.
Distribution of our choice is CentOS, currently in version 7. It's a well known server distribution, derived from enterprise distrubution from Red Hat, so it's very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain.
The big rival of CentOS in server distributions field is Debian. We're running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It's easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`.
@ -805,9 +805,9 @@ Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic k
- _mysql_root_password_ -- Password of root user of MySQL database. Will be set after installation and saved to `/root/.my.cnf` file.
- _mysql_recodex_username_ -- MySQL username for ReCodEx API access.
- _mysql_recodex_password_ -- Password for the user above.
- _admin_email_: Email of administrator. Used when configuring Apache webserver.
- _recodex_hostname: Hostname where the API and web app will be accessible. For example "recodex.projekty.ms.mff.cuni.cz".
- _webapp_node_addr_ -- IP address of NodeJS server running web app. Defaults to "127.0.0.1" and should not be chnaged.
- _admin_email_ -- Email of administrator. Used when configuring Apache webserver.
- _recodex_hostname -- Hostname where the API and web app will be accessible. For example "recodex.projekty.ms.mff.cuni.cz".
- _webapp_node_addr_ -- IP address of NodeJS server running web app. Defaults to "127.0.0.1" and should not be changed.
- _webapp_node_port_ -- Port to above.
- _webapp_public_addr_ -- Public address, where web server for web app will listen. Defaults to "*".
- _webapp_public_port_ -- Port to above.
@ -815,8 +815,8 @@ Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic k
- _webapi_public_endpoint_ -- Public URL when the API will be running, for example "https://recodex.projekty.ms.mff.cuni.cz:4000/v1".
- _webapi_public_addr_ -- Public address, where web server for API will listen. Defaults to "*".
- _webapi_public_port_ -- Port to above.
- _webapi_firewall_ - Open port for API in firewall, values "yes" or "no".
- _database_firewall_ - Open port for database in firewall, values "yes" or "no".
- _webapi_firewall_ -- Open port for API in firewall, values "yes" or "no".
- _database_firewall_ -- Open port for database in firewall, values "yes" or "no".
- _broker_to_webapi_addr_ -- Address, where API can reach broker. Private one is recommended.
- _broker_to_webapi_port_ -- Port to above.
- _broker_firewall_api_ -- Open above port in firewall, "yes" or "no".
@ -862,7 +862,7 @@ One of the most important aspects of ReCodEx instance is security. It's crutial
- Secure MySQL installation. The installation script doesn't do any security actions, so please run at least `mysql_secure_installation` script on database computer.
- Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Don't forget to set up automatic renewing!
- Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it's better keep them isolated.
- Keep your saver updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf).
- Don't put actualy used credentials on web, for example don't commit your passwords (in Ansible variables file) on GitHub.
- Regullary check logs for anomalies.
- Keep your server updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf).
- Don't put actually used credentials on web, for example don't commit your passwords (in Ansible variables file) on GitHub.
- Regularly check logs for anomalies.

Loading…
Cancel
Save