master
Martin Polanka 8 years ago
parent 0ca188ea50
commit 78fd3de0ca

@ -6,7 +6,7 @@
![Overall architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png) ![Overall architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png)
**Web app** is main part of whole project from user point of view. It provides nice user interface and it's the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate it's results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users. **Web app** is main part of whole project from user point of view. It provides nice user interface and it is the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate its results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users.
## Communication ## Communication
@ -18,7 +18,7 @@ Detailed communication inside the ReCodEx project is captured in the following i
### Broker - Worker communication ### Broker - Worker communication
Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on it's **init** command). Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on its **init** command).
#### Commands from broker to worker: #### Commands from broker to worker:
@ -48,8 +48,8 @@ Broker acts as server when communicating with worker. Listening IP address and p
- `description` -- a human readable description of the worker for - `description` -- a human readable description of the worker for
administrators (it will show up in broker logs) administrators (it will show up in broker logs)
- `current_job` -- an identifier of a job the worker is now processing. This - `current_job` -- an identifier of a job the worker is now processing. This
is useful when we're reassembling a connection to the broker and need it is useful when we are reassembling a connection to the broker and need it
to know the worker won't accept a new job. to know the worker will not accept a new job.
- **done** -- notifying of finished job. Contains following message frames: - **done** -- notifying of finished job. Contains following message frames:
- `job_id` -- identifier of finished job - `job_id` -- identifier of finished job
- `result` -- response result, possible values are: - `result` -- response result, possible values are:
@ -74,8 +74,8 @@ Broker acts as server when communicating with worker. Listening IP address and p
- `task_state` -- only present for "TASK" state -- result of task evaluation. One of: - `task_state` -- only present for "TASK" state -- result of task evaluation. One of:
- COMPLETED -- task was successfully executed without any error, subsequent task will be executed - COMPLETED -- task was successfully executed without any error, subsequent task will be executed
- FAILED -- task ended up with some error, subsequent task will be skipped - FAILED -- task ended up with some error, subsequent task will be skipped
- SKIPPED -- some of the previous dependencies failed to execute, so this task won't be executed at all - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all
- **ping** -- tell broker I'm alive, no arguments - **ping** -- tell broker I am alive, no arguments
#### Heartbeating #### Heartbeating
@ -98,12 +98,12 @@ workers.
If a worker thinks the broker is dead, it tries to reconnect with a bounded, If a worker thinks the broker is dead, it tries to reconnect with a bounded,
exponentially increasing delay. exponentially increasing delay.
This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages doesn't flood the network when there are problems. We experienced no issues since we're using this protocol. This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages does not flood the network when there are problems. We experienced no issues since we are using this protocol.
### Worker - File Server communication ### Worker - File Server communication
Worker is communicating with file server only from _execution thread_ (see picture above). Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you haven't one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request. Worker is communicating with file server only from _execution thread_ (see picture above). Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you have not one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request.
#### Worker side #### Worker side
@ -114,7 +114,7 @@ Worker is cabable of 2 things -- download file and upload file. Internally, work
#### File server side #### File server side
File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server doesn't provide authentication or secured connection by itself, but it's supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers: File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server does not provide authentication or secured connection by itself, but it is supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers:
- **GET /submission_archives/\<id\>.\<ext\>** -- gets an archive with submitted source code and corresponding configuration of this job evaluation - **GET /submission_archives/\<id\>.\<ext\>** -- gets an archive with submitted source code and corresponding configuration of this job evaluation
- **GET /tasks/\<hash\>** -- gets a file, common usage is for input files or reference result files - **GET /tasks/\<hash\>** -- gets a file, common usage is for input files or reference result files
@ -134,7 +134,7 @@ documentation for more info.
Note that the monitor is designed so that it can receive data both from the Note that the monitor is designed so that it can receive data both from the
broker and workers. The current architecture prefers the broker to do all the broker and workers. The current architecture prefers the broker to do all the
communication so that the workers don't have to know too many network services. communication so that the workers do not have to know too many network services.
Monitor is treated as a somewhat optional part of whole solution, so no special Monitor is treated as a somewhat optional part of whole solution, so no special
effort on communication realibility was made. effort on communication realibility was made.
@ -153,7 +153,7 @@ Commands from broker to monitor:
### Broker - Web API communication ### Broker - Web API communication
Broker communicates with main REST API through ZeroMQ connection over TCP. Socket Broker communicates with main REST API through ZeroMQ connection over TCP. Socket
type on broker side is ROUTER, on frontend part it's DEALER. Broker acts as a type on broker side is ROUTER, on frontend part it is DEALER. Broker acts as a
server, its IP address and port is configurable in the API. server, its IP address and port is configurable in the API.
#### Commands from API to broker: #### Commands from API to broker:
@ -169,9 +169,9 @@ server, its IP address and port is configurable in the API.
- **ack** -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request - **ack** -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request
- **accept** -- broker is capable of routing request to a worker - **accept** -- broker is capable of routing request to a worker
- **reject** -- broker can't handle this job (for example when the requirements - **reject** -- broker cannot handle this job (for example when the requirements
specified by the headers cannot be met). There are (rare) cases when the specified by the headers cannot be met). There are (rare) cases when the
broker finds that it cannot handle the job after it's been confirmed. In such broker finds that it cannot handle the job after it was confirmed. In such
cases it uses the frontend REST API to mark the job as failed. cases it uses the frontend REST API to mark the job as failed.
@ -212,7 +212,7 @@ Message JSON format is dictionary (associative array) with keys:
- **task_state** -- state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED". - **task_state** -- state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED".
- COMPLETED -- task was successfully executed without any error, subsequent task will be executed - COMPLETED -- task was successfully executed without any error, subsequent task will be executed
- FAILED -- task ended up with some error, subsequent task will be skipped - FAILED -- task ended up with some error, subsequent task will be skipped
- SKIPPED -- some of the previous dependencies failed to execute, so this task won't be executed at all - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all
### Web app - Web API communication ### Web app - Web API communication
@ -244,7 +244,7 @@ Task is an atomic piece of work executed by worker. There are two basic types of
- (un)zip/tar/gzip/bzip file(s) - (un)zip/tar/gzip/bzip file(s)
- fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP). - fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it's quite easy to implement them using common libraries (e.g., _zlib_, _curl_). Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it is quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
#### Internal tasks #### Internal tasks
@ -289,7 +289,7 @@ Even though the internal operations may be handled by external executables (`mv`
External tasks are arbitrary executables, typically ran inside isolate (with given parameters) and the worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job. External tasks are arbitrary executables, typically ran inside isolate (with given parameters) and the worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job.
- **stdin** -- can be configured to read from existing file or from `/dev/null`. - **stdin** -- can be configured to read from existing file or from `/dev/null`.
- **stdout** and **stderr** -- can be individually redirected to a file or discarded. If this output options are specified, than it's possible to upload output files with results by copying them into result directory. - **stdout** and **stderr** -- can be individually redirected to a file or discarded. If this output options are specified, than it is possible to upload output files with results by copying them into result directory.
- **limits** -- task has time and memory limits; if these limits are exceeded, the task is failed. - **limits** -- task has time and memory limits; if these limits are exceeded, the task is failed.
The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and sent back to frontend application to address which was specified on input. The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and sent back to frontend application to address which was specified on input.
@ -517,7 +517,7 @@ List of usable variables in job configuration:
### Directories and Files ### Directories and Files
For each job execution unique directory structure is created. Job is not restricted to use only specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside a job. DEFAULT variable represents worker's working directory specified in its configuration. No variable of this name is defined for use in job YAML configuration, it's used just for this example. For each job execution unique directory structure is created. Job is not restricted to use only specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside a job. DEFAULT variable represents worker's working directory specified in its configuration. No variable of this name is defined for use in job YAML configuration, it is used just for this example.
List of temporary files for job execution: List of temporary files for job execution:
@ -651,7 +651,7 @@ assignments, so we only include it for demonstration purposes.
Because every assignment focuses on a different technology, we would need a new Because every assignment focuses on a different technology, we would need a new
type of stage for each one. These stages would only run some checker programs type of stage for each one. These stages would only run some checker programs
against the submitted sources (and possibly try to check their syntax etc.). ReCodEx is not primarily determined to perform static analysis, but surely it's also possible. against the submitted sources (and possibly try to check their syntax etc.). ReCodEx is not primarily determined to perform static analysis, but surely it is also possible.
#### Non-procedural programming #### Non-procedural programming
@ -720,12 +720,12 @@ Worker gets request from broker to evaluate particular submission. Next step is
- "listening" thread hand over whole message through `inproc` socket to "execution" thread - "listening" thread hand over whole message through `inproc` socket to "execution" thread
- "execution" thread now has to prepare all things and get ready for execution - "execution" thread now has to prepare all things and get ready for execution
- temporary folders names are initated (but not created) this includes folder with source files, folder with downloaded submission, temporary directory for all possible types of files and folder which will contain results from execution - temporary folders names are initated (but not created) this includes folder with source files, folder with downloaded submission, temporary directory for all possible types of files and folder which will contain results from execution
- if some of the above stated folders is already existing, then it's deleted - if some of the above stated folders is already existing, then it is deleted
- after successfull initiation submission archive is downloaded to created folder - after successfull initiation submission archive is downloaded to created folder
- submission archive is decompressed into submission files folder - submission archive is decompressed into submission files folder
- all files from decompressed archive are copied into evaluation directory which can be used for execution in sandboxes - all files from decompressed archive are copied into evaluation directory which can be used for execution in sandboxes
- all other folders which were not created are created just now - all other folders which were not created are created just now
- it's time to build `job` from configuration - it is time to build `job` from configuration
- job configuration file is located in evaluation directory if exists and is loaded using `yaml-cpp` library - job configuration file is located in evaluation directory if exists and is loaded using `yaml-cpp` library
- loaded configuration is now parsed into `job_metadata` structure which is handed over to `job` execution class itself - loaded configuration is now parsed into `job_metadata` structure which is handed over to `job` execution class itself
- `job` execution class will now initialize and construct particular `tasks` from `job_metadata` into task tree - `job` execution class will now initialize and construct particular `tasks` from `job_metadata` into task tree
@ -804,25 +804,25 @@ Web Application has only a simple work to do. If results is obtained on demand t
## Installation ## Installation
Installation of whole ReCodEx solution is a very complex process. It's recommended to have good unix skills with basic knowledge of project architecture. Installation of whole ReCodEx solution is a very complex process. It is recommended to have good unix skills with basic knowledge of project architecture.
There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it's impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter. There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it is impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter.
Distribution of our choice is CentOS, currently in version 7. It's a well known server distribution, derived from enterprise distrubution from Red Hat, so it's very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain. Distribution of our choice is CentOS, currently in version 7. It is a well known server distribution, derived from enterprise distrubution from Red Hat, so it is very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain.
The big rival of CentOS in server distributions field is Debian. We're running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It's easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`. The big rival of CentOS in server distributions field is Debian. We are running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It is easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`.
Some components are also capable of running in Windows environment. However setting up Windows OS is a little bit of pain and it's not supposed to run ReCodEx in this way. Only worker component may be needed to run on Windows, so we're providing clickable installer including dependencies. Just for info, all components should be able to run on Windows, only broker was not tested and may require small tweaks to properly work. Some components are also capable of running in Windows environment. However setting up Windows OS is a little bit of pain and it is not supposed to run ReCodEx in this way. Only worker component may be needed to run on Windows, so we are providing clickable installer including dependencies. Just for info, all components should be able to run on Windows, only broker was not tested and may require small tweaks to properly work.
### Ansible installer ### Ansible installer
For automatic installation is used set of Ansible scripts. Ansible is one of the best known and used tools for automatic server management. It's required only to have SSH access to the server and ansible installed on the client machine. For further reading is supposed basic Ansible knowledge. For more info check their [documentation](http://docs.ansible.com/ansible/intro.html). For automatic installation is used set of Ansible scripts. Ansible is one of the best known and used tools for automatic server management. It is required only to have SSH access to the server and ansible installed on the client machine. For further reading is supposed basic Ansible knowledge. For more info check their [documentation](http://docs.ansible.com/ansible/intro.html).
All Ansible scripts are located in _utils_ repository, _installation_ [directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible files are pretty self-describing, they can be also use as template for installation to different systems. Before installation itself it's required to edit two files -- set addresses of hosts and values of some variables. All Ansible scripts are located in _utils_ repository, _installation_ [directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible files are pretty self-describing, they can be also use as template for installation to different systems. Before installation itself it is required to edit two files -- set addresses of hosts and values of some variables.
#### Hosts configuration #### Hosts configuration
First, it's needed to set ip addresses of your computers. Common practise is to have multiple files with definitions, one for development, another for production for example. Example configuration is in _development_ file. Each component of ReCodEx project can be installed on different server. Hosts can be specified as hostnames or ip addresses, optionally with port of SSH after colon. First, it is needed to set ip addresses of your computers. Common practise is to have multiple files with definitions, one for development, another for production for example. Example configuration is in _development_ file. Each component of ReCodEx project can be installed on different server. Hosts can be specified as hostnames or ip addresses, optionally with port of SSH after colon.
Shorten example of hosts config: Shorten example of hosts config:
@ -885,25 +885,25 @@ Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic k
#### Installation itself #### Installation itself
With your computers installed with CentOS and configuration modified it's time to run the installation. With your computers installed with CentOS and configuration modified it is time to run the installation.
``` ```
$ ansible-playbook -i development recodex.yml $ ansible-playbook -i development recodex.yml
``` ```
This command installs all components of ReCodEx onto machines listed in _development_ file. It's possible to install only specified parts of project, just use component's YAML file instead of _recodex.yml_. This command installs all components of ReCodEx onto machines listed in _development_ file. It is possible to install only specified parts of project, just use component's YAML file instead of _recodex.yml_.
Ansible expects to have password-less access to the remote machines. If you haven't such setup, use options `--ask-pass` and `--ask-become-pass`. Ansible expects to have password-less access to the remote machines. If you have not such setup, use options `--ask-pass` and `--ask-become-pass`.
### Security ### Security
One of the most important aspects of ReCodEx instance is security. It's crutial to keep gathered data safe and not to allow unauthorized users modify restricted pieces of information. Here is a small list of recommendations to keep running ReCodEx instance safe. One of the most important aspects of ReCodEx instance is security. It is crucial to keep gathered data safe and not to allow unauthorized users modify restricted pieces of information. Here is a small list of recommendations to keep running ReCodEx instance safe.
- Secure MySQL installation. The installation script doesn't do any security actions, so please run at least `mysql_secure_installation` script on database computer. - Secure MySQL installation. The installation script does not do any security actions, so please run at least `mysql_secure_installation` script on database computer.
- Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Don't forget to set up automatic renewing! - Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Do not forget to set up automatic renewing!
- Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it's better keep them isolated. - Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it is better keep them isolated.
- Keep your server updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf). - Keep your server updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf).
- Don't put actually used credentials on web, for example don't commit your passwords (in Ansible variables file) on GitHub. - Do not put actually used credentials on web, for example do not commit your passwords (in Ansible variables file) on GitHub.
- Regularly check logs for anomalies. - Regularly check logs for anomalies.

Loading…
Cancel
Save