From 78fd3de0ca902cda5ae3428707558e18c05fb605 Mon Sep 17 00:00:00 2001 From: Martin Polanka Date: Sun, 6 Nov 2016 11:12:47 +0100 Subject: [PATCH] typos --- Overall-architecture.md | 72 ++++++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/Overall-architecture.md b/Overall-architecture.md index 3eb8ec4..e587f97 100644 --- a/Overall-architecture.md +++ b/Overall-architecture.md @@ -6,7 +6,7 @@ ![Overall architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png) -**Web app** is main part of whole project from user point of view. It provides nice user interface and it's the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate it's results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users. +**Web app** is main part of whole project from user point of view. It provides nice user interface and it is the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate its results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users. ## Communication @@ -18,7 +18,7 @@ Detailed communication inside the ReCodEx project is captured in the following i ### Broker - Worker communication -Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on it's **init** command). +Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on its **init** command). #### Commands from broker to worker: @@ -48,8 +48,8 @@ Broker acts as server when communicating with worker. Listening IP address and p - `description` -- a human readable description of the worker for administrators (it will show up in broker logs) - `current_job` -- an identifier of a job the worker is now processing. This - is useful when we're reassembling a connection to the broker and need it - to know the worker won't accept a new job. + is useful when we are reassembling a connection to the broker and need it + to know the worker will not accept a new job. - **done** -- notifying of finished job. Contains following message frames: - `job_id` -- identifier of finished job - `result` -- response result, possible values are: @@ -74,8 +74,8 @@ Broker acts as server when communicating with worker. Listening IP address and p - `task_state` -- only present for "TASK" state -- result of task evaluation. One of: - COMPLETED -- task was successfully executed without any error, subsequent task will be executed - FAILED -- task ended up with some error, subsequent task will be skipped - - SKIPPED -- some of the previous dependencies failed to execute, so this task won't be executed at all -- **ping** -- tell broker I'm alive, no arguments + - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all +- **ping** -- tell broker I am alive, no arguments #### Heartbeating @@ -98,12 +98,12 @@ workers. If a worker thinks the broker is dead, it tries to reconnect with a bounded, exponentially increasing delay. -This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages doesn't flood the network when there are problems. We experienced no issues since we're using this protocol. +This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages does not flood the network when there are problems. We experienced no issues since we are using this protocol. ### Worker - File Server communication -Worker is communicating with file server only from _execution thread_ (see picture above). Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you haven't one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request. +Worker is communicating with file server only from _execution thread_ (see picture above). Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you have not one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request. #### Worker side @@ -114,7 +114,7 @@ Worker is cabable of 2 things -- download file and upload file. Internally, work #### File server side -File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server doesn't provide authentication or secured connection by itself, but it's supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers: +File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server does not provide authentication or secured connection by itself, but it is supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers: - **GET /submission_archives/\.\** -- gets an archive with submitted source code and corresponding configuration of this job evaluation - **GET /tasks/\** -- gets a file, common usage is for input files or reference result files @@ -134,7 +134,7 @@ documentation for more info. Note that the monitor is designed so that it can receive data both from the broker and workers. The current architecture prefers the broker to do all the -communication so that the workers don't have to know too many network services. +communication so that the workers do not have to know too many network services. Monitor is treated as a somewhat optional part of whole solution, so no special effort on communication realibility was made. @@ -153,7 +153,7 @@ Commands from broker to monitor: ### Broker - Web API communication Broker communicates with main REST API through ZeroMQ connection over TCP. Socket -type on broker side is ROUTER, on frontend part it's DEALER. Broker acts as a +type on broker side is ROUTER, on frontend part it is DEALER. Broker acts as a server, its IP address and port is configurable in the API. #### Commands from API to broker: @@ -169,9 +169,9 @@ server, its IP address and port is configurable in the API. - **ack** -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request - **accept** -- broker is capable of routing request to a worker -- **reject** -- broker can't handle this job (for example when the requirements +- **reject** -- broker cannot handle this job (for example when the requirements specified by the headers cannot be met). There are (rare) cases when the - broker finds that it cannot handle the job after it's been confirmed. In such + broker finds that it cannot handle the job after it was confirmed. In such cases it uses the frontend REST API to mark the job as failed. @@ -212,7 +212,7 @@ Message JSON format is dictionary (associative array) with keys: - **task_state** -- state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED". - COMPLETED -- task was successfully executed without any error, subsequent task will be executed - FAILED -- task ended up with some error, subsequent task will be skipped - - SKIPPED -- some of the previous dependencies failed to execute, so this task won't be executed at all + - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all ### Web app - Web API communication @@ -244,7 +244,7 @@ Task is an atomic piece of work executed by worker. There are two basic types of - (un)zip/tar/gzip/bzip file(s) - fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP). -Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it's quite easy to implement them using common libraries (e.g., _zlib_, _curl_). +Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it is quite easy to implement them using common libraries (e.g., _zlib_, _curl_). #### Internal tasks @@ -289,7 +289,7 @@ Even though the internal operations may be handled by external executables (`mv` External tasks are arbitrary executables, typically ran inside isolate (with given parameters) and the worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job. - **stdin** -- can be configured to read from existing file or from `/dev/null`. -- **stdout** and **stderr** -- can be individually redirected to a file or discarded. If this output options are specified, than it's possible to upload output files with results by copying them into result directory. +- **stdout** and **stderr** -- can be individually redirected to a file or discarded. If this output options are specified, than it is possible to upload output files with results by copying them into result directory. - **limits** -- task has time and memory limits; if these limits are exceeded, the task is failed. The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and sent back to frontend application to address which was specified on input. @@ -517,7 +517,7 @@ List of usable variables in job configuration: ### Directories and Files -For each job execution unique directory structure is created. Job is not restricted to use only specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside a job. DEFAULT variable represents worker's working directory specified in its configuration. No variable of this name is defined for use in job YAML configuration, it's used just for this example. +For each job execution unique directory structure is created. Job is not restricted to use only specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside a job. DEFAULT variable represents worker's working directory specified in its configuration. No variable of this name is defined for use in job YAML configuration, it is used just for this example. List of temporary files for job execution: @@ -651,7 +651,7 @@ assignments, so we only include it for demonstration purposes. Because every assignment focuses on a different technology, we would need a new type of stage for each one. These stages would only run some checker programs -against the submitted sources (and possibly try to check their syntax etc.). ReCodEx is not primarily determined to perform static analysis, but surely it's also possible. +against the submitted sources (and possibly try to check their syntax etc.). ReCodEx is not primarily determined to perform static analysis, but surely it is also possible. #### Non-procedural programming @@ -720,12 +720,12 @@ Worker gets request from broker to evaluate particular submission. Next step is - "listening" thread hand over whole message through `inproc` socket to "execution" thread - "execution" thread now has to prepare all things and get ready for execution - temporary folders names are initated (but not created) this includes folder with source files, folder with downloaded submission, temporary directory for all possible types of files and folder which will contain results from execution -- if some of the above stated folders is already existing, then it's deleted +- if some of the above stated folders is already existing, then it is deleted - after successfull initiation submission archive is downloaded to created folder - submission archive is decompressed into submission files folder - all files from decompressed archive are copied into evaluation directory which can be used for execution in sandboxes - all other folders which were not created are created just now -- it's time to build `job` from configuration +- it is time to build `job` from configuration - job configuration file is located in evaluation directory if exists and is loaded using `yaml-cpp` library - loaded configuration is now parsed into `job_metadata` structure which is handed over to `job` execution class itself - `job` execution class will now initialize and construct particular `tasks` from `job_metadata` into task tree @@ -804,25 +804,25 @@ Web Application has only a simple work to do. If results is obtained on demand t ## Installation -Installation of whole ReCodEx solution is a very complex process. It's recommended to have good unix skills with basic knowledge of project architecture. +Installation of whole ReCodEx solution is a very complex process. It is recommended to have good unix skills with basic knowledge of project architecture. -There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it's impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter. +There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it is impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter. -Distribution of our choice is CentOS, currently in version 7. It's a well known server distribution, derived from enterprise distrubution from Red Hat, so it's very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain. +Distribution of our choice is CentOS, currently in version 7. It is a well known server distribution, derived from enterprise distrubution from Red Hat, so it is very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain. -The big rival of CentOS in server distributions field is Debian. We're running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It's easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`. +The big rival of CentOS in server distributions field is Debian. We are running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It is easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`. -Some components are also capable of running in Windows environment. However setting up Windows OS is a little bit of pain and it's not supposed to run ReCodEx in this way. Only worker component may be needed to run on Windows, so we're providing clickable installer including dependencies. Just for info, all components should be able to run on Windows, only broker was not tested and may require small tweaks to properly work. +Some components are also capable of running in Windows environment. However setting up Windows OS is a little bit of pain and it is not supposed to run ReCodEx in this way. Only worker component may be needed to run on Windows, so we are providing clickable installer including dependencies. Just for info, all components should be able to run on Windows, only broker was not tested and may require small tweaks to properly work. ### Ansible installer -For automatic installation is used set of Ansible scripts. Ansible is one of the best known and used tools for automatic server management. It's required only to have SSH access to the server and ansible installed on the client machine. For further reading is supposed basic Ansible knowledge. For more info check their [documentation](http://docs.ansible.com/ansible/intro.html). +For automatic installation is used set of Ansible scripts. Ansible is one of the best known and used tools for automatic server management. It is required only to have SSH access to the server and ansible installed on the client machine. For further reading is supposed basic Ansible knowledge. For more info check their [documentation](http://docs.ansible.com/ansible/intro.html). -All Ansible scripts are located in _utils_ repository, _installation_ [directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible files are pretty self-describing, they can be also use as template for installation to different systems. Before installation itself it's required to edit two files -- set addresses of hosts and values of some variables. +All Ansible scripts are located in _utils_ repository, _installation_ [directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible files are pretty self-describing, they can be also use as template for installation to different systems. Before installation itself it is required to edit two files -- set addresses of hosts and values of some variables. #### Hosts configuration -First, it's needed to set ip addresses of your computers. Common practise is to have multiple files with definitions, one for development, another for production for example. Example configuration is in _development_ file. Each component of ReCodEx project can be installed on different server. Hosts can be specified as hostnames or ip addresses, optionally with port of SSH after colon. +First, it is needed to set ip addresses of your computers. Common practise is to have multiple files with definitions, one for development, another for production for example. Example configuration is in _development_ file. Each component of ReCodEx project can be installed on different server. Hosts can be specified as hostnames or ip addresses, optionally with port of SSH after colon. Shorten example of hosts config: @@ -885,25 +885,25 @@ Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic k #### Installation itself -With your computers installed with CentOS and configuration modified it's time to run the installation. +With your computers installed with CentOS and configuration modified it is time to run the installation. ``` $ ansible-playbook -i development recodex.yml ``` -This command installs all components of ReCodEx onto machines listed in _development_ file. It's possible to install only specified parts of project, just use component's YAML file instead of _recodex.yml_. +This command installs all components of ReCodEx onto machines listed in _development_ file. It is possible to install only specified parts of project, just use component's YAML file instead of _recodex.yml_. -Ansible expects to have password-less access to the remote machines. If you haven't such setup, use options `--ask-pass` and `--ask-become-pass`. +Ansible expects to have password-less access to the remote machines. If you have not such setup, use options `--ask-pass` and `--ask-become-pass`. ### Security -One of the most important aspects of ReCodEx instance is security. It's crutial to keep gathered data safe and not to allow unauthorized users modify restricted pieces of information. Here is a small list of recommendations to keep running ReCodEx instance safe. +One of the most important aspects of ReCodEx instance is security. It is crucial to keep gathered data safe and not to allow unauthorized users modify restricted pieces of information. Here is a small list of recommendations to keep running ReCodEx instance safe. -- Secure MySQL installation. The installation script doesn't do any security actions, so please run at least `mysql_secure_installation` script on database computer. -- Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Don't forget to set up automatic renewing! -- Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it's better keep them isolated. +- Secure MySQL installation. The installation script does not do any security actions, so please run at least `mysql_secure_installation` script on database computer. +- Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Do not forget to set up automatic renewing! +- Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it is better keep them isolated. - Keep your server updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf). -- Don't put actually used credentials on web, for example don't commit your passwords (in Ansible variables file) on GitHub. +- Do not put actually used credentials on web, for example do not commit your passwords (in Ansible variables file) on GitHub. - Regularly check logs for anomalies.