**ReCodEx** is designed to be very modular and configurable. One such configuration is sketched in the following picture. There are two separate frontend instances with distinct databases sharing common backend part. This configuration may be suitable for MFF UK -- basic programming course and KSP competition. Note, that connections between components are not fully accurate.
**Web app** is main part of whole project from user point of view. It provides nice user interface and it's the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate it's results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users.
Detailed communication inside the ReCodEx project is captured in the following image and described in sections below. Red connections are through ZeroMQ sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unles explicitly specified otherwise).
Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on it's **init** command).
This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages doesn't flood the network when there are problems. We experienced no issues since we're using this protocol.
Worker is communicating with file server only from _execution thread_ (see picture above). Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you haven't one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request.
Worker is cabable of 2 things -- download file and upload file. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >= 400). Worker have list of credentials to all available file servers in its config file.
File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server doesn't provide authentication or secured connection by itself, but it's supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers:
- **GET /submission_archives/\<id\>.\<ext\>** -- gets an archive with submitted source code and corresponding configuration of this job evaluation
- **GET /tasks/\<hash\>** -- gets a file, common usage is for input files or reference result files
- **PUT /results/\<id\>.\<ext\>** -- upload archive with evaluation results under specified name (should be same _id_ as name of submission archive). On successful upload returns JSON `{ "result": "OK" }` as body of returned page.
If not specified otherwise, `zip` format of archives is used. Symbol `/` in API description is root of file server's domain. If the domain is for example `fs.recodex.org` with SSL support, getting input file for one task could look as GET request to `https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c`.
### Broker - Monitor communication
Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of
- **eval** -- evaluate a job. Requires at least 4 frames:
-`job_id` -- identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
-`header` -- additional header describing worker capabilities. Format must be `header_name=value`, every header shall be in a separate message frame. There is no maximum limit on number of headers. There may be also no headers at all.
- **ack** -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request
- **accept** -- broker is capable of routing request to a worker
- **reject** -- broker can't handle this job (for example when the requirements
File server has a REST API for interaction with other parts of ReCodEx. Description of communication with workers is in [File server side](#file-server-side) section. On top of that, there are other commands for interaction with the API:
- **GET /results/\<id\>.\<ext\>** -- download archive with evaluated results of job _id_
- **POST /submissions/\<id\>** -- upload new submission with identifier _id_. Expects that the body of the POST request uses file paths as keys and the content of the files as values. On successful upload returns JSON `{ "archive_path": <archive_url>, "result_path": <result_url> }` in response body. From _archive_path_ the submission can be downloaded (by worker) and corresponding evaluation results should be uploaded to _result_path_.
- **POST /tasks** -- upload new files, which will be available by names equal to `sha1sum` of their content. There can be uploaded more files at once. On successful upload returns JSON `{ "result": "OK", "files": <file_list> }` in response body, where _file_list_ is dictionary of original file name as key and new URL with already hashed name as value.
Web API calls these fileserver endpoints with standard HTTP requests. There are no special commands involved. There is no communication in opposite direction.
Monitor interacts with web application through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be multiple listeners per channel, even (shortly) delayed connections will receive all messages from the very beginning.
- there is no WebSocket connection for listed channel (job id) -- message is dropped
- there is active WebSocket connection for listed channel -- message is parsed into JSON format (see below) and send as string to that established channel. Messages for active connections are queued, so no messages are discarded even on heavy workload.
Provided web application runs as javascript client inside user's browser. It communicates with REST API on the server through standard HTTP requests. Documentation of the main REST API is in separate document due to its extensiveness. Results are returned as JSON payload, which is simply parsed in web application and presented to the users.
Assignments are programming tasks that can be tested and evaluated by a worker after user submits his/hers solution. An assignment is described by a YAML file that contains information on how to build, run and test it. One submitted assignment is called a (worker) job.
Job is a set/list of tasks (it is generally a set, but order of tasks have some meaning). These tasks may have dependencies (arbitrary number), which needs to be observed. When worker processes a job, it creates a task graph, where tasks are vertices and dependencies are edges (A -> B means that the task A is on the dependency list of task B, so A must be run earlier) and creates its linear ordering. The graph must be acyclic (otherwise linear ordering will not exist) and the worker attempts to execute maximal number of tasks possible. Tasks without dependencies can be executed directly, other tasks are executed when all their dependencies have been successfully completed.
Tasks are executed sequentially -- by the linear ordering of the task graph. Parallel tasks (tasks, which are not directly dependent and thus their linear ordering may be arbitrary) are ordered first by their priority (higher number means higher priority) and secondly by their order in the configuration file. Priority is important for specifying evaluation flow. See sample picture for better understanding.
Each task has a unique ID (alphanum string like _CompileA_, _RunAA_, or _JudgeAB_ in the picture). These IDs are used to identify tasks (for dependency references, in the log, ...). Numbers in bottom right corner are priorities of each task. Higher number means greater priority. It means, that if task _RunAA_ is done, next must be _JudgeAA_ and not _RunAB_ (that will be also valid linear ordering, but _RunAB_ has lower priority).
- **Execute external process** (optionally inside Isolate). External processes are meant for compilation, testing, or execution of external judges. Linux default is mandatory usage of isolate sandbox, this option is present because of Windows, where is currently no sandbox available.
- **Perform internal operation**. Internal operations comprise commands, which are typically related to file/directory maintenance and other evaluation management stuff. Few important examples:
- Create/delete/move/rename file/directory
- (un)zip/tar/gzip/bzip file(s)
- fetch a file from the file repository (either from worker cache or download it by HTTP GET or through SFTP).
Even though the internal operations may be handled by external executables (`mv`, `tar`, `pkzip`, `wget`, ...), it might be better to keep them inside the worker as it would simplify these operations and their portability among platforms. Furthermore, it's quite easy to implement them using common libraries (e.g., _zlib_, _curl_).
- **Extract task** is opposite to archivate task. It can extract different types of archives. Supported formats are the same as supports `libarchive` library (see [libarchive wiki](https://github.com/libarchive/libarchive/wiki)), mainly `zip`, `tar`, `tar.gz`, `tar.bz2` and `7zip`. Please note, that system administrator may not install all packages needed, so some formats may not work. Please, consult your system administrator for more information. Archives could contain only regular files or directories (ie. no symlinks, block and character devices sockets or pipes allowed). Calling command is `extract` and requires two arguments:
- **Fetch task** will give you a file. It can be downloaded from remote file server or just copied from local cache if available. Calling comand is `fetch` with two arguments:
- **Copy task** can copy files and directories. Detailed info can be found on reference page of [boost::filesystem::copy](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#copy). Calling command is `cp` and require two arguments:
- **Make directory task** can create arbitrary number of directories. Calling command is `mkdir` and requires at least one argument. For each provided argument will be called [boost::filesystem::create_directories](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#create_directories) command.
- **Rename task** will rename files and directories. Detailed bahavior can be found on reference page of [boost::filesystem::rename](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#rename). Calling command is `rename` and require two arguments:
- path and name of source target
- path and name of destination target
- **Remove task** is for deleting files and directories. Calling command is `rm` and require at least one argument. For each provided one will be called [boost::filesystem::remove_all](http://www.boost.org/doc/libs/1_60_0/libs/filesystem/doc/reference.html#remove_all) command.
External tasks are arbitrary executables, typically ran inside isolate (with given parameters) and the worker waits until they finish. The exit code determines, whether the task succeeded (0) or failed (anything else). A task may be marked as essential; in such case, failure will immediately cause termination of the whole job.
- **stdin** -- can be configured to read from existing file or from `/dev/null`.
- **stdout** and **stderr** -- can be individually redirected to a file or discarded. If this output options are specified, than it's possible to upload output files with results by copying them into result directory.
- **limits** -- task has time and memory limits; if these limits are exceeded, the task is failed.
The task results (exit code, time, and memory consumption, etc.) are saved into result yaml file and sent back to frontend application to address which was specified on input.
Judges are treated as normal external commands, so there is no special task type for them. Binaries are installed alongside with worker executable in standard directories (on both Linux and Windows systems).
Judges should be used for comparision of outputted files from execution tasks and sample outputs fetched from fileserver. Results of this comparision should be at least information if files are same or not. Extension for this is percentual results based on similarity of given files. All of the judges results have to be printed to standard output.
All packed judges are adopted from old Codex with only very small modifications. ReCodEx judges base directory is in `${JUDGES_DIR}` variable, which can be used in job config file.
- **recodex-judge-normal** is base judge used by most of exercises. This judge compares two text files. It compares only text tokens regardless on amount of whitespace between them.
- **recodex-judge-filter** can be used for preprocess output files before real judging. This judge filters C-like comments from a text file. The comment starts with double slash sequence (`//`) and finishes with newline. If the comment takes whole line, then whole line is filtered.
- if `outputFile` is ommited, std. output is used instead.
- if both files are ommited, application uses std. input and output.
- **recodex-judge-shuffle** is for judging shuffled files. This judge compares two text files and returns 0 if they matches (and 1 otherwise). Two files are compared with no regards for whitespace (whitespace acts just like token delimiter).
- **submission** -- information about this particular submission
- **job-id** -- textual ID which should be unique in whole recodex
- **language** -- no specific function, just for debugging and clarity
- **file-collector** -- address from which fetch tasks will download data
- _log_ -- default is false, can be omitted, determines whether job execution will be logged into one shared log
- **hw-groups** -- list of hardware groups for which are specified limits in this configuration
- **tasks** -- list (not map) of individual tasks
- **task-id** -- unique identifier of task in scope of one submission
- **priority** -- higher number, higher priority
- **fatal-failure** -- if true, than execution of whole job will be stopped after failing of this one
- _dependencies_ -- list of dependencies which have to be fulfilled before this task, can be omitted if there is no dependencies
- **cmd** -- description of command which will be executed
- **bin** -- the binary itself (full path of external command or name of internal task)
- _args_ -- list of arguments which will be sent into execution unit
- _test-id_ -- ID of the test this task is part of -- must be specified for tasks which the particular test's result depends on
- _type_ -- type of the task, can be omitted, default value is _inner_ -- possible values are: _inner_, _initiation_, _execution_, _evaluation_
- _sandbox_ -- wrapper for external tasks which will run in sandbox, if defined task is automatically external
- **name** -- name of used sandbox
- _stdin_ -- file to which standard input will be redirected, can be omitted
- _stdout_ -- file to which standard output will be redirected, can be omitted
- _stderr_ -- file to which error output will be redirected, can be omitted
- **limits** -- list of limits which can be passed to sandbox
- **hw-group-id** -- determines specific limits for specific machines
- _time_ -- time of execution in second
- _wall-time_ -- wall time in seconds
- _extra-time_ -- extra time which will be added to execution
- _stack-size_ -- size of stack of executed program in kilobytes
- _memory_ -- overall memory limit for application in kilobytes
- _parallel_ -- integral number of processes which can run simultaneously, time and memory limits are merged from all potential processes/threads
- _disk-size_ -- size of all IO operations from/to files in kilobytes
- _disk-files_ -- number of files which can be opened
- _environ-variable_ -- wrapper for map of environmental variables, union with default worker configuration
- _chdir_ -- this will be working directory of executed application
- _bound-directories_ -- list of structures representing directories which will be visible inside sandbox, union with default worker configuration. Contains 3 suboptions: **src** -- source pointing to actual system directory, **dst** -- destination inside sandbox which can have its own filesystem binding and **mode** -- determines connection mode of specified directory, one of values: RW, NOEXEC, FS, MAYBE, DEV
This configuration example is written in YAML and serves only for demonstration purposes. Some items can be omitted and defaults from worker configuration will be used.
Because frontend does not know which worker gets the job, its necessary to be a little general in configuration file. This means that some worker specific things has to be transparent. Good example of this is that some (evaluation) directories may be placed differently across all workers. To provide a solution, variables were established. There are of course some restrictions where variables can be used. Basically whenever filesystem paths can be used, variables can be used.
Usage of variables in configuration is simple and kind of shell-like. Name of variable is put inside braces which are preceded with dollar sign. Real usage is then something like this: ${VAR}. There should be no quotes or apostrophes around variable name, just simple text in braces. Parsing is simple and whenever there is dollar sign with braces job execution unit automatically assumes that this is a variable, so there is no chance to have this kind of substring anywhere else.
- **WORKER_ID** -- integral identification of worker, unique on server
- **JOB_ID** -- identification of this job
- **SOURCE_DIR** -- directory where source codes of job are stored
- **EVAL_DIR** -- evaluation directory which should point inside sandbox. Note, that some existing directory must be bound inside sanbox under **EVAL_DIR** name using _bound-directories_ directive inside limits section.
- **RESULT_DIR** -- results from job can be copied here, but only with internal task
- **TEMP_DIR** -- general temp directory which is not dependent on operating system
- **JUDGES_DIR** -- directory in which judges are stored (outside sandbox)
For each job execution unique directory structure is created. Job is not restricted to use only specified directories (tasks can do whatever is allowed on system), but it is advised to use them inside a job. DEFAULT variable represents worker's working directory specified in its configuration. No variable of this name is defined for use in job YAML configuration, it's used just for this example.
List of temporary files for job execution:
- **\${DEFAULT}/downloads/\${WORKER_ID}/\${JOB_ID}** -- where the downloaded archive is saved
- **\${DEFAULT}/submission/\${WORKER_ID}/\${JOB_ID}** -- decompressed submission is stored here
- **\${DEFAULT}/eval/\${WORKER_ID}/\${JOB_ID}** -- this directory is accessible in job configuration using variables and all execution should happen here
- **\${DEFAULT}/temp/\${WORKER_ID}/\${JOB_ID}** -- directory where all sort of temporary files can be stored
- **\${DEFAULT}/results/\${WORKER_ID}/\${JOB_ID}** -- again accessible directory from job configuration which is used to store all files which will be upload on fileserver, usually there will be only yaml result file and optionally log, every other file has to be copied here explicitly from job
Results of tasks are sent back in YAML format compressed into archive. This archive can contain further files, such as job logging information and files which were explicitly copied into results directory. Results file contains job identification and results of individual tasks.
Every assignment consists of tasks. Only some tasks however are part of the evaluation. Those tasks are grouped into **tests**. Each task might have assigned a _test-id_ parameter, as described above. Every test must consist of at least two tasks: execution and evaluation by a judge. The former retrieves information about the execution such as elapsed time and memory consumed, the latter result with a score -- float between 0 and 1. There may be more than one execution tasks, but evaluation task must be exactly one.
Total resulting score of the assignment submission is then calculated according to a supplied score config (described below). Total score is also a float between 0 and 1. This number is then multiplied by the maximum of points awarded for the assignment by the teacher assigning the exercise -- not the exercise author.
First implemented calculator is simple score calculator with test weights. This calculator just looks at the score of each test and put them together according to the test weights specified in assignment configuration. Resulting score is calculated as a sum of products of score and weight of each test divided by the sum of all weights. The algorithm in Python would look something like this:
During the execution tasks can use one shared log. There is no use for multiple logs, one per task for example, because of pretty small amount of information logged. By default logging is disabled, enabling can be done in job configuration.
the output, then we run bison on top of previous stage results and do the same. This is more advanced configuration and ReCodEx is specifically designed to support such evaluation pipeline.
against the submitted sources (and possibly try to check their syntax etc.). ReCodEx is not primarily determined to perform static analysis, but surely it's also possible.
This article will describe in detail execution flow of submission from the point of submission into web application to the point of evaluation of results from execution. Only hot path is considered in following description.
First thing user has to submit his/hers solution to web application. Generally web application has to store submitted files and hand over all needed information about submission to broker. More detailed description follows:
Broker gets information about new submission from web application. At this point broker has to find suitable worker for execution of this particular submission. When worker is found and is jobless, then broker send detailed submission to worker to evaluation. More detailed description follows:
- broker gets multipart "eval" message from web application with job identification, source archive URL, result URL and appropriate worker headers
- headers are parsed and worker which matches all of them is chosen as the one which will execute incoming submission
- whole execution request is saved into `worker` structure to waiting queue
- if chosen worker is not working right now then incoming request is forwarded directly from waiting queue to worker through multipart message
- if worker queue is not empty then nothing is done right now
Worker gets request from broker to evaluate particular submission. Next step is to evaluate given submission and upload results to fileserver. After this worker only send broker that submission was evaluated. More detailed description follows:
- "listening" thread gets multipart message from `broker` with command "eval"
- "listening" thread hand over whole message through `inproc` socket to "execution" thread
- "execution" thread now has to prepare all things and get ready for execution
- temporary folders names are initated (but not created) this includes folder with source files, folder with downloaded submission, temporary directory for all possible types of files and folder which will contain results from execution
- if some of the above stated folders is already existing, then it's deleted
- after successfull initiation submission archive is downloaded to created folder
- submission archive is decompressed into submission files folder
- all files from decompressed archive are copied into evaluation directory which can be used for execution in sandboxes
- all other folders which were not created are created just now
- it's time to build `job` from configuration
- job configuration file is located in evaluation directory if exists and is loaded using `yaml-cpp` library
- loaded configuration is now parsed into `job_metadata` structure which is handed over to `job` execution class itself
-`job` execution class will now initialize and construct particular `tasks` from `job_metadata` into task tree
- if there is some item which can use variables (e.g. binary path, cmd arguments, bound directories) it is done at this point
- all tasks from configuration are created and divided into external or internal tasks
Broker gets done message from worker and basically only mark submission as done in its internal structures. After that broker has to tell Web API that execution of particular job ended. More detailed description follows:
Installation of whole ReCodEx solution is a very complex process. It's recommended to have good unix skills with basic knowledge of project architecture.
There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it's impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter.
Distribution of our choice is CentOS, currently in version 7. It's a well known server distribution, derived from enterprise distrubution from Red Hat, so it's very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain.
The big rival of CentOS in server distributions field is Debian. We're running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It's easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`.
Some components are also capable of running in Windows environment. However setting up Windows OS is a little bit of pain and it's not supposed to run ReCodEx in this way. Only worker component may be needed to run on Windows, so we're providing clickable installer including dependencies. Just for info, all components should be able to run on Windows, only broker was not tested and may require small tweaks to properly work.
For automatic installation is used set of Ansible scripts. Ansible is one of the best known and used tools for automatic server management. It's required only to have SSH access to the server and ansible installed on the client machine. For further reading is supposed basic Ansible knowledge. For more info check their [documentation](http://docs.ansible.com/ansible/intro.html).
All Ansible scripts are located in _utils_ repository, _installation_ [directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible files are pretty self-describing, they can be also use as template for installation to different systems. Before installation itself it's required to edit two files -- set addresses of hosts and values of some variables.
First, it's needed to set ip addresses of your computers. Common practise is to have multiple files with definitions, one for development, another for production for example. Example configuration is in _development_ file. Each component of ReCodEx project can be installed on different server. Hosts can be specified as hostnames or ip addresses, optionally with port of SSH after colon.
Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic key-value pair per line, separated by colon. Values with brief description:
- _source_dir_ -- Directory, where to store all sources from GitHub. Defaults `/opt/recodex`.
- _mysql_root_password_ -- Password of root user of MySQL database. Will be set after installation and saved to `/root/.my.cnf` file.
- _mysql_recodex_username_ -- MySQL username for ReCodEx API access.
- _mysql_recodex_password_ -- Password for the user above.
- _broker_to_webapi_addr_ -- Address, where API can reach broker. Private one is recommended.
- _broker_to_webapi_port_ -- Port to above.
- _broker_firewall_api_ -- Open above port in firewall, "yes" or "no".
- _broker_to_workers_addr_ -- Address, where workers can reach broker. Private one is recommended.
- _broker_to_workers_port_ -- Port to above.
- _broker_firewall_workers_ -- Open above port in firewall, "yes" or "no".
- _broker_notifier_address_ -- URL (on API), where broker will send notifications, for example "https://recodex.projekty.ms.mff.cuni.cz/v1/broker-reports".
- _broker_notifier_port_ -- Port to above, should be the same as for API itself (_webapi_public_port_)
- _broker_notifier_username_ -- Username for HTTP Authentication for reports
- _broker_notifier_password_ -- Password for HTTP Authentication for reporst
- _monitor_websocket_addr_ -- Address, where websocket connection from monitor will be available
- _monitor_websocket_port_ -- Port to above.
- _monitor_firewall_websocket_ -- Open above port in firewall, "yes" or "no".
- _monitor_zeromq_addr_ -- Address, where monitor will be available on ZeroMQ socket for broker to receive reports.
- _monitor_zeromq_port_ -- Port to above.
- _monitor_firewall_zeromq_ -- Open above port in firewall, "yes" or "no".
- _fileserver_addr_ -- Address, where fileserver will serve files.
- _fileserver_port_ -- Port to above.
- _fileserver_firewall_ -- Open above port in firewall, "yes" or "no".
- _fileserver_username_ -- Username for HTTP Authentication for access the fileserver.
- _fileserver_password_ -- Password for HTTP Authentication for access the fileserver.
- _worker_cache_dir_ -- File cache storage for workers. Defaults to "/tmp/recodex/cache".
- _worker_cache_age_ -- How long hold fetched files in worker cache, in seconds.
- _isolate_version_ -- Git tag of Isolate version worker depends on.
With your computers installed with CentOS and configuration modified it's time to run the installation.
```
$ ansible-playbook -i development recodex.yml
```
This command installs all components of ReCodEx onto machines listed in _development_ file. It's possible to install only specified parts of project, just use component's YAML file instead of _recodex.yml_.
Ansible expects to have password-less access to the remote machines. If you haven't such setup, use options `--ask-pass` and `--ask-become-pass`.
One of the most important aspects of ReCodEx instance is security. It's crutial to keep gathered data safe and not to allow unauthorized users modify restricted pieces of information. Here is a small list of recommendations to keep running ReCodEx instance safe.
- Secure MySQL installation. The installation script doesn't do any security actions, so please run at least `mysql_secure_installation` script on database computer.
- Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Don't forget to set up automatic renewing!
- Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it's better keep them isolated.
- Keep your server updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf).
- Don't put actually used credentials on web, for example don't commit your passwords (in Ansible variables file) on GitHub.