diff --git a/Broker.md b/Broker.md index 43285e6..77b95b2 100644 --- a/Broker.md +++ b/Broker.md @@ -70,121 +70,3 @@ forwarded to the frontend. The same goes for external failures. Jobs that fail internally cannot be reassigned, because the "new" broker does not know their headers -- they are reported as failed immediately. -## Installation - -### Dependencies - -Broker has similar basic dependencies as worker, for recapitulation: - -- ZeroMQ in version at least 4.0, packages `zeromq` and `zeromq-devel` (`libzmq3-dev` on Debian) -- YAML-CPP library, `yaml-cpp` and `yaml-cpp-devel` (`libyaml-cpp0.5v5` and `libyaml-cpp-dev` on Debian) -- libcurl library `libcurl-devel` (`libcurl4-gnutls-dev` on Debian) - -### Clone broker source code repository -``` -$ git clone https://github.com/ReCodEx/broker.git -$ git submodule update --init -``` - -### Install broker -It is supposed that your current working directory is that one with clonned worker source codes. - -- Prepare environment running `mkdir build && cd build` -- Build sources by `cmake ..` following by `make` -- Build binary package by `make package` (may require root permissions). -Note that `rpm` and `deb` packages are build in the same time. You may need to have `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit CPACK_GENERATOR variable _CMakeLists.txt_ file in root of source code tree. -- Install generated package through your package manager (`yum`, `dnf`, `dpkg`). - -_Note:_ If you do not want to generate binary packages, you can just install the project with `make install` (as root). But installation through your distribution's package manager is preferred way to keep your system clean and manageable in long term horizon. - - -## Configuration and usage -Following text describes how to set up and run broker program. It is supposed to have required binaries installed. Also, using systemd is recommended for best user experience, but it is not required. Almost all modern Linux distributions are using systemd now. - -Installation of broker program does following step to your computer: - -- create config file `/etc/recodex/broker/config.yml` -- create _systemd_ unit file `/etc/systemd/system/recodex-broker.service` -- put main binary to `/usr/bin/recodex-broker` -- create system user and group `recodex` with nologin shell (if not existing) -- create log directory `/var/log/recodex` -- set ownership of config (`/etc/recodex`) and log (`/var/log/recodex`) directories to `recodex` user and group - -### Default broker configuration - -#### Configuration items - -Description of configurable items in broker's config. Mandatory items are bold, optional italic. - -- _clients_ -- specifies address and port to bind for clients (frontend instance) - - _address_ -- hostname or IP address as string (`*` for any) - - _port_ -- desired port -- _workers_ -- specifies address and port to bind for workers - - _address_ -- hostname or IP address as string (`*` for any) - - _port_ -- desired port - - _max_liveness_ -- maximum amount of pings the worker can fail to send before it is considered disconnected - - _max_request_failures_ -- maximum number of times a job can fail (due to e.g. worker disconnect or a network error when downloading something from the fileserver) and be assigned again -- _monitor_ -- settings of monitor service connection - - _address_ -- IP address of running monitor service - - _port_ -- desired port -- _notifier_ -- details of connection which is used in case of errors and good to know states - - _address_ -- address where frontend API runs - - _port_ -- desired port - - _username_ -- username which can be used for HTTP authentication - - _password_ -- password which can be used for HTTP authentication -- _logger_ -- settings of logging capabilities - - _file_ -- path to the logging file with name without suffix. `/var/log/recodex/broker` item will produce `broker.log`, `broker.1.log`, ... - - _level_ -- level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug` - - _max-size_ -- maximal size of log file before rotating - - _rotations_ -- number of rotation kept - -#### Example config file - -```{.yml} -# Address and port for clients (frontend) -clients: - address: "*" - port: 9658 # Address and port for workers -workers: - address: "*" - port: 9657 - max_liveness: 10 - max_request_failures: 3 -monitor: - address: "127.0.0.1" - port: 7894 -notifier: - address: "127.0.0.1" - port: 8080 - username: "" - password: "" -logger: - file: "/var/log/recodex/broker" # w/o suffix - actual names will be - # broker.log, broker.1.log, ... - level: "debug" # level of logging - max-size: 1048576 # 1 MB; max size of file before log rotation - rotations: 3 # number of rotations kept -``` - -### Running broker - -Running broker is very similar to the worker setup. There is also provided systemd unit file for convenient usage. There is only one broker per whole ReCodEx solution, so there is no need for systemd templates. - -- Running broker can be done by following command: -``` -# systemctl start recodex-broker.service -``` -Check with -``` -# systemctl status recodex-broker.service -``` -if the broker is running. You should see "active (running)" message. - -- Broker can be stopped or restarted accordigly using `systemctl stop` and `systemctl restart` commands. -- If you want to run broker after system startup, run: -``` -# systemctl enable recodex-broker.service -``` - -For further information about using systemd please refer to systemd documentation. - diff --git a/Coding-style.md b/Coding-style.md deleted file mode 100644 index 20d87eb..0000000 --- a/Coding-style.md +++ /dev/null @@ -1,122 +0,0 @@ -# Coding style - -Every project should have some consistent coding style in which all contributors write. Bellow you can find our conventions on which we agreed on and which we try to keep. - -## C++ - -**NOTE, that C++ projects have set code linter (`cmake-format`) with custom format. To reformat code run `make format` inside `build` directory of the project (probably not working on Windows).** For quick introduction into our format, see following paragraphs. - -In C++ is written worker and broker. Generally it is used underscore style with all small letters. Inspired by [Google C++ style guide](https://google.github.io/styleguide/cppguide.html). If something is not defined than naming/formatting can be arbitrary, but should be similar to bellow-defined behaviour. - -### Naming convention -* For source codes use all lower case with underscores not dashes. Header files should end with `.h` and C++ files with `.cpp`. -* Typenames are all in lower case with underscores between words. This is applicable to classes, structs, typedefs, enums and type template parameters. -* Variable names can be divided on local variables and class members. Local variables are all lower case with underscores between words. Class members have in addition trailing underscore on the end (struct data members do not have underscore on the end). -* Constants are just like any other variables and do not have any specifics. -* All function names are again all lower case with underscores between words. -* Namespaces if there are ones they should have lower case and underscores. -* Macros are classical and should have all capitals and underscores. -* Comments can be two types documentational and ordinery ones in code. Documentation should start with `/**` and end with `*/`, convention inside them is javadoc documentation format. Classical comments in code are one liners which starts with `//` and end with the end of the line. - -### Formatting convention -* Line length is not explicitly defined, but should be reasonable. -* All files should use UTF-8 character set. -* For code indentation tabs (`\t`) are used. -* Function declaration/definition: return type should be on the same line as the rest of the declaration, if line is too long, than particular parameters are placed on new line. Opening parenthesis of function should be placed on new line bellow declaration. Its possible to write small function which can be on only one line. Between parameter and comma should be one space. -``` -int run(int id, string msg); - -void print_hello_world() -{ - std::cout << "Hello world" << std::endl; - return; -} - -int get_five() { return 5; } -``` -* Lambda expressions: same formatting as classical functions -``` -auto hello = [](int x) { std::cout << "hello_" << x << std::endl; } -``` -* Function calls: basically same as function header definition. -* Condition: after if, or else there always have to be one space in front of opening bracket and again one space after closing condition bracket (and in front of opening parenthesis). If and else always should be on separate lines. Inside condition there should not be any pointless spaces. -``` -if (x == 5) { - std::cout << "Exactly five!" << std::endl; -} else if (x < 5 && y > 5) { - std::cout << "Whoa, that is weird format!" << std::endl; -} else { - std::cout << "I dont know what is this!" << std::endl; -} -``` -* For and while cycles: basically same rules as for if condition. -* Try-catch blocks: again same rules as for if conditions. Closing parentheses of try block should be on the same line as catch block. -``` -try { - int a = 5 / 0; -} catch (...) { - std::cout << "Division by zero" << std::endl; -} -``` -* Switch: again basics are the same as for if condition. Case statements should not be indented and case body should be intended with 1 tab. -``` -switch (switched) { -case 0: // no tab indent - ... // 1 tab indent - break; -case 1: - ... - break; -default: - exit(1); -} -``` -* Pointers and references: no spaces between period or arrow in accessing type member. No spaces after asterisk or ampersand. In declaration of pointer or reference format should be that asterisk or ampersand is adjacent to name of the variable not type. -``` -number = *ptr; -ptr = &val; -number = ptr->number; -number = val_ref.number; - -int *i; -int &j; - -// bad format bellow -int* i; -int * i; -``` -* Boolean expression: long boolean expression should be divided into more lines. The division point should always be after logical operators. -``` -if (i > 10 && - j < 10 && - k > 20) { - std::cout << "Were here!" << std::endl; -} -``` -* Return values should not be generally wrapped with parentheses, only if needed. -* Preprocessor directives start with `#` and always should start at the beginning of the line. -* Classes: sections aka. public, protected, private should have same indentation as the class start itself. Opening parenthesis of class should be on the same line as class name. -``` -class my_class { -public: - void class_function(); -private: - int class_member_; -}; -``` -* Operators: around all binary operators there always should be spaces. -``` -int x = 5; -x = x * 5 / 5; -x = x + 5 * (10 - 5); -``` - -## Python - -Python code should correspond to [PEP 8](https://www.python.org/dev/peps/pep-0008/) style. - -## PHP -TODO: - -## JavaScript -TODO: \ No newline at end of file diff --git a/Fileserver.md b/Fileserver.md index 0c80987..59a88ea 100644 --- a/Fileserver.md +++ b/Fileserver.md @@ -32,87 +32,4 @@ the following subfolders: structure). -## Installation - -To install and use the fileserver, it is necessary to have Python3 with `pip` package manager installed. It is needed to install the dependencies. From clonned repository run the following command: - -``` -$ pip install -r requirements.txt -``` - -That is it. Fileserver does not need any special installation. It is possible to build and install _rpm_ package or install it without packaging the same way as monitor, but it is only optional. The installation would provide you with script `recodex-fileserver` in you `PATH`. No systemd unit files are provided, because of the configuration and usage of fileserver component is much different to our other Python parts. - - -## Configuration and usage - -There are several ways of running the ReCodEx fileserver. We will cover two -typical use cases. - -### Running in development mode - -For simple development usage, it is possible to run the fileserver in the command -line. Allowed options are described below. - -``` -usage: fileserver.py [--directory WORKING_DIRECTORY] - {runserver,shell} ... -``` - -- **runserver** argument starts the Flask development server (i.e. `app.run()`). As additional argument can be given a port number. -- **shell** argument instructs Flask to run a Python shell inside application context. - -Simple development server on port 9999 can be run as - -``` -$ python3 fileserver.py runserver 9999 -``` - -When run like this command, the fileserver creates a temporary directory where it stores all the files and which is deleted when it exits. - - -### Running as WSGI script in a web server - -If you need features such as HTTP authentication (recommended) or efficient serving of static -files, it is recommended to run the app in a full-fledged web server (such as -Apache or Nginx) using WSGI. Apache configuration can be generated by `mkconfig.py` script from the repository. - -``` -usage: mkconfig.py apache [-h] [--port PORT] --working-directory - WORKING_DIRECTORY [--htpasswd HTPASSWD] - [--user USER] -``` - -- **port** -- port where the fileserver should listen -- **working_directory** -- directory where the files should be stored -- **htpasswd** -- path to user file for HTTP Basic Authentication -- **user** -- user under which the server should be run - -### Running using uWSGI - -Another option is to run fileserver as a standalone app via uWSGI service. Setup is also quite simple, configuration file can be also generated by `mkconfig.py` script. - -1. (Optional) Create a user for running the fileserver -2. Make sure that your user can access your clone of the repository -3. Run `mkconfig.py` script. - ``` - usage: mkconfig.py uwsgi [-h] [--user USER] [--port PORT] - [--socket SOCKET] - --working-directory WORKING_DIRECTORY - ``` - - - **user** -- user under which the server should be run - - **port** -- port where the fileserver should listen - - **socket** -- path to UNIX socket where the fileserver should listen - - **working_directory** -- directory where the files should be stored - -4. Save the configuration file generated by the script and run it with uWSGI, - either directly or using systemd. This depends heavily on your distribution. -5. To integrate this with another web server, see the [uWSGI - documentation](http://uwsgi-docs.readthedocs.io/en/latest/WebServers.html) - -Note that the ways distributions package uWSGI can vary wildly. In Debian 8 it is -necessary to convert the configuration file to XML and make sure that the -python3 plugin is loaded instead of python. This plugin also uses Python 3.4, -even though the rest of the system uses Python 3.5 - make sure to install -dependencies for the correct version. diff --git a/Installation.md b/Installation.md index 903b617..587ec64 100644 --- a/Installation.md +++ b/Installation.md @@ -1,24 +1,59 @@ # Installation -Installation of whole ReCodEx solution is a very complex process. It is recommended to have good unix skills with basic knowledge of project architecture. +Installation of whole ReCodEx solution is a very complex process. It is +recommended to have good unix skills with basic knowledge of project +architecture. -There are a lot of different GNU/Linux distributions with different package management, naming convention and version release policies. So it is impossible to cover all of the possible variants. We picked one distribution, which is fully supported by automatic installation script, for others there are brief information about installation in every project component's own chapter. +There are a lot of different GNU/Linux distributions with different package +management, naming convention and version release policies. So it is impossible +to cover all of the possible variants. We picked one distribution, which is +fully supported by automatic installation script, but there are also steps for +manual installation of all components which should work on most of the Linux +distributions. -Distribution of our choice is CentOS, currently in version 7. It is a well known server distribution, derived from enterprise distrubution from Red Hat, so it is very stable and widely used system with long term support. There are [EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora project, which adds newer versions of some packages into CentOS, which allows us to use current environment. Also, _rpm_ packages are much easier to build (for example from Python sources) and maintain. +The distribution of our choice is CentOS, currently in version 7. It is a well +known server distribution, derived from enterprise distrubution from Red Hat, so +it is very stable and widely used system with long term support. There are +[EPEL](https://fedoraproject.org/wiki/EPEL) additional repositories from Fedora +project, which adds newer versions of some packages into CentOS, which allows us +to use current environment. Also, _rpm_ packages are much easier to build than +_deb_ packages (for example from Python sources). -The big rival of CentOS in server distributions field is Debian. We are running one instance of ReCodEx on Debian too. You need to use _testing_ repositories to use some decent package versions. It is easy to mess your system easily, so create file `/etc/apt/apt.conf` with content of `APT::Default-Release "stable";`. After you add testing repos to `/etc/apt/sources.list`, you can install packages from there like `$ sudo apt-get -t testing install gcc`. +The big rival of CentOS in server distributions field is Debian. We are running +one instance of ReCodEx on Debian too. You need to use _testing_ repositories to +use some decent package versions. It is easy to mess your system easily, so +create file `/etc/apt/apt.conf` with content of `APT::Default-Release +"stable";`. After you add testing repos to `/etc/apt/sources.list`, you can +install packages from there like `$ sudo apt-get -t testing install gcc`. -Some components are also capable of running in Windows environment. However setting up Windows OS is a little bit of pain and it is not supposed to run ReCodEx in this way. Only worker component may be needed to run on Windows, so we are providing clickable installer including dependencies. Just for info, all components should be able to run on Windows, only broker was not tested and may require small tweaks to properly work. +Some components are also capable of running in Windows environment. However +setting up Windows OS is a little bit of pain and it is not supposed to run +ReCodEx in this way. Only worker component may be needed to run on Windows, so +we are providing clickable installer including dependencies. Just for info, all +components should be able to run on Windows, only broker was not tested and may +require small tweaks to work properly. ## Ansible installer -For automatic installation is used set of Ansible scripts. Ansible is one of the best known and used tools for automatic server management. It is required only to have SSH access to the server and ansible installed on the client machine. For further reading is supposed basic Ansible knowledge. For more info check their [documentation](http://docs.ansible.com/ansible/intro.html). +For automatic installation is used a set of Ansible scripts. Ansible is one of +the best known and used tools for automatic server management. It is required +only to have SSH access to the server and ansible installed on the client +machine. For further reading is supposed basic Ansible knowledge. For more info +check their [documentation](http://docs.ansible.com/ansible/intro.html). -All Ansible scripts are located in _utils_ repository, _installation_ [directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible files are pretty self-describing, they can be also use as template for installation to different systems. Before installation itself it is required to edit two files -- set addresses of hosts and values of some variables. +All Ansible scripts are located in _utils_ repository, _installation_ +[directory](https://github.com/ReCodEx/utils/tree/master/installation). Ansible +files are pretty self-describing, they can be also use as template for +installation to different systems. Before installation itself it is required to +edit two files -- set addresses of hosts and values of some variables. ### Hosts configuration -First, it is needed to set ip addresses of your computers. Common practise is to have multiple files with definitions, one for development, another for production for example. Example configuration is in _development_ file. Each component of ReCodEx project can be installed on different server. Hosts can be specified as hostnames or ip addresses, optionally with port of SSH after colon. +First, it is needed to set IP addresses of your computers. Common practise is to +have multiple files with definitions, one for development, another for +production for example. Example configuration is in _development_ file. Each +component of ReCodEx project can be installed on different server. Hosts can be +specified as hostnames or IP addresses, optionally with port of SSH after colon. Shorten example of hosts config: @@ -36,69 +71,748 @@ broker ### Variables -Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic key-value pair per line, separated by colon. Values with brief description: +Configurable variables are saved in _group_vars/all.yml_ file. Syntax is basic +key-value pair per line, separated by colon. Values with brief description: -- _source_dir_ -- Directory, where to store all sources from GitHub. Defaults `/opt/recodex`. -- _mysql_root_password_ -- Password of root user of MySQL database. Will be set after installation and saved to `/root/.my.cnf` file. +- _source_dir_ -- Directory, where to store all sources from GitHub. Defaults + `/opt/recodex`. +- _mysql_root_password_ -- Password of root user of MySQL database. Will be set + after installation and saved to `/root/.my.cnf` file. - _mysql_recodex_username_ -- MySQL username for ReCodEx API access. - _mysql_recodex_password_ -- Password for the user above. -- _admin_email_ -- Email of administrator. Used when configuring Apache webserver. -- _recodex_hostname_ -- Hostname where the API and web app will be accessible. For example "recodex.projekty.ms.mff.cuni.cz". -- _webapp_node_addr_ -- IP address of NodeJS server running web app. Defaults to "127.0.0.1" and should not be changed. +- _admin_email_ -- Email of administrator. Used when configuring Apache + webserver. +- _recodex_hostname_ -- Hostname where the API and web app will be accessible. + For example "recodex.projekty.ms.mff.cuni.cz". +- _webapp_node_addr_ -- IP address of NodeJS server running web app. Defaults to + "127.0.0.1" and should not be changed. - _webapp_node_port_ -- Port to above. -- _webapp_public_addr_ -- Public address, where web server for web app will listen. Defaults to "*". +- _webapp_public_addr_ -- Public address, where web server for web app will + listen. Defaults to "*". - _webapp_public_port_ -- Port to above. - _webapp_firewall_ -- Open port for web app in firewall, values "yes" or "no". -- _webapi_public_endpoint_ -- Public URL when the API will be running, for example "https://recodex.projekty.ms.mff.cuni.cz:4000/v1". -- _webapi_public_addr_ -- Public address, where web server for API will listen. Defaults to "*". +- _webapi_public_endpoint_ -- Public URL when the API will be running, for + example "https://recodex.projekty.ms.mff.cuni.cz:4000/v1". +- _webapi_public_addr_ -- Public address, where web server for API will listen. + Defaults to "*". - _webapi_public_port_ -- Port to above. - _webapi_firewall_ -- Open port for API in firewall, values "yes" or "no". -- _database_firewall_ -- Open port for database in firewall, values "yes" or "no". -- _broker_to_webapi_addr_ -- Address, where API can reach broker. Private one is recommended. +- _database_firewall_ -- Open port for database in firewall, values "yes" or + "no". +- _broker_to_webapi_addr_ -- Address, where API can reach broker. Private one is + recommended. - _broker_to_webapi_port_ -- Port to above. - _broker_firewall_api_ -- Open above port in firewall, "yes" or "no". -- _broker_to_workers_addr_ -- Address, where workers can reach broker. Private one is recommended. +- _broker_to_workers_addr_ -- Address, where workers can reach broker. Private + one is recommended. - _broker_to_workers_port_ -- Port to above. - _broker_firewall_workers_ -- Open above port in firewall, "yes" or "no". -- _broker_notifier_address_ -- URL (on API), where broker will send notifications, for example "https://recodex.projekty.ms.mff.cuni.cz/v1/broker-reports". -- _broker_notifier_port_ -- Port to above, should be the same as for API itself (_webapi_public_port_) +- _broker_notifier_address_ -- URL (on API), where broker will send + notifications, for example + "https://recodex.projekty.ms.mff.cuni.cz/v1/broker-reports". +- _broker_notifier_port_ -- Port to above, should be the same as for API itself + (_webapi_public_port_) - _broker_notifier_username_ -- Username for HTTP Authentication for reports - _broker_notifier_password_ -- Password for HTTP Authentication for reporst -- _monitor_websocket_addr_ -- Address, where websocket connection from monitor will be available +- _monitor_websocket_addr_ -- Address, where websocket connection from monitor + will be available - _monitor_websocket_port_ -- Port to above. - _monitor_firewall_websocket_ -- Open above port in firewall, "yes" or "no". -- _monitor_zeromq_addr_ -- Address, where monitor will be available on ZeroMQ socket for broker to receive reports. +- _monitor_zeromq_addr_ -- Address, where monitor will be available on ZeroMQ + socket for broker to receive reports. - _monitor_zeromq_port_ -- Port to above. - _monitor_firewall_zeromq_ -- Open above port in firewall, "yes" or "no". - _fileserver_addr_ -- Address, where fileserver will serve files. - _fileserver_port_ -- Port to above. - _fileserver_firewall_ -- Open above port in firewall, "yes" or "no". -- _fileserver_username_ -- Username for HTTP Authentication for access the fileserver. -- _fileserver_password_ -- Password for HTTP Authentication for access the fileserver. -- _worker_cache_dir_ -- File cache storage for workers. Defaults to "/tmp/recodex/cache". +- _fileserver_username_ -- Username for HTTP Authentication for access the + fileserver. +- _fileserver_password_ -- Password for HTTP Authentication for access the + fileserver. +- _worker_cache_dir_ -- File cache storage for workers. Defaults to + "/tmp/recodex/cache". - _worker_cache_age_ -- How long hold fetched files in worker cache, in seconds. - _isolate_version_ -- Git tag of Isolate version worker depends on. ### Installation itself -With your computers installed with CentOS and configuration modified it is time to run the installation. +With your computers installed with CentOS and configuration modified it is time +to run the installation. ``` $ ansible-playbook -i development recodex.yml ``` -This command installs all components of ReCodEx onto machines listed in _development_ file. It is possible to install only specified parts of project, just use component's YAML file instead of _recodex.yml_. +This command installs all components of ReCodEx onto machines listed in +_development_ file. It is possible to install only specified parts of project, +just use component's YAML file instead of _recodex.yml_. -Ansible expects to have password-less access to the remote machines. If you have not such setup, use options `--ask-pass` and `--ask-become-pass`. +Ansible expects to have password-less access to the remote machines. If you have +not such setup, use options `--ask-pass` and `--ask-become-pass`. + + +## Manual installation + +### Worker + +#### Dependencies + +Worker specific requirements are written in this section. It covers only basic +requirements, additional runtimes or tools may be needed depending on type of +use. The package names are for CentOS if not specified otherwise. + +- ZeroMQ in version at least 4.0, packages `zeromq` and `zeromq-devel` + (`libzmq3-dev` on Debian) +- YAML-CPP library, `yaml-cpp` and `yaml-cpp-devel` (`libyaml-cpp0.5v5` and + `libyaml-cpp-dev` on Debian) +- libcurl library `libcurl-devel` (`libcurl4-gnutls-dev` on Debian) +- libarchive library as optional dependency. Installing will speed up build + process, otherwise libarchive is built from source during installation. + Package name is `libarchive` and `libarchive-devel` (`libarchive-dev` on + Debian) + +**Isolate** (only for Linux installations) + +First, we need to compile sandbox Isolate from source and install it. Current +worker is tested against version 1.3, so this version needs to be checked out. +Assume that we keep source code in `/opt/src` dir. For building man page you +need to have package `asciidoc` installed. + +``` +$ cd /opt/src +$ git clone https://github.com/ioi/isolate.git +$ cd isolate +$ git checkout v1.3 +$ make +# make install && make install-doc +``` + +For proper work Isolate depends on several advanced features of the Linux +kernel. Make sure that your kernel is compiled with `CONFIG_PID_NS`, +`CONFIG_IPC_NS`, `CONFIG_NET_NS`, `CONFIG_CPUSETS`, `CONFIG_CGROUP_CPUACCT`, +`CONFIG_MEMCG`. If your machine has swap enabled, also check +`CONFIG_MEMCG_SWAP`. With which flags was your kernel compiled with can be found +in `/boot` directory, file `config-` and version of your kernel. Red Hat based +distributions should have these enabled by default, for Debian you you may want +to add the parameters `cgroup_enable=memory swapaccount=1` to the kernel +command-line, which can be set by adding value `GRUB_CMDLINE_LINUX_DEFAULT` to +`/etc/default/grub` file. + +For better reproducibility of results, some kernel parameters can be tweaked: + +- Disable address space randomization. Create file + `/etc/sysctl.d/10-recodex.conf` with content `kernel.randomize_va_space=0`. + Changes will take effect after restart or run `sysctl + kernel.randomize_va_space=0` command. +- Disable dynamic CPU frequency scaling. This requires setting the cpufreq + scaling governor to _performance_. + +#### Clone worker source code repository + +``` +$ git clone https://github.com/ReCodEx/worker.git +$ git submodule update --init +``` + +#### Install worker on Linux + +It is supposed that your current working directory is that one with clonned +worker source codes. + +- Prepare environment running `mkdir build && cd build` +- Build sources by `cmake ..` following by `make` +- Build binary package by `make package` (may require root permissions). Note + that `rpm` and `deb` packages are build in the same time. You may need to have + `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit + CPACK_GENERATOR variable in _CMakeLists.txt_ file in root of source code tree. +- Install generated package through your package manager (`yum`, `dnf`, `dpkg`). + +The worker installation process is composed of following steps: + +- create config file `/etc/recodex/worker/config-1.yml` +- create systemd unit file `/etc/systemd/system/recodex-worker@.service` +- put main binary to `/usr/bin/recodex-worker` +- put judges binaries to `/usr/bin/` directory +- create system user and group `recodex` with `/sbin/nologin` shell (if not + already existing) +- create log directory `/var/log/recodex` +- set ownership of config (`/etc/recodex`) and log (`/var/log/recodex`) + directories to `recodex` user and group + +_Note:_ If you do not want to generate binary packages, you can just install the +project with `make install` (as root). But installation through your +distribution's package manager is preferred way to keep your system clean and +manageable in long term horizon. + +#### Install worker on Windows + +There are basically two main dependencies needed, **Windows 7** or higher and +**Visual Studio 2015+**. Provided simple installation batch script should do all +the work on Windows machine. Officially only VS2015 and 32-bit compilation is +supported, because of hardcoded compile options in installation script. If +different VS or different platform is needed, the script should be changed to +appropriate values. + +Mentioned script is placed in *install* directory alongside supportive scripts +for UNIX systems and is named *win-build.cmd*. Provided script will do almost +all the work connected with building and dependency resolving (using +**NuGet** package manager and `msbuild` building system). Script should be +run under 32-bit version of _Developer Command Prompt for VS2015_ and from +*install* directory. + +Building and installing of worker is then quite simple, script has command line +parameters which can be used to specify what will be done: + +- *-build* -- It is the default options if none specified. Builds worker and its + tests, all is saved in *build* folder and subfolders. +- *-clean* -- Cleanup of downloaded NuGet packages and built + application/libraries. +- *-test* -- Build worker and run tests on compiled test cases. +- *-package* -- Generation of clickable installation using cpack and + [NSIS](http://nsis.sourceforge.net/) (has to be installed on machine to get + this to work). + +``` +install> win-build.cmd # same as: win-build.cmd -build +install> win-build.cmd -clean +install> win-build.cmd -test +install> win-build.cmd -package +``` + +All build binaries and cmake temporary files can be found in *build* folder, +classically there will be subfolder *Release* which will contain compiled +application with all needed dlls. Once if clickable installation binary is +created, it can be found in *build* folder under name +*recodex-worker-VERSION-win32.exe*. Sample screenshot can be found on following +picture. + +![NSIS Installation](https://github.com/ReCodEx/wiki/blob/master/images/nsis_installation.png) + +#### Usage + +A systemd unit file is distributed with the worker to simplify its launch. It +integrates worker nicely into your Linux system and allows you to run it +automatically on system startup. It is possible to have more than one worker on +every server, so the provided unit file is templated. Each instance of the +worker unit has a unique string identifier, which is used for managing that +instance through systemd. By default, only one worker instance is ready to use +after installation and its ID is "1". + +- Starting worker with id "1" can be done this way: +``` +# systemctl start recodex-worker@1.service +``` +Check with +``` +# systemctl status recodex-worker@1.service +``` +if the worker is running. You should see "active (running)" message. + +- Worker can be stopped or restarted accordigly using `systemctl stop` and + `systemctl restart` commands. +- If you want to run worker after system startup, run: +``` +# systemctl enable recodex-worker@1.service +``` +For further information about using systemd please refer to systemd +documentation. + +##### Adding new worker + +To add a new worker you need to do a few steps: + +- Make up an unique string ID. +- Copy default configuration file `/etc/recodex/worker/config-1.yml` to the same + directory and name it `config-.yml` +- Edit that config file to fit your needs. Note that you must at least change + _worker-id_ and _logger file_ values to be unique. +- Run new instance using +``` +# systemctl start recodex-worker@.service +``` + +### Broker + +#### Dependencies + +Broker has similar basic dependencies as worker, for recapitulation: + +- ZeroMQ in version at least 4.0, packages `zeromq` and `zeromq-devel` + (`libzmq3-dev` on Debian) +- YAML-CPP library, `yaml-cpp` and `yaml-cpp-devel` (`libyaml-cpp0.5v5` and + `libyaml-cpp-dev` on Debian) +- libcurl library `libcurl-devel` (`libcurl4-gnutls-dev` on Debian) + +#### Clone broker source code repository + +``` +$ git clone https://github.com/ReCodEx/broker.git +$ git submodule update --init +``` + +#### Installation itself + +Installation of broker program does following step to your computer: + +- create config file `/etc/recodex/broker/config.yml` +- create _systemd_ unit file `/etc/systemd/system/recodex-broker.service` +- put main binary to `/usr/bin/recodex-broker` +- create system user and group `recodex` with nologin shell (if not existing) +- create log directory `/var/log/recodex` +- set ownership of config (`/etc/recodex`) and log (`/var/log/recodex`) + directories to `recodex` user and group + +It is supposed that your current working directory is that one with clonned +worker source codes. + +- Prepare environment running `mkdir build && cd build` +- Build sources by `cmake ..` following by `make` +- Build binary package by `make package` (may require root permissions). Note + that `rpm` and `deb` packages are build in the same time. You may need to have + `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit + CPACK_GENERATOR variable _CMakeLists.txt_ file in root of source code tree. +- Install generated package through your package manager (`yum`, `dnf`, `dpkg`). + +_Note:_ If you do not want to generate binary packages, you can just install the +project with `make install` (as root). But installation through your +distribution's package manager is preferred way to keep your system clean and +manageable in long term horizon. + +#### Usage + +Running broker is very similar to the worker setup. There is also provided +systemd unit file for convenient usage. There is only one broker per whole +ReCodEx solution, so there is no need for systemd templates. + +- Running broker can be done by following command: +``` +# systemctl start recodex-broker.service +``` +Check with +``` +# systemctl status recodex-broker.service +``` +if the broker is running. You should see "active (running)" message. + +- Broker can be stopped or restarted accordigly using `systemctl stop` and + `systemctl restart` commands. +- If you want to run broker after system startup, run: +``` +# systemctl enable recodex-broker.service +``` + +For further information about using systemd please refer to systemd +documentation. + +### Fileserver + +To install and use the fileserver, it is necessary to have Python3 with `pip` +package manager installed. It is needed to install the dependencies. From +clonned repository run the following command: + +``` +$ pip install -r requirements.txt +``` + +That is it. Fileserver does not need any special installation. It is possible to +build and install _rpm_ package or install it without packaging the same way as +monitor, but it is only optional. The installation would provide you with script +`recodex-fileserver` in you `PATH`. No systemd unit files are provided, because +of the configuration and usage of fileserver component is much different to our +other Python parts. + +#### Usage + +There are several ways of running the ReCodEx fileserver. We will cover three +typical use cases. + +##### Running in development mode + +For simple development usage, it is possible to run the fileserver in the +command line. Allowed options are described below. + +``` +usage: fileserver.py [--directory WORKING_DIRECTORY] + {runserver,shell} ... +``` + +- **runserver** argument starts the Flask development server (i.e. `app.run()`). + As additional argument can be given a port number. +- **shell** argument instructs Flask to run a Python shell inside application + context. + +Simple development server on port 9999 can be run as + +``` +$ python3 fileserver.py runserver 9999 +``` + +When run like this command, the fileserver creates a temporary directory where +it stores all the files and which is deleted when it exits. + +##### Running as WSGI script in a web server + +If you need features such as HTTP authentication (recommended) or efficient +serving of static files, it is recommended to run the app in a full-fledged web +server (such as Apache or Nginx) using WSGI. Apache configuration can be +generated by `mkconfig.py` script from the repository. + +``` +usage: mkconfig.py apache [-h] [--port PORT] --working-directory + WORKING_DIRECTORY [--htpasswd HTPASSWD] + [--user USER] +``` + +- **port** -- port where the fileserver should listen +- **working_directory** -- directory where the files should be stored +- **htpasswd** -- path to user file for HTTP Basic Authentication +- **user** -- user under which the server should be run + +##### Running using uWSGI + +Another option is to run fileserver as a standalone app via uWSGI service. Setup +is also quite simple, configuration file can be also generated by `mkconfig.py` +script. + +1. (Optional) Create a user for running the fileserver +2. Make sure that your user can access your clone of the repository +3. Run `mkconfig.py` script. + ``` + usage: mkconfig.py uwsgi [-h] [--user USER] [--port PORT] + [--socket SOCKET] + --working-directory WORKING_DIRECTORY + ``` + + - **user** -- user under which the server should be run + - **port** -- port where the fileserver should listen + - **socket** -- path to UNIX socket where the fileserver should listen + - **working_directory** -- directory where the files should be stored + +4. Save the configuration file generated by the script and run it with uWSGI, + either directly or using systemd. This depends heavily on your distribution. +5. To integrate this with another web server, see the [uWSGI + documentation](http://uwsgi-docs.readthedocs.io/en/latest/WebServers.html) + +Note that the ways distributions package uWSGI can vary wildly. In Debian 8 it +is necessary to convert the configuration file to XML and make sure that the +python3 plugin is loaded instead of python. This plugin also uses Python 3.4, +even though the rest of the system uses Python 3.5 - make sure to install +dependencies for the correct version. + +### Monitor + +For monitor functionality there are some required packages. All of them are +listed in _requirements.txt_ file in the repository and can be installed by +`pip` package manager as + +``` +$ pip install -r requirements.txt +``` + +**Description of dependencies:** + +- zmq -- binding to ZeroMQ framework +- websockets -- framework for communication over WebSockets +- asyncio -- library for fast asynchronous operations +- pyyaml -- parsing YAML configuration files +- argparse -- parsing command line arguments + +Installation will provide you following files: + +- `/usr/bin/recodex-monitor` -- simple startup script located in PATH +- `/etc/recodex/monitor/config.yml` -- configuration file +- `/etc/systemd/system/recodex-monitor.service` -- systemd startup script +- code files will be installed in location depending on your system settings, + mostly into `/usr/lib/python3.5/site-packages/monitor/` or similar + +Systemd script runs monitor binary as specific _recodex_ user, so in `postinst` +script user and group of this name are created. Also, ownership of configuration +file will be granted to that user. + +- RPM distributions can make and install binary package. This can be done like + this: + - run command + ``` + $ python3 setup.py bdist_rpm --post-install ./install/postints + ``` + to generate binary `.rpm` package or download precompiled one from releases + tab of monitor GitHub repository (it is architecture independent package) + - install package using + ``` + # yum install ./dist/recodex-monitor--1.noarch.rpm + ``` +- Other Linux distributions can install cleaner straight + ``` + $ python3 setup.py install --install-scripts /usr/bin + # ./install/postinst + ``` + +#### Usage + +Preferred way to start monitor as a service is via systemd as the other parts of +ReCodEx solution. + +- Running monitor is fairly simple: +``` +# systemctl start recodex-monitor.service +``` +- Current state can be obtained by +``` +# systemctl status recodex-monitor.service +``` +You should see green **Active (running)**. +- Setting up monitor to be started on system startup: +``` +# systemctl enable recodex-monitor.service +``` + +Alternatively monitor can be started directly from command line with specifying +path to configuration file. Note that this command will not start monitor as a +daemon. + +``` +$ recodex-monitor -c /etc/recodex/monitor/config.yml +``` + + +### Cleaner + +To install and use the cleaner, it is necessary to have Python3 with package +manager `pip` installed. + +- Dependencies of cleaner has to be installed: +``` +$ pip install -r requirements.txt +``` +- RPM distributions can make and install binary package. This can be done like + this: +``` +$ python setup.py bdist_rpm --post-install ./cleaner/install/postinst +``` +- Installing generated package using YUM: +``` +# yum install ./dist/recodex-cleaner--1.noarch.rpm +``` +- Other Linux distributions can install cleaner straight +``` +$ python setup.py install --install-scripts /usr/bin +# ./cleaner/install/postinst +``` +- For Windows installation do following: + - start `cmd` with administrator permissions + - run installation with + ``` + > python setup.py install --install-scripts \ + "C:\Program Files\ReCodEx\cleaner" + ``` + where path specified with `--install-scripts` can be changed + - copy configuration file alongside with installed executable using + ``` + > copy install\config.yml \ + "C:\Program Files\ReCodEx\cleaner\config.yml" + ``` + +#### Usage + +As stated before cleaner should be cronned, on linux systems this can be done by +built in `cron` service or if there is `systemd` present cleaner itself provides +`*.timer` file which can be used for cronning from `systemd`. On Windows systems +internal scheduler should be used. + +- Running cleaner from command line is fairly simple: +``` +$ recodex-cleaner -c /etc/recodex/cleaner +``` +- Enable cleaner service using systemd: +``` +$ systemctl start recodex-cleaner.timer +``` +- Add cleaner to linux cron service using following configuration line: +``` +0 0 * * * /usr/bin/recodex-cleaner -c /etc/recodex/cleaner/config.yml +``` +- Add cleaner to Windows cheduler service with following command: +``` +> schtasks /create /sc daily /tn "ReCodEx Cleaner" /tr \ + "\"C:\Program Files\ReCodEx\cleaner\recodex-cleaner.exe\" \ + -c \"C:\Program Files\ReCodEx\cleaner\config.yml\"" +``` + +### REST API + +The web API requires a PHP runtime version at least 7. Which one depends on +actual configuration, there is a choice between _mod_php_ inside Apache, +_php-fpm_ with Apache or Nginx proxy or running it as standalone uWSGI script. +It is common that there are some PHP extensions, that have to be installed on +the system. Namely ZeroMQ binding (`php-zmq` package or similar), MySQL module +(`php-mysqlnd` package) and ldap extension module for CAS authentication +(`php-ldap` package). Make sure that the extensions are loaded in your `php.ini` +file (`/etc/php.ini` or files in `/etc/php.d/`). + +The API depends on some other projects and libraries. For managing them +[Composer](https://getcomposer.org/) is used. It can be installed from system +repositories or downloaded from the website, where detailed instructions are as +well. Composer reads `composer.json` file in the project root and installs +dependencies to the `vendor/` subdirectory. To do that, run: + +``` +$ composer install +``` + +#### Database preparation + +When the API is installed and configured (_doctrine_ section is sufficient here) +the database schema can be generated. There is a prepared command to do that +from command line: + +``` +$ php www/index.php orm:schema-tool:update --force +``` + +With API comes some initial values, for example default user roles with proper +permissions. To fill your database with these values there is another command +line command: + +``` +$ php www/index.php db:fill +``` + +Check the outputs of both commands for errors. If there are any, try to clean +temporary API cache in `temp/cache/` directory and repeat the action. + +#### Webserver configuration + +The simplest way to get started is to start the built-in PHP server in the root +directory of your project: + +``` +$ php -S localhost:4000 -t www +``` + +Then visit `http://localhost:4000` in your browser to see the welcome page of +API project. + +For Apache or Nginx, setup a virtual host to point to the `www/` directory of +the project and you should be ready to go. It is **critical** that whole `app/`, +`log/` and `temp/` directories are not accessible directly via a web browser +(see [security warning](https://nette.org/security-warning)). Also it is +**highly recommended** to set up a HTTPS certificate for public access to the +API. + +#### Troubleshooting + +In case of any issues first remove the Nette cache directory `temp/cache/` and +try again. This solves most of the errors. If it does not help, examine API logs +from `log/` directory of the API source or logs of your webserver. + +### Web application + +Web application requires [NodeJS](https://nodejs.org/en/) server as its runtime +environment. This runtime is needed for executing JavaScript code on server and +sending the pre-render parts of pages to clients, so the final rendering in +browsers is a lot quicker and the page is accessible to search engines for +indexing. + +But some functionality is better in other full fledged web servers like *Apache* +or *Nginx*, so the common practice is to use a tandem of both. *NodeJS* takes +care of basic functionality of the app while the other server (Apache) is set as +reverse proxy and providing additional functionality like SSL encryption, load +balancing or caching of static files. The recommended setup contains both NodeJS +and one of Apache and Nginx web servers for the reasons discussed above. + +Stable versions of 4th and 6th series of NodeJS server are sufficient, using at +least 6th series is highly recommended. Please check the most recent version of +the packages in your distribution's repositories, there are often outdated ones. +However, there are some third party repositories for all main Linux +distributions. + +The app depends on several libraries and components, all of them are listed in +`package.json` file in source repository. For managing dependencies is used node +package manager (`npm`), which can come with NodeJS installation otherwise can +be installed separately. To fetch and install all dependencies run: + +``` +$ npm install +``` + +For easy production usage there is an additional package for managing NodeJS +processes, `pm2`. This tool can run your application as a daemon, monitor +occupied resources, gather logs and provide simple console interface for +managing app's state. To install it globally into your system run: + +``` +# npm install pm2 -g +``` + +#### Usage + +The application can be run in two modes, development and production. Development +mode uses only client rendering and tracks code changes with rebuilds of the +application in real time. In production mode the compilation (transpile to _ES5_ +standard using *Babel* and bundle into single file using *webpack*) has to be +done separately prior to running. The scripts for compilation are provided as +additional `npm` commands. + +- Development mode can be use for local testing of the app. This mode uses + webpack dev server, so all code runs on a client, there is no server side + rendering available. Starting is simple command, default address is + http://localhost:8080. +``` +$ npm run dev +``` +- Production mode is mostly used on the servers. It provides all features such + as server side rendering. This can be run via: +``` +$ npm run build +$ npm start +``` + +Both modes can be configured to use different ports or set base address of used +API server. This can be configured in `.env` file in root of the repository. +There is `.env-sample` file which can be just copied and altered. + +The production mode can be run also as a demon controled by `pm2` tool. First +the web application has to be built and then the server javascript file can run +as a daemon. + +``` +$ npm run build +$ pm2 start bin/server.js +``` + +The `pm2` tool has several options, most notably _status_, _stop_, _restart_ and +_logs_. Further description is available on project +[website](http://pm2.keymetrics.io). ## Security -One of the most important aspects of ReCodEx instance is security. It is crucial to keep gathered data safe and not to allow unauthorized users modify restricted pieces of information. Here is a small list of recommendations to keep running ReCodEx instance safe. +One of the most important aspects of ReCodEx instance is security. It is crucial +to keep gathered data safe and not to allow unauthorized users modify restricted +pieces of information. Here is a small list of recommendations to keep running +ReCodEx instance safe. -- Secure MySQL installation. The installation script does not do any security actions, so please run at least `mysql_secure_installation` script on database computer. -- Get HTTPS certificate and set it in Apache for web application and API. Monitor should be proxied through the web server too with valid certificate. You can get free DV certificate from [Let's Encrypt](https://letsencrypt.org/). Do not forget to set up automatic renewing! -- Hide broker, workers and fileserver behind firewall, private subnet or IPsec tunnel. They are not required to be reached from public internet, so it is better keep them isolated. -- Keep your server updated and well configured. For automatic installation of security updates on CentOS system refer to `yum-cron` package. Configure SSH and Apache to use only strong ciphers, some recommendations can be found [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf). -- Do not put actually used credentials on web, for example do not commit your passwords (in Ansible variables file) on GitHub. +- Secure MySQL installation. The installation script does not do any security + actions, so please run at least `mysql_secure_installation` script on database + computer. +- Get HTTPS certificate and set it in Apache for web application and API. + Monitor should be proxied through the web server too with valid certificate. + You can get free DV certificate from [Let's + Encrypt](https://letsencrypt.org/). Do not forget to set up automatic + renewing! +- Hide broker, workers and fileserver behind firewall, private subnet or IPsec + tunnel. They are not required to be reached from public internet, so it is + better keep them isolated. +- Keep your server updated and well configured. For automatic installation of + security updates on CentOS system refer to `yum-cron` package. Configure SSH + and Apache to use only strong ciphers, some recommendations can be found + [here](https://bettercrypto.org/static/applied-crypto-hardening.pdf). +- Do not put actually used credentials on web, for example do not commit your + passwords (in Ansible variables file) on GitHub. - Regularly check logs for anomalies. + + + + diff --git a/Monitor.md b/Monitor.md index 198a319..58f3e37 100644 --- a/Monitor.md +++ b/Monitor.md @@ -11,18 +11,6 @@ Monitor is needed one per broker, that is one per separate ReCodEx instance. Als Monitor is written in Python, tested versions are 3.4 and 3.5. This language was chosen because it is already in project requirements (fileserver) and there are great libraries for ZeroMQ, WebSockets and asynchronous operations. This library saves system resources and provides us great amount of processed messages. Also, coding in Python was pretty simple and saves us time for improving the other parts of ReCodEx. -For monitor functionality there are some required packages. All of them are listed in _requirements.txt_ file in the repository and can be installed by `pip` package manager as -``` -$ pip install -r requirements.txt -``` - -**Description of dependencies:** - -- zmq -- binding to ZeroMQ framework -- websockets -- framework for communication over WebSockets -- asyncio -- library for fast asynchronous operations -- pyyaml -- parsing YAML configuration files -- argparse -- parsing command line arguments ### Message flow @@ -40,94 +28,3 @@ There can be multiple receivers to one channel id. Each one has separate _asynci Messages from client's queue are sent through corresponding WebSocket connection via main event loop as soon as possible. This approach with separate queue per connection is easy to implement and guarantees reliability and order of message delivery. - -## Installation - -Installation will provide you following files: - -- `/usr/bin/recodex-monitor` -- simple startup script located in PATH -- `/etc/recodex/monitor/config.yml` -- configuration file -- `/etc/systemd/system/recodex-monitor.service` -- systemd startup script -- code files will be installed in location depending on your system settings, mostly into `/usr/lib/python3.5/site-packages/monitor/` or similar - -Systemd script runs monitor binary as specific _recodex_ user, so in `postinst` script user and group of this name are created. Also, ownership of configuration file will be granted to that user. - -- RPM distributions can make and install binary package. This can be done like this: - - run command - ``` - $ python3 setup.py bdist_rpm --post-install ./install/postints - ``` - to generate binary `.rpm` package or download precompiled one from releases tab of monitor GitHub repository (it is architecture independent package) - - install package using - ``` - # yum install ./dist/recodex-monitor--1.noarch.rpm - ``` -- Other Linux distributions can install cleaner straight - ``` - $ python3 setup.py install --install-scripts /usr/bin - # ./install/postinst - ``` - -## Configuration and usage - -### Configuration -Configuration file is located in subdirectory `monitor` of standard ReCodEx configuration folder `/etc/recodex/`. It is in YAML format as all of the other configurations. Format is very similar to configurations of broker or workers. - -### Configuration items - -Description of configurable items, bold ones are required, italics ones are optional. - -- _websocket_uri_ -- URI where is the endpoint of websocket connection. Must be visible to the clients (directly or through public proxy) - - string representation of IP address or a hostname - - port number -- _zeromq_uri_ -- URI where is the endpoint of zeromq connection from broker. Could be hidden from public internet. - - string representation of IP address or a hostname - - port number -- _logger_ -- settings of logging - - _file_ -- path with name of log file. Defaults to `/var/log/recodex/monitor.log` - - _level_ -- logging level, one of "debug", "info", "warning", "error" and "critical" - - _max-size_ -- maximum size of log file before rotation in bytes - - _rotations_ -- number of rotations kept - -### Example configuration file - -```{.yml} ---- -websocket_uri: - - "127.0.0.1" - - 4567 -zeromq_uri: - - "127.0.0.1" - - 7894 -logger: - file: "/var/log/recodex/monitor.log" - level: "debug" - max-size: 1048576 # 1 MB - rotations: 3 -... -``` - -### Usage - -Preferred way to start monitor as a service is via systemd as the other parts of ReCodEx solution. - -- Running monitor is fairly simple: -``` -# systemctl start recodex-monitor.service -``` -- Current state can be obtained by -``` -# systemctl status recodex-monitor.service -``` -You should see green **Active (running)**. -- Setting up monitor to be started on system startup: -``` -# systemctl enable recodex-monitor.service -``` - -Alternatively monitor can be started directly from command line. -Note that this command will not start monitor as a daemon. -``` -$ recodex-monitor -c /etc/recodex/monitor/config.yml -``` - diff --git a/Overall-architecture.md b/Overall-architecture.md deleted file mode 100644 index 3e097a7..0000000 --- a/Overall-architecture.md +++ /dev/null @@ -1,251 +0,0 @@ -# Overall Architecture - -## Description - -**ReCodEx** is designed to be very modular and configurable. One such configuration is sketched in the following picture. There are two separate frontend instances with distinct databases sharing common backend part. This configuration may be suitable for MFF UK -- basic programming course and KSP competition. Note, that connections between components are not fully accurate. - -![Overall architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png) - -**Web app** is main part of whole project from user point of view. It provides nice user interface and it is the only part, that interacts with outside world directly. **Web API** contains almost all logic of the app including _user management and authentication_, _storing and versioning files_ (with help of **File server**), _counting and assigning points_ to users etc. Advanced users may connect to the API directly or may create custom frontends. **Broker** is essential part of whole architecture. It maintains list of available **Workers**, receives submissions from the **Web API** and routes them further and reports progress of evaluations back to the **Web app**. **Worker** securely runs each received job and evaluate its results. **Monitor** resends evaluation progress messages to the **Web app** in order to be presented to users. - - -## Communication - -Detailed communication inside the ReCodEx system is captured in the following -image and described in sections below. Red connections are through ZeroMQ -sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ -messages are sent as multipart with one string (command, option) per part, with -no empty frames (unles explicitly specified otherwise). - -![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png) - - -### Broker - Worker communication - -Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on its **init** command). - -#### Commands from broker to worker: - -- **eval** -- evaluate a job. Requires 3 message frames: - - `job_id` -- identifier of the job (in ASCII representation -- we avoid - endianness issues and also support alphabetic ids) - - `job_url` -- URL of the archive with job configuration and submitted source - code - - `result_url` -- URL where the results should be stored after evaluation -- **intro** -- introduce yourself to the broker (with **init** command) -- this is - required when the broker loses track of the worker who sent the command. - Possible reasons for such event are e.g. that one of the communicating sides - shut down and restarted without the other side noticing. -- **pong** -- reply to **ping** command, no arguments - -#### Commands from worker to broker: - -- **init** -- introduce self to the broker. Useful on startup or after reestablishing lost connection. Requires at least 2 arguments: - - `hwgroup` -- hardware group of this worker - - `header` -- additional header describing worker capabilities. Format must - be `header_name=value`, every header shall be in a separate message frame. - There is no limit on number of headers. - - There is also an optional third argument -- additional information. If - present, it should be separated from the headers with an empty frame. The - format is the same as headers. Supported keys for additional information are: - - `description` -- a human readable description of the worker for - administrators (it will show up in broker logs) - - `current_job` -- an identifier of a job the worker is now processing. This - is useful when we are reassembling a connection to the broker and need it - to know the worker will not accept a new job. -- **done** -- notifying of finished job. Contains following message frames: - - `job_id` -- identifier of finished job - - `result` -- response result, possible values are: - - OK -- evaluation finished successfully - - FAILED -- job failed and cannot be reassigned to another worker (e.g. - due to error in configuration) - - INTERNAL_ERROR -- job failed due to internal worker error, but another - worker might be able to process it (e.g. downloading a file failed) - - `message` -- a human readable error message -- **progress** -- notice about current evaluation progress. Contains following message frames: - - `job_id` -- identifier of current job - - `state` -- what is happening now. - - DOWNLOADED -- submission successfuly fetched from fileserver - - FAILED -- something bad happened and job was not executed at all - - UPLOADED -- results are uploaded to fileserver - - STARTED -- evaluation of tasks started - - ENDED -- evaluation of tasks is finished - - ABORTED -- evaluation of job encountered internal error, job will be rescheduled to another worker - - FINISHED -- whole execution is finished and worker ready for another job execution - - TASK -- task state changed -- see below - - `task_id` -- only present for "TASK" state -- identifier of task in current job - - `task_state` -- only present for "TASK" state -- result of task evaluation. One of: - - COMPLETED -- task was successfully executed without any error, subsequent task will be executed - - FAILED -- task ended up with some error, subsequent task will be skipped - - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all -- **ping** -- tell broker I am alive, no arguments - - -#### Heartbeating - -It is important for the broker and workers to know if the other side is still -working (and connected). This is achieved with a simple heartbeating protocol. - -The protocol requires the workers to send a **ping** command regularly (the -interval is configurable on both sides -- future releases might let the worker -send its ping interval with the **init** command). Upon receiving a **ping** -command, the broker responds with **pong**. - -Whenever a heartbeating message doesn't arrive, a counter called _liveness_ is -decreased. When this counter drops to zero, the other side is considered -disconnected. When a message arrives, the liveness counter is set back to its -maximum value, which is configurable for both sides. - -When the broker decides a worker disconnected, it tries to reschedule its jobs -to other workers. - -If a worker thinks the broker crashed, it tries to reconnect periodically, with -a bounded, exponentially increasing delay. - -This protocol proved great robustness in real world testing. Thus whole backend -is reliable and can outlive short term issues with connection without problems. -Also, increasing delay of ping messages does not flood the network when there -are problems. We experienced no issues since we are using this protocol. - -### Worker - File Server communication - -Worker is communicating with file server only from _execution thread_. Supported -protocol is HTTP optionally with SSL encryption (**recommended**). If supported -by server and used version of libcurl, HTTP/2 standard is also available. File -server should be set up to require basic HTTP authentication and worker is -capable to send corresponding credentials with each request. - -#### Worker side - -Workers comunicate with the file server in both directions -- they download -student's submissions and then upload evaluation results. Internally, worker is -using libcurl C library with very similar setup. In both cases it can verify -HTTPS certificate (on Linux against system cert list, on Windows against -downloaded one from CURL website during installation), support basic HTTP -authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error -(returned HTTP status code is >=400). Worker have list of credentials to all -available file servers in its config file. - -- download file -- standard HTTP GET request to given URL expecting file content as response -- upload file -- standard HTTP PUT request to given URL with file data as body -- same as command line tool `curl` with option `--upload-file` - -#### File server side - -File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server does not provide authentication or secured connection by itself, but it is supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers: - -- **GET /submission_archives/\.\** -- gets an archive with submitted source code and corresponding configuration of this job evaluation -- **GET /exercises/\** -- gets a file, common usage is for input files or - reference result files -- **PUT /results/\.\** -- upload archive with evaluation results under specified name (should be same _id_ as name of submission archive). On successful upload returns JSON `{ "result": "OK" }` as body of returned page. - -If not specified otherwise, `zip` format of archives is used. Symbol `/` in API description is root of file server's domain. If the domain is for example `fs.recodex.org` with SSL support, getting input file for one task could look as GET request to `https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c`. - - -### Broker - Monitor communication - -Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of -socket is same on both sides, ROUTER. Monitor is set to act as server in this -communication, its IP address and port are configurable in monitor's config -file. ZeroMQ socket ID (set on monitor's side) is "recodex-monitor" and must be -sent as first frame of every multipart message -- see ZeroMQ ROUTER socket -documentation for more info. - -Note that the monitor is designed so that it can receive data both from the -broker and workers. The current architecture prefers the broker to do all the -communication so that the workers do not have to know too many network services. - -Monitor is treated as a somewhat optional part of whole solution, so no special -effort on communication realibility was made. - -#### Commands from monitor to broker: - -Because there is no need for the monitor to communicate with the broker, there -are no commands so far. Any message from monitor to broker is logged and -discarded. - -Commands from broker to monitor: - -- **progress** -- notification about progress with job evaluation. See [Progress callback](#progress-callback) section for more info. - - -### Broker - Web API communication - -Broker communicates with main REST API through ZeroMQ connection over TCP. Socket -type on broker side is ROUTER, on frontend part it is DEALER. Broker acts as a -server, its IP address and port is configurable in the API. - -#### Commands from API to broker: - -- **eval** -- evaluate a job. Requires at least 4 frames: - - `job_id` -- identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids) - - `header` -- additional header describing worker capabilities. Format must be `header_name=value`, every header shall be in a separate message frame. There is no maximum limit on number of headers. There may be also no headers at all. A worker is considered suitable for the job if and only if it satisfies all of its headers. - - empty frame -- frame which contains only empty string and serves only as breakpoint after headers - - `job_url` -- URI location of archive with job configuration and submitted source code - - `result_url` -- remote URI where results will be pushed to - -#### Commands from broker to API (all are responses to **eval** command): - -- **ack** -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request -- **accept** -- broker is capable of routing request to a worker -- **reject** -- broker cannot handle this job (for example when the requirements - specified by the headers cannot be met). There are (rare) cases when the - broker finds that it cannot handle the job after it was confirmed. In such - cases it uses the frontend REST API to mark the job as failed. - - -#### Asynchronous communication between broker and API - -Only a fraction of the errors that can happen during evaluation can be detected -while there is a ZeroMQ connection between the API and broker. To notify the -frontend of the rest, we need an asynchronous communication channel that can be -used by the broker when the status of a job changes (it's finished, it failed -permanently, the only worker capable of processing it disconnected...). - -This functionality is supplied by the `broker-reports/` API endpoint group -- -see its documentation for more details. - -### File Server - Web API communication - -File server has a REST API for interaction with other parts of ReCodEx. Description of communication with workers is in [File server side](#file-server-side) section. On top of that, there are other commands for interaction with the API: - -- **GET /results/\.\** -- download archive with evaluated results of job _id_ -- **POST /submissions/\** -- upload new submission with identifier _id_. Expects that the body of the POST request uses file paths as keys and the content of the files as values. On successful upload returns JSON `{ "archive_path": , "result_path": }` in response body. From _archive_path_ the submission can be downloaded (by worker) and corresponding evaluation results should be uploaded to _result_path_. -- **POST /tasks** -- upload new files, which will be available by names equal to `sha1sum` of their content. There can be uploaded more files at once. On successful upload returns JSON `{ "result": "OK", "files": }` in response body, where _file_list_ is dictionary of original file name as key and new URL with already hashed name as value. - -There are no plans yet to support deleting files from this API. This may change in time. - -Web API calls these fileserver endpoints with standard HTTP requests. There are no special commands involved. There is no communication in opposite direction. - - -### Monitor - Web app communication - -Monitor interacts with web application through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be multiple listeners per channel, even (shortly) delayed connections will receive all messages from the very beginning. - -When monitor receives **progress** message from broker there are two options: - -- there is no WebSocket connection for listed channel (job id) -- message is dropped -- there is active WebSocket connection for listed channel -- message is parsed into JSON format (see below) and send as string to that established channel. Messages for active connections are queued, so no messages are discarded even on heavy workload. - -Message JSON format is dictionary (associative array) with keys: - -- **command** -- type of progress, one of: - - DOWNLOADED -- submission successfuly fetched from fileserver - - FAILED -- something bad happened and job was not executed at all - - UPLOADED -- results are uploaded to fileserver - - STARTED -- evaluation of tasks started - - ENDED -- evaluation of all tasks finished, worker now just have to send results and cleanup after execution - - ABORTED -- evaluation of job encountered internal error, job will be rescheduled to another worker - - FINISHED -- whole execution finished and worker is ready for another job execution - - TASK -- task state changed, further information will be provided -- see below -- **task_id** -- id of currently evaluated task. Present only if **command** is "TASK". -- **task_state** -- state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED". - - COMPLETED -- task was successfully executed without any error, subsequent task will be executed - - FAILED -- task ended up with some error, subsequent task will be skipped - - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all - - -### Web app - Web API communication - -Provided web application runs as javascript client inside user's browser. It communicates with REST API on the server through standard HTTP requests. Documentation of the main REST API is in separate [document](https://recodex.github.io/api/) due to its extensiveness. Results are returned as JSON payload, which is simply parsed in web application and presented to the users. - diff --git a/Rewritten-docs.md b/Rewritten-docs.md index b2c6392..3cafd0a 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -139,19 +139,21 @@ corresponds to his/her privileges. There are user groups reflecting the structure of lectured courses. A database of exercises (algorithmic problems) is another part of the project. -Each exercise consists of a text describing the problem in multiple language -variants, an evaluation configuration (machine-readable instructions on how to -evaluate solutions to the exercise) and a set of inputs and reference outputs. -Exercises are created by instructed privileged users. Assigning an exercise to a -group means choosing one of the available exercises and specifying additional -properties: a deadline (optionally a second deadline), a maximum amount of -points, a configuration for calculating the score, a maximum number of -submissions, and a list of supported runtime environments (e.g. programming -languages) including specific time and memory limits for each one. +Each exercise consists of a text describing the problem (optionally in two +language variants -- Czech and English), an evaluation configuration +(machine-readable instructions on how to evaluate solutions to the exercise) and +a set of inputs and reference outputs. Exercises are created by instructed +privileged users. Assigning an exercise to a group means choosing one of the +available exercises and specifying additional properties: a deadline (optionally +a second deadline), a maximum amount of points, a configuration for calculating +the score, a maximum number of submissions, and a list of supported runtime +environments (e.g. programming languages) including specific time and memory +limits for each one. Typical use cases for supported user roles are following: - **student** + - create new user account via registration form - join a group - get assignments in group - submit solution to assignment -- upload one source file and trigger @@ -180,10 +182,10 @@ students. Concepts of consecutive steps from source code to final results is described in more detail below to give readers solid overview of what have to happen during evaluation process. -First thing users have to do is to submit their solutions through web user +First thing students have to do is to submit their solutions through web user interface. The system checks assignment invariants (deadlines, count of -submissions, ...) and stores the submitted file. The runtime environment is -automatically detected based on input file and a suitable evaluation +submissions, ...) and stores the submitted code. The runtime environment is +automatically detected based on input file extension and a suitable evaluation configuration variant is chosen (one exercise can have multiple variants, for example C and Java languages). This exercise configuration is then used for taking care of evaluation process. @@ -341,7 +343,7 @@ annoyed if they did not. - _automated deployment_ -- all of the components of the system must be easy to deploy in an automated fashion - _open source licensing_ -- the source code should be released under a - permissive license allowing further development; this also applies to used + permissive licence allowing further development; this also applies to used libraries and frameworks - _multi-platform worker_ -- worker machines running Linux, Windows and potentially other operating systems must be supported @@ -449,20 +451,18 @@ restarted. At this point there is a clear idea how the new system will be used and what are the major enhancements for future releases. With this in mind, the overall -architecture can be sketched. From the previous research, several goals are set -up for the new project. They mostly reflect drawbacks of the current version of -CodEx and some reasonable wishes of university users. Most notable features are -following: +architecture can be sketched. To sum up, here is a list of key features of the +new system. They come from previous research of current system's drawbacks, +reasonable wishes of university users and our major design choices. - modern HTML5 web frontend written in JavaScript using a suitable framework -- REST API implemented in PHP, communicating with database, evaluation backend - and a file server +- REST API communicating with database, evaluation backend and a file server - evaluation backend implemented as a distributed system on top of a message - queue framework (ZeroMQ) with master-worker architecture + queue framework with master-worker architecture - multi-platform worker supporting Linux and Windows environment (latter without sandbox, no general purpose suitable tool available yet) -- evaluation procedure configured in a YAML file, compound of small tasks - connected into an arbitrary oriented acyclic graph +- evaluation procedure configured in a human readable text file, compound of + small tasks connected into an arbitrary oriented acyclic graph The reasons supporting these decisions are explained in the rest of analysis chapter. Also a lot of smaller design choices are mentioned including possible @@ -532,18 +532,18 @@ is implemented. The relative value is set in percents and is called threshold. Our university has a few partner grammar schools. There were an idea, that they could use CodEx for teaching informatics classes. To make the setup simple for -them, all the software and hardware would be provided by university and hosted -in their datacentre. However, CodEx were not prepared to support this kind of -usage and no one had time to manage a separate instance. With ReCodEx it is -possible to offer hosted environment as a service to other subjects. The concept -we figured out is based on user and group separation inside the system. There -are multiple _instances_ in the system, which means unit of separation. Each -instance has own set of users and groups, exercises can be optionally shared. -Evaluation backend is common for all instances. To keep track of active -instances and paying customers, each instance must have a valid _license_ to -allow users submit their solutions. License is granted for defined period of -time and can be revoked in advance if the subject do not keep approved terms and -conditions. +them, all the software and hardware would be provided by the university as a +completely ready-to-use remote service. However, CodEx were not prepared to +support this kind of usage and no one had time to manage a separate instance. +With ReCodEx it is possible to offer hosted environment as a service to other +subjects. The concept we figured out is based on user and group separation +inside the system. There are multiple _instances_ in the system, which means +unit of separation. Each instance has own set of users and groups, exercises can +be optionally shared. Evaluation backend is common for all instances. To keep +track of active instances and paying customers, each instance must have a valid +_licence_ to allow users submit their solutions. licence is granted for defined +period of time and can be revoked in advance if the subject do not keep approved +terms and conditions. The main work for the system is to evaluate programming exercises. The exercise is quite similar to homework assignment during school labs. When a homework is @@ -561,36 +561,6 @@ for every assignment of the same exercise. This separation is natural for all users, in CodEx it is implemented in similar way and no other considerable solution was found. -### Forgotten password - -With authentication and some sort of dealing with passwords is related a problem -with forgotten credentials, especially passwords. People easily forget them and -there has to be some kind of mechanism to retrieve a new password or change the -old one. Problem is that it cannot be done in totally secure way, but we can at -least come quite close to it. First, there are absolutely not secure and -recommendable ways how to handle that, for example sending the old password -through email. A better, but still not secure solution is to generate a new one -and again send it through email. This solution was provided in CodEx, users had -to write an email to administrator, who generated a new password and sent it -back to the sender. This simple solution could be also automated, but -administrator had quite a big control over whole process. This might come in -handy if there could be some additional checkups for example, but on the other -hand it can be quite time consuming. - -Probably the best solution which is often used and is fairly secure is -following. Let us consider only case in which all users have to fill their -email addresses into the system and these addresses are safely in the hands of -the right users. When user finds out that he/she does not remember a password, -he/she requests a password reset and fill in his/her unique identifier; it might -be email or unique nickname. Based on matched user account the system generates -unique access token and sends it to user via email address. This token should be -time limited and usable only once, so it cannot be misused. User then takes the -token or URL address which is provided in the email and go to the system's -appropriate section, where new password can be set. After that user can sign in -with his/her new password. As previously stated, this solution is quite safe and -user can handle it on its own, so administrator does not have to worry about it. -That is the main reason why this approach was chosen to be used. - ### Evaluation unit executed by ReCodEx One of the bigger requests for the new system is to support a complex @@ -623,14 +593,23 @@ so no sandbox needs to be used as in external tasks case. For a job evaluation, the tasks needs to be executed sequentially in a specified order. The idea of running independent tasks in parallel is bad because exact time measurement needs controlled environment on target computer with -minimization of interrupts by other processes. It seems that connecting tasks -into directed acyclic graph (DAG) can handle all possible problem cases. None of -the authors, supervisors and involved faculty staff can think of a problem that -cannot be decomposed into tasks connected in a DAG. The goal of evaluation is -to satisfy as many tasks as possible. During execution there are sometimes -multiple choices of next task. To control that, each task can have a priority, -which is used as a secondary ordering criterion. For better understanding, here -is a small example. +minimization of interrupts by other processes. It would be possible to run tasks +which does not need exact time measuremet in parallel, but in this case a +synchronization mechanism has to be developed to exclude paralellism for +measured tasks. Usually, there are about four times more unmeasured tasks than +tasks with time measurement, but measured tasks tends to be much longer. With +[Amdahl's law](https://en.wikipedia.org/wiki/Amdahl's_law) in mind, the +parallelism seems not to provide a huge benefit in overall execution speed and +brings troubles with synchronization. However, it there will be speed issues, +this approach could be reconsiderred. + +It seems that connecting tasks into directed acyclic graph (DAG) can handle all +possible problem cases. None of the authors, supervisors and involved faculty +staff can think of a problem that cannot be decomposed into tasks connected in a +DAG. The goal of evaluation is to satisfy as many tasks as possible. During +execution there are sometimes multiple choices of next task. To control that, +each task can have a priority, which is used as a secondary ordering criterion. +For better understanding, here is a small example. ![Task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png) @@ -639,20 +618,34 @@ _CompileA_ task is finished, the _RunAA_ task is started (or _RunAB_, but should be deterministic by position in configuration file -- tasks stated earlier should be executed earlier). The task priorities guaranties, that after _CompileA_ task all dependent tasks are executed before _CompileB_ task (they -have higher priority number). For example this is useful to control which files -are present in a working directory at every moment. To sum up, there are 3 -ordering criteria: dependencies, then priorities and finally position of task in -configuration. Together, they define a unambiguous linear ordering of all tasks. +have higher priority number). To sum up, connection of tasks represents +dependencies and priorities can be used to order unrelated tasks and with this +provide a total ordering of them. For well written jobs the priorities may not +be so useful, but they can help control execution order for example to avoid +situation, where each test of the job generates large temporary file and there +is a one valid execution order which keeps all the temporary files for later +processing at one time. Better approach is to finish execution of one test, +clean the big temporary file and proceed with following test. If there is an +ambiguity in task ordering at this point, they are executed in order of input +task configuration. + +The total linear ordering of tasks can be done easier with just executing them +in order of input configuration. But this structure cannot handle well cases, +when a task fails. There is not a easy and nice way how to tell which task +should be executed next. However, this issue can be solved with graph structured +dependencies of the tasks. In graph structure, it is clear that all dependent +tasks has to be skipped and continue execution with a non related task. This is +the main reason, why the tasks are connected in a DAG. For grading there are several important tasks. First, tasks executing submitted code need to be checked for time and memory limits. Second, outputs of judging tasks need to be checked for correctness (represented by return value or by data -on standard output) and should not fail on time or memory limits. This division -can be transparent for backend, each task is executed the same way. But frontend -must know which tasks from whole job are important and what is their kind. It is -reasonable, to keep this piece of information alongside the tasks in job -configuration, so each task can have a label about its purpose. Unlabeled tasks -have an internal type _inner_. There are four categories of tasks: +on standard output) and should not fail. This division can be transparent for +backend, each task is executed the same way. But frontend must know which tasks +from whole job are important and what is their kind. It is reasonable, to keep +this piece of information alongside the tasks in job configuration, so each task +can have a label about its purpose. Unlabeled tasks have an internal type +_inner_. There are four categories of tasks: - _initiation_ -- setting up the environment, compiling code, etc.; for users failure means error in their sources which are not compatible with running it @@ -673,47 +666,19 @@ arbitrary number of tasks with other types. ### Evaluation progress state -Users surely want to know progress state of their submitted solution this kind -of functionality comes particularly handy in long duration exercises. Because of -reporting progress users have immediate knowledge if anything goes wrong, not -mention psychological effect that whole system and its parts are working and -doing something. That is why this feature was considered from beginning but -there are multiple ways how to look at it in particular. - -The very first idea would be to provide progress state based on done messages -from compilation, execution and evaluation. Which is something what a lot of -evaluation systems are providing. These information are high level enough for -users and they probably know what is going on and executing right now. If -compilation fails users know that their solution is not compilable, if execution -fails there were some problems with their program. The clarity of this kind of -progress state is nice and understandable. But as we learnt ReCodEx has to have -more advanced execution pipeline there can be more compilations or more -executions. And in addition parts of the system which ensure execution of users -solutions do not have to precisely know what they are executing at the moment. -This kind of information may be meaningless for them. - -That is why another solution of progress state was considered. As we know right -now one of the best ways how to ensure generality is to have jobs with -single-purpose tasks. These tasks can be anything, some internal operation or -execution of external and sandboxed program. Based on this there is one very -simple solution how to provide general progress state which should be -independent on task types. We know that job has some number of tasks which has -to be executed so we can send state info after execution of every task. And that -is how we get percentual completion of an execution. Yes, it is kind of boring -and standard way but on top of that there can be built something else and more -appealing to users. - -So displaying progress to users can be done numerous ways. We have percentual -completion which is of course begging for simple solution which is displaying -only the percentage or some kind of standard graphical progress bar. But that is -too mainstream lets try something else. Very good idea is to have some kind of -puzzled image or images which will be composed together according to progress. -Nice way but kind of challenging if we do not have designer around. Another -original solution is to have database of random kind-of-funny statements which -will be displayed every time task is completed. It is easy enough for -implementation and even for making up these messages and it is quite new and -original. That is why this last solution was chosen for displaying progress -state. +Users surely want to know a progress state of their submitted solution. The very +first idea would be to report state based on done messages from compilation, +execution and evaluation as a lot of evaluation systems are already providing. +However the ReCodEx have more advanced execution pipeline where there can be +more compilations or more executions per test and also other technical tasks +controlling the job execution flow. The users do not know about these technical +details and data from this tasks may confuse them. + +A solution is to show users only percentual completion of the job as a plain +progress bar without additional information about task types. This solution +works well for all of the jobs and is very user friendly. To make the output +more interesting, there is a database of random kind-of-funny statements and a +random new one is displayed every time a task is completed. ### Results of evaluation @@ -721,32 +686,28 @@ There are lot of things which deserves discussion concerning results of evaluation, how they should be displayed, what should be visible or not and also what kind of reward for users solutions should be chosen. +#### Evaluation outputs + At first let us focus on all kinds of outputs from executed programs within job. Out of discussion is that supervisors should be able to view almost all outputs from solutions if they choose them to be visible and recorded. This feature is -critical in debugging either whole exercises or users solutions. But should it -be default behaviour to record every output? Absolutely not, supervisor should -have a choice to turn it on, but discarding the outputs has to be the default -option. Even without this functionality a file base around whole ReCodEx system -can become quite large and on top of that outputs from executed programs can be -sometimes very extensive. Storing this amount of data is inefficient and -unnecessary to most of the solutions. However, on supervisor request this -feature should be available. - -More interesting question is what should regular users see from execution of -their solution. Simple answer is of course that they should not see anything -which is partly true. Outputs from their programs can be anything and users can -somehow analyze inputs or even redirect them to output. So outputs from -execution should not be visible at all or under very special circumstances. But -that is not so straightforward for compilation or other kinds of initiation, -where it really depends on the particular case. Generally it is quite harmless -to display user some kind of compilation error which can help a lot during -troubleshooting. Of course again this kind of functionality should be -configurable by supervisors and disabled by default. There is also the last kind -of tasks which can output some information which is evaluation tasks. Output of -these tasks is somehow important to whole system and again can contain some -information about inputs or reference outputs. So outputs of evaluation tasks -should not be visible to regular users too. +critical in debugging either whole exercises or users solutions. Supervisor +should have a choice to turn on preserving the data while the default behaviour +is to discard them to keep a file base around whole ReCodEx system in sensible +limits. + +More interesting question is if students should see the logs from execution of +their solution. Usual approach is to keep these information private because of +possibility of leaking input data. This may lead students to hack their +solutions to pass just the ReCodEx testing cases instead of properly solving the +assigned problem. Martin Mareš strongly recommended to use this strategy of +hiding sensitive data too, so ReCodEx does. One exception are compilation +outputs which can help students a lot during troubleshooting. These logs shall +be visible unless the supervisor decides otherwise. Note, that due to lack of +frontend developers, this feature was not implemented in the very first release +of ReCodEx, but will be definitely available in the future. + +#### Scoring and assigning points The overall concept of grading solutions was presented earlier. To briefly remind that, backend returns only exact measured values (used time and memory, @@ -799,7 +760,7 @@ factor. There are several ways how to save structured data: - relational database Another important factor is amount and size of stored data. Our guess is about -1000 users, 100 exercises, 200 assignments per year and 400000 unique solutions +1000 users, 100 exercises, 200 assignments per year and 200000 unique solutions per year. The data are mostly structured and there are a lot of them with the same format. For example, there is a thousand of users and each one has the same values -- name, email, age, etc. These kind of data are relatively small, name @@ -825,19 +786,6 @@ approaches are equally good, final decision depends on actual case. ## Structure of the project -There are numerous ways how to divide some sort of system into separated -services, from one single component to many and many single-purpose components. -Having only one big service is not feasible, not scalable enough and mainly it -would be one big blob of code which somehow works and is very complex, so this -is not the way. The quite opposite, having a lot of single-purpose components is -also somehow impractical. It is scalable by default and all services would have -quite simple code but on the other hand communication requirements for such -solution would be insane. So there has to be chosen approach which is somehow in -the middle, that means services have to communicate in manner which will not -bring network down, code basis should be reasonable and the whole system has to -be scalable enough. With this being said there can be discussion over particular -division for ReCodEx system. - The ReCodEx project is divided into two logical parts – the *backend* and the *frontend* – which interact which each other and which cover the whole area of code examination. Both of these logical parts are independent of each other in @@ -861,8 +809,8 @@ progress of processing of the queued jobs and the results of the evaluations can be queried after the job processing is finished. The backend produces a log of the evaluation and scores the solution based on the job configuration document. -From the scalable point of view there are two necessary components, the one -which will execute jobs and component which will distribute jobs to the +To make the backend scalable, there are two necessary components -- the one +which will execute jobs and the other which will distribute jobs to the instances of the first one. This ensures scalability in manner of parallel execution of numerous jobs which is exactly what is needed. Implementation of these services are called **broker** and **worker**, first one handles @@ -929,20 +877,15 @@ protocol between these two logical parts will be described as well. ## Implementation analysis -When developing a project like ReCodEx there has to be some discussion over -implementation details and how to solve some particular problems properly. This -discussion is a never ending story which goes on through the whole development -process. Some of the most important implementation problems or interesting -observations will be discussed in this chapter. +Some of the most important implementation problems or interesting observations +will be discussed in this chapter. -### General communication +### Communication between the backend components Overall design of the project is discussed above. There are bunch of components with their own responsibility. Important thing to design is communication of -these components. All we can count with is that they are connected by network. - -To choose a suitable protocol, there are some additional requirements that -should be met: +these components. To choose a suitable protocol, there are some additional +requirements that should be met: - reliability -- if a message is sent between components, the protocol has to ensure that it is received by target component @@ -955,26 +898,65 @@ Often way to reflect these reproaches is to use some framework which provides better abstraction and more suitable API. We decided to go this way, so the following options are considered: -- CORBA -- Corba is a well known framework for remote object invocation. There - are multiple implementations for almost every known programming language. It - fits nicely into object oriented programming environment. -- RabbitMQ -- RabbitMQ is a messaging framework written in Erlang. It has - bindings to huge number of languages and large community. Also, it is capable - of routing requests, which could be handy feature for job loadbalancing. -- ZeroMQ -- ZeroMQ is another messaging framework, but instead of creating - separate service this is a small library which can be embedded into own - projects. It is written in C++ with huge number of bindings. - -We like CORBA, but our system should be more loosely-coupled, so (asynchronous) -messaging is better approach in our minds. RabbitMQ seems nice with great -advantage of routing capability, but it is quite heavy service written in -language no one from the team knows, so we do not like it much. ZeroMQ is the -best option for us. However, all of the three options would have been possible -to use. - -Frontend communication follows the choice, that ReCodEx should be primary a web -application. The communication protocol has to reflect client-server -architecture. There are several options: +- CORBA (or some other form of RPC) -- CORBA is a well known framework for + remote procedure calls. There are multiple implementations for almost every + known programming language. It fits nicely into object oriented programming + environment. +- RabbitMQ -- RabbitMQ is a messaging framework written in Erlang. It features a + message broker, to which nodes connect and declare the message queues they + work with. It is also capable of routing requests, which could be a useful + feature for job load-balancing. Bindings exist for a large number of languages + and there is a large community supporting the project. +- ZeroMQ -- ZeroMQ is another messaging framework, which is different from + RabbitMQ and others (such as ActiveMQ) because it features a "brokerless + design". This means there is no need to launch a message broker service to + which clients have to connect -- ZeroMQ based clients are capable of + communicating directly. However, it only provides an interface for passing + messages (basically vectors of 255B strings) and any additional features such + as load balancing or acknowledgement schemes have to be implemented on top of + this. The ZeroMQ library is written in C++ with a huge number of bindings. + +CORBA is a large framework that would satisfy all our needs, but we are aiming +towards a more loosely-coupled system, and asynchronous messaging seems better +for this approach than RPC. Moreover, we rarely need to receive replies to our +requests immediately. + +RabbitMQ seems well suited for many use cases, but implementing a job routing +mechanism between heterogenous workers would be complicated -- we would probably +have to create a separate load balancing service, which cancels the advantage of +a message broker already being provided by the framework. It is also written in +Erlang, which nobody from our team understands. + +ZeroMQ is the best option for us, even with the drawback of having to implement +a load balancer ourselves (which could also be seen as a benefit and there is a +notable chance we would have to do the same with RabbitMQ). It also gives us +complete control over the transmitted messages and communication patterns. +However, all of the three options would have been possible to use. + +### File transfers + +There has to be a way to access files stored on the fileserver from both workers +and clients. We will present some of the possible options: + +@todo elaborate this stuff + +- HTTP(S) +- FTP +- SFTP +- A network-shared file system (such as NFS) +- A custom protocol over ZeroMQ + +We chose HTTPS because it is widely used and clients exist in all relevant +environments. In addition, it is highly probable we will have to run an HTTP +server, because it is intended for ReCodEx to have a web frontend. + +### Frontend - backend communication + +Our choices when considering how clients will communicate with the backend have +to stem from the fact that ReCodEx should primarily be a web application. This +rules out ZeroMQ -- while it is very useful for asynchronous communication +between backend components, it is practically impossible to use it from a web +browser. There are several other options: - *TCP sockets* -- TCP sockets give a reliable means of a full-duplex communication. All major operating systems support this protocol and there are @@ -1013,9 +995,10 @@ and testing, and it is understood by programmers so it should be easy for a new developer with some experience in client-side applications to get to know with the ReCodEx API and develop a client application. -To sum up, chosen ways of communication inside the ReCodEx system are captured -in the following image. Red connections are through ZeroMQ sockets, blue are -through WebSockets and green are through HTTP(S). +A high level view of chosen communication protocols in ReCodEx can be seen in +following image. Red arrows mark connections through ZeroMQ sockets, blue mark +WebSockets communication and green arrows connect nodes that communicate through +HTTP(S). ![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png) @@ -1108,114 +1091,132 @@ services, for example via HTTP. Worker is component which is supposed to execute incoming jobs from broker. As such worker should work and support wide range of different infrastructures and maybe even platforms/operating systems. Support of at least two main operating -systems is desirable and should be implemented. Worker as a service does not -have to be much complicated, but a bit of complex behaviour is needed. Mentioned -complexity is almost exclusively concerned about robust communication with -broker which has to be regularly checked. Ping mechanism is usually used for -this in all kind of projects. This means that worker should be able to send ping -messages even during execution. So worker has to be divided into two separate -parts, the one which will handle communication with broker and the another which -will execute jobs. The easiest solution is to have these parts in separate -threads which somehow tightly communicates with each other. For inter process -communication there can be used numerous technologies, from shared memory to -condition variables or some kind of in-process messages. Already used library -ZeroMQ is possible to provide in-process messages working on the same principles -as network communication which is quite handy and solves problems with threads -synchronization and such. +systems is desirable and should be implemented. + +Worker as a service does not have to be much complicated, but a bit of complex +behaviour is needed. Mentioned complexity is almost exclusively concerned about +robust communication with broker which has to be regularly checked. Ping +mechanism is usually used for this in all kind of projects. This means that +worker should be able to send ping messages even during execution. So worker has +to be divided into two separate parts, the one which will handle communication +with broker and the another which will execute jobs. + +The easiest solution is to have these parts in separate threads which somehow +tightly communicates with each other. For inter process communication there can +be used numerous technologies, from shared memory to condition variables or some +kind of in-process messages. Already used library ZeroMQ is possible to provide +in-process messages working on the same principles as network communication +which is quite handy and solves problems with threads synchronization and such. + +#### Evaluation At this point we have worker with two internal parts listening one and execution -one. Implementation of first one is quite straightforward and clear. So lets -discuss what should be happening in execution subsystem. Jobs as work units can -quite vary and do completely different things, that means configuration and -worker has to be prepared for this kind of generality. Configuration and its -solution was already discussed above, implementation in worker is then quite -also quite straightforward. Worker has internal structures to which loads and -which stores metadata given in configuration. Whole job is mapped to job -metadata structure and tasks are mapped to either external ones or internal ones -(internal commands has to be defined within worker), both are different whether -they are executed in sandbox or as internal worker commands. +one. Implementation of first one is quite straightforward and clear. So let us +discuss what should be happening in execution subsystem. + +After successful arrival of job, worker has to prepare new execution +environment, then solution archive has to be downloaded from fileserver and +extracted. Job configuration is located within these files and loaded into +internal structures and executed. After that, results are uploaded back to +fileserver. These steps are the basic ones which are really necessary for whole +execution and have to be executed in this precise order. + +#### Job configuration + +Jobs as work units can quite vary and do completely different things, that means +configuration and worker has to be prepared for this kind of generality. +Configuration and its solution was already discussed above, implementation in +worker is then quite also quite straightforward. + +Worker has internal structures to which loads and which stores metadata given in +configuration. Whole job is mapped to job metadata structure and tasks are +mapped to either external ones or internal ones (internal commands has to be +defined within worker), both are different whether they are executed in sandbox +or as internal worker commands. Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is -how to behave if execution of task with some particular type fails. There are -two possible situations execution fails due to bad user solution or due to some -internal error. If execution fails on internal error solution cannot be declared -overly as failed. User should not be punished for bad configuration or some -network error. This is where task types are useful. Generally initiation, -execution and evaluation are tasks which are somehow executing code which was -given by users who submitted solution of exercise. If this kinds of tasks fail -it is probably connected with bad user solution and can be evaluated. But if -some inner task fails solution should be re-executed, in best case scenario on -different worker. That is why if inner task fails it is sent back to broker -which will reassign job to another worker. More on this subject should be +how to behave if execution of task with some particular type fails. + +There are two possible situations execution fails due to bad user solution or +due to some internal error. If execution fails on internal error solution cannot +be declared overly as failed. User should not be punished for bad configuration +or some network error. This is where task types are useful. Generally +initiation, execution and evaluation are tasks which are somehow executing code +which was given by users who submitted solution of exercise. If this kinds of +tasks fail it is probably connected with bad user solution and can be evaluated. +But if some inner task fails solution should be re-executed, in best case +scenario on different worker. That is why if inner task fails it is sent back to +broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section. +#### Job working directories + There is also question about working directory or directories of job, which directories should be used and what for. There is one simple answer on this every job will have only one specified directory which will contain every file -with which worker will work in the scope of whole job execution. This is of -course nonsense there has to be some logical division. The least which must be -done are two folders one for internal temporary files and second one for -evaluation. The directory for temporary files is enough to comprehend all kind -of internal work with filesystem but only one directory for whole evaluation is -somehow not enough. Users solutions are downloaded in form of zip archives so -why these should be present during execution or why the results and files which -should be uploaded back to fileserver should be cherry picked from the one big -directory? The answer is of course another logical division into subfolders. The -solution which was chosen at the end is to have folders for downloaded archive, -decompressed solution, evaluation directory in which user solution is executed -and then folders for temporary files and for results and generally files which -should be uploaded back to fileserver with solution results. Of course there has -to be hierarchy which separate folders from different workers on the same -machines. That is why paths to directories are in format: +with which worker will work in the scope of whole job execution. This solution +is easy but fails due to logical and security reasons. + +The least which must be done are two folders one for internal temporary files +and second one for evaluation. The directory for temporary files is enough to +comprehend all kind of internal work with filesystem but only one directory for +whole evaluation is somehow not enough. + +The solution which was chosen at the end is to have folders for downloaded +archive, decompressed solution, evaluation directory in which user solution is +executed and then folders for temporary files and for results and generally +files which should be uploaded back to fileserver with solution results. + +There has to be also hierarchy which separate folders from different workers on +the same machines. That is why paths to directories are in format: `${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID}` where default means default working directory of whole worker, folder is particular directory for some -purpose (archives, evaluation, ...). Mentioned division of job directories -proved to be flexible and detailed enough, everything is in logical units and -where it is supposed to be which means that searching through this system should -be easy. In addition if solutions of users have access only to evaluation -directory then they do not have access to unnecessary files which is better for -overall security of whole ReCodEx. - -As we discovered above worker has job directories but users who are writing and +purpose (archives, evaluation, ...). + +Mentioned division of job directories proved to be flexible and detailed enough, +everything is in logical units and where it is supposed to be which means that +searching through this system should be easy. In addition if solutions of users +have access only to evaluation directory then they do not have access to +unnecessary files which is better for overall security of whole ReCodEx. + +#### Job variables + +As mentioned above worker has job directories but users who are writing and managing job configurations do not know where they are (on some particular worker) and how they can be accessed and written into configuration. For this kind of task we have to introduce some kind of marks or signs which will -represent particular folders. Marks or signs can have form of some kind of -special strings which can be called variables. These variables then can be used -everywhere where filesystem paths are used within configuration file. This will -solve problem with specific worker environment and specific hierarchy of -directories. Final form of variables is `${...}` where triple dot is textual -description. This format was used because of special dollar sign character which -cannot be used within filesystem path, braces are there only to border textual -description of variable. +represent particular folders. Marks or signs can have form broadly used +variables. -#### Evaluation +Variables can be used everywhere where filesystem paths are used within +configuration file. This will solve problem with specific worker environment and +specific hierarchy of directories. Final form of variables is `${...}` where +triple dot is textual description. This format was used because of special +dollar sign character which cannot be used within filesystem path, braces are +there only to border textual description of variable. -After successful arrival of job, worker has to prepare new execution -environment, then solution archive has to be downloaded from fileserver and -extracted. Job configuration is located within these files and loaded into -internal structures and executed. After that results are uploaded back to -fileserver. These steps are the basic ones which are really necessary for whole -execution and have to be executed in this precise order. +#### Supplementary files Interesting problem is with supplementary files (inputs, sample outputs). There are two approaches which can be observed. Supplementary files can be downloaded either on the start of the execution or during execution. If the files are -downloaded at the beginning execution does not really started at this point and -if there are problems with network worker find it right away and can abort +downloaded at the beginning, execution does not really started at this point and +if there are problems with network worker will find it right away and can abort execution without executing single task. Slight problems can arise if some of the files needs to have same name (e.g. solution assumes that input is `input.txt`), in this scenario downloaded files cannot be renamed at the beginning but during execution which is somehow impractical and not easily -observed. Second solution of this problem when files are downloaded on the fly -has quite opposite problem, if there are problems with network worker will find -it during execution when for instance almost whole execution is done, this is -also not ideal solution if we care about burnt hardware resources. On the other -hand using this approach users have quite advanced control of execution flow and -know what files exactly are available during execution which is from users -perspective probably more appealing then the first solution. Based on that +observed. + +Second solution of this problem when files are downloaded on the fly has quite +opposite problem, if there are problems with network, worker will find it during +execution when for instance almost whole execution is done, this is also not +ideal solution if we care about burnt hardware resources. On the other hand +using this approach users have quite advanced control of execution flow and know +what files exactly are available during execution which is from users +perspective probably more appealing then the first solution. Based on that, downloading of supplementary files using 'fetch' tasks during execution was chosen and implemented. @@ -1297,7 +1298,7 @@ be fine. Because fetch tasks should have 'inner' task type which implies that fail in this task will stop all execution and job will be reassigned to another worker. It should be like the last salvation in case everything else goes wrong. -#### Sandboxing +### Sandboxing There are numerous ways how to approach sandboxing on different platforms, describing all possible approaches is out of scope of this document. Instead of @@ -1318,6 +1319,8 @@ implemented well are giving pretty safe sandbox which can be used for all kinds of users solutions and should be able to restrict and stop any standard way of attacks or errors. +#### Linux + Linux systems have quite extent support of sandboxing in kernel, there were introduced and implemented kernel namespaces and cgroups which combined can limit hardware resources (cpu, memory) and separate executing program into its @@ -1327,42 +1330,43 @@ new one. Luckily existing solution was found and its name is **isolate**. Isolate does not use all possible kernel features but only subset which is still enough to be used by ReCodEx. +#### Windows + The opposite situation is in Windows world, there is limited support in its kernel which makes sandboxing a bit trickier. Windows kernel only has ways how to restrict privileges of a process through restriction of internal access tokens. Monitoring of hardware resources is not possible but used resources can -be obtained through newly created job objects. But find sandbox which can do all -things needed for ReCodEx seems to be impossible. There are numerous sandboxes -for Windows but they all are focused on different things in a lot of cases they -serves as safe environment for malicious programs, viruses in particular. Or -they are designed as a separate filesystem namespace for installing a lot of -temporarily used programs. From all these we can mention Sandboxie, Comodo -Internet Security, Cuckoo sandbox and many others. None of these is fitted as -sandbox solution for ReCodEx. With this being said we can safely state that -designing and implementing new general sandbox for Windows is out of scope of -this project. - -New general sandbox for Windows is out of business but what about more -specialized solution used for instance only for C#. CLR as a virtual machine and -runtime environment has a pretty good security support for restrictions and -separation which is also transferred to C#. This makes it quite easy to -implement simple sandbox within C# but surprisingly there cannot be found some -well known general purpose implementations. As said in previous paragraph -implementing our own solution is out of scope of project there is simple not -enough time. But C# sandbox is quite good topic for another project for example -term project for C# course so it might be written and integrated in future. +be obtained through newly created job objects. + +There are numerous sandboxes for Windows but they all are focused on different +things in a lot of cases they serves as safe environment for malicious programs, +viruses in particular. Or they are designed as a separate filesystem namespace +for installing a lot of temporarily used programs. From all these we can +mention: Sandboxie, Comodo Internet Security, Cuckoo sandbox and many others. +None of these is fitted as sandbox solution for ReCodEx. With this being said we +can safely state that designing and implementing new general sandbox for Windows +is out of scope of this project. + +But designing sandbox only for specific environment is possible, namely for C# +and .NET. CLR as a virtual machine and runtime environment has a pretty good +security support for restrictions and separation which is also transferred to +C#. This makes it quite easy to implement simple sandbox within C# but there are +not any well known general purpose implementations. As said in previous +paragraph implementing our own solution is out of scope of project. But C# +sandbox is quite good topic for another project for example term project for C# +course so it might be written and integrated in future. ### Fileserver The fileserver provides access to a shared storage space that contains files submitted by students, supplementary files such as test inputs and outputs and -results of evaluation. In other words, it acts as an intermediate node for data -passed between the frontend and the backend. This functionality can be easily -separated from the rest of the backend features, which led to designing the -fileserver as a standalone component. Such design helps encapsulate the details -of how the files are stored (e.g. on a file system, in a database or using a -cloud storage service), while also making it possible to share the storage -between multiple ReCodEx frontends. +results of evaluation. In other words, it acts as an intermediate storage node +for data passed between the frontend and the backend. This functionality can be +easily separated from the rest of the backend features, which led to designing +the fileserver as a standalone component. Such design helps encapsulate the +details of how the files are stored (e.g. on a file system, in a database or +using a cloud storage service), while also making it possible to share the +storage between multiple ReCodEx frontends. For early releases of the system, we chose to store all files on the file system -- it is the least complicated solution (in terms of implementation complexity) @@ -1449,8 +1453,8 @@ of connection with no message loss. ### API server The API server must handle HTTP requests and manage the state of the application -in some kind of a database. It must also be able to communicate with the -backend over ZeroMQ. +in some kind of a database. It must also be able to communicate with the backend +over ZeroMQ. We considered several technologies which could be used: @@ -1533,36 +1537,71 @@ more convenient, but provides us with less control. #### Authentication To make certain data and actions acessible only for some specific users, there -must be a way how these users can prove their identity. We decided to avoid -PHP sessions to make the server stateless (session ID is stored in the cookies -of the HTTP requests and responses). The server issues a specific token for the +must be a way how these users can prove their identity. We decided to avoid PHP +sessions to make the server stateless (session ID is stored in the cookies of +the HTTP requests and responses). The server issues a specific token for the user after his/her identity is verified (i.e., by providing email and password) -and sent to the client in the body of the HTTP response. The client must remember -this token and attach it to every following request in the *Authorization* header. +and sent to the client in the body of the HTTP response. The client must +remember this token and attach it to every following request in the +*Authorization* header. The token must be valid only for a certain time period ("log out" the user after -a few hours of inactivity) and it must be protected against abuse (e.g., an attacker -must not be able to issue a token which will be considered valid by the system and -using which the attacker could pretend to be a different user). We decided to use -the JWT standard (the JWS). - -The JWT is a base64-encoded string which contains three JSON documents - a header, -some payload, and a signature. The interesting parts are the payload and the signature: -the payload can contain any data which can identify the user and metadata of the token -(i.e., the time when the token was issued, the time of expiration). The last part is a -digital signature contains a digital signature of the header and payload and it -ensures that nobody can issue their own token and steal someone's identity. Both of -these characteristics give us the opportunity to validate the token without storing -all of the tokens in the database. +a few hours of inactivity) and it must be protected against abuse (e.g., an +attacker must not be able to issue a token which will be considered valid by the +system and using which the attacker could pretend to be a different user). We +decided to use the JWT standard (the JWS). + +The JWT is a base64-encoded string which contains three JSON documents - a +header, some payload, and a signature. The interesting parts are the payload and +the signature: the payload can contain any data which can identify the user and +metadata of the token (i.e., the time when the token was issued, the time of +expiration). The last part is a digital signature contains a digital signature +of the header and payload and it ensures that nobody can issue their own token +and steal someone's identity. Both of these characteristics give us the +opportunity to validate the token without storing all of the tokens in the +database. To implement JWT in Nette, we have to implement some of its security-related -interfaces such as IAuthenticator and IUserStorage, which is rather easy -thanks to the simple authentication flow. Replacing these services in a Nette +interfaces such as IAuthenticator and IUserStorage, which is rather easy thanks +to the simple authentication flow. Replacing these services in a Nette application is also straightforward, thanks to its dependency injection container implementation. The encoding and decoding of the tokens itself -including generating the signature and signature verification is done through -a widely used third-party library which lowers the risk of having a bug -in the implementation of this critical security feature. +including generating the signature and signature verification is done through a +widely used third-party library which lowers the risk of having a bug in the +implementation of this critical security feature. + +#### Forgotten password + +With authentication and some sort of dealing with passwords is related a problem +with forgotten credentials, especially passwords. There has to be some kind of +mechanism to retrieve a new password or change the old one. + +First, there are absolutely not secure and recommendable ways how to handle +that, for example sending the old password through email. A better, but still +not secure solution is to generate a new one and again send it through email. + +Mentioned solution was provided in CodEx, users had to write an email to +administrator, who generated a new password and sent it back to the sender. This +simple solution could be also automated, but administrator had quite a big +control over whole process. This might come in handy if there should be some +additional checkups, but on the other hand it can be quite time consuming. + +Probably the best solution which is often used and is fairly secure follows. Let +us consider only case in which all users have to fill their email addresses into +the system and these addresses are safely in the hands of the right users. + +When user finds out that he/she does not remember a password, he/she requests a +password reset and fill in his/her unique identifier; it might be email or +unique nickname. Based on matched user account the system generates unique +access token and sends it to user via email address. This token should be time +limited and usable only once, so it cannot be misused. User then takes the token +or URL address which is provided in the email and go to the system's appropriate +section, where new password can be set. After that user can sign in with his/her +new password. + +As previously stated, this solution is quite safe and user can handle it on its +own, so administrator does not have to worry about it. That is the main reason +why this approach was chosen to be used. #### Uploading files @@ -1585,7 +1624,7 @@ checking. Previous chapters implies, that each user has to have a role, which corresponds to his/her privileges. Our research showed, that three roles are sufficient -- student, supervisor and administrator. The user role has to be checked with every request. The good points is, that roles nicely match with -granuality of API endpoints, so the permission checking can be done at the +granularity of API endpoints, so the permission checking can be done at the beginning of each request. That is implemented using PHP annotations, which allows to specify allowed user roles for each request with very little of code, but all the business logic is the same, together in one place. @@ -1624,10 +1663,10 @@ We decided for the lazy loading at the time when the results are requested for the first time. However, the concept of asynchronous jobs is then introduced. This type of job is useful for batch submitting of jobs, for example re-running jobs which failed on a worker hardware issue. These jobs are typically submitted -by different user than the author (an administrator for example), so the original -authors should be notified. In this case it is more reasonable to load the results -immediately and optionally send them a notification via an email. This is exactely -what we do. +by different user than the author (an administrator for example), so the +original authors should be notified. In this case it is more reasonable to load +the results immediately and optionally send them a notification via an email. +This is exactly what we do. It seems with the benefit of hindsight that immediate loading of all jobs could simplify the code and it has no major drawbacks. In the next version of ReCodEx @@ -1635,23 +1674,48 @@ we will re-evaluate this decision. #### Communication with the backend -##### Backend failiure reporting - -The backend is a separate component which does not communicate with the administrators directly. When it encounters an error it stores it in a log file. It would be handy to inform the administrator directly at this moment so he can fix the cause of the error as soon as possible. The backend does not have any mechanism for notifying users using for example an email. The API server on the other hand has email sending implemented and it can easily forward any messages to the administrator. A secured communication protocol between the backend and the frontend already exists (it is used for the reporting of a finished job processing) and it is easy to add another endpoint for bug reporting. - -When a request for sending a report arrives from the backend then the type of the report is inferred and if it is an error which deserves attention of -the administrator then an email is sent to him/her. There can also be errors which are not that important (e.g., it was somehow solved by the backend itself or it is only informative, then these do not have to be reported through an email but can only be stored in the persistent database for further consideration. - -On top of that the separate backend component does not have to be exposed to the outside network at all. - -If a job processing fails then the backend informs the API server which initiated processing of the job. If an error which is not related to job-processing occurs then the backend must communicate with a given API server which is configured by the administrator while the other API servers which are using the same backend are not informed. +##### Backend failure reporting + +The backend is a separate component which does not communicate with the +administrators directly. When it encounters an error it stores it in a log file. +It would be handy to inform the administrator directly at this moment so he can +fix the cause of the error as soon as possible. The backend does not have any +mechanism for notifying users using for example an email. The API server on the +other hand has email sending implemented and it can easily forward any messages +to the administrator. A secured communication protocol between the backend and +the frontend already exists (it is used for the reporting of a finished job +processing) and it is easy to add another endpoint for bug reporting. + +When a request for sending a report arrives from the backend then the type of +the report is inferred and if it is an error which deserves attention of the +administrator then an email is sent to him/her. There can also be errors which +are not that important (e.g., it was somehow solved by the backend itself or it +is only informative, then these do not have to be reported through an email but +can only be stored in the persistent database for further consideration. + +On top of that the separate backend component does not have to be exposed to the +outside network at all. + +If a job processing fails then the backend informs the API server which +initiated processing of the job. If an error which is not related to +job-processing occurs then the backend must communicate with a given API server +which is configured by the administrator while the other API servers which are +using the same backend are not informed. ##### Backend state monitoring -The next thing related to communication with the backend is monitoring its current state. This concerns namely which workers are available for processing different hardware groups and which languages can be therefore used in exercises. +The next thing related to communication with the backend is monitoring its +current state. This concerns namely which workers are available for processing +different hardware groups and which languages can be therefore used in +exercises. -Another step would be the overall backend state like how many jobs were processed by some particular worker, workload of the broker and the workers, etc. The easiest solution is to manage this information by hand, every -instance of the API server has to have an administrator which would have to fill them. This of course includes only the currently available workers and runtime environments which does not change very often. The real-time statistics of the backend cannot be made accesible this way in a reasonable way. +Another step would be the overall backend state like how many jobs were +processed by some particular worker, workload of the broker and the workers, +etc. The easiest solution is to manage this information by hand, every instance +of the API server has to have an administrator which would have to fill them. +This includes only the currently available workers and runtime +environments which does not change very often. The real-time statistics of the +backend cannot be made accessible this way in a reasonable way. A better solution is to update this information automatically. This can be done in two ways: @@ -1659,9 +1723,13 @@ done in two ways: - It can be provided by the backend on-demand if API needs it - The backend will send these information periodically to the API. -Things like currently available workers or runtime environments are better to be really up-to-date so this could be provided on-demand if needed. Backend statistics are not that necessary and could be updated periodically. +Things like currently available workers or runtime environments are better to be +really up-to-date so this could be provided on-demand if needed. Backend +statistics are not that necessary and could be updated periodically. -However due to the lack of time automatic monitoring of the backend state will not be implemented in the early versions of this project but might be implemented in some of the next releases. +However due to the lack of time automatic monitoring of the backend state will +not be implemented in the early versions of this project but might be +implemented in some of the next releases. ### The WebApp @@ -1677,34 +1745,45 @@ One of the downsides is the large number of different web browsers (including the older versions of a specific browser) and their different interpretation of the code (HTML, CSS, JS). Some features of the latest specifications of HTML5 are implemented in some browsers which are used by a subset of the Internet -users. This has to be taken into account when choosing apropriate tools +users. This has to be taken into account when choosing appropriate tools for implementation of a website. There are two basic ways how to create a website these days: - **server-side approach** - user's actions are processed on the server and the -HTML code with the results of the action is generated on the server and sent back -to the user's Internet browser. The client does not handle any logic (apart from -rendering of the user interface and some basic user interaction) and is therefore -very simple. The server can use the API server for processing of the actions so -the business logic of the server can be very simple as well. A disadvantage of -this approach is that a lot of redundant data is transferred across the requests -although some parts of the content can be cached (e.g., CSS files). This results -in longer loading times of the website. -- **server-side rendering with asynchronous updates (AJAX)** - a slightly different -approach is to render the page on the server as in the previous case but then -execute user's actions asynchronously using the `XMLHttpRequest` JavaScript -functionality. Which creates a HTTP request and transfers only the part of the -website which will be updated. -- **Single Page Application (SPA)** - the opposite approach is to transfer the communication with the API server and the rendering of the HTML completely from the server directly to the client. The client runs the code (usually JavaScript) in his/her web browser and the content of the website is generated based on the data received from the API server. The script file is usually quite large but it can be cached and does not have to be downloaded from the server again (until the cached file expires). Only the data from the API server needs to be transfered over the Internet and thus reduce the volume of payload on each request which leads to a much more responsive user experience, especially on slower networks. Since the client-side code has full control over the UI and a more sophisticated user interactions with the UI can be achieved. + HTML code with the results of the action is generated on the server and sent + back to the user's Internet browser. The client does not handle any logic + (apart from rendering of the user interface and some basic user interaction) + and is therefore very simple. The server can use the API server for processing + of the actions so the business logic of the server can be very simple as well. + A disadvantage of this approach is that a lot of redundant data is transferred + across the requests although some parts of the content can be cached (e.g., + CSS files). This results in longer loading times of the website. +- **server-side rendering with asynchronous updates (AJAX)** - a slightly + different approach is to render the page on the server as in the previous case + but then execute user's actions asynchronously using the `XMLHttpRequest` + JavaScript functionality. Which creates a HTTP request and transfers only the + part of the website which will be updated. +- **client-side approach** - the opposite approach is to transfer the + communication with the API server and the rendering of the HTML completely + from the server directly to the client. The client runs the code (usually + JavaScript) in his/her web browser and the content of the website is generated + based on the data received from the API server. The script file is usually + quite large but it can be cached and does not have to be downloaded from the + server again (until the cached file expires). Only the data from the API + server needs to be transferred over the Internet and thus reduce the volume of + payload on each request which leads to a much more responsive user experience, + especially on slower networks. Since the client-side code has full control + over the UI and a more sophisticated user interactions with the UI can be + achieved. All of these approaches are used in production by the web developers and all of them are well documented and there are mature tools for creating websites using any of these approaches. -We decided to use the third approach -- to create a fully client-side application -which would be familiar and intuitive for a user who is used to modern web -applications. +We decided to use the third approach -- to create a fully client-side +application which would be familiar and intuitive for a user who is used to +modern web applications. #### Used technologies @@ -1738,194 +1817,626 @@ The whole project is written using the next generation of JavaScript referred to # User documentation -@todo: Describe different scenarios of the usage of the Web App - -@todo: Describe the requirements of running the web application (modern web browser, enabled CSS, JavaScript, Cookies & Local storage) +Users interact with the ReCodEx through the web application. It is required to +use a modern web browser with good HTML5 and CSS3 support. Among others, cookies +and local storage are used. Also a decent JavaScript runtime must be provided by +the browser. + +Supported and tested browsers are: Firefox 50+, Chrome 55+, Opera 42+ and Edge +13+. Mobile devices often have problems with internationalization and possibly +lack support for some common features of desktop browsers. In this stage of +development is not possible for us to fine tune the interface for major mobile +browsers on all mobile platforms. However, it is confirmed to work with latest +Google Chrome and Gello browser on Android 7.1+. Issues have been reported with +Firefox that will be fixed in the future. Also, it is confirmed to work with +Safari browser on iOS 10. + +Usage of the web application is divided into sections concerning particular user +roles. Under these sections all possible use cases can be found. These sections +are inclusive, so more privileged users need to read instructions for all less +privileged users. Described roles are: + +- Student +- Group supervisor +- Group administrator +- Instance administrator +- Superadministrator ## Terminology -@todo: Describe the terminology: Instance, User, Group, Student, -Supervisor, Admin +**Instance** -- Represents a university, company or some other organization + unit. Multiple instances can exist in a single ReCodEx installation. -## General basics +**Group** -- A group of students to which exercises are assigned by a + supervisor. It should typically correspond with a real world lab group. + +**User** -- A person that interacts with the system using the web interface (or + an alternative client). + +**Student** -- A user with least privileges who is subscribed to some groups and + submits solutions to exercise assignments. + +**Supervisor** -- A person responsible for assigning exercises to a group and + reviewing submissions. + +**Admin** -- A person responsible for the maintenance of the system and fixing + problems supervisors cannot solve. -@todo: actions which are available for all users +**Exercise** -- An algorithmic problem that can be assigned to a group. They + can be shared by the teachers using an exercise database in ReCodEx. -@todo: how to solve problems with ReCodEx, first supervisors, then administrators, etc... +**Assignment** -- An exercise assigned to a group, possibly with modifications. + +**Runtime environment** -- Runtime environment is unique combination of platform + (OS) and programming language runtime/compiler in specific version. Runtime + environments are managed by the administrators to reflect abilities of whole + system. + +**Hardware group** -- Hardware group is a set of workers with similar hardware. + Its purpose is to group workers that are likely to run a program using the same + amount of resources. Hardware groups are managed byt the system administrators + who have to keep them up-to-date. + +## General basics + +Description of general basics which are the same for all users of ReCodEx web +application follows. ### First steps in ReCodEx -You can create an account if you click on the “*Create account*” menu -item in the left sidebar. You can choose between two types of -registration methods – by creating a local account with a specific -password, or pairing your new account with an existing CAS UK account. - -If you decide a new “*local*” account using the “*Create ReCodEx -account*” form, you will have to provide your details and choose a -password for your account. You will later sign in using your email -address as your username and the password you select. - -If you decide to use the CAS UK, then we will verify your credentials -and access your name and email stored in the system and create your -account based on this information. You can change your personal -information or email later on the “*Settings*” page. - -When creating your account both ways, you must select an instance your -account will belong to by default. The instance you will select will be -most likely your university or other organization you are a member of. - -To log in, go to the homepage of ReCodEx and in the left sidebar choose -the menu item “*Sign in*”. Then you must enter your credentials into one -of the two forms – if you selected a password during registration, then -you should sign with your email and password in the first form called -“*Sign into ReCodEx*”. If you registered using the Charles University -Authentication Service (CAS), you should put your student’s number and -your CAS password into the second form called “Sign into ReCodEx using -CAS UK”. +You can create an account by clicking the "Create account" menu item in the left +sidebar. You can choose between two types of registration methods -- by creating +a local account with a specific password, or pairing your new account with an +existing CAS UK account. + +If you decide to create a new local account using the "Create ReCodEx account” +form, you will have to provide your details and choose a password for your +account. Although ReCodEx allows using quite weak passwords, it is wise to use a +bit stronger ones The actual strength is shown in progress bar near the password +field during registration. You will later sign in using your email address as +your username and the password you select. + +If you decide to use the CAS UK service, then ReCodEx will verify your CAS +credentials and create a new account based on information stored there (name and +email address). You can change your personal information later on the +"Settings" page. + +Regardless of the desired account type, an instance it will belong to must be +selected. The instance will be most likely your university or other organization +you are a member of. + +To log in, go to the homepage of ReCodEx and in the left sidebar choose the menu +item "Sign in". Then you must enter your credentials into one of the two forms +-- if you selected a password during registration, then you should sign with +your email and password in the first form called "Sign into ReCodEx". If you +registered using the Charles University Authentication Service (CAS), you should +put your student’s number and your CAS password into the second form called +"Sign into ReCodEx using CAS UK". There are several options you can edit in your user account: -- changing your personal information (i.e., name) -- changing your credentials (email and password) -- updating your preferences (e.g., source code viewer/editor settings, - default language) +- changing your personal information (i.e., name) +- changing your credentials (email and password) +- updating your preferences (source code viewer/editor settings, default + language) -You can access the settings page through the “*Settings*” button right -under your name in the left sidebar. +You can access the settings page through the "Settings" button right under +your name in the left sidebar. -If you don’t use ReCodEx for a whole day, you will be logged out -automatically. However, we recommend you sign out of the application -after you finished your interaction with it. The logout button is placed -in the top section of the left sidebar right under your name. You will -have to expand the sidebar with a button next to the “*ReCodEx*” title -(shown in the picture below). +If you are not active in ReCodEx for a whole day, you will be logged out +automatically. However, we recommend you sign out of the application after you +finish your interaction with it. The logout button is placed in the top section +of the left sidebar right under your name. You may need to expand the sidebar +with a button next to the "ReCodEx” title (informally known as _hamburger +button_), depending on your screen size. ### Forgotten password -If you can’t remember your password and you don’t use CAS UK -authentication, then you can reset your password. You will find a link -saying “*You cannot remember what your password was? Reset your -password.*” under the sign in form. After you click on this link, you -will be asked to submit your email address. An email with a link -containing a special token will be sent to the address you fill in. We -make sure that the person who requested password resetting is really -you. When you click on the link (or you copy & paste it into your web -browser) you will be able to select a new password for your account. The -token is valid only for a couple of minutes, so do not forget to reset -the password as soon as possible, or you will have to request a new link +If you cannot remember your password and you do not use CAS UK authentication, +then you can reset your password. You will find a link saying "Cannot remember +what your password was? Reset your password." under the sign in form. After you +click this link, you will be asked to submit your registration email address. A +message with a link containing a special token will be sent to you by e-mail -- +we make sure that the person who requested password resetting is really you. +When you visit the link, you will be able to enter a new password for your +account. The token is valid only for a couple of minutes, so do not forget to +reset the password as soon as possible, or you will have to request a new link with a valid token. If you sign in through CAS UK, then please follow the instructions provided by the administrators of the service described on their website. +### Dashboard + +When you log into the system you should be redirected to your "Dashboard". On +this page you can see some brief information about the groups you are member of. +The information presented there varies with your role in the system -- further +description of dashboard will be provided later on with according roles. ## Student -@todo: describe what it means to be a “student” and what are the -student’s rights +Student is a default role for every newly registered user. This role has quite +limited capabilites in ReCodEx. Generally, a student can only submit solutions +of exercises in some particular groups. These groups should correspond to +courses he/she attends. + +On the "Dashboard" page there is "Groups you are student of" section where you +can find list of your student groups. In first column of every row there is a +brief panel describing concerning group. There is name of the group and +percentage of gained points from course. If you have enough points to +successfully complete the course then this panel has green background with tick +sign. In the second column there is a list of assigned exercises with its +deadlines. If you want to quickly get to the groups page you might want to use +provided "Show group's detail" button. ### Join group and start solving assignments -@todo: How to join a specific group +To be able to submit solutions you have to be a member of the right group. Each +instance has its own group hierarchy, so you can choose only those within your +instance. That is why a list of groups is available from under an instance link +located in the sidebar. This link brings you to instance detail page. + +In there you can see a description of the instance and most importantly in +"Groups hierarchy" box there is a hierarchical list of all public groups in the +instance. Please note that groups with plus sign are collapsible and can be +further extended. When you find a group you would like to join, continue by +clicking on "See group's page" link following with "Join group" link. + +**Note:** Some groups can be marked as private and these groups are not visible + in hierarchy and membership cannot be established by students themselves. + Management of students in this type of groups is in the hands of supervisors. + +On the group detail page there are multiple interesting things for you. First +one is brief overview with information describing the group, there is list with +supervisors and also hierarchy of subgroups. Most importantly, there is the +"Student's dashboard" section. This section contains list of assignments and +a list of fellow students. If supervisors of groups allowed students to see each +other's statistics there will also be the number of points the students gained. + +In the "Assignments" box on the group detail page there is a list of assigned +exercises which students are supposed to solve. The assignments are displayed +with their names and deadlines. There are possibly two deadlines, the first one +means that till this datetime student will receive full amount of points in case +of successful solution. Second deadline does not have to be set, but in case it +is, the maximum number of points for successful solution between these two +deadlines can be different. + +An assignment link will lead you to assignment detail page where are presented +all known details about assignment. There are of course both deadlines, limit of +submissions which you can make and also full-range description of assignment, +which can be localized. The localization can be on demand switched between all +language variants in tab like box. + +Further on the page you can find "Submitted solutions" box where is a list of +submissions with links to result details. But most importantly there is a +"Submit new solution" button on the assignment page which provides an interface +to submit solution of the assignment. + +After clicking on submit button, dialog window will show up. In here you can +upload files representing your solution, you can even add some notes to mark the +solution. Your supervisor can also access this note. After you successfully +upload all files necessary for your solution, click the "Submit your solution" +button and let ReCodEx evaluate the solution. + +During the execution ReCodEx backend might send evaluation progress state to +your browser which will be displayed in another dialog window. When the whole +execution is finished then a "See the results" button will appear and you can +look at the results of your solution. + +On the results detail page there are a lot of information. Apart from assignment +description, which is not connected to your results, there is also the solution +submitter name (supervisor can submit a solution on your behalf), further there +are files which were uploaded on submission and most importantly "Evaluation +details" and "Test results" boxes. + +Evaluation details contains overall results of your solution. There are +information such as whether the solution was provided before deadlines, if the +evaluation process successfully finished or if compilation succeeded. After that +you can find a lot of values, most important one is the last, "Total score", +consisting of your score, slash and the maximum number of points for this +assignment. Interestingly the your score value can be higher than the maximum, +which is caused by "Bonus points" item above. If your solution is nice and +supervisor notices it, he/she can assign you additional points for effort. On +the other hand, points can be also subtracted for bad coding habits or even +cheating. + +In test results box there is a table of all exercise tests results. Columns +represents these information: + +- test case overall result, symbol of yes/no option +- test case name +- percentage of correctness of this particular test +- evaluation status, if test was successfully executed or failed +- memory limit, if supervisor allowed it then percentual memory usage is + displayed +- time limit, if supervisor allowed it then percentual time usage is displayed + +A new feature of web application is "Comments and notes" box where you can +communicate with your supervisors or just write random private notes to your +submission. Adding a note is quite simple, you just write it to text field in +the bottom of box and click on the "Send" button. The button with lock image +underneath can switch visibility of newly created comments. + +In case you think the ReCodEx evaluation of your solution is wrong, please use +the comments system described above, or even better notify your supervisor by +another channel (email). Unfortunately there is currently no notification +mechanism for new comment messages. -@todo: Where can the user see groups description and details, what -information is available. -@todo: Where the student can find the list of the assignment he is -expected to solve, what is the first and second deadline. +## Group supervisor -@todo: How does a student submit his solution through the web app +Group supervisor is typically the lecturer of a course. A user in this role can +modify group description and properties, assign exercises or manage list of +students. Further permissions like managing subgroups or supervisors is +available only for group administrators. + +On "Dashboard" page you can find "Groups you supervise" section. Here there are +boxes representing your groups with the list of students attending course and +their points. Student names are clickable with redirection to user's profile +where further information about his/hers assignments and solution can be found. +To quickly jump onto groups page, use "Show group's detail" button at the bottom +of the matching group box. + +### Manage group + +Locate group you supervise and you want to manage. All your supervised groups +are available in sidebar under "Groups -- supervisor" collapsible menu. If you +click on one of those you will be redirected to group detail page. In addition +to basic group information you can also see "Supervisor's controls" section. In +this section there are lists of current students and assignments. + +As a supervisor of group you are able to see "Edit group settings" button +at the top of the page. Following this link will take you to group editation +page with form containing these fields: + +- group name which is visible to other users +- external identification which may be used for pairing with entries in an + information system +- description of group which will be available to users in instance (in + Markdown) +- set if group is publicly visible (and joinable by students) or private +- options to set if students should be able see statistics of each other +- minimal points threshold which students have to gain to successfully complete + the course + +After filling all necessary fields the form can be sent by clicking on "Edit +group" button and all changes will be applied. + +For students management there are "Students" and "Add student" boxes. The first +one is simple list of all students which are attending the course with the +possibility of delete them from the group. That can be done by hitting "Leave +group" button near particular user. The second box is for adding students to the +group. There is a text field for typing name of the student and after clicking +on the magnifier image or pressing enter key there will appear list of matched +users. At this moment just click on the "Join group" button and student will be +signed in to your group. -@todo: When the results are ready and what the results mean and what to -do about them, when the user is convinced, that his solution is correct -although the results say different +### Assigning exercises -@todo: Describe the comments thread behavior (public/private comments), -who else can see the comments, how notifications work (*not implemented -yet*!). +Before assigning an exercise, you obviously have to know what exercises are +available. A list of all exercises in the system can be found under "Exercises" +link in sidebar. This page contains a table with exercises names, difficulties +and names of the exercise authors. Further information about exercise is +available by clicking on its name. + +On the exercise details page are numerous information about it. There is a box +with all possible localized descriptions and also a box with some additional +information of exercise author, its difficulty, version, etc. There is also a +description for supervisors by exercise author under "Exercise overview" option, +where some important information can be found. And most notably there is an +information about available programming languages for this exercise, under +"Supported runtime environments" section. + +If you decide that the exercise is suitable for one of your groups, look for the +"Groups" box at the bottom of the page. There is a list of all groups you +supervise with an "Assign" button which will assign the exercise to the +selected group. + +After clicking on the "Assign" button you should be redirected to assignment +editation page. In there you can find two forms, one for editation of assignment +meta information and the second one for setting exercise time and memory limits. + +In meta information form you can fill these options: + +- name of the assignment which will be visible in a group +- visibility (if an assignment is under construction then you can mark it as not + visible and students will not see it) +- subform for localized descriptions (new localization can be added by clicking + on "Add language variant" button, current one can be deleted with "Remove this + language" button) + - language of description from dropdown field (English, Czech, German) + - description in selected language +- score configuration which will be used on students solution evaluation, you + can find some very simple one already in here, description of score + configuration can be found further in "Writing score configuration" chapter +- first submission deadline +- maximum points that can be gained before the first deadline; if you want to + manage all points manually, set it to 0 and then use bonus points, which are + described in the next subchapter +- second submission deadline, after that students still can submit exercises but + they are given no points no points (must be after the first deadline) +- maximum points that can be gained between first deadline and second deadline +- submission count limit for students' solutions -- limits the amount of + attempts a student has at solving the problem +- visibility of memory and time ratios; if true students can see the percentage + of used memory and time (with respect to the limit) for each test +- minimum percentage of points which each submission must gain to be considered + correct (if it gets less, it will gain no points) +- whether the assignment is marked as bonus one and points from solving it are + not included into group threshold limit (that means solving it can get you + additional points over the limit) + +The form has to be submitted with "Edit settings" button otherwise changes will +not be saved. + +The same editation page serves also for the purpose of assignment editation, not +only creation. That is why on bottom of the page "Delete the assignment" box +can be found. Clearly the button "Delete" in there can be used to unassign +exercise from group. + +The last unexplored area is the time and memory limits form. The whole form is +situated in a box with tabs which are leading to particular runtime +environments. If you wish not to use one of those, locate "Remove" button at the +bottom of the box tab which will delete this environment from the assignment. +Please note that this action is irreversible. + +In general, every tab in environments box contains some basic information about +runtime environment and another nested tabbed box. In there you can find all +hardware groups which are available for the exercise and set limits for all test +cases. The time limits have to be filled in seconds (float), memory limits are +in bytes (int). If you are interested in some reference values to particular +test case then you can take a peek on collapsible "Reference solutions' +evaluations" items. If you are satisfied with changes you made to the limits, +save the form with "Change limits" button right under environments box. +### Students' solutions management -## Group supervisor +One of the most important tasks of a group supervisor is checking student +solutions. As automatic evaluation of them cannot catch all problems in the +source code, it is advisable to do a brief manual review of student's coding +style and reflect that in assignment bonus points. + +On "Assignment detail" page there is an "View student results" button near top +of the page (next to "Edit assignment settings" button). This will redirect you +to a page where is a list of boxes, one box per student. Each student box +contains a list of submissions for this assignment. The row structure of +submission list is the same as the structure in student's "Submitted solution" +box. More information about every solution can be showed by clicking on "Show +details" link on the end of solution row. + +This page is the same as for students with one exception -- there is an +additional collapsed box "Set bonus points". In unfolded state, there is an +input field for one number (positive or negative integer) and confirmation +button "Set bonus points". After filling intended amount of points and +submitting the form, the data in "Evaluation details" box get immediately +updated. To remove assigned bonus points, submit just the zero number. The bonus +points are not additive, newer value overrides older values. + +It is useful to give a feedback about the solution back to the user. For this +you can use the "Commens and notes" box. Make sure that the messages are not +private, so that the student can see them. More detailed description of this box +can be nicely used the "Comments and notes" box. Make sure that the messages are +not private, so the student can see them. More detailed description of this box +is available in student part of user documentation. + +One of the discussed concept was marking one solution as accepted. However, due +to lack of frontend developers it is not yet prepared in user interface. We +hope, it will be ready as soon as possible. The button for accepting a solution +will be most probably also on this page. -@todo: describe what it means to be a “supervisor” of a group and what -are the supervisors rights +### Creating exercises -### Create groups and manage them +Link to exercise creation can be found in exercises list which is accessible +through "Exercises" link in sidebar. On the bottom of the exercises list page +you can find "Add exercise" button which will redirect you to exercise editation +page. In this moment exercise is already created so if you just leave this page +exercise will stay in the database. This is also reason why exercise creation +form is the same as the exercise editation form. + +Exercise editation page is divided into three separate forms. First one is +supposed to contain meta information about exercise, second one is used for +uploading and management of supplementary files and third one manages runtime +configuration in which exercise can be executed. + +First form is located in "Edit exercise settings" and generally contains meta +information needed by frontend which are somehow somewhere visible. In here you +can define: + +- exercise name which will be visible to other supervisors +- difficulty of exercise (easy, medium, hard) +- description which will be available only for visitors, may be used for further + description of exercise (for example information about test cases and how they + could be scored) +- private/public switch, if exercise is private then only you as author can see + it, assign it or modify it +- subform containing localized descriptions of exercise, new one can be added + with "Add language variant" button and current one deleted with "Remove this + language" + - language in which this particular description is in (Czech, English, + German) + - actual localized description of exercise + +After all information is properly set form has to be submitted with "Edit +settings" button. + +Management of supplementary files can be found in "Supplementary files" box. +Supplementary files are files which you can use further in job configurations +which have to be provided in all runtime configurations. These files are +uploaded directly to fileserver from where worker can download them and use +during execution according to job configuration. + +Files can be uploaded either by drag and drop mechanism or by standard "Add a +file" button. In opened dialog window choose file which should be uploaded. All +chosen files are immediately uploaded to server but to save supplementary files +list you have to hit "Save supplementary files" button. All previously uploaded +files are visible right under drag and drop area, please note that files are +stored on fileserver and cannot be deleted after upload. + +The last form on exercise editation page is runtime configurations editation +form. Exercise can have multiple runtime configurations according to the number +of programming languages in which it can be run. Every runtime configuration +corresponds to one programming language because all of them has to have a bit +different job configuration. + +New runtime configuration can be added with "Add new runtime configuration" +button this will spawn new tab in runtime configurations box. In here you can +fill following: + +- human readable identifier of runtime configuration +- runtime environment which corresponds to programming language +- job configuration in YAML, detailed description of job configuration can be + found further in this chapter in "Writing job configuration" section + +If you are done with changes to runtime configurations save form with "Change +runtime configurations" button. If you want to delete some particular runtime +just hit "Remove" button in the right tab, please note that after this operation +runtime configurations form has to be again saved to apply changes. + +All runtime configurations which were added to exercise will be visible to +supervisors and all can be used in assignment, so please be sure that all of the +languages and job configurations are working. + +If you choose to delete exercise, at the bottom of the exercise editation page +you can find "Delete the exercise" box where "Delete" button is located. By +clicking on it exercise will be delete from the exercises list and will no +longer be available. + +### Exercise's reference solutions + +Each exercise should have a set of reference solutions, which are used to tune +time and memory limits of assignments. Values of used time and memory for each +solution are displayed in yellow boxes under forms for setting assignment limits +as described earlier. + +However, there is currently no user interface to upload and evaluate reference +solutions. It is possible to use direct REST API calls, but it is not much user +friendly. If you are interested, please look at [API +documentation](https://recodex.github.io/api/), notably sections +_Uploaded-Files_ and _Reference-Exercise-Solutions_. You need to upload the +reference solution files, create a new reference solution and then evaluate the +solution. After that, measured data will be available in the box at assignment +editing page (setting limits section). + +We are now working on a better user interface, which will be available soon. +Then its description will be added here. -@todo: How does a user become a supervisor of a group? -@todo: How to add a specific student to a given group +## Group administrator -### Assigning exercises +Group administrator is the group supervisor with some additional permissions in +particular group. Namely group administrator is capable of creating a subgroups +in managed group and also adding and deleting supervisors. Administrator of the +particular group can be only one person. -@todo: Describe how to access the database of the exercises and what are -the possibilities of assignment setup – availability, deadlines, points, -score configuration, limits +### Creating subgroups and managing supervisors -@todo: How can I assign some exercises only to some students of the group? Describe how to achieve this using subgroups +There is no special link which will get you to groups in which you are +administrator. So you have to get there through "Groups - supervisor" link in +sidebar and choose the right group detail page. If you are there you can see +"Administrator controls" section, here you can either add supervisor to group or +create new subgroup. -### Students' solutions management +Form for creating a subgroup is present right on the group detail page in "Add +subgroup" box. Group can be created with following options: -@todo Describe where all the students’ solutions for a given assignment -can be found, where to look for all solutions of a given student, how to -see results of a specific student’s solution’s evaluation result. +- name which will be visible in group hierarchy +- external identification, can be for instance ID of group from school system +- some brief description about group +- allow or deny users to see each others statistics from assignments -@todo Can I assign points to my students’ solutions manually instead of depending on automatic scoring? If and how to change the score of a solution – assignment -settings, setting points, bonus points, accepting a solution (*not -implemented yet!*). Describe how the student and supervisor will still -be able to see the percentage received from the automatic scoring, but -the awarded points will be overridden. +After filling all the information a group can be created by clicking on "Create +new group" button. If creation is successful then the group is visible in +"Groups hierarchy" box on the top of page. All information filled during +creation can be later modified. -@todo: Describe the comments thread behavior (public/private comments), -who else can see the comments -- same as from the student perspective +Adding a supervisor to a group is rather easy, on group detail page is an "Add +supervisor" box which contains text field. In there you can type name or +username of any user from system. After filling user name, click on the +magnifier image or press the enter key and all suitable users are searched. If +your chosen supervisor is in the updated list then just click on the "Make +supervisor" button and new supervisor should be successfully set. -### Creating exercises +Also, existing supervisor can be removed from the group. On the group detail +page there is "Supervisors" box in which all supervisors of the group are +visible. If you are the group administrator, you can see there "Remove +supervisor" buttons right next to supervisors names. After clicking on it some +particular supervisor should not to be supervisor of the group anymore. -@todo: how to create exercise, what has to be provided during creation, who can create exercises -@todo: Describe the form and explain the concept of reference solutions. -How to evaluate the reference solutions for the exercise right now (to -get the up-to-date information). +## Instance administrator +Instance administrator can be only one person per instance. In addition to +previous roles this administrator should be able to modify the instance details, +manage licences and take care of top level groups which belong to the instance. -## Group administrator +### Instance management -@todo: who is this? +List of all instances in the system can be found under "Instances" link in the +sidebar. On that page there is a table of instances with their respective +admins. If you are one of them, you can visit its page by clicking on the +instance name. On the instance details page you can find a description of the +instance, current groups hierarchy and a form for creating a new group. -### Creating subgroups and managing supervisors +If you want to change some of the instance settings, follow "Edit instance" link +on the instance details page. This will take you to the instance editation page +with corresponding form. In there you can fill following information: + +- name of the instance which will be visible to every other user +- brief description of instance and for whom it is intended +- checkbox if instance is open or not which means public or private (hidden from + potential users) + +If you are done with your editation, save filled information by clicking on +"Update instance" button. + +If you go back to the instance details page you can find there a "Create new +group" box which is able to add a group to the instance. This form is the same +as the one for creating subgroup in already existing group so we can skip +description of the form fields. After successful creation of the group it will +appear in "Groups hierarchy" box at the top of the page. + +### Licences -@todo: What it means to create a subgroup and how to do it. +On the instance details page, there is a box "Licences". On the first line, it +shows it this instance has currently valid licence or not. Then, there are +multiple lines with all licences assigned to this instance. Each line consists of +a note, validity status (if it is valid or revoked by superadministrator) and +the last date of licence validity. -@todo: who can add another supervisor, what would be the rights of the -second supervisor +A box "Add new licence" is used for creating new licences. Required fields are +the note and the last day of validity. It is not possible to extend licence +lifetime, a new one should be generated instead. It is possible to have more +than one valid licence at a time. Currently there is no user interface for +revoking licences, this is done manually by superadministrator. If an instance +is to be disabled, all valid licences have to be revoked. ## Superadministrator -Superadmin is user with the most priviledges and as such superadmin should be -quite unique role. Ideally there should be only one of this kind, used with -special caution and adequate security. With this stated it is obvious that -superadmin can perform any action the API is capable of. +Superadministrator is a user with the most privileges and as such superadmin +should be quite a unique role. Ideally, there should be only one user of this +kind, used with special caution and adequate security. With this stated it is +obvious that superadmin can perform any action the API is capable of. ### Users management -There are only few roles to which users can belong in ReCodEx. Basically there -are only three: _student_, _supervisor_, and _superadmin_. Base role is student -which is assigned to every registered user. Roles are stored in database -alongside other information about user. One user always has only one role at the -time. At first startup of ReCodEx administrator should create his account and -then change role in database by hand. After that manual intervention into -database should never be needed. +There are only a few user roles in ReCodEx. Basically there are only three: +_student_, _supervisor_, and _superadmin_. Base role is student which is +assigned to every registered user. Roles are stored in database alongside other +information about user. One user always has only one role at the time. At first +startup of ReCodEx, the administrator has to change the role for his/her account +manually in the database. After that manual intervention into database should +never be needed. There is a little catch in groups and instances management. Groups can have admins and supervisors. This setting is valid only per one particular group and has to be separated from basic role system. This implies that supervisor in one group can be student in another and simultaneously have global supervisor role. -Changing role from student to supervisor and back is done automatically by -application and should not be managed by hand in database! Previously stated -information can be applied to instances as well, but instances can only have -admins. +Changing role from student to supervisor and back is done automatically when the +new privileges are granted to the user, so managing roles by hand in database is +not needed. Previously stated information can be applied to instances as well, +but instances can only have admins. Roles description: @@ -1937,8 +2448,56 @@ Roles description: assigned exercises. On top of that supervisor can create/delete groups too, but only as subgroup of groups he/she belongs to. - Superadmin -- Inherits all permissions from supervisor role. Most powerful - user in ReCodEx who should be able to do everything which is provided by - application. + user in ReCodEx who should be able to do access any functionality provided by + the application. + + +## Writing score configuration + +An important thing about assignment is how to assign points to particular +solutions. As mentioned previously, the whole job is composed of logical tests. +All of these tests have to contain one essential "evaluation" task. Evaluation +task should output one float number which can be further used for scoring of +particular tests. + +Total resulting score of the students solution is then calculated according to a +supplied score config (described below) and using specified calculator. Total +score is also a float between 0 and 1. This number is then multiplied by the +maximum of points awarded for the assignment by the teacher assigning the +exercise -- not the exercise author. + +For now, there is only one way how to write score configuration using only +simple score calculator. But the implementation in API is agile enough to handle +upcoming score calculators which might use some more complex scoring algorithms. +This also means that future calculators do not have to use the YAML format for +configuration. In fact, the configuration can be a string in any format. + +### Simple score calculation + +First implemented calculator is simple score calculator with test weights. This +calculator just looks at the score of each test and put them together according +to the test weights specified in assignment configuration. Resulting score is +calculated as a sum of products of score and weight of each test divided by the +sum of all weights. The algorithm in Python would look something like this: + +``` +sum = 0 +weightSum = 0 +for t in tests: + sum += t.score * t.weight + weightSum += t.weight +score = sum / weightSum +``` + +Sample score config in YAML format: + +```{.yml} +testWeights: + a: 300 # test with id 'a' has a weight of 300 + b: 200 + c: 100 + d: 100 +``` ## Writing job configuration @@ -2054,6 +2613,10 @@ the output length (as long as the printing fits in the time limit). limits: - hw-group-id: group1 chdir: ${EVAL_DIR} + bound-directories: + - src: ${SOURCE_DIR} + dst: ${EVAL_DIR} + mode: RW time: 0.5 memory: 8192 ``` @@ -2094,12 +2657,18 @@ used. name: "isolate" limits: - hw-group-id: group1 - chdir: ${EVAL_DIR} + chdir: ${EVAL_DIR} + bound-directories: + - src: ${SOURCE_DIR} + dst: ${EVAL_DIR} + mode: RW ``` -# The Backend +# Implementation + +## The backend The backend is the part which is hidden to the user and which has only one purpose: evaluate user’s solutions of their assignments. @@ -2111,8 +2680,6 @@ one purpose: evaluate user’s solutions of their assignments. @todo: describe how the backend receives the inputs and how it communicates the results -## Components - Whole backend is not just one service/component, it is quite complex system on its own. @todo: describe the inner parts of the Backend (and refer to the Wiki @@ -2122,6 +2689,11 @@ for the technical description of the components) @todo: gets stuff done, single point of failure and center point of ReCodEx universe +@todo: what to mention: + - job scheduling, worker queues + - API notification using curl, authentication using HTTP Basic Auth + - asynchronous resending progress messages + ### Fileserver @todo: stores particular data from frontend and backend, hashing, HTTP API @@ -2129,219 +2701,16 @@ for the technical description of the components) ### Worker @todo: describe a bit of internal structure in general + - two threads + - number of ZeroMQ sockets, using it also for internal communication + - how sandboxes are fitted into worker, unix syscalls, #ifndef + - libcurl for fetchning, why not to use some object binding + - working with local filesystem, directory structure + - hardware groups in detail @todo: describe how jobs are generally executed -### Monitor - -@todo: not necessary component which can be omitted, proxy-like service - -## Backend internal communication - -@todo: internal backend communication, what communicates with what and why - -The Frontend -============ - -The frontend is the part which is visible to the user of ReCodEx and -which holds the state of the system – the user accounts, their roles in -the system, the database of exercises, the assignments of these -exercises to groups of users (i.e., students), and the solutions and -evaluations of them. - -Frontend is split into three parts: - -- the server-side REST API (“API”) which holds the business logic and - keeps the state of the system consistent - -- the relational database (“DB”) which persists the state of the - system - -- the client side application (“client”) which simplifies access to - the API for the common users - -The centerpiece of this architecture is the API. This component receives -requests from the users and from the Backend, validates them and -modifies the state of the system and persists this modified state in the -DB. - -We have created a web application which can communicate with the API -server and present the information received from the server to the user -in a convenient way. The client can be though any application, which can -send HTTP requests and receive the HTTP responses. Users can use general -applications like [cURL](https://github.com/curl/curl/), -[Postman](https://www.getpostman.com/), or create their own specific -client for ReCodEx API. - -Frontend capabilities ---------------------- - -@todo: describe what the frontend is capable of and how it really works, -what are the limitations and how it can be extended - -Terminology ------------ - -This project was created for the needs of a university and this fact is -reflected into the terminology used throughout the Frontend. A list of -important terms’ definitions follows to make the meaning unambiguous. - -### User and user roles - -*User* is a person who uses the application. User is granted access to -the application once he or she creates an account directly through the -API or the web application. There are several types of user accounts -depending on the set of permissions – a so called “role” – they have -been granted. Each user receives only the most basic set of permissions -after he or she creates an account and this role can be changed only by -the administrators of the service: - -- *Student* is the most basic role. Student can become member of a - group and submit his solutions to his assignments. - -- *Supervisor* can be entitled to manage a group of students. - Supervisor can assign exercises to the students who are members of - his groups and review their solutions submitted to - these assignments. - -- *Super-admin* is a user with unlimited rights. This user can perform - any action in the system. - -There are two implicit changes of roles: - -- Once a *student* is added to a group as its supervisor, his role is - upgraded to a *supervisor* role. - -- Once a *supervisor* is removed from the lasts group where he is a - supervisor then his role is downgraded to a *student* role. - -These mechanisms do not prevent a single user being a supervisor of one -group and student of a different group as supervisors’ permissions are -superset of students’ permissions. - -### Login - -*Login* is a set of user’s credentials he must submit to verify he can -be allowed to access the system as a specific user. We distinguish two -types of logins: local and external. - -- *Local login* is user’s email address and a password he chooses - during registration. - -- *External login* is a mapping of a user profile to an account of - some authentication service (e.g., [CAS](https://ldap1.cuni.cz/)). - -### Instance - -*An instance* of ReCodEx is in fact just a set of groups and user -accounts. An instance should correspond to a real entity as a -university, a high-school, an IT company or an HR agency. This approach -enables the system to be shared by multiple independent organizations -without interfering with each other. - -Usage of the system by the users of an instance can be limited by -possessing a valid license. It is up to the administrators of the system -to determine the conditions under which they will assign licenses to the -instances. - -### Group - -*Group* corresponds to a school class or some other unit which gathers -users who will be assigned the same set exercises. Each group can have -multiple supervisors who can manage the students and the list of -assignments. - -Groups can form a tree hierarchy of arbitrary depth. This is inspired by the -hierarchy of school classes belonging to the same subject over several school -years. For example, there can be a top level group for a programming class that -contains subgroups for every school year. These groups can then by divided into -actual student groups with respect to lab attendance. Supervisors can create -subgroups of their groups and further manage these subgroups. - -### Exercise - -*An exercise* consists of textual assignment of a task and a definition -of how a solution to this exercise should be processed and evaluated in -a specific runtime environment (i.e., how to compile a submitted source -code and how to test the correctness of the program). It is a template -which can be instantiated as an *assignment* by a supervisor of a group. - -### Assignment - -An assignment is an instance of an *exercise* assigned to a specific -*group*. An assignment can modify the text of the task assignment and it -has some additional information which is specific to the group (e.g., a -deadline, the number of points gained for a correct solution, additional -hints for the students in the assignment). The text of the assignment -can be edited and supervisors can translate the assignment into another -language. - -### Solution - -*A solution* is a set of files which a user submits to a given -*assignment*. - -### Submission - -*A submission* corresponds to a *solution* being evaluated by the -Backend. A single *solution* can be submitted repeatedly (e.g., when the -Backend encounters an error or when the supervisor changes the assignment). - -### Evaluation - -*An evaluation* is the processed report received from the Backend after -a *submission* is processed. Evaluation contains points given to the -user based on the quality of his solution measured by the Backend and -the settings of the assignment. Supervisors can review the evaluation -and add bonus points (both positive and negative) if the student -deserves some. - -### Runtime environment - -*A runtime environment* defines the used programming language or tools -which are needed to process and evaluate a solution. Examples of a -runtime environment can be: - -- *Linux + GCC* -- *Linux + Mono* -- *Windows + .NET 4* -- *Bison + Yacc* - -### Limits - -A correct *solution* of an *assignment* has to pass all specified tests (mostly -checks that it yields the correct output for various inputs) and typically must -also be effective in some sense. The Backend measures the time and memory -consumption of the solution while running. This consumption of resources can be -*limited* and the solution will receive fewer points if it exceeds the given -limits in some test cases defined by the *exercise*. - -User management ---------------- - -@todo: roles and their rights, adding/removing different users, how the -role of a specific user changes - -Instances and hierarchy of groups ---------------------------------- - -@todo: What is an instance, how to create one, what are the licenses and -how do they work. Why can the groups form hierarchies and what are the -benefits – what it means to be an admin of a group, hierarchy of roles -in the group hierarchy. - -Exercises database ------------------- - -@todo: How the exercises are stored, accessed, who can edit what - -### Creating a new exercise - -@todo Localized assignments, default settings - -### Runtime environments and hardware groups - -@todo read this later and see if it still makes sense +#### Runtime environments ReCodEx is designed to utilize a rather diverse set of workers -- there can be differences in many aspects, such as the actual hardware running the worker @@ -2365,69 +2734,28 @@ However, limits can differ between runtime environments -- formally speaking, limits are a function of three arguments: an assignment, a hardware group and a runtime environment. -### Reference solutions - -@todo: how to add one, how to evaluate it - -The task of determining appropriate resource limits for exercises is difficult -to do correctly. To aid exercise authors and group supervisors, ReCodEx supports -assigning reference solutions to exercises. Those are example programs that -should cover the main approaches to the implementation. For example, searching -for an integer in an ordered array can be done with a linear search, or better, -using a binary search. - -Reference solutions can be evaluated on demand, using a selected hardware group. -The evaluation results are stored and can be used later to determine limits. In -our example problem, we could configure the limits so that the linear -search-based program doesn't finish in time on larger inputs, but a binary -search does. - -Note that separate reference solutions should be supplied for all supported -runtime environments. - -### Exercise assignments - -@todo: Creating instances of an exercise for a specific group of users, -capabilities of settings. Editing limits according to the reference -solution. - -Evaluation process ------------------- - -@todo: How the evaluation process works on the Frontend side. - -### Uploading files and file storage - -@todo: One by one upload endpoint. Explain different types of the -Uploaded files. - -### Automatic detection of the runtime environment - -@todo: Users must submit correctly named files – assuming the RTE from -the extensions. - -REST API implementation ------------------------ - -@todo: What is the REST API, what are the basic principles – GET, POST, -Headers, JSON. +### Monitor -### Authentication and authorization scopes +@todo: not necessary component which can be omitted, proxy-like service -@todo: How authentication works – signed JWT, headers, expiration, -refreshing. Token scopes usage. +### Cleaner -### HTTP requests handling +@todo: if it is something what to say here -@todo: Router and routes with specific HTTP methods, preflight, required -headers +## The frontend -### HTTP responses format +### REST API -@todo: Describe the JSON structure convention of success and error -responses +@todo: what to mention + - basic - GET, POST, JSON, Header, ... + - endpoint structure, Swager UI + - handling requests, preflight, checking roles with annotation + - Uploading files and file storage - one by one upload endpoint. Explain + different types of the Uploaded files. + - Automatic detection of the runtime environment - users must submit + correctly named files, assuming the RTE from the extensions -### Used technologies +#### Used technologies @todo: PHP7 – how it is used for typehints, Nette framework – how it is used for routing, Presenters actions endpoints, exceptions and @@ -2437,7 +2765,7 @@ problem with the extension and how we reported it and how to treat it in the future when the bug is solved. Relational database – we use MariaDB, Doctine enables us to switch the engine to a different engine if needed -### Data model +#### Data model @todo: Describe the code-first approach using the Doctrine entities, how the entities map onto the database schema (refer to the attached schemas @@ -2445,7 +2773,7 @@ of entities and relational database models), describe the logical grouping of entities and how they are related: - user + settings + logins + ACL -- instance + licenses + groups + group membership +- instance + licences + groups + group membership - exercise + assignments + localized assignments + runtime environments + hardware groups - submission + solution + reference solution + solution evaluation @@ -2456,64 +2784,315 @@ grouping of entities and how they are related: @todo: Tell the user about the generated API reference and how the Swagger UI can be used to access the API directly. -Web Application ---------------- +### Web application -@todo: What is the purpose of the web application and how it interacts -with the REST API. +@todo: what to mention: + - used libraries, JSX, ... + - usage in user doc + - server side rendering + - maybe more ... -### Used technologies +## Communication protocol -@todo: Briefly introduce the used technologies like React, Redux and the -build process. For further details refer to the GitHub wiki +Detailed communication inside the ReCodEx system is captured in the following +image and described in sections below. Red connections are through ZeroMQ +sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ +messages are sent as multipart with one string (command, option) per part, with +no empty frames (unles explicitly specified otherwise). -### How to use the application - -@todo: Describe the user documentation and the FAQ page. - -Backend-Frontend communication protocol -======================================= - -@todo: describe the exact methods and respective commands for the -communication - -Initiation of a job evaluation ------------------------------- - -@todo: How does the Frontend initiate the evaluation and how the Backend -can accept it or decline it - -Job processing progress monitoring ----------------------------------- - -When evaluating a job the worker sends progress messages on predefined points of -evaluation chain. The sending place can be on very beginning of the job, when -submit archive is downloaded or at the end of each simple task with its state -(completed, failed, skipped). These messages are sent to broker through existing -ZeroMQ connection. Detailed format of messages can be found on [communication -page](https://github.com/ReCodEx/wiki/wiki/Overall-architecture#commands-from-worker-to-broker). - -Broker only resends received progress messages to the monitor component via -ZeroMQ socket. The output message format is the same as the input format. - -Monitor parses received messages to JSON format, which is easy to work with in -JavaScript inside web application. All messages are cached (one queue per job) -and can be obtained multiple times through WebSocket communication channel. The -cache is cleared 5 minutes after receiving last message. - -Publishing of the results -------------------------- +![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png) -After job finish the worker packs results directory into single archive and -uploads it to the fileserver through HTTP protocol. The target URL is obtained -from API in headers on job initiation. Then "job done" notification request is -performed to API via broker. Special submissions (reference or asynchronous -submissions) are loaded immediately, other types are loaded on-demand on first -results request. -Loading results means fetching archive from fileserver, parsing the main YAML -file generated by worker and saving data to the database. Also, points are -assigned by score calculator. +### Broker - Worker communication + +Broker acts as server when communicating with worker. Listening IP address and +port are configurable, protocol family is TCP. Worker socket is of DEALER type, +broker one is ROUTER type. Because of that, very first part of every (multipart) +message from broker to worker must be target worker's socket identity (which is +saved on its **init** command). + +#### Commands from broker to worker: + +- **eval** -- evaluate a job. Requires 3 message frames: + - `job_id` -- identifier of the job (in ASCII representation -- we avoid + endianness issues and also support alphabetic ids) + - `job_url` -- URL of the archive with job configuration and submitted + source code + - `result_url` -- URL where the results should be stored after evaluation +- **intro** -- introduce yourself to the broker (with **init** command) -- this + is required when the broker loses track of the worker who sent the command. + Possible reasons for such event are e.g. that one of the communicating sides + shut down and restarted without the other side noticing. +- **pong** -- reply to **ping** command, no arguments + +#### Commands from worker to broker: + +- **init** -- introduce self to the broker. Useful on startup or after + reestablishing lost connection. Requires at least 2 arguments: + - `hwgroup` -- hardware group of this worker + - `header` -- additional header describing worker capabilities. Format must + be `header_name=value`, every header shall be in a separate message frame. + There is no limit on number of headers. There is also an optional third + argument -- additional information. If present, it should be separated + from the headers with an empty frame. The format is the same as headers. + Supported keys for additional information are: + - `description` -- a human readable description of the worker for + administrators (it will show up in broker logs) + - `current_job` -- an identifier of a job the worker is now processing. This + is useful when we are reassembling a connection to the broker and need it + to know the worker will not accept a new job. +- **done** -- notifying of finished job. Contains following message frames: + - `job_id` -- identifier of finished job + - `result` -- response result, possible values are: + - OK -- evaluation finished successfully + - FAILED -- job failed and cannot be reassigned to another worker (e.g. + due to error in configuration) + - INTERNAL_ERROR -- job failed due to internal worker error, but another + worker might be able to process it (e.g. downloading a file failed) + - `message` -- a human readable error message +- **progress** -- notice about current evaluation progress. Contains following + message frames: + - `job_id` -- identifier of current job + - `command` -- what is happening now. + - DOWNLOADED -- submission successfuly fetched from fileserver + - FAILED -- something bad happened and job was not executed at all + - UPLOADED -- results are uploaded to fileserver + - STARTED -- evaluation of tasks started + - ENDED -- evaluation of tasks is finished + - ABORTED -- evaluation of job encountered internal error, job will be + rescheduled to another worker + - FINISHED -- whole execution is finished and worker ready for another + job execution + - TASK -- task state changed -- see below + - `task_id` -- only present for "TASK" state -- identifier of task in + current job + - `task_state` -- only present for "TASK" state -- result of task + evaluation. One of: + - COMPLETED -- task was successfully executed without any error, + subsequent task will be executed + - FAILED -- task ended up with some error, subsequent task will be + skipped + - SKIPPED -- some of the previous dependencies failed to execute, so + this task will not be executed at all +- **ping** -- tell broker I am alive, no arguments + + +#### Heartbeating + +It is important for the broker and workers to know if the other side is still +working (and connected). This is achieved with a simple heartbeating protocol. + +The protocol requires the workers to send a **ping** command regularly (the +interval is configurable on both sides -- future releases might let the worker +send its ping interval with the **init** command). Upon receiving a **ping** +command, the broker responds with **pong**. + +Whenever a heartbeating message doesn't arrive, a counter called _liveness_ is +decreased. When this counter drops to zero, the other side is considered +disconnected. When a message arrives, the liveness counter is set back to its +maximum value, which is configurable for both sides. + +When the broker decides a worker disconnected, it tries to reschedule its jobs +to other workers. + +If a worker thinks the broker crashed, it tries to reconnect periodically, with +a bounded, exponentially increasing delay. + +This protocol proved great robustness in real world testing. Thus whole backend +is reliable and can outlive short term issues with connection without problems. +Also, increasing delay of ping messages does not flood the network when there +are problems. We experienced no issues since we are using this protocol. + +### Worker - File Server communication + +Worker is communicating with file server only from _execution thread_. Supported +protocol is HTTP optionally with SSL encryption (**recommended**). If supported +by server and used version of libcurl, HTTP/2 standard is also available. File +server should be set up to require basic HTTP authentication and worker is +capable to send corresponding credentials with each request. + +#### Worker side + +Workers comunicate with the file server in both directions -- they download +student's submissions and then upload evaluation results. Internally, worker is +using libcurl C library with very similar setup. In both cases it can verify +HTTPS certificate (on Linux against system cert list, on Windows against +downloaded one from CURL website during installation), support basic HTTP +authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error +(returned HTTP status code is >=400). Worker have list of credentials to all +available file servers in its config file. + +- download file -- standard HTTP GET request to given URL expecting file content + as response +- upload file -- standard HTTP PUT request to given URL with file data as body + -- same as command line tool `curl` with option `--upload-file` + +#### File server side + +File server has its own internal directory structure, where all the files are +stored. It provides simple REST API to get them or create new ones. File server +does not provide authentication or secured connection by itself, but it is +supposed to run file server as WSGI script inside a web server (like Apache) +with proper configuration. Relevant commands for communication with workers: + +- **GET /submission_archives/\.\** -- gets an archive with submitted + source code and corresponding configuration of this job evaluation +- **GET /exercises/\** -- gets a file, common usage is for input files or + reference result files +- **PUT /results/\.\** -- upload archive with evaluation results + under specified name (should be same _id_ as name of submission archive). On + successful upload returns JSON `{ "result": "OK" }` as body of returned page. + +If not specified otherwise, `zip` format of archives is used. Symbol `/` in API +description is root of file server's domain. If the domain is for example +`fs.recodex.org` with SSL support, getting input file for one task could look as +GET request to +`https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c`. + + +### Broker - Monitor communication + +Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of +socket is same on both sides, ROUTER. Monitor is set to act as server in this +communication, its IP address and port are configurable in monitor's config +file. ZeroMQ socket ID (set on monitor's side) is "recodex-monitor" and must be +sent as first frame of every multipart message -- see ZeroMQ ROUTER socket +documentation for more info. + +Note that the monitor is designed so that it can receive data both from the +broker and workers. The current architecture prefers the broker to do all the +communication so that the workers do not have to know too many network services. + +Monitor is treated as a somewhat optional part of whole solution, so no special +effort on communication realibility was made. + +#### Commands from monitor to broker: + +Because there is no need for the monitor to communicate with the broker, there +are no commands so far. Any message from monitor to broker is logged and +discarded. + +#### Commands from broker to monitor: + +- **progress** -- notification about progress with job evaluation. This + communication is only redirected as is from worker, more info can be found in + "Broker - Worker Communication" chapter above. + + +### Broker - Web API communication + +Broker communicates with main REST API through ZeroMQ connection over TCP. +Socket type on broker side is ROUTER, on frontend part it is DEALER. Broker acts +as a server, its IP address and port is configurable in the API. + +#### Commands from API to broker: + +- **eval** -- evaluate a job. Requires at least 4 frames: + - `job_id` -- identifier of this job (in ASCII representation -- we avoid + endianness issues and also support alphabetic ids) + - `header` -- additional header describing worker capabilities. Format must + be `header_name=value`, every header shall be in a separate message frame. + There is no maximum limit on number of headers. There may be also no + headers at all. A worker is considered suitable for the job if and only if + it satisfies all of its headers. + - empty frame -- frame which contains only empty string and serves only as + breakpoint after headers + - `job_url` -- URI location of archive with job configuration and submitted + source code + - `result_url` -- remote URI where results will be pushed to + +#### Commands from broker to API (all are responses to **eval** command): + +- **ack** -- this is first message which is sent back to frontend right after + eval command arrives, basically it means "Hi, I am all right and am capable of + receiving job requests", after sending this broker will try to find acceptable + worker for arrived request +- **accept** -- broker is capable of routing request to a worker +- **reject** -- broker cannot handle this job (for example when the requirements + specified by the headers cannot be met). There are (rare) cases when the + broker finds that it cannot handle the job after it was confirmed. In such + cases it uses the frontend REST API to mark the job as failed. + + +#### Asynchronous communication between broker and API + +Only a fraction of the errors that can happen during evaluation can be detected +while there is a ZeroMQ connection between the API and broker. To notify the +frontend of the rest, we need an asynchronous communication channel that can be +used by the broker when the status of a job changes (it's finished, it failed +permanently, the only worker capable of processing it disconnected...). + +This functionality is supplied by the `broker-reports/` API endpoint group -- +see its documentation for more details. + +### File Server - Web API communication + +File server has a REST API for interaction with other parts of ReCodEx. +Description of communication with workers is in "Worker - File Server +Communication" chapter above. On top of that, there are other commands for +interaction with the API: + +- **GET /results/\.\** -- download archive with evaluated results of + job _id_ +- **POST /submissions/\** -- upload new submission with identifier _id_. + Expects that the body of the POST request uses file paths as keys and the + content of the files as values. On successful upload returns JSON `{ + "archive_path": , "result_path": }` in response + body. From _archive_path_ the submission can be downloaded (by worker) and + corresponding evaluation results should be uploaded to _result_path_. +- **POST /tasks** -- upload new files, which will be available by names equal to + `sha1sum` of their content. There can be uploaded more files at once. On + successful upload returns JSON `{ "result": "OK", "files": }` in + response body, where _file_list_ is dictionary of original file name as key + and new URL with already hashed name as value. + +There are no plans yet to support deleting files from this API. This may change +in time. + +Web API calls these fileserver endpoints with standard HTTP requests. There are +no special commands involved. There is no communication in opposite direction. + +### Monitor - Web app communication + +Monitor interacts with web application through WebSocket connection. Monitor +acts as server and browsers are connecting to it. IP address and port are +configurable. When client connects to the monitor, it sends a message with +string representation of channel id (which messages are interested in, usually +id of evaluating job). There can be multiple listeners per channel, even +(shortly) delayed connections will receive all messages from the very beginning. + +When monitor receives **progress** message from broker there are two options: + +- there is no WebSocket connection for listed channel (job id) -- message is + dropped +- there is active WebSocket connection for listed channel -- message is parsed + into JSON format (see below) and send as string to that established channel. + Messages for active connections are queued, so no messages are discarded even + on heavy workload. + +Message from monitor to web application is in JSON format and it has form of +dictionary (associative array). Information contained in this message should +correspond with the ones given by worker to broker. For further description +please read more in "Broker - Worker communication" chapter under "progress" +command. + +Message format: + +- **command** -- type of progress, one of: DOWNLOADED, FAILED, UPLOADED, + STARTED, ENDED, ABORTED, FINISHED, TASK +- **task_id** -- id of currently evaluated task. Present only if **command** is + "TASK". +- **task_state** -- state of task with id **task_id**. Present only if + **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED". + +### Web app - Web API communication + +Provided web application runs as javascript client inside user's browser. It +communicates with REST API on the server through standard HTTP requests. +Documentation of the main REST API is in separate +[document](https://recodex.github.io/api/) due to its extensiveness. Results are +returned as JSON payload, which is simply parsed in web application and +presented to the users. + diff --git a/Web-API.md b/Web-API.md index e0e8195..7202516 100644 --- a/Web-API.md +++ b/Web-API.md @@ -88,171 +88,3 @@ both our internal login service and CAS. An advantage of this approach is being able control the authentication process completely instead of just receiving session data through a global variable. -## Installation - -The web API requires a PHP runtime version at least 7. Which one depends on actual configuration, there is a choice between _mod_php_ inside Apache, _php-fpm_ with Apache or Nginx proxy or running it as standalone uWSGI script. It is common that there are some PHP extensions, that have to be installed on the system. Namely ZeroMQ binding (`php-zmq` package or similar), MySQL module (`php-mysqlnd` package) and ldap extension module for CAS authentication (`php-ldap` package). Make sure that the extensions are loaded in your `php.ini` file (`/etc/php.ini` or files in `/etc/php.d/`). - -The API depends on some other projects and libraries. For managing them [Composer](https://getcomposer.org/) is used. It can be installed from system repositories or downloaded from the website, where detailed instructions are as well. Composer reads `composer.json` file in the project root and installs dependencies to the `vendor/` subdirectory. To do that, run: -``` -$ composer install -``` - -## Configuration and usage - -The API can be configured in `config.neon` and `config.local.neon` files in `app/config` directory. The first file is predefined by authors and should not be modified. The second one is not present and could be created by copying `config.local.neon.example` template in the config directory. Local configuration have higher precedence, so it will override default values from `config.neon`. - -### Configurable items - -Description of configurable items. All timeouts are in milliseconds if not stated otherwise. - -- accessManager -- configuration of access token in [JWT standard](https://www.rfc-editor.org/rfc/rfc7519.txt). Do **not** modify unless you really know what are you doing. -- fileServer -- connection to fileserver - - address -- URI of fileserver - - auth -- _username_ and _password_ for HTTP basic authentication - - timeouts -- _connection_ timeout for establishing new connection and _request_ timeout for completing one request -- broker -- connection to broker - - address -- URI of broker - - auth -- _username_ and _password_ for broker callback authentication back to API - - timeouts -- _ack_ timeout for first response that broker receives the message, _send_ timeout how long try to send new job to the broker and _result_ timeout how long to wait for confirmation if job can be processed or not -- monitor -- connection to monitor - - address -- URI of monitor -- CAS -- CAS external authentication - - serviceId -- visible identifier of this service - - ldapConnection -- parameters for connecting to LDAP, _hostname_, _base_dn_, _port_, _security_ and _bindName_ - - fields -- names of LDAP keys for informations as _email_, _firstName_ and _lastName_ -- emails -- common configuration for sending email (addresses and template variables) - - apiUrl -- base URL of API server including port (for referencing pictures in messages) - - footerUrl -- link in the message footer - - siteName -- name of frontend (ReCodEx, or KSP for unique instance for KSP course) - - githubUrl -- URL to GitHub repository of this project - - from -- sending email address -- failures -- admin messages on errors - - emails -- additional info for sending mails, _to_ is admin mail address, _from_ is source address, _subjectPrefix_ is prefix of mail subject -- forgottenPassword -- user messages for changing passwords - - redirectUrl -- URL of web application where the password can be changed - - tokenExpiration -- expiration timeout of temporary token (in seconds) - - emails -- additional info for sending mails, _from_ is source address and _subjectPrefix_ is prefix of mail subject -- mail -- configuration of sending mails - - smtp -- using SMTP server, have to be "true" - - host -- address of the server - - port -- sending port (common values are 25, 465, 587) - - username -- login to the server - - password -- password to the server - - secure -- security, values are empty for no security, "ssl" or "tls" - - context -- additional parameters, depending on used mail engine. For examle self-signed certificates can be allowed as _verify_peer_ and _verify_peer_name_ to false and _allow_self_signed_ to true under _ssl_ key (see example). - -Outside the parameters section of configuration is configuration for Doctrine. It is ORM framework which maps PHP objects (entities) into database tables and rows. The configuration is simple, required items are only _user_, _password_ and _host_ with _dbname_, i.e. address of database computer (mostly localhost) with name of ReCodEx database. - -### Example local configuration file - -```{.yml} -parameters: - accessManager: - leeway: 60 - issuer: https://recodex.projekty.ms.mff.cuni.cz - audience: https://recodex.projekty.ms.mff.cuni.cz - expiration: 86400 # 24 hours in seconds - usedAlgorithm: HS256 - allowedAlgorithms: - - HS256 - verificationKey: "recodex-123" - fileServer: - address: http://127.0.0.1:9999 - auth: - username: "user" - password: "pass" - timeouts: - connection: 500 - broker: - address: tcp://127.0.0.1:9658 - auth: - username: "user" - password: "pass" - timeouts: - ack: 100 - send: 5000 - result: 1000 - monitor: - address: wss://recodex.projekty.ms.mff.cuni.cz:4443/ws - CAS: - serviceId: "cas-uk" - ldapConnection: - hostname: "ldap.cuni.cz" - base_dn: "ou=people,dc=cuni,dc=cz" - port: 389 - security: SSL - bindName: "cunipersonalid" - fields: - email: "mail" - firstName: "givenName" - lastName: "sn" - emails: - apiUrl: https://recodex.projekty.ms.mff.cuni.cz:4000 - footerUrl: https://recodex.projekty.ms.mff.cuni.cz - siteName: "ReCodEx" - githubUrl: https://github.com/ReCodEx - from: "ReCodEx " - failures: - emails: - to: "Admin Name " - from: %emails.from% - subjectPrefix: "ReCodEx Failure Report - " - forgottenPassword: - redirectUrl: "https://recodex.projekty.ms.mff.cuni.cz/ - forgotten-password/change" - tokenExpiration: 600 # 10 minues - emails: - from: %emails.from% - subjectPrefix: "ReCodEx Forgotten Password Request - " - mail: - smtp: true - host: "smtp.ps.stdin.cz" - port: 587 - username: "user" - password: "pass" - secure: "tls" - context: - ssl: - verify_peer: false - verify_peer_name: false - allow_self_signed: true -doctrine: - user: "user" - password: "pass" - host: localhost - dbname: "recodex-api" -``` - -### Database preparation - -When the API is installed and configured (_doctrine_ section is sufficient here) the database schema can be generated. There is a prepared command to do that from command line: - -``` -$ php www/index.php orm:schema-tool:update --force -``` - -With API comes some initial values, for example default user roles with proper permissions. To fill your database with these values there is another command line command: - -``` -$ php www/index.php db:fill -``` - -Check the outputs of both commands for errors. If there are any, try to clean temporary API cache in `temp/cache/` directory and repeat the action. - - -### Webserver configuration - -The simplest way to get started is to start the built-in PHP server in the root directory of your project: - -``` -$ php -S localhost:4000 -t www -``` - -Then visit `http://localhost:4000` in your browser to see the welcome page of API project. - -For Apache or Nginx, setup a virtual host to point to the `www/` directory of the project and you should be ready to go. It is **critical** that whole `app/`, `log/` and `temp/` directories are not accessible directly via a web browser (see [security warning](https://nette.org/security-warning)). Also it is **highly recommended** to set up a HTTPS certificate for public access to the API. - -### Troubleshooting - -In case of any issues first remove the Nette cache directory `temp/cache/` and try again. This solves most of the errors. If it does not help, examine API logs from `log/` directory of the API source or logs of your webserver. - diff --git a/Web-application.md b/Web-application.md index 5b80dbe..84e2eb9 100644 --- a/Web-application.md +++ b/Web-application.md @@ -153,61 +153,3 @@ $ npm run exportStrings ``` This will create *JSON* files with the exported strings for the *'en'* and *'cs'* locale. If you want to export strings for more languages, you must edit the `/manageTranslations.js` script. The exported strings are placed in the `/src/locales` directory. -## Installation - -Web application requires [NodeJS](https://nodejs.org/en/) server as its runtime environment. This runtime is needed for executing JavaScript code on server and sending the pre-render parts of pages to clients, so the final rendering in browsers is a lot quicker and the page is accessible to search engines for indexing. - -But some functionality is better in other full fledged web servers like *Apache* or *Nginx*, so the common practice is to use a tandem of both. *NodeJS* takes care of basic functionality of the app while the other server (Apache) is set as reverse proxy and providing additional functionality like SSL encryption, load balancing or caching of static files. The recommended setup contains both NodeJS and one of Apache and Nginx web servers for the reasons discussed above. - -Stable versions of 4th and 6th series of NodeJS server are sufficient, using at least 6th series is highly recommended. Please check the most recent version of the packages in your distribution's repositories, there are often outdated ones. However, there are some third party repositories for all main Linux distributions. - -The app depends on several libraries and components, all of them are listed in `package.json` file in source repository. For managing dependencies is used node package manager (`npm`), which can come with NodeJS installation otherwise can be installed separately. To fetch and install all dependencies run: - -``` -$ npm install -``` - -For easy production usage there is an additional package for managing NodeJS processes, `pm2`. This tool can run your application as a daemon, monitor occupied resources, gather logs and provide simple console interface for managing app's state. To install it globally into your system run: -``` -# npm install pm2 -g -``` - -## Configuration and usage - -The application can be run in two modes, development and production. Development mode uses only client rendering and tracks code changes with rebuilds of the application in real time. In production mode the compilation (transpile to _ES5_ standard using *Babel* and bundle into single file using *webpack*) has to be done separately prior to running. The scripts for compilation are provided as additional `npm` commands. - -- Development mode can be use for local testing of the app. This mode uses webpack dev server, so all code runs on a client, there is no server side rendering available. Starting is simple command, default address is http://localhost:8080. -``` -$ npm run dev -``` -- Production mode is mostly used on the servers. It provides all features such as server side rendering. This can be run via: -``` -$ npm run build -$ npm start -``` - -Both modes can be configured to use different ports or set base address of used API server. This can be configured in `.env` file in root of the repository. There is `.env-sample` file which can be just copied and altered. - -The production mode can be run also as a demon controled by `pm2` tool. First the web application has to be built and then the server javascript file can run as a daemon. -``` -$ npm run build -$ pm2 start bin/server.js -``` - -The `pm2` tool has several options, most notably _status_, _stop_, _restart_ and _logs_. Further description is available on project [website](http://pm2.keymetrics.io). - -#### Configurable items - -Description of configurable options. Bold are required values, optional ones are in italics. - -- **NODE_ENV** -- mode of the server -- **API_BASE** -- base address of API server, including port and API version -- **PORT** -- port where the app is listening -- _WEBPACK_DEV_SERVER_PORT_ -- port for webpack dev server when running in development mode. Default one is 8081, this option might be useful when this port is necessary for some other service. - -#### Example configuration file -``` -NODE_ENV=production -API_BASE=https://recodex.projekty.ms.mff.cuni.cz:4000/v1 -PORT=8080 -``` diff --git a/Worker.md b/Worker.md index bc4d8a6..1e55aa8 100644 --- a/Worker.md +++ b/Worker.md @@ -82,245 +82,6 @@ Isolate is executed in separate Linux process created by `fork` and `exec` syste Sandbox in general has to be command line application taking parameters with arguments, standard input or file. Outputs should be written to file or standard output. There are no other requirements, worker design is very versatile and can be adapted to different needs. -## Installation - -### Dependencies - -Worker specific requirements are written in this section. It covers only basic requirements, additional runtimes or tools may be needed depending on type of use. The package names are for CentOS if not specified otherwise. - -- ZeroMQ in version at least 4.0, packages `zeromq` and `zeromq-devel` (`libzmq3-dev` on Debian) -- YAML-CPP library, `yaml-cpp` and `yaml-cpp-devel` (`libyaml-cpp0.5v5` and `libyaml-cpp-dev` on Debian) -- libcurl library `libcurl-devel` (`libcurl4-gnutls-dev` on Debian) -- libarchive library as optional dependency. Installing will speed up build process, otherwise libarchive is built from source during installation. Package name is `libarchive` and `libarchive-devel` (`libarchive-dev` on Debian) - -**Install Isolate from source** - -First, we need to compile sandbox Isolate from source and install it. Current worker is tested against version 1.3, so this version needs to be checked out. Assume that we keep source code in `/opt/src` dir. For building man page you need to have package `asciidoc` installed. -``` -$ cd /opt/src -$ git clone https://github.com/ioi/isolate.git -$ cd isolate -$ git checkout v1.3 -$ make -# make install && make install-doc -``` -For proper work Isolate depends on several advanced features of the Linux kernel. Make sure that your kernel is compiled with `CONFIG_PID_NS`, `CONFIG_IPC_NS`, `CONFIG_NET_NS`, `CONFIG_CPUSETS`, `CONFIG_CGROUP_CPUACCT`, `CONFIG_MEMCG`. If your machine has swap enabled, also check `CONFIG_MEMCG_SWAP`. With which flags was your kernel compiled with can be found in `/boot` directory, file `config-` and version of your kernel. Red Hat based distributions should have these enabled by default, for Debian you you may want to add the parameters `cgroup_enable=memory swapaccount=1` to the kernel command-line, which can be set by adding value `GRUB_CMDLINE_LINUX_DEFAULT` to `/etc/default/grub` file. - -For better reproducibility of results, some kernel parameters can be tweaked: - -- Disable address space randomization. Create file `/etc/sysctl.d/10-recodex.conf` with content `kernel.randomize_va_space=0`. Changes will take effect after restart or run `sysctl kernel.randomize_va_space=0` command. -- Disable dynamic CPU frequency scaling. This requires setting the cpufreq scaling governor to _performance_. - -### Clone worker source code repository -``` -$ git clone https://github.com/ReCodEx/worker.git -$ git submodule update --init -``` - -### Install worker on Linux -It is supposed that your current working directory is that one with clonned worker source codes. - -- Prepare environment running `mkdir build && cd build` -- Build sources by `cmake ..` following by `make` -- Build binary package by `make package` (may require root permissions). -Note that `rpm` and `deb` packages are build in the same time. You may need to have `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit CPACK_GENERATOR variable in _CMakeLists.txt_ file in root of source code tree. -- Install generated package through your package manager (`yum`, `dnf`, `dpkg`). - -The worker installation process is composed of following steps: - -- create config file `/etc/recodex/worker/config-1.yml` -- create systemd unit file `/etc/systemd/system/recodex-worker@.service` -- put main binary to `/usr/bin/recodex-worker` -- put judges binaries to `/usr/bin/` directory -- create system user and group `recodex` with `/sbin/nologin` shell (if not already existing) -- create log directory `/var/log/recodex` -- set ownership of config (`/etc/recodex`) and log (`/var/log/recodex`) directories to `recodex` user and group - -_Note:_ If you do not want to generate binary packages, you can just install the project with `make install` (as root). But installation through your distribution's package manager is preferred way to keep your system clean and manageable in long term horizon. - -### Install worker on Windows -From beginning we are determined to support Windows operating system on which some of the workers may run (especially for projects in C# programming language). Support for Windows is quite hard and time consuming and there were several problems during the development. To ensure capability of compilation on Windows we set up CI for Windows named [Appveyor](http://www.appveyor.com/). However installation should be easy due to provided installation script. - -There are only two additional dependencies needed, **Windows 7 and higher** and **Visual Studio 2015+**. Provided simple installation batch script should do all the work on Windows machine. Officially only VS2015 and 32-bit compilation is supported, because of hardcoded compile options in installation script. If different VS or different platform is needed, the script should be changed to appropriate values, which is simple and straightforward. - -Mentioned script is placed in *install* directory alongside supportive scripts for UNIX systems and is named *win-build.cmd*. Provided script will do almost all the work connected with building and dependency resolving (using **NuGet** package manager and `msbuild` building system). Script should be run under 32-bit version of _Developer Command Prompt for VS2015_ and from *install* directory. - -Building and installing of worker is then quite simple, script has command line parameters which can be used to specify what will be done: - -- *-build* -- It is the default options if none specified. Builds worker and its tests, all is saved in *build* folder and subfolders. -- *-clean* -- Cleanup of downloaded NuGet packages and built application/libraries. -- *-test* -- Build worker and run tests on compiled test cases. -- *-package* -- Generation of clickable installation using cpack and [NSIS](http://nsis.sourceforge.net/) (has to be installed on machine to get this to work). - -``` -install> win-build.cmd # same as: win-build.cmd -build -install> win-build.cmd -clean -install> win-build.cmd -test -install> win-build.cmd -package -``` - -All build binaries and cmake temporary files can be found in *build* folder, -classically there will be subfolder *Release* which will contain compiled -application with all needed dlls. Once if clickable installation binary is -created, it can be found in *build* folder under name -*recodex-worker-VERSION-win32.exe*. Sample screenshot can be found on following picture. - -![NSIS Installation](https://github.com/ReCodEx/wiki/blob/master/images/nsis_installation.png) - - -## Configuration and usage - -Following text describes how to set up and run **worker** program. It is supposed to have required binaries installed. Also, using systemd is recommended for best user experience, but it is not required. Almost all modern Linux distributions are using systemd nowadays. - -### Default worker configuration - -Worker should have some default configuration which is applied to worker itself or may be used in given jobs (implicitly if something is missing, or explicitly with special variables). This configuration should be hardcoded and can be rewritten by explicitly declared configuration file. Format of this configuration is yaml with similar structure to job configuration. - -#### Configuration items - -Mandatory items are bold, optional italic. - -- **worker-id** -- unique identification of worker at one server. This id is used by _isolate_ sanbox on linux systems, so make sure to meet isolate's requirements (default is number from 1 to 999). -- _worker-description_ -- human readable description of this worker -- **broker-uri** -- URI of the broker (hostname, IP address, including port, ...) -- _broker-ping-interval_ -- time interval how often to send ping messages to broker. Used units are milliseconds. -- _max-broker-liveness_ -- specifies how many pings in a row can broker miss without making the worker dead. -- _headers_ -- map of headers specifies worker's capabilities - - _env_ -- list of enviromental variables which are sent to broker in init command - - _threads_ -- information about available threads for this worker -- **hwgroup** -- hardware group of this worker. Hardware group must specify worker hardware and software capabilities and it is main item for broker routing decisions. -- _working-directory_ -- where will be stored all needed files. Can be the same for multiple workers on one server. -- **file-managers** -- addresses and credentials to all file managers used (eq. all different frontends using this worker) - - **hostname** -- URI of file manager - - _username_ -- username for http authentication (if needed) - - _password_ -- password for http authentication (if needed) -- _file-cache_ -- configuration of caching feature - - _cache-dir_ -- path to caching directory. Can be the same for multiple workers. -- _logger_ -- settings of logging capabilities - - _file_ -- path to the logging file with name without suffix. `/var/log/recodex/worker` item will produce `worker.log`, `worker.1.log`, ... - - _level_ -- level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug` - - _max-size_ -- maximal size of log file before rotating - - _rotations_ -- number of rotation kept -- _limits_ -- default sandbox limits for this worker. All items are described in assignments section in job configuration description. If some limits are not set in job configuration, defaults from worker config will be used. In such case the worker's defaults will be set as the maximum for the job. Also, limits in job configuration cannot exceed limits from worker. - -#### Example config file - -```{.yml} -worker-id: 1 -broker-uri: tcp://localhost:9657 -broker-ping-interval: 10 # milliseconds -max-broker-liveness: 10 -headers: - env: - - c - - cpp - threads: 2 -hwgroup: "group1" -working-directory: /tmp/recodex -file-managers: - - hostname: "http://localhost:9999" # port is optional - username: "" # can be ignored in specific modules - password: "" # can be ignored in specific modules -file-cache: # only in case that there is cache module - cache-dir: "/tmp/recodex/cache" -logger: - file: "/var/log/recodex/worker" # w/o suffix - actual names will - # be worker.log, worker.1.log,... - level: "debug" # level of logging - max-size: 1048576 # 1 MB; max size of file before log rotation - rotations: 3 # number of rotations kept -limits: - time: 5 # in secs - wall-time: 6 # seconds - extra-time: 2 # seconds - stack-size: 0 # normal in KB, but 0 means no special limit - memory: 50000 # in KB - parallel: 1 - disk-size: 50 - disk-files: 5 - environ-variable: - ISOLATE_BOX: "/box" - ISOLATE_TMP: "/tmp" - bound-directories: - - src: /tmp/recodex/eval_5 - dst: /evaluate - mode: RW,NOEXEC -``` - -### Running the worker - -A systemd unit file is distributed with the worker to simplify its launch. It -integrates worker nicely into your Linux system and allows you to run it -automatically on system startup. It is possible to have more than one worker on -every server, so the provided unit file is templated. Each instance of the -worker unit has a unique string identifier, which is used for managing that -instance through systemd. By default, only one worker instance is ready to use -after installation and its ID is "1". - -- Starting worker with id "1" can be done this way: -``` -# systemctl start recodex-worker@1.service -``` -Check with -``` -# systemctl status recodex-worker@1.service -``` -if the worker is running. You should see "active (running)" message. - -- Worker can be stopped or restarted accordigly using `systemctl stop` and `systemctl restart` commands. -- If you want to run worker after system startup, run: -``` -# systemctl enable recodex-worker@1.service -``` -For further information about using systemd please refer to systemd documentation. - -### Adding new worker - -To add a new worker you need to do a few steps: - -- Make up an unique string ID. -- Copy default configuration file `/etc/recodex/worker/config-1.yml` to the same directory and name it `config-.yml` -- Edit that config file to fit your needs. Note that you must at least change _worker-id_ and _logger file_ values to be unique. -- Run new instance using -``` -# systemctl start recodex-worker@.service -``` - - -## Sandboxes - -### Isolate - -Isolate is used as one and only sandbox for linux-based operating systems. Headquarters of this project can be found at [GitHub](https://github.com/ioi/isolate) and more of its installation and setup can be found in [installation](#installation) section. Isolate uses linux kernel features for sandboxing and thus its security depends on them, namely _kernel namespaces_ and _cgroups_ are used. Similar functionality can now be partially achieved with systemd. - -From the very beginning of ReCodEx project there was sure that Isolate sandbox for Linux environment will be used. There is no suitable general purpose sandbox on Windows platform, so main operation system of whole backend should be linux-based. Set of supported operations in Isolate seems reasonable for every sandbox, so most of its functionality is accessible from job configuration. As there is no other sandbox, naming often reflects Isolate's names. However worker is prepared to run on Windows too, so integrating with other sandboxes (as libraries or commandline tools) is possible. - -Isolate as sandbox provides wide scale of functionality which can be used to limit resources or even cut off particular resources from sandboxed program. There is of course basics like limiting cpu-time and memory consumption, but there can be found also wall-time (human perception of time) or extra-time which is extra limit added to other time limits to increase chance of successful exiting of sandboxed program. From other features there is limiting stack-size, redirection of stdin, stdout or stderr from/to a file. Worth of mentioning is also defining number of processes/threads which can be created or defining environment variables which are passed to sandboxed program. - -Chapter by itself is filesystem handling. Isolate uses mount kernel namespace to create "virtual" filesystem which will be mounted in sandboxed program. By default there are only few read-only files/directories mapped into sandbox (described in Isolate man-page). This can be of course changed by providing another numerous folders as isolate parameters. By default folders are mapped as read-only but Isolate has few access options which can be set to some mount point. - -#### Limit isolate boxes to particular cpu or memory node - -New feature in version 1.3 is possibility of limit Isolate box to one or more cpu or memory node. This functionality is provided by _cpusets_ kernel mechanism and is now integrated in isolate. It is allowed to set only `cpuset.cpus` and `cpuset.mems` which should be just fine for sandbox purposes. As kernel functionality further description can be found in manual page of _cpuset_ or in Linux documentation in section `linux/Documentation/cgroups/cpusets.txt`. As previously stated this settings can be applied for particular isolate boxes and has to be written in isolate configuration. Standard configuration path should be `/usr/local/etc/isolate` but it may depend on your installation process. Configuration of _cpuset_ in there is really simple and is described in example below. - -``` -box0.cpus = 0 # assign processor with ID 0 to isolate box with ID 0 -box0.mems = 0 # assign memory node with ID 0 -# if not set, linux by itself will decide where should -# the sandboxed programs run at -box2.cpus = 1-3 # assign range of processors to isolate box 2 -box2.mems = 4-7 # assign range of memory nodes -box3.cpus = 1,2,3 # assign list of processors to isolate box 3 -``` - -- **cpuset.cpus:** Cpus limitation will restrict sandboxed program only to processor threads set in configuration. On hyperthreaded processors this means that all virtual threads are assignable, not only the physical ones. Value can be represented by single number, list of numbers separated by commas or range with hyphen delimiter. -- **cpuset.mems:** This value is particularly handy on NUMA systems which has several memory nodes. On standard desktop computers this value should always be zero because only one independent memory node is present. As stated in `cpus` limitation there can be single value, list of values separated by comma or range stated with hyphen. - -### WrapSharp - -WrapSharp is sandbox for programs in C# written also in C#. We have written it as a proof of concept sandbox for using in Windows environment. However, it is not properly tested and integrated to the worker yet. Security audit should be done before using in production. After that, with just a little bit of effort integrating into worker there can be a running sandbox for C# programs on Windows system. - - ## Cleaner ### Description @@ -335,69 +96,3 @@ There is a bit of catch with cleaner service, to work properly, server filesyste Another possibility seems to be to update last modified timestamp when accessing the file. This timestamp is used in most major filesystems, so there are less issues with compatibility than last access timestamp. The modified timestamp then must be updated by workers at each access, for example using `touch` command or similar. Final decision on better of these ways will be made after practical experience of running production system. -### Installation - -To install and use the cleaner, it is necessary to have Python3 with package manager `pip` installed. - -- Dependencies of cleaner has to be installed: -``` -$ pip install -r requirements.txt -``` -- RPM distributions can make and install binary package. This can be done like this: -``` -$ python setup.py bdist_rpm --post-install ./cleaner/install/postinst -# yum install ./dist/recodex-cleaner--1.noarch.rpm -``` -- Other Linux distributions can install cleaner straight -``` -$ python setup.py install --install-scripts /usr/bin -# ./cleaner/install/postinst -``` -- For Windows installation do following: - - start `cmd` with administrator permissions - - run installation with - ``` - > python setup.py install --install-scripts \ - "C:\Program Files\ReCodEx\cleaner" - ``` - where path specified with `--install-scripts` can be changed - - copy configuration file alongside with installed executable using - ``` - > copy install\config.yml \ - "C:\Program Files\ReCodEx\cleaner\config.yml" - ``` - -### Configuration and usage - -#### Configuration items -- **cache-dir** -- directory which cleaner manages -- **file-age** -- file age in seconds which are considered outdated and will be deleted - -#### Example configuration -```{.yml} -cache-dir: "/tmp" -file-age: "3600" # in seconds -``` - -#### Usage -As stated before cleaner should be cronned, on linux systems this can be done by built in `cron` service or if there is `systemd` present cleaner itself provides `*.timer` file which can be used for cronning from `systemd`. On Windows systems internal scheduler should be used. - -- Running cleaner from command line is fairly simple: -``` -$ recodex-cleaner -c /etc/recodex/cleaner -``` -- Enable cleaner service using systemd: -``` -$ systemctl start recodex-cleaner.timer -``` -- Add cleaner to linux cron service using following configuration line: -``` -0 0 * * * /usr/bin/recodex-cleaner -c /etc/recodex/cleaner/config.yml -``` -- Add cleaner to Windows cheduler service with following command: -``` -> schtasks /create /sc daily /tn "ReCodEx Cleaner" /tr \ - "\"C:\Program Files\ReCodEx\cleaner\recodex-cleaner.exe\" \ - -c \"C:\Program Files\ReCodEx\cleaner\config.yml\"" -``` - diff --git a/_Sidebar.md b/_Sidebar.md deleted file mode 100644 index 36062a4..0000000 --- a/_Sidebar.md +++ /dev/null @@ -1,24 +0,0 @@ -### [[Home]] - -### Content -* [[Introduction]] -* [[User documentation]] -* [[Overall architecture]] -* [[Assignments]] -* [[Submission flow]] -* [[Installation]] -* [[Worker]] -* [[Broker]] -* [[Monitor]] -* [[Fileserver]] -* [[Web API]] -* [[Web application]] -* [[Database]] -* [[Conclusion]] - -### Separated pages -* [[FAQ]] -* [[Logo]] -* [[Coding style]] -* [[Database schema]] -