Monitor section improved

master
Petr Stefan 8 years ago
parent 95df4c2e84
commit 8c0d00c43c

@ -1,14 +1,16 @@
# Broker
The broker is a central part of the ReCodEx backend that directs almost all
communication.
communication. It was designed to properly maintain heavy load of messages
by making only small actions in main communication thread and asynchronous
execution of other actions.
## Description
The broker's responsibilites are:
- allowing workers to register themselves and keep track of their capabilities
- tracking worker's status and handle cases when they crash
- tracking status of each worker and handle cases when they crash
- accepting assignment evaluation requests from the frontend and forwarding them
to workers
- receiving job status information from workers and forward it to the frontend
@ -17,7 +19,7 @@ The broker's responsibilites are:
## Architecture
The broker uses our ZeroMQ reactor to bind events on sockets to handler classes.
The broker uses our ZeroMQ _reactor_ to bind events on sockets to handler classes.
There are currently two handlers - one that handles the main functionality and
another one that sends status reports to the REST API asynchronously so that the
broker doesn't have to wait for HTTP requests which can take a lot of time,
@ -53,7 +55,7 @@ be the reason of the crash, its reassignment is also counted towards the
could process a job (i.e. it cannot be reassigned), the job is reported as
failed to the frontend via REST API.
**Broker failure** - when the broker itself crashes and is restarted, workers
**Broker failure** - when the broker itself crashed and is restarted, workers
will reconnect automatically. However, all jobs in their queues are lost. If a
worker manages to finish a job and notifies the "new" broker, the report is
forwarded to the frontend. The same goes for external failures. Jobs that fail
@ -64,22 +66,11 @@ headers - they are reported as failed immediately.
### Dependencies
Broker has similar basic dependencies as worker.
Broker has similar basic dependencies as worker, for recapitulation:
**Install ZeroMQ** in version at least 4.0
- Debian package is `libzmq3-dev`.
- RedHat packages are `zeromq` and `zeromq-devel`.
**Install YAML-CPP library**
- Debian packages: `libyaml-cpp0.5v5` and `libyaml-cpp-dev`.
- RedHat packages are `yaml-cpp` and `yaml-cpp-devel`.
**Install libcurl library**
- Debian package is `libcurl4-gnutls-dev`.
- RedHat package is `libcurl-devel`.
- ZeroMQ in version at least 4.0, packages `zeromq` and `zeromq-devel` (`libzmq3-dev` on Debian)
- YAML-CPP library, `yaml-cpp` and `yaml-cpp-devel` (`libyaml-cpp0.5v5` and `libyaml-cpp-dev` on Debian)
- libcurl library `libcurl-devel` (`libcurl4-gnutls-dev` on Debian)
### Clone broker source code repository
```
@ -91,7 +82,7 @@ $ git submodule update --init
It's supposed that your current working directory is that one with clonned worker source codes.
- Prepare environment running `mkdir build && cd build`
- Build sources by `cmake ..` following by `make -j#` where '#' symbol refers to number of your CPU threads.
- Build sources by `cmake ..` following by `make -j#` where `#` symbol refers to number of your CPU threads.
- Build binary package by `make package` (may require root permissions).
Note that `rpm` and `deb` packages are build in the same time. You may need to have `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit CPACK_GENERATOR variable _CMakeLists.txt_ file in root of source code tree.
- Install generated package through your package manager (`yum`, `dnf`, `dpkg`).
@ -100,10 +91,11 @@ _Note:_ If you don't want to generate binary packages, you can just install the
## Configuration and usage
Following text describes how to set up and run **broker** program. It's supposed to have required binaries installed. For instructions see [[Installation|Broker#installation]] section. Also, using systemd is recommended for best user experience, but it's not required. Almost all modern Linux distributions are using systemd now.
Following text describes how to set up and run broker program. It's supposed to have required binaries installed. Also, using systemd is recommended for best user experience, but it's not required. Almost all modern Linux distributions are using systemd now.
Installation of broker program does following step to your computer:
Installation of **broker** program does following step to your computer:
- create config file `/etc/recodex/brokerr/config.yml`
- create config file `/etc/recodex/broker/config.yml`
- create _systemd_ unit file `/etc/systemd/system/recodex-broker.service`
- put main binary to `/usr/bin/recodex-broker`
- create system user and group `recodex` with nologin shell (if not existing)
@ -114,29 +106,29 @@ Installation of **broker** program does following step to your computer:
#### Configuration items
Mandatory items are bold, optional italic.
- _clients_ - specifies address and port to bind for clients (eq. frontends)
- _address_ - hostname or IP address as string (`*` for any)
- _port_ - desired port
- _workers_ - specifies address and port to bind for workers
- _address_ - hostname or IP address as string (`*` for any)
- _port_ - desired port
- _max_liveness_ - maximum amount of pings the worker can fail to send before it is considered disconnected
- _max_request_failures_ - maximum number of times a job can fail (due to e.g. worker disconnect or a network error when downloading something from the fileserver) and be assigned again
- _monitor_ - settings of monitor service connection
- _address_ - IP address of running monitor service
- _port_ - desired port
- _notifier_ - details of connection which is used in case of errors and good to know states
- _address_ - address where frontend API runs
- _port_ - desired port
- _username_ - username which can be used for HTTP authentication
- _password_ - password which can be used for HTTP authentication
- _logger_ - settings of logging capabilities
- _file_ - path to the logging file with name without suffix. `/var/log/recodex/broker` item will produce `broker.log`, `broker.1.log`, ...
- _level_ - level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug`
- _max-size_ - maximal size of log file before rotating
- _rotations_ - number of rotation kept
Description of configurable items in broker's config. Mandatory items are bold, optional italic.
- _clients_ -- specifies address and port to bind for clients (eq. frontends)
- _address_ -- hostname or IP address as string (`*` for any)
- _port_ -- desired port
- _workers_ -- specifies address and port to bind for workers
- _address_ -- hostname or IP address as string (`*` for any)
- _port_ -- desired port
- _max_liveness_ -- maximum amount of pings the worker can fail to send before it is considered disconnected
- _max_request_failures_ -- maximum number of times a job can fail (due to e.g. worker disconnect or a network error when downloading something from the fileserver) and be assigned again
- _monitor_ -- settings of monitor service connection
- _address_ -- IP address of running monitor service
- _port_ -- desired port
- _notifier_ -- details of connection which is used in case of errors and good to know states
- _address_ -- address where frontend API runs
- _port_ -- desired port
- _username_ -- username which can be used for HTTP authentication
- _password_ -- password which can be used for HTTP authentication
- _logger_ -- settings of logging capabilities
- _file_ -- path to the logging file with name without suffix. `/var/log/recodex/broker` item will produce `broker.log`, `broker.1.log`, ...
- _level_ -- level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug`
- _max-size_ -- maximal size of log file before rotating
- _rotations_ -- number of rotation kept
#### Example config file
@ -159,7 +151,8 @@ notifier:
username: ""
password: ""
logger:
file: "/var/log/recodex/broker" # w/o suffix - actual names will be broker.log, broker.1.log, ...
file: "/var/log/recodex/broker" # w/o suffix - actual names will be
# broker.log, broker.1.log, ...
level: "debug" # level of logging
max-size: 1048576 # 1 MB; max size of file before log rotation
rotations: 3 # number of rotations kept
@ -167,8 +160,23 @@ logger:
### Running broker
Running broker is very similar to the worker setup. There is only one broker per whole ReCodEx solution, so there is no need for systemd templates. So running broker is just:
Running broker is very similar to the worker setup. There is also provided systemd unit file for convenient usage. There is only one broker per whole ReCodEx solution, so there is no need for systemd templates.
- Running broker can be done by following command:
```
# systemctl start recodex-broker.service
```
For more info please refer to worker part of this page or systemd documentation.
Check with
```
# systemctl status recodex-broker.service
```
if the broker is running. You should see "active (running)" message.
- Broker can be stopped or restarted accordigly using `systemctl stop` and `systemctl restart` commands.
- If you want to run broker after system startup, run:
```
# systemctl enable recodex-broker.service
```
For further information about using systemd please refer to systemd documentation.

@ -4,62 +4,92 @@
Monitor is part of ReCodEx solution for reporting progress of job evaluation back to user in the real time. It gets progress notifications from broker and sends them through WebSockets to clients' browsers. For now, it's meant as an optional part of whole solution, but for full experince it's recommended to use one.
Monitor is one per broker, that is one per separate ReCodEx instance. Also, monitor has to be publicly visible (has to have public IP address or be behind public proxy server) and also needs a connection to the broker.
Monitor is needed one per broker, that is one per separate ReCodEx instance. Also, monitor has to be publicly visible (has to have public IP address or be behind public proxy server) and also needs a connection to the broker. If the web application is using HTTPS, it's required to use a proxy for monitor to provide encryption over WebSockets. If this is not done, browsers of the users will block unencrypted connection and won't show the progress to the users.
## Architecture
Monitor is written in Python, tested versions are 3.4 and 3.5. For it's functionality following packages are required:
Monitor is written in Python, tested versions are 3.4 and 3.5. This language was chosen because it's already in project requirements (fileserver) and there are great libraries for ZeroMQ, WebSockets and asynchronous operations. This library saves system resources and provides us great amount of processed messages. Also, coding in Python was pretty simple and saves us time for improving the other parts of ReCodEx.
- zmq - binding to ZeroMQ framework
- websockets - framework for communication over WebSockets
- asyncio - library for fast asynchronous operations
- pyyaml - parsing YAML configuration files
- argparse - parsing command line arguments
For monitor functionality there are some required packages. All of them are listed in _requirements.txt_ file in the repository and can be installed by `pip` package manager as
```
$ pip install -r requirements.txt
```
**Description of dependencies:**
- zmq -- binding to ZeroMQ framework
- websockets -- framework for communication over WebSockets
- asyncio -- library for fast asynchronous operations
- pyyaml -- parsing YAML configuration files
- argparse -- parsing command line arguments
**Monitor's architecture and message flow describes following diagram:**
### Message flow
![Monitor message flow](https://raw.githubusercontent.com/ReCodEx/wiki/master/images/Monitor_arch.png)
![Message flow inside montior](https://raw.githubusercontent.com/ReCodEx/wiki/master/images/Monitor_arch.png)
Monitor runs in 2 threads - _Thread 1_ is main thread, which initializes all components (logger, ...), starts the other thread and runs the ZeroMQ part of the application - receives and parses incomming messages from broker and forwards them to _thread 2_ sending logic. _Thread 2_ is responsible for managing all of WebSocket connections asynchronously. Whole thread is one big _asyncio_ event loop through which are processed all actions. None of custom data types is thread-safe, so all events from other threads (actually only `send_message` method) must be called within the event loop (via `asyncio.loop.call_soon_threadsafe` function). Please note, that most of the Python interpreter use GIL ([Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock)), so there is actualy no parallelism in the performance point of view, but proper synchronization is still required!
Monitor runs in 2 threads. _Thread 1_ is the main thread, which initializes all components (logger for example), starts the other thread and runs the ZeroMQ part of the application. This thread receives and parses incomming messages from broker and forwards them to _thread 2_ sending logic.
**Handling of incomming messages:**
_Thread 2_ is responsible for managing all of WebSocket connections asynchronously. Whole thread is one big _asyncio_ event loop through which all actions are processed. None of custom data types in Python are thread-safe, so all events from other threads (actually only `send_message` method invocation) must be called within the event loop (via `asyncio.loop.call_soon_threadsafe` function). Please note, that most of the Python interpreters use [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock), so there is actualy no parallelism in the performance point of view, but proper synchronization is still required!
Incomming ZeroMQ message is received and parsed to JSON format (same as our WebSocket communication format). JSON string is then passed to _thread 2_ for sending. Each message has an identifier of channel where to send it to.
### Handling of incomming messages
There can be multiple receivers to one channel id. Each one has separate _asyncio.Queue_ instance where new messages are added. In addition to that, there is one list of all messages per channel. If a client connects a bit later, it'll receive all messages from the beginning. Messages are stored 5 minutes after last progress command (FINISHED), then are permanently deleted.
Incomming ZeroMQ progress message is received and parsed to JSON format (same as our WebSocket communication format). JSON string is then passed to _thread 2_ for asynchronous sending. Each message has an identifier of channel where to send it to.
There can be multiple receivers to one channel id. Each one has separate _asyncio.Queue_ instance where new messages are added. In addition to that, there is one list of all messages per channel. If a client connects a bit later than the point when monitor starts to receive messages, it'll receive all messages from the beginning. Messages are stored 5 minutes after last progress command (normally FINISHED) is received, then are permanently deleted.
Messages from client's queue are sent through corresponding WebSocket connection via main event loop as soon as possible. This approach with separate queue per connection is easy to implement and guarantees reliability and order of message delivery.
## Installation
Installation will provide you with following files:
Installation will provide you following files:
- `/usr/bin/recodex-monitor` - simple startup script located in PATH
- `/etc/recodex/monitor/config.yml` - configuration file
- `/etc/systemd/system/recodex-monitor.service` - systemd startup script
- code files will be installed in location depending on your settings, mostly into `/usr/lib/python3.5/site-packages/monitor/` or similar
- `/usr/bin/recodex-monitor` -- simple startup script located in PATH
- `/etc/recodex/monitor/config.yml` -- configuration file
- `/etc/systemd/system/recodex-monitor.service` -- systemd startup script
- code files will be installed in location depending on your system settings, mostly into `/usr/lib/python3.5/site-packages/monitor/` or similar
Systemd script runs monitor binary as specific _recodex_ user, so in `postinst` script user and group of this name are created. Also, ownership of configuration file will be granted to that user.
Make sure to allow TCP connection to WebSocket address and port specified in configuration in your firewall. Otherwise, monitor won't work.
- RPM distributions can make and install binary package. This can be done like this:
- run command
```
$ python3 setup.py bdist_rpm --post-install ./install/postints
```
to generate binary `.rpm` package or download precompiled one from releases tab of monitor GitHub repository (it's architecture independent package)
- install package using
```
# yum install ./dist/recodex-monitor-<version>-1.noarch.rpm
```
- Other Linux distributions can install cleaner straight
```
$ python3 setup.py install --install-scripts /usr/bin
# ./install/postinst
```
**Fedora (and other RPM distributions):**
## Configuration and usage
- run `python3 setup.py bdist_rpm --post-install ./install/postints` to generate binary `.rpm` package or download it from releases tab of monitor GitHub repository (it's architecture independent package)
- install package using `sudo dnf install ./dist/recodex-monitor-0.1.0-1.noarch.rpm` (number depends on actual version)
### Configuration
Configuration file is located in subdirectory `monitor` of standard ReCodEx configuration folder `/etc/recodex/`. It's in YAML format as all of the other configurations. Format is very similar to configurations of broker or workers.
**Other Linux systems:**
### Configuration items
- run installation as `python3 setup.py install --install-scripts /usr/bin`
- run postinst script as root - `sudo ./install/postinst`
Description of configurable items, bold ones are required, italics ones are optional.
## Configuration and usage
- _websocket_uri_ -- URI where is the endpoint of websocket connection. Must be visible to the clients (directly or through public proxy)
- string representation of IP address or a hostname
- port number
- _zeromq_uri_ -- URI where is the endpoint of zeromq connection from broker. Could be hidden from public internet.
- string representation of IP address or a hostname
- port number
- _logger_ -- settings of logging
- _file_ -- path with name of log file. Defaults to `/var/log/recodex/monitor.log`
- _level_ -- logging level, one of "debug", "info", "warning", "error" and "critical"
- _max-size_ -- maximum size of log file before rotation in bytes
- _rotations_ -- number of rotations kept
### Configuration
Configuration file is located in subdirectory `monitor` of standard ReCodEx configuration folder `/etc/recodex/`. It's in YAML format as all of the other configurations. Format is very similar to configurations of broker or workers. Example configuration file is here:
### Example configuration file
```{.yml}
---
@ -77,21 +107,27 @@ logger:
...
```
### Configuration items
### Usage
- **websocket_uri** - URI where is the endpoint of websocket connection. Must be visible to the clients (directly or through public proxy)!
- string representation of IP address or a hostname
- port number
- **zeromq_uri** - URI where is the endpoint of zeromq connection from broker. Could be hidden from public internet.
- string representation of IP address or a hostname
- port number
- **logger** - settings of logging
- **file** - path with name of log file. Defaults to `/var/log/recodex/monitor.log`
- **level** - logging level, one of "debug", "info", "warning", "error" and "critical"
- **max-size** - maximum size of log file before rotation in bytes
- **rotations** - number of rotations kept
Preferred way to start monitor as a service is via systemd as the other parts of ReCodEx solution.
### Usage
Preferred way to start monitor as a service is via systemd as `sudo systemctl start recodex-monitor.service`. After that, check if application lauch was successful using `sudo systemctl status recodex-monitor.service`. Now you should see green **Active (running)**.
- Running monitor is fairly simple:
```
# systemctl start recodex-monitor.service
```
- Current state can be obtained by
```
# systemctl status recodex-monitor.service
```
You should see green **Active (running)**.
- Setting up monitor to be started on system startup:
```
# systemctl enable recodex-monitor.service
```
Alternatively monitor can be started directly from command line.
Note that this command won't start monitor as a daemon.
```
$ recodex-monitor -c /etc/recodex/monitor/config.yml
```
Alternative starting command is `recodex-monitor -c /etc/recodex/monitor/config.yml` from command line. But note that it won't start monitor as a daemon.

@ -261,17 +261,17 @@ To add a new worker you need to do a few steps:
### Isolate
Isolate is used as one and only sandbox for linux-based operating systems. Headquarters of this project can be found at [GitHub](https://github.com/ioi/isolate) and more of its installation and setup can be found in [installation](#installation) section. Isolate uses linux kernel features for sandboxing and thus its security depends on them, namely kernel namespaces and cgroups are used. Similar functionality can now be partially achieved with systemd.
Isolate is used as one and only sandbox for linux-based operating systems. Headquarters of this project can be found at [GitHub](https://github.com/ioi/isolate) and more of its installation and setup can be found in [installation](#installation) section. Isolate uses linux kernel features for sandboxing and thus its security depends on them, namely _kernel namespaces_ and _cgroups_ are used. Similar functionality can now be partially achieved with systemd.
From the very beginning of ReCodEx project there was only one thing sure: isolate will be used. Almost everything else changed but isolate persist, this of course has some implications: main operating system of whole backend should be linux-based and worker will be designed to interact well with isolate. This precondition was fulfilled and worker has fully integrated isolate with almost all possible functionality which isolate provides. This also means that job configuration was heavily affected and reflects what isolate can do.
From the very beginning of ReCodEx project there was only one thing sure: Isolate will be used. Almost everything else changed but Isolate persist, this of course has some implications: main operating system of whole backend should be linux-based and worker will be designed to interact well with Isolate. This precondition was fulfilled and worker has fully integrated Isolate with almost all possible functionality which Isolate provides. This also means that job configuration was heavily affected and reflects Isolate's capabilities.
Isolate as sandbox provides wide scale of functionality which can be used to limit resources or even cut off particular resources from sandboxed program. There is of course basics like limiting cpu-time and memory consumption, but there can be found also wall-time (human perception of time) or extra-time which is extra limit added to other time limits to increase chance of successful exiting of sandboxed program. From other features there is limiting stack-size, redirection of stdin, stdout or stderr into/to file. Worth of mentioning is also defining number of processes/threads which can be created or defining environment variables which are passed to sandboxed program.
Isolate as sandbox provides wide scale of functionality which can be used to limit resources or even cut off particular resources from sandboxed program. There is of course basics like limiting cpu-time and memory consumption, but there can be found also wall-time (human perception of time) or extra-time which is extra limit added to other time limits to increase chance of successful exiting of sandboxed program. From other features there is limiting stack-size, redirection of stdin, stdout or stderr from/to a file. Worth of mentioning is also defining number of processes/threads which can be created or defining environment variables which are passed to sandboxed program.
Chapter by itself is filesystem handling. Isolate uses mount kernel namespace to create "virtual" filesystem which will be mounted in sandboxed program. By default there are only few read-only files/directories mapped into sandbox (described in isolate man-page). This can be of course changed by providing another numerous folders as isolate parameters. By default folders are mapped as read-only but isolate has few access options which can be set to some mount point.
Chapter by itself is filesystem handling. Isolate uses mount kernel namespace to create "virtual" filesystem which will be mounted in sandboxed program. By default there are only few read-only files/directories mapped into sandbox (described in Isolate man-page). This can be of course changed by providing another numerous folders as isolate parameters. By default folders are mapped as read-only but Isolate has few access options which can be set to some mount point.
#### Limit isolate boxes to particular cpu or memory node
New feature in isolate is possibility of limit isolate box to one or more cpu or memory node. This functionality is provided by cpusets kernel mechanism and is now integrated in isolate. It is allowed to set only `cpuset.cpus` and `cpuset.mems` which should be just fine for sandbox purposes. As kernel functionality further description can be found in manual page of cpuset or in linux documentation in section `linux/Documentation/cgroups/cpusets.txt`. As previously stated this settings can be applied for particular isolate boxes and has to be written in isolate configuration. Standard configuration path should be `/usr/local/etc/isolate` but it may depend on your installation process. Configuration of cpuset in there is really simple and is described in example below.
New feature in version 1.3 is possibility of limit Isolate box to one or more cpu or memory node. This functionality is provided by _cpusets_ kernel mechanism and is now integrated in isolate. It is allowed to set only `cpuset.cpus` and `cpuset.mems` which should be just fine for sandbox purposes. As kernel functionality further description can be found in manual page of _cpuset_ or in Linux documentation in section `linux/Documentation/cgroups/cpusets.txt`. As previously stated this settings can be applied for particular Isolate boxes and has to be written in Isolate configuration. Standard configuration path should be `/usr/local/etc/isolate` but it may depend on your installation process. Configuration of _cpuset_ in there is really simple and is described in example below.
```
box0.cpus = 0 # assign processor ID 0 to isolate box with ID 0
@ -283,9 +283,8 @@ box2.mems = 4-7 # assign range of memory nodes
box3.cpus = 1,2,3 # assign list of processors to isolate box 3
```
**cpuset.cpus:** Cpus limitation will restrict sandboxed program only to processor threads set in configuration. On hyperthreaded processors this means that all virtual threads are assignable not only the physical ones. Value can be represented by single one, list of values separated by commas or range with hyphen delimiter.
**cpuset.mems:** This value is particularly handy on NUMA systems which has several memory nodes. On standard desktop computers this value should always be zero because only one independent memory node is present. As stated in `cpus` limitation there can be single value, list of values separated by comma or range stated with hyphen.
- **cpuset.cpus:** Cpus limitation will restrict sandboxed program only to processor threads set in configuration. On hyperthreaded processors this means that all virtual threads are assignable, not only the physical ones. Value can be represented by single number, list of numbers separated by commas or range with hyphen delimiter.
- **cpuset.mems:** This value is particularly handy on NUMA systems which has several memory nodes. On standard desktop computers this value should always be zero because only one independent memory node is present. As stated in `cpus` limitation there can be single value, list of values separated by comma or range stated with hyphen.
### WrapSharp

Loading…
Cancel
Save