You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
159 lines
7.4 KiB
Markdown
159 lines
7.4 KiB
Markdown
# Broker
|
|
|
|
The broker is a cental part of the ReCodEx backend that directs almost all
|
|
communication.
|
|
|
|
## Description
|
|
|
|
The broker's responsibilites are:
|
|
|
|
- allowing workers to register themselves and keep track of their capabilities
|
|
- tracking worker's status and handle cases when they crash
|
|
- accepting assignment evaluation requests from the frontend and forwarding them
|
|
to workers
|
|
- receiving job status information from workers and forward it to the frontend
|
|
either via monitor or REST API
|
|
- notifying the frontend of errors in the backend
|
|
|
|
## Architecture
|
|
|
|
The broker uses our ZeroMQ reactor to bind events on sockets to handler classes.
|
|
There are currently two handlers - one that handles the main functionality and
|
|
another one that sends status reports to the REST API asynchronously so that the
|
|
broker doesn't have to wait for HTTP requests which can take a lot of time,
|
|
especially when some kind of error happens on the server.
|
|
|
|
### Worker registry
|
|
|
|
The `worker_registry` class is used to store information about workers, their
|
|
status and the jobs in their queue. It can look up a worker using the headers
|
|
received with a request. It also uses a basic load balancing algorithm - the
|
|
workers are contained in a queue and whenever one of them receives a job, it's
|
|
moved to the back, which makes it less likely to receive another job soon.
|
|
|
|
When a worker is assigned a job, it won't be assigned another one until we
|
|
receive a `done` message from it.
|
|
|
|
### Error handling
|
|
|
|
**Job failure** - we recognize two ways a job can fail - an internally and
|
|
externally. An internal failure is the worker's fault - for example when it
|
|
can't download a file needed for the evaluation for some reason. An external
|
|
error is for example when the job configuration is malformed. Note that we don't
|
|
consider a student entering an incorrect solution a job failure.
|
|
|
|
Jobs that failed internally are reassigned until a limit on the amount of
|
|
reassingments (configurable with the `max_request_failures` option) is reached.
|
|
External failures are reported to the frontend immediately.
|
|
|
|
**Worker failure** - when a worker crash is detected, we attempt to reassign its
|
|
current job and also all the jobs from its queue. Because the current job might
|
|
be the reason of the crash, its reassignment is also counted towards the
|
|
`max_request_failures` limit (the counter is shared). If there is no worker that
|
|
could process a job (i.e. it cannot be reassigned), the job is reported as
|
|
failed to the frontend via REST API.
|
|
|
|
**Broker failure** - when the broker itself crashes and is restarted, workers
|
|
will reconnect automatically. However, all jobs in their queues are lost. If a
|
|
worker manages to finish a job and notifies the "new" broker, the report is
|
|
forwarded to the frontend. The same goes for external failures. Jobs that fail
|
|
internally cannot be reassigned, because the "new" broker doesn't know their
|
|
headers - they are reported as failed immediately.
|
|
|
|
## Installation
|
|
|
|
### Dependencies
|
|
Broker has no special dependencies only the ones written in [[Common install|Overall architecture#common-install]] chapter.
|
|
|
|
### Clone broker source code repository
|
|
```
|
|
$ git clone https://github.com/ReCodEx/broker.git
|
|
$ git submodule update --init
|
|
```
|
|
|
|
### Install broker
|
|
It's supposed that your current working directory is that one with clonned worker source codes.
|
|
|
|
- Prepare environment running `mkdir build && cd build`
|
|
- Build sources by `cmake ..` following by `make -j#` where '#' symbol refers to number of your CPU threads.
|
|
- Build binary package by `make package` (may require root permissions).
|
|
Note that `rpm` and `deb` packages are build in the same time. You may need to have `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit CPACK_GENERATOR variable _CMakeLists.txt_ file in root of source code tree.
|
|
- Install generated package through your package manager (`yum`, `dnf`, `dpkg`).
|
|
|
|
_Note:_ If you don't want to generate binary packages, you can just install the project with `make install` (as root). But installation through your distribution's package manager is preferred way to keep your system clean and manageable in long term horizon.
|
|
|
|
|
|
## Configuration and usage
|
|
Following text describes how to set up and run **broker** program. It's supposed to have required binaries installed. For instructions see [[Installation|Broker#installation]] section. Also, using systemd is recommended for best user experience, but it's not required. Almost all modern Linux distributions are using systemd now.
|
|
|
|
Installation of **broker** program does following step to your computer:
|
|
- create config file `/etc/recodex/brokerr/config.yml`
|
|
- create _systemd_ unit file `/etc/systemd/system/recodex-broker.service`
|
|
- put main binary to `/usr/bin/recodex-broker`
|
|
- create system user and group `recodex` with nologin shell (if not existing)
|
|
- create log directory `/var/log/recodex`
|
|
- set ownership of config (`/etc/recodex`) and log (`/var/log/recodex`) directories to `recodex` user and group
|
|
|
|
### Default broker configuration
|
|
|
|
#### Configuration items
|
|
|
|
Mandatory items are bold, optional italic.
|
|
|
|
- _clients_ - specifies address and port to bind for clients (eq. frontends)
|
|
- _address_ - hostname or IP address as string (`*` for any)
|
|
- _port_ - desired port
|
|
- _workers_ - specifies address and port to bind for workers
|
|
- _address_ - hostname or IP address as string (`*` for any)
|
|
- _port_ - desired port
|
|
- _max_liveness_ - maximum amount of pings the worker can fail to send before it is considered disconnected
|
|
- _max_request_failures_ - maximum number of times a job can fail (due to e.g. worker disconnect or a network error when downloading something from the fileserver) and be assigned again
|
|
- _monitor_ - settings of monitor service connection
|
|
- _address_ - IP address of running monitor service
|
|
- _port_ - desired port
|
|
- _notifier_ - details of connection which is used in case of errors and good to know states
|
|
- _address_ - address where frontend API runs
|
|
- _port_ - desired port
|
|
- _username_ - username which can be used for HTTP authentication
|
|
- _password_ - password which can be used for HTTP authentication
|
|
- _logger_ - settings of logging capabilities
|
|
- _file_ - path to the logging file with name without suffix. `/var/log/recodex/broker` item will produce `broker.log`, `broker.1.log`, ...
|
|
- _level_ - level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug`
|
|
- _max-size_ - maximal size of log file before rotating
|
|
- _rotations_ - number of rotation kept
|
|
|
|
#### Example config file
|
|
|
|
```{.yml}
|
|
# Address and port for clients (frontend)
|
|
clients:
|
|
address: "*"
|
|
port: 9658 # Address and port for workers
|
|
workers:
|
|
address: "*"
|
|
port: 9657
|
|
max_liveness: 10
|
|
max_request_failures: 3
|
|
monitor:
|
|
address: "127.0.0.1"
|
|
port: 7894
|
|
notifier:
|
|
address: "127.0.0.1"
|
|
port: 8080
|
|
username: ""
|
|
password: ""
|
|
logger:
|
|
file: "/var/log/recodex/broker" # w/o suffix - actual names will be broker.log, broker.1.log, ...
|
|
level: "debug" # level of logging
|
|
max-size: 1048576 # 1 MB; max size of file before log rotation
|
|
rotations: 3 # number of rotations kept
|
|
```
|
|
|
|
### Running broker
|
|
|
|
Running broker is very similar to the worker setup. There is only one broker per whole ReCodEx solution, so there is no need for systemd templates. So running broker is just:
|
|
```
|
|
# systemctl start recodex-broker.service
|
|
```
|
|
For more info please refer to worker part of this page or systemd documentation.
|