Detailed communication inside the ReCodEx project is captured in the following image and described in sections below. Red connections are through ZeroMQ sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unles explicitly specified otherwise).
Detailed communication inside the ReCodEx system is captured in the following
image and described in sections below. Red connections are through ZeroMQ
sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ
messages are sent as multipart with one string (command, option) per part, with
no empty frames (unles explicitly specified otherwise).
@ -88,26 +92,40 @@ interval is configurable on both sides -- future releases might let the worker
send its ping interval with the **init** command). Upon receiving a **ping**
command, the broker responds with **pong**.
Both sides keep track of missing heartbeating messages since the last one was
received. When this number reaches a threshold (called maximum liveness), the
other side is considered dead.
Whenever a heartbeating message doesn't arrive, a counter called _liveness_ is
decreased. When this counter drops to zero, the other side is considered
disconnected. When a message arrives, the liveness counter is set back to its
maximum value, which is configurable for both sides.
When the broker decides a worker died, it tries to reschedule its jobs to other
workers.
When the broker decides a worker disconnected, it tries to reschedule its jobs
to other workers.
If a worker thinks the broker is dead, it tries to reconnect with a bounded,
exponentially increasing delay.
This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages does not flood the network when there are problems. We experienced no issues since we are using this protocol.
If a worker thinks the broker crashed, it tries to reconnect periodically, with
a bounded, exponentially increasing delay.
This protocol proved great robustness in real world testing. Thus whole backend
is reliable and can outlive short term issues with connection without problems.
Also, increasing delay of ping messages does not flood the network when there
are problems. We experienced no issues since we are using this protocol.
### Worker - File Server communication
Worker is communicating with file server only from _execution thread_. Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you have not one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request.
Worker is communicating with file server only from _execution thread_. Supported
protocol is HTTP optionally with SSL encryption (**recommended**). If supported
by server and used version of libcurl, HTTP/2 standard is also available. File
server should be set up to require basic HTTP authentication and worker is
capable to send corresponding credentials with each request.
#### Worker side
Worker is cabable of 2 things -- download file and upload file. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >= 400). Worker have list of credentials to all available file servers in its config file.
Workers comunicate with the file server in both directions -- they download
student's submissions and then upload evaluation results. Internally, worker is
using libcurl C library with very similar setup. In both cases it can verify
HTTPS certificate (on Linux against system cert list, on Windows against
downloaded one from CURL website during installation), support basic HTTP
authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error
(returned HTTP status code is >=400). Worker have list of credentials to all
available file servers in its config file.
- download file -- standard HTTP GET request to given URL expecting file content as response
- upload file -- standard HTTP PUT request to given URL with file data as body -- same as command line tool `curl` with option `--upload-file`