diff --git a/Overall-architecture.md b/Overall-architecture.md index 7906e91..1b68954 100644 --- a/Overall-architecture.md +++ b/Overall-architecture.md @@ -11,7 +11,11 @@ ## Communication -Detailed communication inside the ReCodEx project is captured in the following image and described in sections below. Red connections are through ZeroMQ sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unles explicitly specified otherwise). +Detailed communication inside the ReCodEx system is captured in the following +image and described in sections below. Red connections are through ZeroMQ +sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ +messages are sent as multipart with one string (command, option) per part, with +no empty frames (unles explicitly specified otherwise). ![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png) @@ -88,26 +92,40 @@ interval is configurable on both sides -- future releases might let the worker send its ping interval with the **init** command). Upon receiving a **ping** command, the broker responds with **pong**. -Both sides keep track of missing heartbeating messages since the last one was -received. When this number reaches a threshold (called maximum liveness), the -other side is considered dead. +Whenever a heartbeating message doesn't arrive, a counter called _liveness_ is +decreased. When this counter drops to zero, the other side is considered +disconnected. When a message arrives, the liveness counter is set back to its +maximum value, which is configurable for both sides. -When the broker decides a worker died, it tries to reschedule its jobs to other -workers. +When the broker decides a worker disconnected, it tries to reschedule its jobs +to other workers. -If a worker thinks the broker is dead, it tries to reconnect with a bounded, -exponentially increasing delay. - -This protocol proved great robustness in real world testing. Thus whole backend is really reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages does not flood the network when there are problems. We experienced no issues since we are using this protocol. +If a worker thinks the broker crashed, it tries to reconnect periodically, with +a bounded, exponentially increasing delay. +This protocol proved great robustness in real world testing. Thus whole backend +is reliable and can outlive short term issues with connection without problems. +Also, increasing delay of ping messages does not flood the network when there +are problems. We experienced no issues since we are using this protocol. ### Worker - File Server communication -Worker is communicating with file server only from _execution thread_. Supported protocol is HTTP optionally with SSL encryption (**recommended**, you can get free trusted DV certificate from [Let's Encrypt](https://letsencrypt.org/) authority if you have not one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request. +Worker is communicating with file server only from _execution thread_. Supported +protocol is HTTP optionally with SSL encryption (**recommended**). If supported +by server and used version of libcurl, HTTP/2 standard is also available. File +server should be set up to require basic HTTP authentication and worker is +capable to send corresponding credentials with each request. #### Worker side -Worker is cabable of 2 things -- download file and upload file. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >= 400). Worker have list of credentials to all available file servers in its config file. +Workers comunicate with the file server in both directions -- they download +student's submissions and then upload evaluation results. Internally, worker is +using libcurl C library with very similar setup. In both cases it can verify +HTTPS certificate (on Linux against system cert list, on Windows against +downloaded one from CURL website during installation), support basic HTTP +authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error +(returned HTTP status code is >=400). Worker have list of credentials to all +available file servers in its config file. - download file -- standard HTTP GET request to given URL expecting file content as response - upload file -- standard HTTP PUT request to given URL with file data as body -- same as command line tool `curl` with option `--upload-file`