diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 5735bc3..6b648ea 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -866,10 +866,24 @@ distributing jobs that it receives from the frontend between them. #### Worker management -@todo initialization - broker is fixed, workers connect to it - -@todo heartbeating - workers send ping, the inverse is possible, too (doesn't -really matter) +It is intended for the broker to be a fixed part of the backend infrastructure +to which workers connect at will. Thanks to this design, workers can be added +and removed when necessary (and possibly in an automated fashion), without +changing the configuration of the broker. An alternative solution would be +configuring a list of workers before startup, thus making them passive in the +communication (in the sense that they just wait for incoming jobs instead of +connecting to the broker). However, this approach comes with a notable +administration overhead -- in addition to starting a worker, the administrator +would have to update the worker list. + +Worker management must also take into account the possibility of worker +disconnection, either because of a network or software failure (or termination). +A common way to detect such events in distributed systems is to periodically +send short messages to other nodes and expect a response. When these messages +stop arriving, we presume that the other node encountered a failure. Both the +broker and workers can be made responsible for initiating these exchanges and it +seems that there are no differences stemming from this choice. We decided that +the workers will be the active party that initiates the exchange. #### Scheduling