|
|
|
@ -866,10 +866,24 @@ distributing jobs that it receives from the frontend between them.
|
|
|
|
|
|
|
|
|
|
#### Worker management
|
|
|
|
|
|
|
|
|
|
@todo initialization - broker is fixed, workers connect to it
|
|
|
|
|
|
|
|
|
|
@todo heartbeating - workers send ping, the inverse is possible, too (doesn't
|
|
|
|
|
really matter)
|
|
|
|
|
It is intended for the broker to be a fixed part of the backend infrastructure
|
|
|
|
|
to which workers connect at will. Thanks to this design, workers can be added
|
|
|
|
|
and removed when necessary (and possibly in an automated fashion), without
|
|
|
|
|
changing the configuration of the broker. An alternative solution would be
|
|
|
|
|
configuring a list of workers before startup, thus making them passive in the
|
|
|
|
|
communication (in the sense that they just wait for incoming jobs instead of
|
|
|
|
|
connecting to the broker). However, this approach comes with a notable
|
|
|
|
|
administration overhead -- in addition to starting a worker, the administrator
|
|
|
|
|
would have to update the worker list.
|
|
|
|
|
|
|
|
|
|
Worker management must also take into account the possibility of worker
|
|
|
|
|
disconnection, either because of a network or software failure (or termination).
|
|
|
|
|
A common way to detect such events in distributed systems is to periodically
|
|
|
|
|
send short messages to other nodes and expect a response. When these messages
|
|
|
|
|
stop arriving, we presume that the other node encountered a failure. Both the
|
|
|
|
|
broker and workers can be made responsible for initiating these exchanges and it
|
|
|
|
|
seems that there are no differences stemming from this choice. We decided that
|
|
|
|
|
the workers will be the active party that initiates the exchange.
|
|
|
|
|
|
|
|
|
|
#### Scheduling
|
|
|
|
|
|
|
|
|
|