some juicy information on heartbeating

master
Teyras 9 years ago
parent 826cf6f8d8
commit a642941556

@ -64,7 +64,10 @@ Broker is server when comminicating with worker. IP address and port are configu
Commands from broker to worker: Commands from broker to worker:
- **eval** - evaluate a job. See **eval** command in [[Communication#main-communication]] - **eval** - evaluate a job. See **eval** command in [[Communication#main-communication]]
- **intro** - introduce yourself to the broker (with **init** command) - **intro** - introduce yourself to the broker (with **init** command) - this is
required when the broker loses track of the worker who sent the command.
Possible reasons for such event are e.g. that one of the communicating sides
shut down and restarted whithout the other side noticing.
- **pong** - reply to **ping** command, no arguments - **pong** - reply to **ping** command, no arguments
Commands from worker to broker: Commands from worker to broker:
@ -76,6 +79,25 @@ Commands from worker to broker:
- **progress** - evaluation progress report, see **progress** command in [[Communication#progress-callback]] - **progress** - evaluation progress report, see **progress** command in [[Communication#progress-callback]]
- **ping** - tell broker I'm alive, no arguments - **ping** - tell broker I'm alive, no arguments
### Heartbeating
It is important for the broker and workers to know if the other side is still
working (and connected). This is achieved with a simple heartbeating protocol.
The protocol requires the workers to send a **ping** command regularly (the
interval is configurable on both sides - future releases might let the worker
send its ping interval with the **init** command). Upon receiving a **ping**
command, the broker responds with **pong**.
Both sides keep track of missing heartbeating messages since the last one was
received. When this number reaches a threshold (called maximum liveness), the
other side is considered dead.
When the broker decides a worker died, it tries to reschedule its jobs to other
workers.
If a worker thinks the broker is dead, it tries to reconnect with a bounded,
exponentially increasing delay.
## Worker - File Server communication ## Worker - File Server communication

Loading…
Cancel
Save