You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
recodex-wiki/Communication.md

7.6 KiB

Communication

This section gives detailed overview about communication in ReCodEx solution. Basic concept is captured on following image:

Communication Img

Red connections are through ZeroMQ sockets, Blue are through WebSockets and Green are through HTTP. All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unles explicitly specified otherwise).

Internal worker communication

Communication between the two worker threads is split into two separate parts, each one holding dedicated connection line. These internal lines are realized by ZeroMQ inproc PAIR sockets. For this section assume that the thread of the worker which communicates with broker is called listening thread and the other one, which is evaluating incoming jobs is called job thread. Listening thread is at both cases server (here is called bind() method), but because of ZeroMQ function it's not much important (connect() call in clients can precede server bind() call with no issue).

Main communication

Main communication is on inproc://jobs sockets. Listening thread is waiting for any messages (from broker, jobs and progress sockets) and handle incoming requests properly.

Commands from listening thread to job thread:

  • eval - evaluate a job. Requires 3 arguments:
    • job_id - identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
    • job_url - URI location of archive with job configuration and submitted source code
    • result_url - remote URI where results will be pushed to

Commands from job thread to listening thread:

  • done - notifying of finished job. Requires 2 arguments:
    • job_id - identifier of finished job
    • result - response result, one of "OK" and "ERR"

Note that we will need to store the job ID and the assignment configuration somewhere close to the submitted files so it's possible to check how a submission was evaluated. The job ID will likely be a part of the submission's path. The configuration could be linked there under some well-known name.

Progress callback

Progress messages are sent through inproc://progress sockets. This is only one way communication from job thread to the listening thread.

Commands:

  • progress - notice about evaluation progress. Requires 2 or 4 arguments:
    • job_id - identifier of current job
    • state - what is happening now. One of "DOWNLOADED" (submission successfuly fetched), "UPLOADED" (results are uploaded to fileserver), "STARTED" (evaluation started), "ENDED" (evaluation is finnished) and "TASK" (task state changed - see below)
    • task_id - only present for "TASK" state - identifier of task in current job
    • task_state - only present for "TASK" state - result of task evaluation. One of "COMPLETED" and "FAILED".

Broker - Worker communication

Broker is server when comminicating with worker. IP address and port are configurable, protocol is TCP. Worker socket is DEALER, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on it's init command).

Commands from broker to worker:

  • eval - evaluate a job. See eval command in Communication#main-communication
  • intro - introduce yourself to the broker (with init command)
  • pong - reply to ping command, no arguments

Commands from worker to broker:

  • init - introduce yourself to the broker. Useful on startup or after reestablishing lost connection. Requires at least two arguments:
    • hwgroup - hardware group of this worker
    • header - additional header describing worker capabilities. Format must be header_name=value, every header shall be in a separate message frame. There is no maximum limit on number of headers.
  • done - job evaluation finished, see done command in Communication#main-communication.
  • progress - evaluation progress report, see progress command in Communication#progress-callback
  • ping - tell broker I'm alive, no arguments

Worker - File Server communication

TODO:

Broker - Monitor communication

Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of socket is same on both sides, ROUTER. Monitor is set as server in this communication, it's IP address and port are configurable in monitor's config file. ZeroMQ socket ID (set on monitor's side) is "recodex-monitor" and must be sent as first frame of every multipart message - see ZeroMQ ROUTER socket documentation for more info.

Monitor is treated somehow as optional part of whole solution, so no special effort on communication realibility was made.

Commands from monitor to broker:

There are none commands yet. Any message from monitor to broker is logged and discarded.

Commands from broker to monitor:

Broker - Frontend communication

Broker communicates with frontend through ZeroMQ connection over TCP. Socket type on broker side is ROUTER, on frontend part it's REQ. Broker has server role, his IP address and port is configurable in frontend.

Commands from frontend to broker:

  • eval - evaluate a job. Requires 3 arguments:
    • job_id - identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
    • header - additional header describing worker capabilities. Format must be header_name=value, every header shall be in a separate message frame. There is no maximum limit on number of headers.
    • empty frame (with empty string)
    • job_url - URI location of archive with job configuration and submitted source code
    • result_url - remote URI where results will be pushed to

Commands from broker to frontend (all are responses to eval command):

  • accept - broker is capable of routing request to a worker
  • reject - broker can't handle this job (for example when the requirements specified by the headers cannot be met)

File Server - Frontend communication

TODO:

Monitor - Browser communication

Monitor interacts with browser through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are also configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be at most one listener per channel, latter connection replaces previous one. After establishing the connection, the message "Connection established" is sent from monitor to browser.

When monitor receives "progress" message from broker there are two options:

  • there is no WebSocket connection for listed channel (job id) - message is dropped
  • there is active WebSocket connection for listed channel - message is parsed into JSON format (see below) and send as string to browser. Messages for active connections are queued, so no messages are discarded even on heavy workload.

Message JSON format is dictionary with keys:

  • command - type of progress. One of "STARTED" (evaluation started), "DOWNLOADED" (submission source downloaded), "TASK" (progress on one of the tasks), "UPLOADED" (results are uploaded), "ENDED" (evaluation ended)
  • task_id - id of currently evaluated task. Present only if command is "TASK".
  • task_state - state of task with id task_id. Present only if command is "TASK".

Frontend - Browser communication

TODO: