communication

8 years ago · 597060e9c3
parent 7cecbabdc2
commit 597060e9c3
1 changed files with 224 additions and 1 deletions
--- a/Overall-architecture.md
+++ b/Overall-architecture.md
@ -2,8 +2,231 @@
 ## Components
 TODO:
 ## Communication
-TODO:
+This section gives detailed overview about communication in ReCodEx solution. Basic concept is captured on following image:
 ![Communication Img](https://github.com/ReCodEx/GlobalWiki/raw/master/images/Backend_Connections.png)
 Red connections are through ZeroMQ sockets, Blue are through WebSockets and Green are through HTTP. All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unles explicitly specified otherwise).
 ### Internal worker communication
 Communication between the two worker threads is split into two separate parts, 
 each one holding dedicated connection line. These internal lines are realized by 
 ZeroMQ inproc PAIR sockets. In this section we assume that the thread of the 
 worker which communicates with broker is called _listening thread_ and the other 
 one, which is evaluating incoming jobs is called _execution thread_. _Listening 
 thread_ is a server in both cases (the one who calls the `bind()` method), but 
 because of how ZeroMQ works, it's not very important (`connect()` call in 
 clients can precede server `bind()` call with no issue).
 #### Main communication
 Main communication is on `inproc://jobs` sockets. _Listening thread_ is waiting 
 for any messages (from broker, jobs and progress sockets) and passes incoming 
 requests to the _execution thread_, which handles them properly.
 Commands from _listening thread_ to _execution thread_:
 - **eval** - evaluate a job. Requires 3 message frames:
    - `job_id` - identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
    - `job_url` - URI location of archive with job configuration and submitted source code
    - `result_url` - remote URI where results will be pushed to
 Commands from _execution thread_ to _listening thread_:
 - **done** - notifying of finished job. Requires 2 message frames:
    - `job_id` - identifier of finished job
    - `result` - response result, possible values below
        - OK - everything ok
        - FAILED - execution failed and cannot be reassigned to another worker (due to error in configuration for example)
        - INTERNAL_ERROR - execution failed due to internal worker error, but other worker possibly can execute this without error
    - `message` - non-empty error description if result was not "OK"
 #### Progress callback
 Progress messages are sent through `inproc://progress` sockets. This is only one way communication from _execution thread_ to the _listening thread_.
 Commands:
 - **progress** - notice about evaluation progress. Requires 2 or 4 arguments:
    - `job_id` - identifier of current job
    - `state` - what is happening now. 
        - DOWNLOADED - submission successfuly fetched from fileserver
        - FAILED - something bad happened and job was not executed at all
        - UPLOADED - results are uploaded to fileserver
        - STARTED - evaluation of tasks started
        - ENDED - evaluation of tasks is finnished
        - ABORTED - evaluation of job encountered internal error, job will be rescheduled to another worker
        - FINISHED - whole execution is finished and worker ready for another job execution
        - TASK - task state changed - see below
    - `task_id` - only present for "TASK" state - identifier of task in current job
    - `task_state` - only present for "TASK" state - result of task evaluation. One of "COMPLETED", "FAILED" and "SKIPPED".
        - COMPLETED - task was successfully executed without any error, subsequent task will be executed
        - FAILED - task ended up with some error, subsequent task will be skipped
        - SKIPPED - some of the previous dependencies failed to execute, so this task wont be executed at all
 ### Broker - Worker communication
 Broker is server when communicating with worker. IP address and port are configurable, protocol is TCP. Worker socket is DEALER, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on it's **init** command).
 Commands from broker to worker:
 - **eval** - evaluate a job. See **eval** command in [[Communication#main-communication]]
 - **intro** - introduce yourself to the broker (with **init** command) - this is 
  required when the broker loses track of the worker who sent the command. 
  Possible reasons for such event are e.g. that one of the communicating sides 
  shut down and restarted without the other side noticing.
 - **pong** - reply to **ping** command, no arguments
 Commands from worker to broker:
 - **init** - introduce yourself to the broker. Useful on startup or after reestablishing lost connection. Requires at least 2 arguments:
    - `hwgroup` - hardware group of this worker
    - `header` - additional header describing worker capabilities. Format must be `header_name=value`, every header shall be in a separate message frame. There is no maximum limit on number of headers.
 - **done** - job evaluation finished, see **done** command in [[Communication#main-communication]].
 - **progress** - evaluation progress report, see **progress** command in [[Communication#progress-callback]]
 - **ping** - tell broker I'm alive, no arguments
 #### Heartbeating
 It is important for the broker and workers to know if the other side is still 
 working (and connected). This is achieved with a simple heartbeating protocol.
 The protocol requires the workers to send a **ping** command regularly (the 
 interval is configurable on both sides - future releases might let the worker 
 send its ping interval with the **init** command). Upon receiving a **ping** 
 command, the broker responds with **pong**.
 Both sides keep track of missing heartbeating messages since the last one was 
 received. When this number reaches a threshold (called maximum liveness), the 
 other side is considered dead.
 When the broker decides a worker died, it tries to reschedule its jobs to other 
 workers.
 If a worker thinks the broker is dead, it tries to reconnect with a bounded,
 exponentially increasing delay.
 ### Worker - File Server communication
 Worker is communicating with file server only from _execution thread_. Supported is HTTP protocol optionally with SSL encryption (**recommended**, you can get free certificate from [Let's Encrypt](https://letsencrypt.org/) if you haven't one yet). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request.
 #### Worker point of view
 Worker is cabable of 2 things - download file and upload file. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from their website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >= 400). Worker have list of credentials to all available file servers in it's config file.
 - download file - standard HTTP GET request to given URL expecting content as response
 - upload file - standard HTTP PUT request to given URL with file data as body - same as command line tool `curl` with option `--upload-file`
 #### File server point of view
 File server has it's internal directory structure, where all the files are stored. It provides REST API to get them or create new ones. File server doesn't provide authentication or secured connection by itself, but it's supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. For communication with worker are relevant these commands:
 - **GET /submission_archives/\<id\>.\<ext\>** - gets an archive with submitted source code and corresponding configuration of this job evaluation
 - **GET /tasks/\<hash\>** - gets a file, common usage is for input files or reference result files
 - **PUT /results/\<id\>.\<ext\>** - upload archive with evaluation results under specified name (should be same _id_ as name of submission archive). On successful upload returns JSON `{ "result": "OK" }` as body of returned page.
 If not specified otherwise, `zip` format of archives is used. Symbol `/` in API description is root of file server's domain. If the domain is for example `fs.recodex.org` with SSL support, getting input file for one task could look as GET request to `https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c`.
 ### Broker - Monitor communication
 Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of 
 socket is same on both sides, ROUTER. Monitor is set as server in this 
 communication, its IP address and port are configurable in monitor's config 
 file. ZeroMQ socket ID (set on monitor's side) is "recodex-monitor" and must be 
 sent as first frame of every multipart message - see ZeroMQ ROUTER socket 
 documentation for more info. 
 Note that the monitor is designed so that it can receive data both from the 
 broker and workers. The current architecture prefers the broker to do all the 
 communication so that the workers don't have to know too many network services.
 Monitor is treated as a somewhat optional part of whole solution, so no special 
 effort on communication realibility was made.
 Commands from monitor to broker:
 Because there is no need for the monitor to communicate with the broker, there 
 are no commands so far. Any message from monitor to broker is logged and 
 discarded.
 Commands from broker to monitor:
 - **progress** - notification about progress with job evaluation. See [[Communication#progress-callback]] for more info.
 ### Broker - Frontend communication
 Broker communicates with frontend through ZeroMQ connection over TCP. Socket 
 type on broker side is ROUTER, on frontend part it's REQ. Broker acts as a 
 server, its IP address and port is configurable in frontend.
 Commands from frontend to broker:
 - **eval** - evaluate a job. Requires at least 4 frames:
    - `job_id` - identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
    - `header` - additional header describing worker capabilities. Format must be `header_name=value`, every header shall be in a separate message frame. There is no maximum limit on number of headers. There may be also no headers at all.
    - empty frame (with empty string)
    - `job_url` - URI location of archive with job configuration and submitted source code
    - `result_url` - remote URI where results will be pushed to
 Commands from broker to frontend (all are responses to **eval** command):
 - **accept** - broker is capable of routing request to a worker
 - **reject** - broker can't handle this job (for example when the requirements 
  specified by the headers cannot be met). There are (rare) cases when the 
  broker finds that it cannot handle the job after it's been confirmed. In such 
  cases it uses the frontend REST API to mark the job as failed.
 ### File Server - Frontend communication
 File server has a REST API for interaction with other parts of ReCodEx. Description communication with workers is in [[Communication#file-server-point-of-view]]. On top of that, there are other command for interaction with frontend:
 - **GET /results/\<id\>.\<ext\>** - download archive with evaluated results of job _id_
 - **POST /submissions/\<id\>** - upload new submission with identifier _id_. Expects that the body of the POST request uses file paths as keys and the content of the files as values. On successful upload returns JSON `{ "archive_path": <archive_url>, "result_path": <result_url> }` in response body. From _archive_path_ can be the submission downloaded (by worker) and corresponding evaluation results shouldbe uploaded to _result_path_.
 - **POST /tasks** - upload new files, which will be available by names eqal to `sha1sum` of their content. There can be uploaded more files at once. On successful upload returns JSON `{ "result": "OK", "files": <file_list> }` in response body, where _file_list_ is dictionary of original file name as key and new URL with already hashed name as value.
 There are no plans yet to support deleting files from this API. This may change in time.
 **TODO: frontend side**
 ### Monitor - Browser communication
 Monitor interacts with browser through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are also configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be at most one listener per channel, latter connection replaces previous one.
 When monitor receives "progress" message from broker there are two options:
 - there is no WebSocket connection for listed channel (job id) - message is dropped
 - there is active WebSocket connection for listed channel - message is parsed into JSON format (see below) and send as string to browser. Messages for active connections are queued, so no messages are discarded even on heavy workload.
 Message JSON format is dictionary with keys:
 - **command** - type of progress.
    - DOWNLOADED - submission successfuly fetched from fileserver
    - FAILED - something bad happened and job was not executed at all
    - UPLOADED - results are uploaded to fileserver
    - STARTED - evaluation of tasks started
    - ENDED - evaluation of tasks is finnished
    - ABORTED - evaluation of job encountered internal error, job will be rescheduled to another worker
    - FINISHED - whole execution is finished and worker ready for another job execution
    - TASK - task state changed - see below
 - **task_id** - id of currently evaluated task. Present only if **command** is "TASK".
 - **task_state** - state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED".
    - COMPLETED - task was successfully executed without any error, subsequent task will be executed
    - FAILED - task ended up with some error, subsequent task will be skipped
    - SKIPPED - some of the previous dependencies failed to execute, so this task wont be executed at all
 ### Frontend - Browser communication
 **TODO:**
 ## Assignments