worker and broker

master
Martin Polanka 8 years ago
parent 1757b5fc17
commit 04a5de1459

@ -37,56 +37,37 @@ Broker gets information about new submission from Web API. At this point broker
Worker gets request from broker to evaluate particular submission. Next step is to evaluate given submission and upload results to fileserver. After this worker only send broker that submission was evaluated. More detailed description follows:
- "listening" thread gets multipart message from `broker` with command "eval"
- "listening" thread hand over whole message through `inproc` socket to "execution" thread
- "execution" thread now has to prepare all things and get ready for execution
- temporary folders names are initated (but not created) this includes folder with source files, folder with downloaded submission, temporary directory for all possible types of files and folder which will contain results from execution
- if some of the above stated folders is already existing, then it is deleted
- after successfull initiation submission archive is downloaded to created folder
- submission archive is decompressed into submission files folder
- all files from decompressed archive are copied into evaluation directory which can be used for execution in sandboxes
- all other folders which were not created are created just now
- it is time to build `job` from configuration
- job configuration file is located in evaluation directory if exists and is loaded using `yaml-cpp` library
- loaded configuration is now parsed into `job_metadata` structure which is handed over to `job` execution class itself
- `job` execution class will now initialize and construct particular `tasks` from `job_metadata` into task tree
- if there is some item which can use variables (e.g. binary path, cmd arguments, bound directories) it is done at this point
- all tasks from configuration are created and divided into external or internal tasks
- external tasks have to load limits from configuration for this workers hwgroup which was loaded from worker configuration
- if limits were not defined default worker limits are loaded
- internal tasks are just created without any further processing like external tasks
- after successfull creation of all tasks, they are connected into graph structure using dependencies
- next and last step of building `job` structure is to execute topological sorting and get queue of tasks which will be executed in order
- topological sorting take into account priority of tasks and sequential order from job configuration
- running all tasks in order follows
- after that results have to be obtained from all executed tasks and given into corresponding yaml file
- result yaml file alongside with content of result folder is sent back to fileserver in compressed form
- of course there has to be cleaning after whole evaluation which will deinitialize all needed variables and also delete all used temporary folders
- all of previous was in "execution" thread which now have to tell "listening" thread that execution is done
- this is done through multipart message "done" with packed job identification addressed to "listening" thread
- action of "listening" is now pretty straightforward "done" message is resent to `broker`
1. worker gets evaluation request from broker
2. worker now has to do some initialization of directories and some internal structures for new incoming evaluation
3. users submission archive from given address from evaluation request is downloaded and decompressed
4. job configuration file is located and parsed into internal job structure
5. tasks which are present in configuration are loaded into tree like structure with dependencies and are divided into internal or external ones
1. internal tasks are tasks which has no defined limits and contains some internal action
2. external tasks have defined limits in configuration and are executed in sandbox
6. last step of initializing of job is to topologically sort tasks and prepare queue with right execution order
7. after that topologically sorted execution queue is processed
1. if execution of `inner` task fails worker will immediatelly stop execution and send back to broker 'internal error during execution' message
2. execution of `execution` or `evaluation` task fails then worker stops and send back to broker 'execution failed' message
8. when execution successfully ends results are collected from all executed tasks and written into yaml file
9. results yaml file alongside with content of job result folder is sent back to fileserver in compressed form
10. of course there has to be some cleaning after whole evaluation which will mostly delete content of temporarily created folders
11. last step of worker is to send back to broker message that execution successfully ended
## Broker
Broker gets done message from worker and basically only mark submission as done in its internal structures. After that broker has to tell Web API that execution of particular job ended. More detailed description follows:
- broker gets "done" message from worker after successfull execution of job
- appropriate `worker` structure is found based on its identification
- some checks of invariants (current job identification, right amount of arguments) are executed
- job results arrived with status "OK"
- frontend is notified that job ended successfully
- deletion of current execution request in `worker` structure follows and appropriate worker is now considered free
- if worker execution queue is not empty than next waiting request is taken and given as current one
- after that only missing thing is to send that request to worker and loop back to worker execution
- if worker queue is empty then appropriate worker remains free and waiting for another execution request
- job results arrived with status "INTERNAL_ERROR"
- current request is retrieved and deleted from current worker
- request can be reassigned then it is assigned to another worker
- request was assigned too many times
- frontend is notified about it and request is thrown away
- job results arrived with status "FAILED"
- frontend is notified through REST API call that job with particular identification failed
- request is canceled and if there is some other waiting it is assigned to worker
1. broker gets message from worker that execution of job ended
2. job execution on worker was successful
1. Web API is notified that job ended successfully
2. if there are some waiting jobs for worker than the first one is sent for execution
3. job execution on worker ended with internal error
1. if job ends with internal error than it is possible to reassign it to another worker
2. broker keeps track of job reassignments and if number of them reach some predefined constant, job is declared as failed
3. suitable worker different that the original one is picked and evaluation request is sent to it
4. job execution on worker failed
1. Web API is notified that execution of this particular job failed
2. again if there is waiting job for execution on worker than it is sent for execution
## Web API

Loading…
Cancel
Save