@ -37,56 +37,37 @@ Broker gets information about new submission from Web API. At this point broker
Worker gets request from broker to evaluate particular submission. Next step is to evaluate given submission and upload results to fileserver. After this worker only send broker that submission was evaluated. More detailed description follows:
Worker gets request from broker to evaluate particular submission. Next step is to evaluate given submission and upload results to fileserver. After this worker only send broker that submission was evaluated. More detailed description follows:
- "listening" thread gets multipart message from `broker` with command "eval"
1. worker gets evaluation request from broker
- "listening" thread hand over whole message through `inproc` socket to "execution" thread
2. worker now has to do some initialization of directories and some internal structures for new incoming evaluation
- "execution" thread now has to prepare all things and get ready for execution
3. users submission archive from given address from evaluation request is downloaded and decompressed
- temporary folders names are initated (but not created) this includes folder with source files, folder with downloaded submission, temporary directory for all possible types of files and folder which will contain results from execution
4. job configuration file is located and parsed into internal job structure
- if some of the above stated folders is already existing, then it is deleted
5. tasks which are present in configuration are loaded into tree like structure with dependencies and are divided into internal or external ones
- after successfull initiation submission archive is downloaded to created folder
1. internal tasks are tasks which has no defined limits and contains some internal action
- submission archive is decompressed into submission files folder
2. external tasks have defined limits in configuration and are executed in sandbox
- all files from decompressed archive are copied into evaluation directory which can be used for execution in sandboxes
6. last step of initializing of job is to topologically sort tasks and prepare queue with right execution order
- all other folders which were not created are created just now
7. after that topologically sorted execution queue is processed
- it is time to build `job` from configuration
1. if execution of `inner` task fails worker will immediatelly stop execution and send back to broker 'internal error during execution' message
- job configuration file is located in evaluation directory if exists and is loaded using `yaml-cpp` library
2. execution of `execution` or `evaluation` task fails then worker stops and send back to broker 'execution failed' message
- loaded configuration is now parsed into `job_metadata` structure which is handed over to `job` execution class itself
8. when execution successfully ends results are collected from all executed tasks and written into yaml file
- `job` execution class will now initialize and construct particular `tasks` from `job_metadata` into task tree
9. results yaml file alongside with content of job result folder is sent back to fileserver in compressed form
- if there is some item which can use variables (e.g. binary path, cmd arguments, bound directories) it is done at this point
10. of course there has to be some cleaning after whole evaluation which will mostly delete content of temporarily created folders
- all tasks from configuration are created and divided into external or internal tasks
11. last step of worker is to send back to broker message that execution successfully ended
- external tasks have to load limits from configuration for this workers hwgroup which was loaded from worker configuration
- if limits were not defined default worker limits are loaded
- internal tasks are just created without any further processing like external tasks
- after successfull creation of all tasks, they are connected into graph structure using dependencies
- next and last step of building `job` structure is to execute topological sorting and get queue of tasks which will be executed in order
- topological sorting take into account priority of tasks and sequential order from job configuration
- running all tasks in order follows
- after that results have to be obtained from all executed tasks and given into corresponding yaml file
- result yaml file alongside with content of result folder is sent back to fileserver in compressed form
- of course there has to be cleaning after whole evaluation which will deinitialize all needed variables and also delete all used temporary folders
- all of previous was in "execution" thread which now have to tell "listening" thread that execution is done
- this is done through multipart message "done" with packed job identification addressed to "listening" thread
- action of "listening" is now pretty straightforward "done" message is resent to `broker`
## Broker
## Broker
Broker gets done message from worker and basically only mark submission as done in its internal structures. After that broker has to tell Web API that execution of particular job ended. More detailed description follows:
Broker gets done message from worker and basically only mark submission as done in its internal structures. After that broker has to tell Web API that execution of particular job ended. More detailed description follows:
- broker gets "done" message from worker after successfull execution of job
1. broker gets message from worker that execution of job ended
- appropriate `worker` structure is found based on its identification
2. job execution on worker was successful
- some checks of invariants (current job identification, right amount of arguments) are executed
1. Web API is notified that job ended successfully
- job results arrived with status "OK"
2. if there are some waiting jobs for worker than the first one is sent for execution
- frontend is notified that job ended successfully
3. job execution on worker ended with internal error
- deletion of current execution request in `worker` structure follows and appropriate worker is now considered free
1. if job ends with internal error than it is possible to reassign it to another worker
- if worker execution queue is not empty than next waiting request is taken and given as current one
2. broker keeps track of job reassignments and if number of them reach some predefined constant, job is declared as failed
- after that only missing thing is to send that request to worker and loop back to worker execution
3. suitable worker different that the original one is picked and evaluation request is sent to it
- if worker queue is empty then appropriate worker remains free and waiting for another execution request
4. job execution on worker failed
- job results arrived with status "INTERNAL_ERROR"
1. Web API is notified that execution of this particular job failed
- current request is retrieved and deleted from current worker
2. again if there is waiting job for execution on worker than it is sent for execution
- request can be reassigned then it is assigned to another worker
- request was assigned too many times
- frontend is notified about it and request is thrown away
- job results arrived with status "FAILED"
- frontend is notified through REST API call that job with particular identification failed
- request is canceled and if there is some other waiting it is assigned to worker