Impl analysis - worker

master
Martin Polanka 8 years ago
parent d9e26b9326
commit ad0dc4a105

@ -381,13 +381,15 @@ From the scalable point of view there are two necessary components, the one whic
#### Worker
@todo: worker and its internal structure, why there are two threads and what they can do, mention also multiplatform approach during development
Worker is component which is supposed to execute incoming jobs from broker. As such worker should work and support wide range of different infrastructures and maybe even platforms/operating systems. Support of at least two main operating systems is desirable and should be implemented. Worker as a service does not have to be much complicated, but a bit of complex behaviour is needed. Mentioned complexity is almost exclusively concerned about robust communication with broker which has to be regularly checked. Ping mechanism is usually used for this in all kind of projects. This means that worker should be able to send ping messages even during execution. So worker has to be divided into two separate parts, the one which will handle communication with broker and the another which will execute jobs. The easiest solution is to have these parts in separate threads which somehow tightly communicates with each other. For inner process commucation there can be used numerous technologies, from shared memory to condition variables or some kind of in-process messages. Already used library ZeroMQ is possible to provide in-process messages working on the same principles as network communication which is quite handy and solves problems with threads synchronization and such.
@todo: execution of job on worker, how it is done, what steps are necessary and general for all jobs
At this point we have worker with two internal parts listening one and execution one. Implementation of first one is quite straighforward and clear. So lets discuss what should be happening in execution subsystem...
@todo: complete paragraph above... execution of job on worker, how it is done, what steps are necessary and general for all jobs
@todo: how can inputs and outputs (and supplementary files) be handled (they can be downloaded on start of execution, or during...)
@todo: caching of supplementary files (link to hashing above), describe cleaner and why it is a separate component
As described in fileserver section stored supplementary files have special filenames which reflects hashes of their content. As such there are no duplicates stored in fileserver. Worker can use feature too and caches these files for some while and saves precious bandwith. This means there has to be system which can download file, store it in cache and after some time of inactivity delete it. Because there can be multiple worker instances on some particular server it is not efficient to have this system in every worker on its own. So it is feasible to have this feature somehow shared among all workers on the same machine. Solution would be again having separate service connected through network with workers which would provide such functionality but this would component with another communication for the purpose where it is not exactly needed. Implemented solution assume worker has access to specified cache folder, to this folder worker can download supplementary files and copy them from here. This means every worker has the possibility to maintain downloads to cache, but what is worker not able to properly do is deletion of unused files after some time. For that single-purpose component is introduced which is called 'cleaner'. It is simple script executed within cron which is able to delete files which were unused for some time. Together with worker fetching feature cleaner completes machine specific caching system.
@todo: describe a bit more cleaner functionality and that it is safe and there are no unrecoverable races

Loading…
Cancel
Save