Move caching to separate cleaner chapter

master
Martin Polanka 8 years ago
parent efc8276fc5
commit 69e3a17429

@ -1216,84 +1216,6 @@ perspective probably more appealing then the first solution. Based on that,
downloading of supplementary files using 'fetch' tasks during execution was downloading of supplementary files using 'fetch' tasks during execution was
chosen and implemented. chosen and implemented.
#### Caching mechanism
Worker can use caching mechanism based on files from fileserver under one
condition, provided files has to have unique name. If uniqueness is fulfilled
then precious bandwidth can be saved using cache. This means there has to be
system which can download file, store it in cache and after some time of
inactivity delete it. Because there can be multiple worker instances on some
particular server it is not efficient to have this system in every worker on its
own. So it is feasible to have this feature somehow shared among all workers on
the same machine. Solution may be again having separate service connected
through network with workers which would provide such functionality but this
would mean component with another communication for the purpose where it is not
exactly needed. But mainly it would be single-failure component if it would stop
working it is quite problem. So there was chosen another solution which assumes
worker has access to specified cache folder, to this folder worker can download
supplementary files and copy them from here. This means every worker has the
possibility to maintain downloads to cache, but what is worker not able to
properly do is deletion of unused files after some time. For that single-purpose
component is introduced which is called 'cleaner'. It is simple script executed
within cron which is able to delete files which were unused for some time.
Together with worker fetching feature cleaner completes machine specific caching
system.
Cleaner as mentioned is simple script which is executed regularly as cron job.
If there is caching system like it was introduced in paragraph above there are
little possibilities how cleaner should be implemented. On various filesystems
there is usually support for two particular timestamps, `last access time` and
`last modification time`. Files in cache are once downloaded and then just
copied, this means that last modification time is set only once on creation of
file and last access time should be set every time on copy. This imply last
access time is what is needed here. But last modification time is widely used by
operating systems, on the other hand last access time is not by default. More on
this subject can be found
[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
For proper cleaner functionality filesystem which is used by worker for caching
has to have last access time for files enabled.
Having cleaner as separated component and caching itself handled in worker is
kind of blurry and is not clearly observable that it works without any race
conditions. The goal here is not to have system without races but to have system
which can recover from them. Implementation of caching system is based upon
atomic operations of underlying filesystem. Follows description of one possible
robust implementation. First start with worker implementation:
- worker discovers fetch task which should download supplementary file
- worker takes name of file and tries to copy it from cache folder to its
working folder
- if successful then last access time should be rewritten (by filesystem
itself) and whole operation is done
- if not successful then file has to be downloaded
- file is downloaded from fileserver to working folder
- downloaded file is then copied to cache
Previous implementation is only within worker, cleaner can anytime intervene and
delete files. Implementation in cleaner follows:
- cleaner on its start stores current reference timestamp which will be used for
comparison and load configuration values of caching folder and maximal file
age
- there is a loop going through all files and even directories in specified
cache folder
- last access time of file or folder is detected
- last access time is subtracted from reference timestamp into
difference
- difference is compared against specified maximal file age, if
difference is greater, file or folder is deleted
Previous description implies that there is gap between detection of last access
time and deleting file within cleaner. In the gap there can be worker which will
access file and the file is anyway deleted but this is fine, file is deleted but
worker has it copied. Another problem can be with two workers downloading the
same file, but this is also not a problem file is firstly downloaded to working
folder and after that copied to cache. And even if something else unexpectedly
fails and because of that fetch task will fail during execution even that should
be fine. Because fetch tasks should have 'inner' task type which implies that
fail in this task will stop all execution and job will be reassigned to another
worker. It should be like the last salvation in case everything else goes wrong.
### Sandboxing ### Sandboxing
There are numerous ways how to approach sandboxing on different platforms, There are numerous ways how to approach sandboxing on different platforms,
@ -1393,6 +1315,90 @@ possible to express the logic in ~100 SLOC and also provides means to run the
fileserver as a standalone service (without a web server), which is useful for fileserver as a standalone service (without a web server), which is useful for
development. development.
### Cleaner
Worker can use caching mechanism based on files from fileserver under one
condition, provided files has to have unique name. This means there has to be
system which can download file, store it in cache and after some time of
inactivity delete it. Because there can be multiple worker instances on some
particular server it is not efficient to have this system in every worker on its
own. So it is feasible to have this feature somehow shared among all workers on
the same machine.
Solution may be again having separate service connected through network with
workers which would provide such functionality, but this would mean component
with another communication for the purpose, where it is not exactly needed. But
mainly it would be single-failure component. If it would stop working then it is
quite a problem.
So there was chosen another solution which assumes worker has access to
specified cache folder. In there folder worker can download supplementary files
and copy them from here. This means every worker has the possibility to maintain
downloads to cache, but what is worker not able to properly do, is deletion of
unused files after some time.
#### Architecture
For that functionality single-purpose component is introduced which is called
'cleaner'. It is simple script executed within cron which is able to delete
files which were unused for some time. Together with worker fetching feature
cleaner completes particular server specific caching system.
Cleaner as mentioned is simple script which is executed regularly as a cron job.
If there is caching system like it was introduced in paragraph above there are
little possibilities how cleaner should be implemented. On various filesystems
there is usually support for two particular timestamps, `last access time` and
`last modification time`. Files in cache are once downloaded and then just
copied, this means that last modification time is set only once on creation of
file and last access time should be set every time on copy. This imply last
access time is what is needed here. But last modification time is widely used by
operating systems, on the other hand last access time is not by default. More on
this subject can be found
[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
For proper cleaner functionality filesystem which is used by worker for caching
has to have last access time for files enabled.
Having cleaner as separated component and caching itself handled in worker is
kind of blurry and is not clearly observable that it works without any race
conditions. The goal here is not to have system without races but to have system
which can recover from them.
#### Caching flow
Follows description of one possible robust implementation. First start with
worker implementation:
- worker discovers fetch task which should download supplementary file
- worker takes name of file and tries to copy it from cache folder to its
working folder
- if successful then last access time should be rewritten (by filesystem
itself) and whole operation is done
- if not successful then file has to be downloaded
- file is downloaded from fileserver to working folder
- downloaded file is then copied to cache
Previous implementation is only within worker, cleaner can anytime intervene and
delete files. Implementation in cleaner follows:
- cleaner on its start stores current reference timestamp which will be used for
comparison and load configuration values of caching folder and maximal file
age
- there is a loop going through all files and even directories in specified
cache folder
- if difference between last access time and reference timestamp is greater
than specified maximal file age, then file or folder is deleted
Previous description implies that there is gap between detection of last access
time and deleting file within cleaner. In the gap there can be worker which will
access file and the file is anyway deleted but this is fine, file is deleted but
worker has it copied. Another problem can be with two workers downloading the
same file, but this is also not a problem file is firstly downloaded to working
folder and after that copied to cache. And even if something else unexpectedly
fails and because of that fetch task will fail during execution even that should
be fine. Because fetch tasks should have 'inner' task type which implies that
fail in this task will stop all execution and job will be reassigned to another
worker. It should be like the last salvation in case everything else goes wrong.
### Monitor ### Monitor
Users want to view real time evaluation progress of their solution. It can be Users want to view real time evaluation progress of their solution. It can be

Loading…
Cancel
Save