|
|
|
@ -1216,84 +1216,6 @@ perspective probably more appealing then the first solution. Based on that,
|
|
|
|
|
downloading of supplementary files using 'fetch' tasks during execution was
|
|
|
|
|
chosen and implemented.
|
|
|
|
|
|
|
|
|
|
#### Caching mechanism
|
|
|
|
|
|
|
|
|
|
Worker can use caching mechanism based on files from fileserver under one
|
|
|
|
|
condition, provided files has to have unique name. If uniqueness is fulfilled
|
|
|
|
|
then precious bandwidth can be saved using cache. This means there has to be
|
|
|
|
|
system which can download file, store it in cache and after some time of
|
|
|
|
|
inactivity delete it. Because there can be multiple worker instances on some
|
|
|
|
|
particular server it is not efficient to have this system in every worker on its
|
|
|
|
|
own. So it is feasible to have this feature somehow shared among all workers on
|
|
|
|
|
the same machine. Solution may be again having separate service connected
|
|
|
|
|
through network with workers which would provide such functionality but this
|
|
|
|
|
would mean component with another communication for the purpose where it is not
|
|
|
|
|
exactly needed. But mainly it would be single-failure component if it would stop
|
|
|
|
|
working it is quite problem. So there was chosen another solution which assumes
|
|
|
|
|
worker has access to specified cache folder, to this folder worker can download
|
|
|
|
|
supplementary files and copy them from here. This means every worker has the
|
|
|
|
|
possibility to maintain downloads to cache, but what is worker not able to
|
|
|
|
|
properly do is deletion of unused files after some time. For that single-purpose
|
|
|
|
|
component is introduced which is called 'cleaner'. It is simple script executed
|
|
|
|
|
within cron which is able to delete files which were unused for some time.
|
|
|
|
|
Together with worker fetching feature cleaner completes machine specific caching
|
|
|
|
|
system.
|
|
|
|
|
|
|
|
|
|
Cleaner as mentioned is simple script which is executed regularly as cron job.
|
|
|
|
|
If there is caching system like it was introduced in paragraph above there are
|
|
|
|
|
little possibilities how cleaner should be implemented. On various filesystems
|
|
|
|
|
there is usually support for two particular timestamps, `last access time` and
|
|
|
|
|
`last modification time`. Files in cache are once downloaded and then just
|
|
|
|
|
copied, this means that last modification time is set only once on creation of
|
|
|
|
|
file and last access time should be set every time on copy. This imply last
|
|
|
|
|
access time is what is needed here. But last modification time is widely used by
|
|
|
|
|
operating systems, on the other hand last access time is not by default. More on
|
|
|
|
|
this subject can be found
|
|
|
|
|
[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
|
|
|
|
|
For proper cleaner functionality filesystem which is used by worker for caching
|
|
|
|
|
has to have last access time for files enabled.
|
|
|
|
|
|
|
|
|
|
Having cleaner as separated component and caching itself handled in worker is
|
|
|
|
|
kind of blurry and is not clearly observable that it works without any race
|
|
|
|
|
conditions. The goal here is not to have system without races but to have system
|
|
|
|
|
which can recover from them. Implementation of caching system is based upon
|
|
|
|
|
atomic operations of underlying filesystem. Follows description of one possible
|
|
|
|
|
robust implementation. First start with worker implementation:
|
|
|
|
|
|
|
|
|
|
- worker discovers fetch task which should download supplementary file
|
|
|
|
|
- worker takes name of file and tries to copy it from cache folder to its
|
|
|
|
|
working folder
|
|
|
|
|
- if successful then last access time should be rewritten (by filesystem
|
|
|
|
|
itself) and whole operation is done
|
|
|
|
|
- if not successful then file has to be downloaded
|
|
|
|
|
- file is downloaded from fileserver to working folder
|
|
|
|
|
- downloaded file is then copied to cache
|
|
|
|
|
|
|
|
|
|
Previous implementation is only within worker, cleaner can anytime intervene and
|
|
|
|
|
delete files. Implementation in cleaner follows:
|
|
|
|
|
|
|
|
|
|
- cleaner on its start stores current reference timestamp which will be used for
|
|
|
|
|
comparison and load configuration values of caching folder and maximal file
|
|
|
|
|
age
|
|
|
|
|
- there is a loop going through all files and even directories in specified
|
|
|
|
|
cache folder
|
|
|
|
|
- last access time of file or folder is detected
|
|
|
|
|
- last access time is subtracted from reference timestamp into
|
|
|
|
|
difference
|
|
|
|
|
- difference is compared against specified maximal file age, if
|
|
|
|
|
difference is greater, file or folder is deleted
|
|
|
|
|
|
|
|
|
|
Previous description implies that there is gap between detection of last access
|
|
|
|
|
time and deleting file within cleaner. In the gap there can be worker which will
|
|
|
|
|
access file and the file is anyway deleted but this is fine, file is deleted but
|
|
|
|
|
worker has it copied. Another problem can be with two workers downloading the
|
|
|
|
|
same file, but this is also not a problem file is firstly downloaded to working
|
|
|
|
|
folder and after that copied to cache. And even if something else unexpectedly
|
|
|
|
|
fails and because of that fetch task will fail during execution even that should
|
|
|
|
|
be fine. Because fetch tasks should have 'inner' task type which implies that
|
|
|
|
|
fail in this task will stop all execution and job will be reassigned to another
|
|
|
|
|
worker. It should be like the last salvation in case everything else goes wrong.
|
|
|
|
|
|
|
|
|
|
### Sandboxing
|
|
|
|
|
|
|
|
|
|
There are numerous ways how to approach sandboxing on different platforms,
|
|
|
|
@ -1393,6 +1315,90 @@ possible to express the logic in ~100 SLOC and also provides means to run the
|
|
|
|
|
fileserver as a standalone service (without a web server), which is useful for
|
|
|
|
|
development.
|
|
|
|
|
|
|
|
|
|
### Cleaner
|
|
|
|
|
|
|
|
|
|
Worker can use caching mechanism based on files from fileserver under one
|
|
|
|
|
condition, provided files has to have unique name. This means there has to be
|
|
|
|
|
system which can download file, store it in cache and after some time of
|
|
|
|
|
inactivity delete it. Because there can be multiple worker instances on some
|
|
|
|
|
particular server it is not efficient to have this system in every worker on its
|
|
|
|
|
own. So it is feasible to have this feature somehow shared among all workers on
|
|
|
|
|
the same machine.
|
|
|
|
|
|
|
|
|
|
Solution may be again having separate service connected through network with
|
|
|
|
|
workers which would provide such functionality, but this would mean component
|
|
|
|
|
with another communication for the purpose, where it is not exactly needed. But
|
|
|
|
|
mainly it would be single-failure component. If it would stop working then it is
|
|
|
|
|
quite a problem.
|
|
|
|
|
|
|
|
|
|
So there was chosen another solution which assumes worker has access to
|
|
|
|
|
specified cache folder. In there folder worker can download supplementary files
|
|
|
|
|
and copy them from here. This means every worker has the possibility to maintain
|
|
|
|
|
downloads to cache, but what is worker not able to properly do, is deletion of
|
|
|
|
|
unused files after some time.
|
|
|
|
|
|
|
|
|
|
#### Architecture
|
|
|
|
|
|
|
|
|
|
For that functionality single-purpose component is introduced which is called
|
|
|
|
|
'cleaner'. It is simple script executed within cron which is able to delete
|
|
|
|
|
files which were unused for some time. Together with worker fetching feature
|
|
|
|
|
cleaner completes particular server specific caching system.
|
|
|
|
|
|
|
|
|
|
Cleaner as mentioned is simple script which is executed regularly as a cron job.
|
|
|
|
|
If there is caching system like it was introduced in paragraph above there are
|
|
|
|
|
little possibilities how cleaner should be implemented. On various filesystems
|
|
|
|
|
there is usually support for two particular timestamps, `last access time` and
|
|
|
|
|
`last modification time`. Files in cache are once downloaded and then just
|
|
|
|
|
copied, this means that last modification time is set only once on creation of
|
|
|
|
|
file and last access time should be set every time on copy. This imply last
|
|
|
|
|
access time is what is needed here. But last modification time is widely used by
|
|
|
|
|
operating systems, on the other hand last access time is not by default. More on
|
|
|
|
|
this subject can be found
|
|
|
|
|
[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
|
|
|
|
|
For proper cleaner functionality filesystem which is used by worker for caching
|
|
|
|
|
has to have last access time for files enabled.
|
|
|
|
|
|
|
|
|
|
Having cleaner as separated component and caching itself handled in worker is
|
|
|
|
|
kind of blurry and is not clearly observable that it works without any race
|
|
|
|
|
conditions. The goal here is not to have system without races but to have system
|
|
|
|
|
which can recover from them.
|
|
|
|
|
|
|
|
|
|
#### Caching flow
|
|
|
|
|
|
|
|
|
|
Follows description of one possible robust implementation. First start with
|
|
|
|
|
worker implementation:
|
|
|
|
|
|
|
|
|
|
- worker discovers fetch task which should download supplementary file
|
|
|
|
|
- worker takes name of file and tries to copy it from cache folder to its
|
|
|
|
|
working folder
|
|
|
|
|
- if successful then last access time should be rewritten (by filesystem
|
|
|
|
|
itself) and whole operation is done
|
|
|
|
|
- if not successful then file has to be downloaded
|
|
|
|
|
- file is downloaded from fileserver to working folder
|
|
|
|
|
- downloaded file is then copied to cache
|
|
|
|
|
|
|
|
|
|
Previous implementation is only within worker, cleaner can anytime intervene and
|
|
|
|
|
delete files. Implementation in cleaner follows:
|
|
|
|
|
|
|
|
|
|
- cleaner on its start stores current reference timestamp which will be used for
|
|
|
|
|
comparison and load configuration values of caching folder and maximal file
|
|
|
|
|
age
|
|
|
|
|
- there is a loop going through all files and even directories in specified
|
|
|
|
|
cache folder
|
|
|
|
|
- if difference between last access time and reference timestamp is greater
|
|
|
|
|
than specified maximal file age, then file or folder is deleted
|
|
|
|
|
|
|
|
|
|
Previous description implies that there is gap between detection of last access
|
|
|
|
|
time and deleting file within cleaner. In the gap there can be worker which will
|
|
|
|
|
access file and the file is anyway deleted but this is fine, file is deleted but
|
|
|
|
|
worker has it copied. Another problem can be with two workers downloading the
|
|
|
|
|
same file, but this is also not a problem file is firstly downloaded to working
|
|
|
|
|
folder and after that copied to cache. And even if something else unexpectedly
|
|
|
|
|
fails and because of that fetch task will fail during execution even that should
|
|
|
|
|
be fine. Because fetch tasks should have 'inner' task type which implies that
|
|
|
|
|
fail in this task will stop all execution and job will be reassigned to another
|
|
|
|
|
worker. It should be like the last salvation in case everything else goes wrong.
|
|
|
|
|
|
|
|
|
|
### Monitor
|
|
|
|
|
|
|
|
|
|
Users want to view real time evaluation progress of their solution. It can be
|
|
|
|
|