From 69e3a174297e7f04297887cec335a92c6402e29d Mon Sep 17 00:00:00 2001 From: Martin Polanka Date: Sat, 21 Jan 2017 22:55:41 +0100 Subject: [PATCH] Move caching to separate cleaner chapter --- Rewritten-docs.md | 162 ++++++++++++++++++++++++---------------------- 1 file changed, 84 insertions(+), 78 deletions(-) diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 8f010f3..4080ac0 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -1216,84 +1216,6 @@ perspective probably more appealing then the first solution. Based on that, downloading of supplementary files using 'fetch' tasks during execution was chosen and implemented. -#### Caching mechanism - -Worker can use caching mechanism based on files from fileserver under one -condition, provided files has to have unique name. If uniqueness is fulfilled -then precious bandwidth can be saved using cache. This means there has to be -system which can download file, store it in cache and after some time of -inactivity delete it. Because there can be multiple worker instances on some -particular server it is not efficient to have this system in every worker on its -own. So it is feasible to have this feature somehow shared among all workers on -the same machine. Solution may be again having separate service connected -through network with workers which would provide such functionality but this -would mean component with another communication for the purpose where it is not -exactly needed. But mainly it would be single-failure component if it would stop -working it is quite problem. So there was chosen another solution which assumes -worker has access to specified cache folder, to this folder worker can download -supplementary files and copy them from here. This means every worker has the -possibility to maintain downloads to cache, but what is worker not able to -properly do is deletion of unused files after some time. For that single-purpose -component is introduced which is called 'cleaner'. It is simple script executed -within cron which is able to delete files which were unused for some time. -Together with worker fetching feature cleaner completes machine specific caching -system. - -Cleaner as mentioned is simple script which is executed regularly as cron job. -If there is caching system like it was introduced in paragraph above there are -little possibilities how cleaner should be implemented. On various filesystems -there is usually support for two particular timestamps, `last access time` and -`last modification time`. Files in cache are once downloaded and then just -copied, this means that last modification time is set only once on creation of -file and last access time should be set every time on copy. This imply last -access time is what is needed here. But last modification time is widely used by -operating systems, on the other hand last access time is not by default. More on -this subject can be found -[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime). -For proper cleaner functionality filesystem which is used by worker for caching -has to have last access time for files enabled. - -Having cleaner as separated component and caching itself handled in worker is -kind of blurry and is not clearly observable that it works without any race -conditions. The goal here is not to have system without races but to have system -which can recover from them. Implementation of caching system is based upon -atomic operations of underlying filesystem. Follows description of one possible -robust implementation. First start with worker implementation: - -- worker discovers fetch task which should download supplementary file -- worker takes name of file and tries to copy it from cache folder to its - working folder - - if successful then last access time should be rewritten (by filesystem - itself) and whole operation is done - - if not successful then file has to be downloaded - - file is downloaded from fileserver to working folder - - downloaded file is then copied to cache - -Previous implementation is only within worker, cleaner can anytime intervene and -delete files. Implementation in cleaner follows: - -- cleaner on its start stores current reference timestamp which will be used for - comparison and load configuration values of caching folder and maximal file - age -- there is a loop going through all files and even directories in specified - cache folder - - last access time of file or folder is detected - - last access time is subtracted from reference timestamp into - difference - - difference is compared against specified maximal file age, if - difference is greater, file or folder is deleted - -Previous description implies that there is gap between detection of last access -time and deleting file within cleaner. In the gap there can be worker which will -access file and the file is anyway deleted but this is fine, file is deleted but -worker has it copied. Another problem can be with two workers downloading the -same file, but this is also not a problem file is firstly downloaded to working -folder and after that copied to cache. And even if something else unexpectedly -fails and because of that fetch task will fail during execution even that should -be fine. Because fetch tasks should have 'inner' task type which implies that -fail in this task will stop all execution and job will be reassigned to another -worker. It should be like the last salvation in case everything else goes wrong. - ### Sandboxing There are numerous ways how to approach sandboxing on different platforms, @@ -1393,6 +1315,90 @@ possible to express the logic in ~100 SLOC and also provides means to run the fileserver as a standalone service (without a web server), which is useful for development. +### Cleaner + +Worker can use caching mechanism based on files from fileserver under one +condition, provided files has to have unique name. This means there has to be +system which can download file, store it in cache and after some time of +inactivity delete it. Because there can be multiple worker instances on some +particular server it is not efficient to have this system in every worker on its +own. So it is feasible to have this feature somehow shared among all workers on +the same machine. + +Solution may be again having separate service connected through network with +workers which would provide such functionality, but this would mean component +with another communication for the purpose, where it is not exactly needed. But +mainly it would be single-failure component. If it would stop working then it is +quite a problem. + +So there was chosen another solution which assumes worker has access to +specified cache folder. In there folder worker can download supplementary files +and copy them from here. This means every worker has the possibility to maintain +downloads to cache, but what is worker not able to properly do, is deletion of +unused files after some time. + +#### Architecture + +For that functionality single-purpose component is introduced which is called +'cleaner'. It is simple script executed within cron which is able to delete +files which were unused for some time. Together with worker fetching feature +cleaner completes particular server specific caching system. + +Cleaner as mentioned is simple script which is executed regularly as a cron job. +If there is caching system like it was introduced in paragraph above there are +little possibilities how cleaner should be implemented. On various filesystems +there is usually support for two particular timestamps, `last access time` and +`last modification time`. Files in cache are once downloaded and then just +copied, this means that last modification time is set only once on creation of +file and last access time should be set every time on copy. This imply last +access time is what is needed here. But last modification time is widely used by +operating systems, on the other hand last access time is not by default. More on +this subject can be found +[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime). +For proper cleaner functionality filesystem which is used by worker for caching +has to have last access time for files enabled. + +Having cleaner as separated component and caching itself handled in worker is +kind of blurry and is not clearly observable that it works without any race +conditions. The goal here is not to have system without races but to have system +which can recover from them. + +#### Caching flow + +Follows description of one possible robust implementation. First start with +worker implementation: + +- worker discovers fetch task which should download supplementary file +- worker takes name of file and tries to copy it from cache folder to its + working folder + - if successful then last access time should be rewritten (by filesystem + itself) and whole operation is done + - if not successful then file has to be downloaded + - file is downloaded from fileserver to working folder + - downloaded file is then copied to cache + +Previous implementation is only within worker, cleaner can anytime intervene and +delete files. Implementation in cleaner follows: + +- cleaner on its start stores current reference timestamp which will be used for + comparison and load configuration values of caching folder and maximal file + age +- there is a loop going through all files and even directories in specified + cache folder + - if difference between last access time and reference timestamp is greater + than specified maximal file age, then file or folder is deleted + +Previous description implies that there is gap between detection of last access +time and deleting file within cleaner. In the gap there can be worker which will +access file and the file is anyway deleted but this is fine, file is deleted but +worker has it copied. Another problem can be with two workers downloading the +same file, but this is also not a problem file is firstly downloaded to working +folder and after that copied to cache. And even if something else unexpectedly +fails and because of that fetch task will fail during execution even that should +be fine. Because fetch tasks should have 'inner' task type which implies that +fail in this task will stop all execution and job will be reassigned to another +worker. It should be like the last salvation in case everything else goes wrong. + ### Monitor Users want to view real time evaluation progress of their solution. It can be