Move caching to separate cleaner chapter

8 years ago · 69e3a17429
parent efc8276fc5
commit 69e3a17429
1 changed files with 84 additions and 78 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -1216,84 +1216,6 @@ perspective probably more appealing then the first solution. Based on that,
 downloading of supplementary files using 'fetch' tasks during execution was
 chosen and implemented.

-#### Caching mechanism
-
-Worker can use caching mechanism based on files from fileserver under one
-condition, provided files has to have unique name. If uniqueness is fulfilled
-then precious bandwidth can be saved using cache. This means there has to be
-system which can download file, store it in cache and after some time of
-inactivity delete it. Because there can be multiple worker instances on some
-particular server it is not efficient to have this system in every worker on its
-own. So it is feasible to have this feature somehow shared among all workers on
-the same machine. Solution may be again having separate service connected
-through network with workers which would provide such functionality but this
-would mean component with another communication for the purpose where it is not
-exactly needed. But mainly it would be single-failure component if it would stop
-working it is quite problem. So there was chosen another solution which assumes
-worker has access to specified cache folder, to this folder worker can download
-supplementary files and copy them from here. This means every worker has the
-possibility to maintain downloads to cache, but what is worker not able to
-properly do is deletion of unused files after some time. For that single-purpose
-component is introduced which is called 'cleaner'. It is simple script executed
-within cron which is able to delete files which were unused for some time.
-Together with worker fetching feature cleaner completes machine specific caching
-system.
-
-Cleaner as mentioned is simple script which is executed regularly as cron job.
-If there is caching system like it was introduced in paragraph above there are
-little possibilities how cleaner should be implemented. On various filesystems
-there is usually support for two  particular timestamps, `last access time` and
-`last modification time`. Files in cache are once downloaded and then just
-copied, this means that last modification time is set only once on creation of
-file and last access time should be set every time on copy. This imply last
-access time is what is needed here. But last modification time is widely used by
-operating systems, on the other hand last access time is not by default. More on
-this subject can be found
-[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
-For proper cleaner functionality filesystem which is used by worker for caching
-has to have last access time for files enabled.
-
-Having cleaner as separated component and caching itself handled in worker is
-kind of blurry and is not clearly observable that it works without any race
-conditions. The goal here is not to have system without races but to have system
-which can recover from them. Implementation of caching system is based upon
-atomic operations of underlying filesystem. Follows description of one possible
-robust implementation. First start with worker implementation:
-
- worker discovers fetch task which should download supplementary file
- worker takes name of file and tries to copy it from cache folder to its
-  working folder
-	- if successful then last access time should be rewritten (by filesystem
-	  itself) and whole operation is done
-	- if not successful then file has to be downloaded
-		- file is downloaded from fileserver to working folder
-		- downloaded file is then copied to cache
-
-Previous implementation is only within worker, cleaner can anytime intervene and
-delete files. Implementation in cleaner follows:
-
- cleaner on its start stores current reference timestamp which will be used for
-  comparison and load configuration values of caching folder and maximal file
-  age
- there is a loop going through all files and even directories in specified
-  cache folder
-	- last access time of file or folder is detected
-	- last access time is subtracted from reference timestamp into
-	  difference
-	- difference is compared against specified maximal file age, if
-	  difference is greater, file or folder is deleted
-
-Previous description implies that there is gap between detection of last access
-time and deleting file within cleaner. In the gap there can be worker which will
-access file and the file is anyway deleted but this is fine, file is deleted but
-worker has it copied. Another problem can be with two workers downloading the
-same file, but this is also not a problem file is firstly downloaded to working
-folder and after that copied to cache. And even if something else unexpectedly
-fails and because of that fetch task will fail during execution even that should
-be fine. Because fetch tasks should have 'inner' task type which implies that
-fail in this task will stop all execution and job will be reassigned to another
-worker. It should be like the last salvation in case everything else goes wrong.
-
 ### Sandboxing

 There are numerous ways how to approach sandboxing on different platforms,
@ -1393,6 +1315,90 @@ possible to express the logic in ~100 SLOC and also provides means to run the
 fileserver as a standalone service (without a web server), which is useful for
 development.

+### Cleaner
+
+Worker can use caching mechanism based on files from fileserver under one
+condition, provided files has to have unique name. This means there has to be
+system which can download file, store it in cache and after some time of
+inactivity delete it. Because there can be multiple worker instances on some
+particular server it is not efficient to have this system in every worker on its
+own. So it is feasible to have this feature somehow shared among all workers on
+the same machine.
+
+Solution may be again having separate service connected through network with
+workers which would provide such functionality, but this would mean component
+with another communication for the purpose, where it is not exactly needed. But
+mainly it would be single-failure component. If it would stop working then it is
+quite a problem.
+
+So there was chosen another solution which assumes worker has access to
+specified cache folder. In there folder worker can download supplementary files
+and copy them from here. This means every worker has the possibility to maintain
+downloads to cache, but what is worker not able to properly do, is deletion of
+unused files after some time.
+
+#### Architecture
+
+For that functionality single-purpose component is introduced which is called
+'cleaner'. It is simple script executed within cron which is able to delete
+files which were unused for some time. Together with worker fetching feature
+cleaner completes particular server specific caching system.
+
+Cleaner as mentioned is simple script which is executed regularly as a cron job.
+If there is caching system like it was introduced in paragraph above there are
+little possibilities how cleaner should be implemented. On various filesystems
+there is usually support for two  particular timestamps, `last access time` and
+`last modification time`. Files in cache are once downloaded and then just
+copied, this means that last modification time is set only once on creation of
+file and last access time should be set every time on copy. This imply last
+access time is what is needed here. But last modification time is widely used by
+operating systems, on the other hand last access time is not by default. More on
+this subject can be found
+[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
+For proper cleaner functionality filesystem which is used by worker for caching
+has to have last access time for files enabled.
+
+Having cleaner as separated component and caching itself handled in worker is
+kind of blurry and is not clearly observable that it works without any race
+conditions. The goal here is not to have system without races but to have system
+which can recover from them.
+
+#### Caching flow
+
+Follows description of one possible robust implementation. First start with
+worker implementation:
+
+- worker discovers fetch task which should download supplementary file
+- worker takes name of file and tries to copy it from cache folder to its
+  working folder
+	- if successful then last access time should be rewritten (by filesystem
+	  itself) and whole operation is done
+	- if not successful then file has to be downloaded
+		- file is downloaded from fileserver to working folder
+		- downloaded file is then copied to cache
+
+Previous implementation is only within worker, cleaner can anytime intervene and
+delete files. Implementation in cleaner follows:
+
+- cleaner on its start stores current reference timestamp which will be used for
+  comparison and load configuration values of caching folder and maximal file
+  age
+- there is a loop going through all files and even directories in specified
+  cache folder
+    - if difference between last access time and reference timestamp is greater
+      than specified maximal file age, then file or folder is deleted
+
+Previous description implies that there is gap between detection of last access
+time and deleting file within cleaner. In the gap there can be worker which will
+access file and the file is anyway deleted but this is fine, file is deleted but
+worker has it copied. Another problem can be with two workers downloading the
+same file, but this is also not a problem file is firstly downloaded to working
+folder and after that copied to cache. And even if something else unexpectedly
+fails and because of that fetch task will fail during execution even that should
+be fine. Because fetch tasks should have 'inner' task type which implies that
+fail in this task will stop all execution and job will be reassigned to another
+worker. It should be like the last salvation in case everything else goes wrong.
+
 ### Monitor

 Users want to view real time evaluation progress of their solution. It can be