Cleaner reorg

8 years ago · f453a3978b
parent 6048f50293
commit f453a3978b
1 changed files with 42 additions and 26 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -1352,27 +1352,38 @@ cleaner completes particular server specific caching system.
 Cleaner as mentioned is simple script which is executed regularly as a cron job.
 If there is caching system like it was introduced in paragraph above there are
-little possibilities how cleaner should be implemented. On various filesystems
+little possibilities how cleaner should be implemented.
-there is usually support for two  particular timestamps, `last access time` and
+
-`last modification time`. Files in cache are once downloaded and then just
+On various filesystems there is usually support for two  particular timestamps,
-copied, this means that last modification time is set only once on creation of
+`last access time` and `last modification time`. Files in cache are once
-file and last access time should be set every time on copy. This imply last
+downloaded and then just copied, this means that last modification time is set
-access time is what is needed here. But last modification time is widely used by
+only once on creation of file and last access time should be set every time on
-operating systems, on the other hand last access time is not by default. More on
+copy. From this we can conclude that last access time is what is needed here.
-this subject can be found
+
-[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime).
+But unlike last modification time, last access time is not usually enabled on
-For proper cleaner functionality filesystem which is used by worker for caching
+conventional filesystems (more on this subject can be found
-has to have last access time for files enabled.
+[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime)).
 So if we choose to use last access time, filesystem used for cache folder has to
 have last access time for files enabled. Last access time was chosen for
 implementation in ReCodEx but this might change in further releases.
 However, there is another way, last modification time which is broadly supported
 can be used. But this solution is not automatic and worker would have to 'touch'
 cache files whenever they are accessed. This solution is maybe a bit better than
 the one with last access time and might be implemented in future releases.
 #### Caching flow
 Having cleaner as separated component and caching itself handled in worker is
-kind of blurry and is not clearly observable that it works without any race
+kind of blurry and is not clearly observable that it works without problems.
-conditions. The goal here is not to have system without races but to have system
+The goal is to have system which can recover from every kind of errors.
 which can recover from them.
-#### Caching flow
+Follows description of one possible implementation. This whole mechanism relies
 on worker ability to recover from internal fetch task failure. In case of error
 here job will be reassigned to another worker where problem hopefully does not
 arise.
-Follows description of one possible robust implementation. First start with
+First start with worker implementation:
 worker implementation:
 - worker discovers fetch task which should download supplementary file
 - worker takes name of file and tries to copy it from cache folder to its
@ -1380,8 +1391,8 @@ worker implementation:
 	- if successful then last access time should be rewritten (by filesystem
 	  itself) and whole operation is done
 	- if not successful then file has to be downloaded
-		- file is downloaded from fileserver to working folder
+		- file is downloaded from fileserver to working folder and then
-		- downloaded file is then copied to cache
+		  copied to cache
 Previous implementation is only within worker, cleaner can anytime intervene and
 delete files. Implementation in cleaner follows:
@ -1397,13 +1408,18 @@ delete files. Implementation in cleaner follows:
 Previous description implies that there is gap between detection of last access
 time and deleting file within cleaner. In the gap there can be worker which will
 access file and the file is anyway deleted but this is fine, file is deleted but
-worker has it copied. Another problem can be with two workers downloading the
+worker has it copied. If worker does not copy whole file or even do not start to
-same file, but this is also not a problem file is firstly downloaded to working
+copy it and the file is deleted then copy process will fail. This will cause
-folder and after that copied to cache. And even if something else unexpectedly
+internal task failure which will be handled by reassigning job to another
-fails and because of that fetch task will fail during execution even that should
+worker.
-be fine. Because fetch tasks should have 'inner' task type which implies that
+
-fail in this task will stop all execution and job will be reassigned to another
+Another problem can be with two workers downloading the same file, but this is
-worker. It should be like the last salvation in case everything else goes wrong.
+also not a problem, file is firstly downloaded to working folder and after that
 copied to cache.
 And even if something else unexpectedly fails and because of that fetch task
 will fail during execution, even that should be fine as mentioned previsously.
 This should be the last salvation in case everything else goes wrong.
 ### Monitor