@ -614,7 +615,7 @@ reasonable, to keep this piece of information alongside the tasks in job
configuration, so each task can have a label about its purpose. Unlabeled tasks
have an internal type _inner_. There are four categories of tasks:
- _initiation_ -- setting up the environment, compilling code, etc.; for users
- _initiation_ -- setting up the environment, compiling code, etc.; for users
failure means error in their sources which are not compatible with running it
with examination data
- _execution_ -- running the user code with examination data, must not exceed
@ -625,11 +626,11 @@ have an internal type _inner_. There are four categories of tasks:
- _inner_ -- no special meaning for frontend, technical tasks for fetching and
copying files, creating directories, etc.
Each job is composed of multiple tasks of these types which are semanticaly
grupped into tests. A test can represent one set of examination data for user
code. To mark the groupping, another task label can be used. Each test must have
Each job is composed of multiple tasks of these types which are semantically
grouped into tests. A test can represent one set of examination data for user
code. To mark the grouping, another task label can be used. Each test must have
exactly one _evaluation_ task (to show success or failure to users) and
arbitraty number of tasks with other types.
arbitrary number of tasks with other types.
## Implementation analysis
@ -666,7 +667,7 @@ messages even during execution. So worker has to be divided into two separate
parts, the one which will handle communication with broker and the another which
will execute jobs. The easiest solution is to have these parts in separate
threads which somehow tightly communicates with each other. For inner process
commucation there can be used numerous technologies, from shared memory to
communication there can be used numerous technologies, from shared memory to
condition variables or some kind of in-process messages. Already used library
ZeroMQ is possible to provide in-process messages working on the same principles
as network communication which is quite handy and solves problems with threads
@ -674,11 +675,22 @@ synchronization and such.
At this point we have worker with two internal parts listening one and execution one. Implementation of first one is quite straighforward and clear. So lets discuss what should be happening in execution subsystem. Jobs as work units can quite vary and do completely different things, that means configuration and worker has to be prepared for this kind of generality. Configuration and its solution was already discussed above, implementation in worker is then quite also quite straightforward. Worker has internal structures to which loads and which stores metadata given in configuration. Whole job is mapped to job metadata structure and tasks are mapped to either external ones or internal ones (internal commands has to be defined within worker), both are different whether they are executed in sandbox or as internal worker commands.
Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails. There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overally as failed. User should not be punished for bad configuration or some network error. This is where task types are usefull. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.
Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails. There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overly as failed. User should not be punished for bad configuration or some network error. This is where task types are useful. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.
There is also question about working directory or directories of job, which directories should be used and what for. There is one simple answer on this every job will have only one specified directory which will contain every file with which worker will work in the scope of whole job execution. This is of course nonsense there has to be some logical division. The least which must be done are two folders one for internal temporary files and second one for evaluation. The directory for temporary files is enough to comprehend all kind of internal work with filesystem but only one directory for whole evaluation is somehow not enough. Users solutions are downloaded in form of zip archives so why these should be present during execution or why the results and files which should be uploaded back to fileserver should be cherry picked from the one big directory? The answer is of course another logical division into subfolders. The solution which was chosen at the end is to have folders for downloaded archive, decompressed solution, evaluation directory in which user solution is executed and then folders for temporary files and for results and generally files which should be uploaded back to fileserver with solution results. Of course there has to be hierarchy which separate folders from different workers on the same machines. That is why paths to directories are in format: ${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID} where default means default working directory of whole worker, folder is particular directory for some purpose (archives, evaluation...). Mentioned division of job directories proved to be flexible and detailed enough, everything is in logical units and where it is supposed to be which means that searching through this system should be easy. In addition if solutions of users have access only to evaluation directory then they do not have access to unnecessary files which is better for overall security of whole ReCodEx.
As we discovered above worker has job directories but users who are writing and managing job configurations do not know where they are (on some particular worker) and how they can be accessed and written into configuration. For this kind of task we have to introduce some kind of marks or signs which will represent particular folders. Marks or signs can have form of some kind of special strings which can be called variables. These variables then can be used everywhere where filesystems paths are used within configuration file. This will solve problem with specific worker environment and specific hierarchy of directories. Final form of variables is ${...} where triple dot is textual description. This format was used because of special dolar sign character which cannot be used within filesystem path, braces are there only to border textual description of variable.
As we discovered above worker has job directories but users who are writing and
managing job configurations do not know where they are (on some particular
worker) and how they can be accessed and written into configuration. For this
kind of task we have to introduce some kind of marks or signs which will
represent particular folders. Marks or signs can have form of some kind of
special strings which can be called variables. These variables then can be used
everywhere where filesystems paths are used within configuration file. This will
solve problem with specific worker environment and specific hierarchy of
directories. Final form of variables is ${...} where triple dot is textual
description. This format was used because of special dollar sign character which
cannot be used within filesystem path, braces are there only to border textual
description of variable.
#### Evaluation
@ -691,7 +703,7 @@ Interesting problem is with supplementary files (inputs, sample outputs). There
As described in fileserver section stored supplementary files have special
filenames which reflects hashes of their content. As such there are no
duplicates stored in fileserver. Worker can use feature too and caches these
files for some while and saves precious bandwith. This means there has to be
files for some while and saves precious bandwidth. This means there has to be
system which can download file, store it in cache and after some time of
inactivity delete it. Because there can be multiple worker instances on some
particular server it is not efficient to have this system in every worker on its
@ -712,22 +724,34 @@ system.
Cleaner as mentioned is simple script which is executed regularly as cron job. If there is caching system like it was introduced in paragraph above there are little possibilities how cleaner should be implemented. On various filesystems there is usually support for two particular timestamps, `last access time` and `last modification time`. Files in cache are once downloaded and then just copied, this means that last modification time is set only once on creation of file and last access time should be set every time on copy. This imply last access time is what is needed here. But last modification time is widely used by operating systems, on the other hand last access time is not by default. More on this subject can be found [here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime). For proper cleaner functionality filesystem which is used by worker for caching has to have last access time for files enabled.
Having cleaner as separated component and caching itself handled in worker is kind of blury and is not clearly observable that it works without any race conditions. The goal here is not to have system without races but to have system which can recover from them. Implementation of caching system is based upon atomic operations of underlying filesystem. Follows description of one possible robust implementation. First start with worker implementation:
Having cleaner as separated component and caching itself handled in worker is
kind of blurry and is not clearly observable that it works without any race
conditions. The goal here is not to have system without races but to have system
which can recover from them. Implementation of caching system is based upon
atomic operations of underlying filesystem. Follows description of one possible
robust implementation. First start with worker implementation:
- worker discovers fetch task which should download supplementary file
- worker takes name of file and tries to copy it from cache folder to its working folder
- if successful then last access time should be rewritten (by filesystem itself) and whole operation is done
- worker takes name of file and tries to copy it from cache folder to its
working folder
- if successful then last access time should be rewritten (by filesystem
itself) and whole operation is done
- if not successful then file has to be downloaded
- file is downloaded from fileserver to working folder
- downloaded file is then copied to cache
Previous implementation is only within worker, cleaner can anytime intervene and delete files. Implementation in cleaner follows:
Previous implementation is only within worker, cleaner can anytime intervene and
delete files. Implementation in cleaner follows:
- cleaner on its start stores current reference timestamp which will be used for comparision and load configuration values of caching folder and maximal file age
- there is a loop going through all files and even directories in specified cache folder
- cleaner on its start stores current reference timestamp which will be used for
comparison and load configuration values of caching folder and maximal file
age
- there is a loop going through all files and even directories in specified
cache folder
- last access time of file or folder is detected
- last access time is subtracted from reference timestamp into difference
- difference is compared against specified maximal file age, if difference is greater, file or folder is deleted
- difference is compared against specified maximal file age, if difference
is greater, file or folder is deleted
Previous description implies that there is gap between detection of last access time and deleting file within cleaner. In the gap there can be worker which will access file and the file is anyway deleted but this is fine, file is deleted but worker has it copied. Another problem can be with two workers downloading the same file, but this is also not a problem file is firstly downloaded to working folder and after that copied to cache. And even if something else unexpectedly fails and because of that fetch task will fail during execution even that should be fine. Because fetch tasks should have 'inner' task type which implies that fail in this task will stop all execution and job will be reassigned to another worker. It should be like the last salvation in case everything else goes wrong.
@ -749,7 +773,7 @@ Previous description implies that there is gap between detection of last access
Users want to view real time evaluation progress of their solution. It can be
easily done with established double-sided connection stream, but it is hard to
achive with web technologies. HTTP protocol works differently on separate
achieve with web technologies. HTTP protocol works differently on separate
requests basis with no longterm connection. However, there is widely used
technology to solve this problem, WebSocket protocol.
@ -761,7 +785,7 @@ surface for possible attacks. With this in mind, there are two possible options:
- make separate component for progress messages
Each of the two possibilities has some pros and cons. The first one is good
beacuse there is no additional component and API is already publicly visible. On
because there is no additional component and API is already publicly visible. On
the other side, working with WebSocket protocol from PHP is not much pleasant
(but it is possible) and embedding this functionality into API is not
extendable. The second approach is better for future changing the protocol or