|
|
|
@ -974,29 +974,29 @@ HTTP(S).
|
|
|
|
|
|
|
|
|
|
### Job Configuration File
|
|
|
|
|
|
|
|
|
|
As discussed previously in 'Evaluation Unit Executed by ReCodEx' evaluation unit
|
|
|
|
|
will have form of job which will contain small tasks representing one piece of
|
|
|
|
|
work executed by worker. This implies jobs have to be somehow given from
|
|
|
|
|
frontend to backend. The best option for this is to use some kind of
|
|
|
|
|
configuration file which will represent particular jobs. Mentioned configuration
|
|
|
|
|
file should be specified in frontend and in backend, namely worker, will be
|
|
|
|
|
parsed and executed.
|
|
|
|
|
As discussed previously in 'Evaluation Unit Executed by ReCodEx' an evaluation
|
|
|
|
|
unit have form of a job which contains small tasks representing one piece of
|
|
|
|
|
work executed by worker. This implies that jobs have to be passed from the
|
|
|
|
|
frontend to the backend. The best option for this is to use some kind of
|
|
|
|
|
configuration file which represents job details. The configuration file should
|
|
|
|
|
be specified in the frontend and in the backend, namely worker, will be parsed
|
|
|
|
|
and executed.
|
|
|
|
|
|
|
|
|
|
There are many formats which can be used for configuration representation. The
|
|
|
|
|
ones which make sense are:
|
|
|
|
|
|
|
|
|
|
- *XML* -- is broadly used general markup language which is flavoured with DTD
|
|
|
|
|
definition which can express and check XML file structure, so it does not have
|
|
|
|
|
to be checked within application. But XML with its tags can be sometimes quite
|
|
|
|
|
'chatty' and extensive which does not have to be desirable. And overally XML
|
|
|
|
|
with all its features and properties can be a bit heavy-weight.
|
|
|
|
|
- *JSON* -- is notation which was developed to represent javascript objects. As
|
|
|
|
|
considered ones are:
|
|
|
|
|
|
|
|
|
|
- *XML* -- broadly used general markup language which is flavoured with document
|
|
|
|
|
type definition (DTD) which can express and check XML file structure, so it
|
|
|
|
|
does not have to be checked within application. But XML with its tags can be
|
|
|
|
|
sometimes quite 'chatty' and extensive which is not desirable. And overally
|
|
|
|
|
XML with all its features and properties can be a bit heavy-weight.
|
|
|
|
|
- *JSON* -- a notation which was developed to represent javascript objects. As
|
|
|
|
|
such it is quite simple, there can be expressed only: key-value structures,
|
|
|
|
|
arrays and primitive values. Structure and hierarchy of data is solved by
|
|
|
|
|
braces and brackets.
|
|
|
|
|
- *INI* -- is very simple configuration format which is able to represents only
|
|
|
|
|
key-value structures which can be grouped into sections. Which is not enough
|
|
|
|
|
to represent job and its tasks hierarchy.
|
|
|
|
|
- *INI* -- very simple configuration format which is able to represents only
|
|
|
|
|
key-value structures which can be grouped into sections. This is not enough
|
|
|
|
|
to represent a job and its tasks hierarchy.
|
|
|
|
|
- *YAML* -- format which is very similar to JSON with its capabilities. But with
|
|
|
|
|
small difference in structure and hirarchy of configuration which is solved
|
|
|
|
|
not with braces but with indentation. This means that YAML is easily readable
|
|
|
|
@ -1010,16 +1010,15 @@ existing parsers for most of the programming languages and it is easy enough to
|
|
|
|
|
learn and understand. Another choice which make sense is JSON but at the end
|
|
|
|
|
YAML seemed to be better.
|
|
|
|
|
|
|
|
|
|
Job configuration as it was implemented and designed is described in 'Job
|
|
|
|
|
configuration' appendix where list of all task types is present alongside with
|
|
|
|
|
whole configuration structure and much more.
|
|
|
|
|
Job configuration including design and implementation notes is described in 'Job
|
|
|
|
|
configuration' appendix.
|
|
|
|
|
|
|
|
|
|
#### Task Types
|
|
|
|
|
|
|
|
|
|
From the low-level point of view there are only two types of tasks in the job.
|
|
|
|
|
First ones are doing some internal operation which should work on all platforms
|
|
|
|
|
or operating systems same way. Second type of tasks are external ones which are
|
|
|
|
|
executing external binary.
|
|
|
|
|
or operating systems the same way. Second type of tasks are external ones which
|
|
|
|
|
are executing external binary.
|
|
|
|
|
|
|
|
|
|
Internal tasks should handle at least these operations:
|
|
|
|
|
|
|
|
|
@ -1033,25 +1032,25 @@ implemented.
|
|
|
|
|
|
|
|
|
|
External tasks executing external binary should be optionally runnable in
|
|
|
|
|
sandbox. But for security sake there is no reason to execute them outside of
|
|
|
|
|
sandbox. So all external tasks are executed within sandbox which should be
|
|
|
|
|
general and configurable. Configuration options for sandboxes will be called
|
|
|
|
|
limits and there can be specified for example time or memory limits.
|
|
|
|
|
sandbox. So all external tasks are executed within a general a configurable
|
|
|
|
|
sandbox. Configuration options for sandboxes will be called limits and there can
|
|
|
|
|
be specified for example time or memory limits.
|
|
|
|
|
|
|
|
|
|
#### Configuration File Content
|
|
|
|
|
|
|
|
|
|
Content of configuration file can be divided in two parts, first concerns about
|
|
|
|
|
job in general and its metadata, second one relates to tasks and their
|
|
|
|
|
specification.
|
|
|
|
|
Content of the configuration file can be divided in two parts, first concerns
|
|
|
|
|
about the job in general and its metadata, second one relates to the tasks and
|
|
|
|
|
their specification.
|
|
|
|
|
|
|
|
|
|
There is not much to express in general job metadata. There can be
|
|
|
|
|
identification of job and some general options, like enable/disable logging. But
|
|
|
|
|
really necessary item is address of fileserver from where supplementary files
|
|
|
|
|
should be downloaded. This option is crucial because there can be more
|
|
|
|
|
fileservers and worker have no other way how to figure out where the files might
|
|
|
|
|
be.
|
|
|
|
|
|
|
|
|
|
More interesting situation is about metadata of tasks. From the initial analysis
|
|
|
|
|
of evaluation unit and its structure there can be derived at least these
|
|
|
|
|
identification of the job and some general options, like enable/disable logging.
|
|
|
|
|
But really necessary item is address of the fileserver from where supplementary
|
|
|
|
|
files are downloaded. This option is crucial because there can be more
|
|
|
|
|
fileservers and the worker have no other way how to figure out where the files
|
|
|
|
|
might be.
|
|
|
|
|
|
|
|
|
|
More interesting situation is about the metadata of tasks. From the initial
|
|
|
|
|
analysis of evaluation unit and its structure there are derived at least these
|
|
|
|
|
generally needed items:
|
|
|
|
|
|
|
|
|
|
- *task identification* -- identificator used at least for specifying
|
|
|
|
@ -1078,22 +1077,22 @@ exclusively related to sandboxing and limitation:
|
|
|
|
|
#### Supplementary Files
|
|
|
|
|
|
|
|
|
|
Interesting problem arise with supplementary files (e.g., inputs, sample
|
|
|
|
|
outputs). There are two approaches which can be observed. Supplementary files
|
|
|
|
|
can be downloaded either on the start of the execution or during execution.
|
|
|
|
|
outputs). There are two main ways which can be observed. Supplementary files can
|
|
|
|
|
be downloaded either on the start of the execution or during the execution.
|
|
|
|
|
|
|
|
|
|
If the files are downloaded at the beginning, execution does not really started
|
|
|
|
|
at this point and thus if there are problems with network, worker will find it
|
|
|
|
|
right away and can abort execution without executing single task. Slight
|
|
|
|
|
problems can arise if some of the files needs to have same name (e.g. solution
|
|
|
|
|
assumes that input is `input.txt`), in this scenario downloaded files cannot be
|
|
|
|
|
renamed at the beginning but during execution which is somehow impractical and
|
|
|
|
|
not easily observed by the authors of job configurations.
|
|
|
|
|
If the files are downloaded at the beginning, the execution does not really
|
|
|
|
|
started at this point and thus if there are problems with network, worker will
|
|
|
|
|
find it right away and can abort execution without executing a single task.
|
|
|
|
|
Slight problems can arise if some of the files needs to have specific name (e.g.
|
|
|
|
|
solution assumes that the input is `input.txt`). In this scenario the downloaded
|
|
|
|
|
files cannot be renamed at the beginning but during the execution which is
|
|
|
|
|
impractical and not easily observed by the authors of job configurations.
|
|
|
|
|
|
|
|
|
|
Second solution of this problem when files are downloaded on the fly has quite
|
|
|
|
|
opposite problem, if there are problems with network, worker will find it during
|
|
|
|
|
execution when for instance almost whole execution is done, this is also not
|
|
|
|
|
opposite problem. If there are problems with network, worker will find it during
|
|
|
|
|
execution when for instance almost whole execution is done. This is also not
|
|
|
|
|
ideal solution if we care about burnt hardware resources. On the other hand
|
|
|
|
|
using this approach users have quite advanced control of execution flow and know
|
|
|
|
|
using this approach users have advanced control of the execution flow and know
|
|
|
|
|
what files exactly are available during execution which is from users
|
|
|
|
|
perspective probably more appealing then the first solution. Based on that,
|
|
|
|
|
downloading of supplementary files using 'fetch' tasks during execution was
|
|
|
|
|