Analysis job config update

master
Petr Stefan 8 years ago
parent a35975951a
commit cde0dbb161

@ -974,29 +974,29 @@ HTTP(S).
### Job Configuration File ### Job Configuration File
As discussed previously in 'Evaluation Unit Executed by ReCodEx' evaluation unit As discussed previously in 'Evaluation Unit Executed by ReCodEx' an evaluation
will have form of job which will contain small tasks representing one piece of unit have form of a job which contains small tasks representing one piece of
work executed by worker. This implies jobs have to be somehow given from work executed by worker. This implies that jobs have to be passed from the
frontend to backend. The best option for this is to use some kind of frontend to the backend. The best option for this is to use some kind of
configuration file which will represent particular jobs. Mentioned configuration configuration file which represents job details. The configuration file should
file should be specified in frontend and in backend, namely worker, will be be specified in the frontend and in the backend, namely worker, will be parsed
parsed and executed. and executed.
There are many formats which can be used for configuration representation. The There are many formats which can be used for configuration representation. The
ones which make sense are: considered ones are:
- *XML* -- is broadly used general markup language which is flavoured with DTD - *XML* -- broadly used general markup language which is flavoured with document
definition which can express and check XML file structure, so it does not have type definition (DTD) which can express and check XML file structure, so it
to be checked within application. But XML with its tags can be sometimes quite does not have to be checked within application. But XML with its tags can be
'chatty' and extensive which does not have to be desirable. And overally XML sometimes quite 'chatty' and extensive which is not desirable. And overally
with all its features and properties can be a bit heavy-weight. XML with all its features and properties can be a bit heavy-weight.
- *JSON* -- is notation which was developed to represent javascript objects. As - *JSON* -- a notation which was developed to represent javascript objects. As
such it is quite simple, there can be expressed only: key-value structures, such it is quite simple, there can be expressed only: key-value structures,
arrays and primitive values. Structure and hierarchy of data is solved by arrays and primitive values. Structure and hierarchy of data is solved by
braces and brackets. braces and brackets.
- *INI* -- is very simple configuration format which is able to represents only - *INI* -- very simple configuration format which is able to represents only
key-value structures which can be grouped into sections. Which is not enough key-value structures which can be grouped into sections. This is not enough
to represent job and its tasks hierarchy. to represent a job and its tasks hierarchy.
- *YAML* -- format which is very similar to JSON with its capabilities. But with - *YAML* -- format which is very similar to JSON with its capabilities. But with
small difference in structure and hirarchy of configuration which is solved small difference in structure and hirarchy of configuration which is solved
not with braces but with indentation. This means that YAML is easily readable not with braces but with indentation. This means that YAML is easily readable
@ -1010,16 +1010,15 @@ existing parsers for most of the programming languages and it is easy enough to
learn and understand. Another choice which make sense is JSON but at the end learn and understand. Another choice which make sense is JSON but at the end
YAML seemed to be better. YAML seemed to be better.
Job configuration as it was implemented and designed is described in 'Job Job configuration including design and implementation notes is described in 'Job
configuration' appendix where list of all task types is present alongside with configuration' appendix.
whole configuration structure and much more.
#### Task Types #### Task Types
From the low-level point of view there are only two types of tasks in the job. From the low-level point of view there are only two types of tasks in the job.
First ones are doing some internal operation which should work on all platforms First ones are doing some internal operation which should work on all platforms
or operating systems same way. Second type of tasks are external ones which are or operating systems the same way. Second type of tasks are external ones which
executing external binary. are executing external binary.
Internal tasks should handle at least these operations: Internal tasks should handle at least these operations:
@ -1033,25 +1032,25 @@ implemented.
External tasks executing external binary should be optionally runnable in External tasks executing external binary should be optionally runnable in
sandbox. But for security sake there is no reason to execute them outside of sandbox. But for security sake there is no reason to execute them outside of
sandbox. So all external tasks are executed within sandbox which should be sandbox. So all external tasks are executed within a general a configurable
general and configurable. Configuration options for sandboxes will be called sandbox. Configuration options for sandboxes will be called limits and there can
limits and there can be specified for example time or memory limits. be specified for example time or memory limits.
#### Configuration File Content #### Configuration File Content
Content of configuration file can be divided in two parts, first concerns about Content of the configuration file can be divided in two parts, first concerns
job in general and its metadata, second one relates to tasks and their about the job in general and its metadata, second one relates to the tasks and
specification. their specification.
There is not much to express in general job metadata. There can be There is not much to express in general job metadata. There can be
identification of job and some general options, like enable/disable logging. But identification of the job and some general options, like enable/disable logging.
really necessary item is address of fileserver from where supplementary files But really necessary item is address of the fileserver from where supplementary
should be downloaded. This option is crucial because there can be more files are downloaded. This option is crucial because there can be more
fileservers and worker have no other way how to figure out where the files might fileservers and the worker have no other way how to figure out where the files
be. might be.
More interesting situation is about metadata of tasks. From the initial analysis More interesting situation is about the metadata of tasks. From the initial
of evaluation unit and its structure there can be derived at least these analysis of evaluation unit and its structure there are derived at least these
generally needed items: generally needed items:
- *task identification* -- identificator used at least for specifying - *task identification* -- identificator used at least for specifying
@ -1078,22 +1077,22 @@ exclusively related to sandboxing and limitation:
#### Supplementary Files #### Supplementary Files
Interesting problem arise with supplementary files (e.g., inputs, sample Interesting problem arise with supplementary files (e.g., inputs, sample
outputs). There are two approaches which can be observed. Supplementary files outputs). There are two main ways which can be observed. Supplementary files can
can be downloaded either on the start of the execution or during execution. be downloaded either on the start of the execution or during the execution.
If the files are downloaded at the beginning, execution does not really started If the files are downloaded at the beginning, the execution does not really
at this point and thus if there are problems with network, worker will find it started at this point and thus if there are problems with network, worker will
right away and can abort execution without executing single task. Slight find it right away and can abort execution without executing a single task.
problems can arise if some of the files needs to have same name (e.g. solution Slight problems can arise if some of the files needs to have specific name (e.g.
assumes that input is `input.txt`), in this scenario downloaded files cannot be solution assumes that the input is `input.txt`). In this scenario the downloaded
renamed at the beginning but during execution which is somehow impractical and files cannot be renamed at the beginning but during the execution which is
not easily observed by the authors of job configurations. impractical and not easily observed by the authors of job configurations.
Second solution of this problem when files are downloaded on the fly has quite Second solution of this problem when files are downloaded on the fly has quite
opposite problem, if there are problems with network, worker will find it during opposite problem. If there are problems with network, worker will find it during
execution when for instance almost whole execution is done, this is also not execution when for instance almost whole execution is done. This is also not
ideal solution if we care about burnt hardware resources. On the other hand ideal solution if we care about burnt hardware resources. On the other hand
using this approach users have quite advanced control of execution flow and know using this approach users have advanced control of the execution flow and know
what files exactly are available during execution which is from users what files exactly are available during execution which is from users
perspective probably more appealing then the first solution. Based on that, perspective probably more appealing then the first solution. Based on that,
downloading of supplementary files using 'fetch' tasks during execution was downloading of supplementary files using 'fetch' tasks during execution was

Loading…
Cancel
Save