From cde0dbb161df6925c9ca396a81575ff7483ea105 Mon Sep 17 00:00:00 2001 From: Petr Stefan Date: Sun, 29 Jan 2017 23:42:35 +0100 Subject: [PATCH] Analysis job config update --- Rewritten-docs.md | 97 +++++++++++++++++++++++------------------------ 1 file changed, 48 insertions(+), 49 deletions(-) diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 653f3b0..fae2b51 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -974,29 +974,29 @@ HTTP(S). ### Job Configuration File -As discussed previously in 'Evaluation Unit Executed by ReCodEx' evaluation unit -will have form of job which will contain small tasks representing one piece of -work executed by worker. This implies jobs have to be somehow given from -frontend to backend. The best option for this is to use some kind of -configuration file which will represent particular jobs. Mentioned configuration -file should be specified in frontend and in backend, namely worker, will be -parsed and executed. +As discussed previously in 'Evaluation Unit Executed by ReCodEx' an evaluation +unit have form of a job which contains small tasks representing one piece of +work executed by worker. This implies that jobs have to be passed from the +frontend to the backend. The best option for this is to use some kind of +configuration file which represents job details. The configuration file should +be specified in the frontend and in the backend, namely worker, will be parsed +and executed. There are many formats which can be used for configuration representation. The -ones which make sense are: - -- *XML* -- is broadly used general markup language which is flavoured with DTD - definition which can express and check XML file structure, so it does not have - to be checked within application. But XML with its tags can be sometimes quite - 'chatty' and extensive which does not have to be desirable. And overally XML - with all its features and properties can be a bit heavy-weight. -- *JSON* -- is notation which was developed to represent javascript objects. As +considered ones are: + +- *XML* -- broadly used general markup language which is flavoured with document + type definition (DTD) which can express and check XML file structure, so it + does not have to be checked within application. But XML with its tags can be + sometimes quite 'chatty' and extensive which is not desirable. And overally + XML with all its features and properties can be a bit heavy-weight. +- *JSON* -- a notation which was developed to represent javascript objects. As such it is quite simple, there can be expressed only: key-value structures, arrays and primitive values. Structure and hierarchy of data is solved by braces and brackets. -- *INI* -- is very simple configuration format which is able to represents only - key-value structures which can be grouped into sections. Which is not enough - to represent job and its tasks hierarchy. +- *INI* -- very simple configuration format which is able to represents only + key-value structures which can be grouped into sections. This is not enough + to represent a job and its tasks hierarchy. - *YAML* -- format which is very similar to JSON with its capabilities. But with small difference in structure and hirarchy of configuration which is solved not with braces but with indentation. This means that YAML is easily readable @@ -1010,16 +1010,15 @@ existing parsers for most of the programming languages and it is easy enough to learn and understand. Another choice which make sense is JSON but at the end YAML seemed to be better. -Job configuration as it was implemented and designed is described in 'Job -configuration' appendix where list of all task types is present alongside with -whole configuration structure and much more. +Job configuration including design and implementation notes is described in 'Job +configuration' appendix. #### Task Types From the low-level point of view there are only two types of tasks in the job. First ones are doing some internal operation which should work on all platforms -or operating systems same way. Second type of tasks are external ones which are -executing external binary. +or operating systems the same way. Second type of tasks are external ones which +are executing external binary. Internal tasks should handle at least these operations: @@ -1033,25 +1032,25 @@ implemented. External tasks executing external binary should be optionally runnable in sandbox. But for security sake there is no reason to execute them outside of -sandbox. So all external tasks are executed within sandbox which should be -general and configurable. Configuration options for sandboxes will be called -limits and there can be specified for example time or memory limits. +sandbox. So all external tasks are executed within a general a configurable +sandbox. Configuration options for sandboxes will be called limits and there can +be specified for example time or memory limits. #### Configuration File Content -Content of configuration file can be divided in two parts, first concerns about -job in general and its metadata, second one relates to tasks and their -specification. +Content of the configuration file can be divided in two parts, first concerns +about the job in general and its metadata, second one relates to the tasks and +their specification. There is not much to express in general job metadata. There can be -identification of job and some general options, like enable/disable logging. But -really necessary item is address of fileserver from where supplementary files -should be downloaded. This option is crucial because there can be more -fileservers and worker have no other way how to figure out where the files might -be. - -More interesting situation is about metadata of tasks. From the initial analysis -of evaluation unit and its structure there can be derived at least these +identification of the job and some general options, like enable/disable logging. +But really necessary item is address of the fileserver from where supplementary +files are downloaded. This option is crucial because there can be more +fileservers and the worker have no other way how to figure out where the files +might be. + +More interesting situation is about the metadata of tasks. From the initial +analysis of evaluation unit and its structure there are derived at least these generally needed items: - *task identification* -- identificator used at least for specifying @@ -1078,22 +1077,22 @@ exclusively related to sandboxing and limitation: #### Supplementary Files Interesting problem arise with supplementary files (e.g., inputs, sample -outputs). There are two approaches which can be observed. Supplementary files -can be downloaded either on the start of the execution or during execution. +outputs). There are two main ways which can be observed. Supplementary files can +be downloaded either on the start of the execution or during the execution. -If the files are downloaded at the beginning, execution does not really started -at this point and thus if there are problems with network, worker will find it -right away and can abort execution without executing single task. Slight -problems can arise if some of the files needs to have same name (e.g. solution -assumes that input is `input.txt`), in this scenario downloaded files cannot be -renamed at the beginning but during execution which is somehow impractical and -not easily observed by the authors of job configurations. +If the files are downloaded at the beginning, the execution does not really +started at this point and thus if there are problems with network, worker will +find it right away and can abort execution without executing a single task. +Slight problems can arise if some of the files needs to have specific name (e.g. +solution assumes that the input is `input.txt`). In this scenario the downloaded +files cannot be renamed at the beginning but during the execution which is +impractical and not easily observed by the authors of job configurations. Second solution of this problem when files are downloaded on the fly has quite -opposite problem, if there are problems with network, worker will find it during -execution when for instance almost whole execution is done, this is also not +opposite problem. If there are problems with network, worker will find it during +execution when for instance almost whole execution is done. This is also not ideal solution if we care about burnt hardware resources. On the other hand -using this approach users have quite advanced control of execution flow and know +using this approach users have advanced control of the execution flow and know what files exactly are available during execution which is from users perspective probably more appealing then the first solution. Based on that, downloading of supplementary files using 'fetch' tasks during execution was