Analysis job config update

8 years ago · cde0dbb161
parent a35975951a
commit cde0dbb161
1 changed files with 48 additions and 49 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -974,29 +974,29 @@ HTTP(S).
 ### Job Configuration File
-As discussed previously in 'Evaluation Unit Executed by ReCodEx' evaluation unit
+As discussed previously in 'Evaluation Unit Executed by ReCodEx' an evaluation
-will have form of job which will contain small tasks representing one piece of
+unit have form of a job which contains small tasks representing one piece of
-work executed by worker. This implies jobs have to be somehow given from
+work executed by worker. This implies that jobs have to be passed from the
-frontend to backend. The best option for this is to use some kind of
+frontend to the backend. The best option for this is to use some kind of
-configuration file which will represent particular jobs. Mentioned configuration
+configuration file which represents job details. The configuration file should
-file should be specified in frontend and in backend, namely worker, will be
+be specified in the frontend and in the backend, namely worker, will be parsed
-parsed and executed.
+and executed.
 There are many formats which can be used for configuration representation. The
-ones which make sense are:
+considered ones are:
-
+
- *XML* -- is broadly used general markup language which is flavoured with DTD
+- *XML* -- broadly used general markup language which is flavoured with document
-  definition which can express and check XML file structure, so it does not have
+  type definition (DTD) which can express and check XML file structure, so it
-  to be checked within application. But XML with its tags can be sometimes quite
+  does not have to be checked within application. But XML with its tags can be
-  'chatty' and extensive which does not have to be desirable. And overally XML
+  sometimes quite 'chatty' and extensive which is not desirable. And overally
-  with all its features and properties can be a bit heavy-weight.
+  XML with all its features and properties can be a bit heavy-weight.
- *JSON* -- is notation which was developed to represent javascript objects. As
+- *JSON* -- a notation which was developed to represent javascript objects. As
  such it is quite simple, there can be expressed only: key-value structures,
  arrays and primitive values. Structure and hierarchy of data is solved by
  braces and brackets.
- *INI* -- is very simple configuration format which is able to represents only
+- *INI* -- very simple configuration format which is able to represents only
-  key-value structures which can be grouped into sections. Which is not enough
+  key-value structures which can be grouped into sections. This is not enough
-  to represent job and its tasks hierarchy.
+  to represent a job and its tasks hierarchy.
 - *YAML* -- format which is very similar to JSON with its capabilities. But with
  small difference in structure and hirarchy of configuration which is solved
  not with braces but with indentation. This means that YAML is easily readable
@ -1010,16 +1010,15 @@ existing parsers for most of the programming languages and it is easy enough to
 learn and understand. Another choice which make sense is JSON but at the end
 YAML seemed to be better.
-Job configuration as it was implemented and designed is described in 'Job
+Job configuration including design and implementation notes is described in 'Job
-configuration' appendix where list of all task types is present alongside with
+configuration' appendix.
 whole configuration structure and much more.
 #### Task Types
 From the low-level point of view there are only two types of tasks in the job.
 First ones are doing some internal operation which should work on all platforms
-or operating systems same way. Second type of tasks are external ones which are
+or operating systems the same way. Second type of tasks are external ones which
-executing external binary.
+are executing external binary.
 Internal tasks should handle at least these operations:
@ -1033,25 +1032,25 @@ implemented.
 External tasks executing external binary should be optionally runnable in
 sandbox. But for security sake there is no reason to execute them outside of
-sandbox. So all external tasks are executed within sandbox which should be
+sandbox. So all external tasks are executed within a general a configurable
-general and configurable. Configuration options for sandboxes will be called
+sandbox. Configuration options for sandboxes will be called limits and there can
-limits and there can be specified for example time or memory limits.
+be specified for example time or memory limits.
 #### Configuration File Content
-Content of configuration file can be divided in two parts, first concerns about
+Content of the configuration file can be divided in two parts, first concerns
-job in general and its metadata, second one relates to tasks and their
+about the job in general and its metadata, second one relates to the tasks and
-specification.
+their specification.
 There is not much to express in general job metadata. There can be
-identification of job and some general options, like enable/disable logging. But
+identification of the job and some general options, like enable/disable logging.
-really necessary item is address of fileserver from where supplementary files
+But really necessary item is address of the fileserver from where supplementary
-should be downloaded. This option is crucial because there can be more
+files are downloaded. This option is crucial because there can be more
-fileservers and worker have no other way how to figure out where the files might
+fileservers and the worker have no other way how to figure out where the files
-be.
+might be.
-
+
-More interesting situation is about metadata of tasks. From the initial analysis
+More interesting situation is about the metadata of tasks. From the initial
-of evaluation unit and its structure there can be derived at least these
+analysis of evaluation unit and its structure there are derived at least these
 generally needed items:
 - *task identification* -- identificator used at least for specifying
@ -1078,22 +1077,22 @@ exclusively related to sandboxing and limitation:
 #### Supplementary Files
 Interesting problem arise with supplementary files (e.g., inputs, sample
-outputs). There are two approaches which can be observed. Supplementary files
+outputs). There are two main ways which can be observed. Supplementary files can
-can be downloaded either on the start of the execution or during execution.
+be downloaded either on the start of the execution or during the execution.
-If the files are downloaded at the beginning, execution does not really started
+If the files are downloaded at the beginning, the execution does not really
-at this point and thus if there are problems with network, worker will find it
+started at this point and thus if there are problems with network, worker will
-right away and can abort execution without executing single task. Slight
+find it right away and can abort execution without executing a single task.
-problems can arise if some of the files needs to have same name (e.g. solution
+Slight problems can arise if some of the files needs to have specific name (e.g.
-assumes that input is `input.txt`), in this scenario downloaded files cannot be
+solution assumes that the input is `input.txt`). In this scenario the downloaded
-renamed at the beginning but during execution which is somehow impractical and
+files cannot be renamed at the beginning but during the execution which is
-not easily observed by the authors of job configurations.
+impractical and not easily observed by the authors of job configurations.
 Second solution of this problem when files are downloaded on the fly has quite
-opposite problem, if there are problems with network, worker will find it during
+opposite problem. If there are problems with network, worker will find it during
-execution when for instance almost whole execution is done, this is also not
+execution when for instance almost whole execution is done. This is also not
 ideal solution if we care about burnt hardware resources. On the other hand
-using this approach users have quite advanced control of execution flow and know
+using this approach users have advanced control of the execution flow and know
 what files exactly are available during execution which is from users
 perspective probably more appealing then the first solution. Based on that,
 downloading of supplementary files using 'fetch' tasks during execution was