Updated Assignments overview (markdown)

9 years ago · 7b9f4b2e93
parent fd60596575
commit 7b9f4b2e93
1 changed files with 40 additions and 19 deletions
--- a/Assignments-overview.md
+++ b/Assignments-overview.md
@ -42,7 +42,7 @@ Inside this directory temporary files for job execution are created:
 - **${DEFAULT}/submission/${WORKER_ID}/${JOB_ID}** - decompressed submission is stored here
 - **${DEFAULT}/eval/${WORKER_ID}/${JOB_ID}** - this directory is accessible in job configuration using variables and all execution should happen here
 - **${DEFAULT}/temp/${WORKER_ID}/${JOB_ID}** - directory where all sort of temporary files can be stored
- **${DEFAULT}/results/${WORKER_ID}/${JOB_ID}** - again accessible directory from job configuration which is used to store all files which will be upload on fileserver
+- **${DEFAULT}/results/${WORKER_ID}/${JOB_ID}** - again accessible directory from job configuration which is used to store all files which will be upload on fileserver, usually there will be only yaml result file and optionally log, every other file has to be copied here explicitly from job
 ### Configuration
 Configuration of the job which is passed to worker is generated from two parts:
@ -160,20 +160,43 @@ tasks:
 ...
 ```
-### Parameters And Results
+### Job variables
-The job may have some input parameters (e.g., default config for Isolate, global parameters for the tested processes, ...). Similarly, the job has some structured results -- for each task (where applicable), it gathers exit code and consumed time and memory.
+Because frontend does not know which worker gets the job, its necessary to be a little general in configuration file. This means that some worker specific things has to be transparent. Good example of this is directories, which can be placed whenever worker wants. In case of this variables were established.
-
+Of course there are some restrictions where variables can be used. Basically whenever filesystem paths can be used, variables can be used.
-These parameters are stored in global, structured parameter space. I would suggest something that would map easily on JSON, for instance -- i.e., something that supports structures (named collections), arrays (ordered collections), and basic numeric and string values.
+List of usable variables in job configuration:
-
+- **WORKER_ID** - integral identification of worker, unique on server
-Input parameters have two sources, some defaults are present in the configuration of the worker, another set is provided in the configuration of the job. These sets are merged, job config has a priority.
+- **JOB_ID** - identification of this job
-
+- **SOURCE_DIR** - directory where source codes of job are stored
-Parameters are only read by the tasks (they can be used in task parameters). Some simple syntax needs to be used for evaluation of parameter expressions -- e.g., ("${params.tests[1].memoryLimit}"). _Parameters should be stored in worker's global namespace. Task configuration can make references to this structure. Validity should be checked before executing first task from the job. In this structure is only writable section "results" - here are written achieved memory and time limits of each task. Whole structure is send to WebApp with all logs._
+- **EVAL_DIR** - evaluation directory which should point inside sandbox
- _**TODO:** analysis required -- how complex expressions do we really need_
+- **RESULT_DIR** - results from job can be copied here, but only with internal task
-
+- **TEMP_DIR** - general temp directory which is not dependent on operating system
-#### Example result file
+- **JUDGES_DIR** - directory in which judges are stored (outside sandbox)
-
+
 ## Results
 Results of tasks are sent back in YAML format compressed into archive. This archive can contain further files, such as job logging information and files which were explicitly copied into results directory.
 Results file contains job identification and results of individual tasks.
 ### Results items
 Mandatory items are bold, optional italic.
 - **job-id** - identification of job to which this results belongs
 - **results** - list of tasks results
  - **task-id** - unique identification of task in scope of this job
  - **status** - two states: OK, FAILED
  - _error_message_ - defined only in internal tasks on failure
  - _sandbox_results_ - if defined than this task was external and was run in sandbox
    - **exitcode** - integer which executed program gave on exit
    - **time** - time in seconds in which program exited
    - **wall-time** - wall time in seconds
    - **memory** - how much memory program used in kilobytes
    - **max-rss** - maximum resident set size used in kilobytes
    - **status** - two letter status code: OK, RE, SG, TO, XX
    - **exitsig** - description of exit signal
    - **killed** - boolean determining if program exited correctly or was killed
    - **message** - status message on failure
 ### Example result file
 ```
--- # only one document which contains job, aka. list of tasks and some general infos
+--- # only one document which contains list of results
 job-id: 5
 results:
 	- task-id: compile1
@ -186,7 +209,7 @@ results:
 		  max-rss: 50000
 		  status: RE  # two letter status code: OK, RE, SG, TO, XX
 		  exitsig: 1
-		  killed:
+		  killed: true
 		  message: "Time limit exceeded"  # status message
 	- task-id: eval1
 	  status: FAILED
@ -198,10 +221,8 @@ results:
 ```
 ### Logs
-There is one general (mandatory) log, where the job progress is logged. Each row corresponds to one task and it holds only the task name, task exit code (or some other indication whether the task ended OK or not), and optionally things like consumed memory and time.
+During execution tasks can use only one shared log. There is no use for multiple logs which will be used in all tasks, because of pretty small amount of information which is loged. Log is in default disabled and can be enabled in job configuration, then all logged actions in tasks will be visible here.
-
+After execution is log packed and sent back to fileserver where can be further processed.
 Other logs (stored in log dir) can be created. They do not have to be declared in advance, but they are specified at each task (if its output is going to a log) and created once some task produces an output that goes to the log.
 ## Case study