Reorganization of implementation text

8 years ago · 538959f2b0
parent 492bbd2ab2
commit 538959f2b0
1 changed files with 74 additions and 64 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -2629,22 +2629,30 @@ used.

 ## Fileserver

-Fileserver component provides shared storage between frontend and backend. It is
-writtend in Python 3 using Flask web framework. Fileserver stores files in
-configurable filesystem directory, provides file deduplication and HTTP access.
-To keep the stored data safe, fileserver is not visible from public internet.
+The fileserver component provides a shared file storage between the frontend and
+the backend. It is writtend in Python 3 using Flask web framework. Fileserver
+stores files in configurable filesystem directory, provides file deduplication
+and HTTP access. To keep the stored data safe, the fileserver should not be
+visible from public internet. Instead, it should be accessed indirectly through
+the REST API.

 ### File deduplication

-File deduplication is designed as storing files under the hashes of their
+From our analysis of the requirements, it is certain we need to implement a
+means of dealing with duplicate files.
+
+File deduplication is implemented by storing files under the hashes of their
 content. This procedure is done completely inside fileserver. Plain files are
-uploaded into fileserver, hashed, saved and the new filename returned back to
+uploaded into fileserver, hashed, saved and the new filename is returned back to
 the uploader.

 SHA1 is used as hashing function, because it is fast to compute and provides
-better collision safety than MD5 hashing function. Files with the same hash are
-treated as the same, no additional checks for collisions are performed. However,
-it is really unlikely to find one.
+reasonable collision safety for non-cryptographic purposes. Files with the same
+hash are treated as the same, no additional checks for collisions are performed.
+However, it is really unlikely to find one. If SHA1 proves insufficient, it is
+possible to change the hash function to something else, because the naming
+strategy is fully contained in the fileserver (special care must be taken to
+maintain backward compatibility).

 ### Storage structure

@ -2656,11 +2664,11 @@ Fileserver stores its data in following structure:
 - `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
  created automatically when a submission is uploaded. `<id>` is an identifier
  of the corresponding submission.
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and
-  outputs). `<key>` is a hash of the file content (`sha1` is used) and
+- `./exercises/<subkey>/<key>` -- supplementary exercise files (e.g. test inputs
+  and outputs). `<key>` is a hash of the file content (`sha1` is used) and
  `<subkey>` is its first letter (this is an attempt to prevent creating a flat
  directory structure).
- `./results/<id>.zip` -- ZIP archive of results for submission with `<id>`
+- `./results/<id>.zip` -- ZIP archives of results for submission with `<id>`
  identifier.


@ -2702,52 +2710,54 @@ processing a job.

 ### Capability identification

-There are possibly multiple worker in a ReCodEx instance and each one can run on
-different computer or have installed different tools. To identify worker's
-hardware capabilities is used concept of **hardware groups**. Every worker
-belongs to exactly one group with set of additional properties called
-**headers**. Together they help the broker to decide which worker is suitable
-for processing a job evaluation request. These information are sent to the
-broker on worker startup.
-
-The hardware group is a string identifier used to group worker machines with
-similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
-groups and headers are configured by the administrator for each worker instance.
-If this is done correctly, performance measurements of a submission should yield
-the same results on all computer from the same hardware group. Thanks to this
-fact, we can use the same resource limits on every worker in a hardware group.
+There are possibly multiple worker instances in a ReCodEx installation and each
+one can run on different hardware, operating system, or have different tools
+installed. To identify the hardware capabilities of a worker, we use the concept
+of **hardware groups**. Each worker belongs to exactly one group that specifies
+the hardware and operating system on which the submitted programs will be run. A
+worker also has a set of additional properties called **headers**. Together they
+help the broker to decide which worker is suitable for processing a job
+evaluation request. This information is sent to the broker on worker startup.
+
+The hardware group is a string identifier of the hardware configuration, for
+example "i7-4560-quad-ssd-linux" configured by the administrator for each worker
+instance. If this is done correctly, performance measurements of a submission
+should yield the same results on all computers from the same hardware group.
+Thanks to this fact, we can use the same resource limits on every worker in a
+hardware group.

 The headers are a set of key-value pairs that describe the worker capabilities.
 For example, they can show which runtime environments are installed or whether
-this worker measures time precisely.
+this worker measures time precisely. Headers are also configured manually by an
+administrator.

 ### Running student submissions

-Student submissions are executed inside sandboxing environment to prevent damage
-of host system and also to restrict amount of used resources. Now only the
-Isolate sandbox support is implemented in worker, but there is a possibility of
-easy extending list of supported sandboxes.
-
-Isolate is executed in separate Linux process created by `fork` and `exec`
-system calls. Communication between processes is performed through unnamed pipe
-with standard input and output descriptors redirection. To prevent Isolate
-failure there is another safety guard -- whole sandbox is killed when it does
-not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
-allowed for the task. This formula worksi well both for short and long tasks,
-but is not meant to be used unless there is ai really big trouble. Isolate
-should allways end itself in time, so this additional safety should never be
-used.
+Student submissions are executed in a sandbox environment to prevent them from
+damaging the host system and also to restrict the amount of used resources.
+Currently, only the Isolate sandbox support is implemented, but it is possible
+to add support for another sandox.

-Sandbox in general has to be command line application taking parameters with
-arguments, standard input or file. Outputs should be written to file or standard
-output. There are no other requirements, worker design is very versatile and can
-be adapted to different needs.
+Every sandbox, regardless of the concrete implementation, has to be a command
+line application taking parameters with arguments, standard input or file.
+Outputs should be written to a file or to the standard output. There are no
+other requirements, the design of the worker is very versatile and can be
+adapted to different needs.

 The sandbox part of the worker is the only one which is not portable, so
 conditional compilation is used to include only supported parts of the project.
 Isolate does not work on Windows environment, so also its invocation is done
 through native calls of Linux OS (`fork`, `exec`). To disable compilation of
-this part on Windows, guard `#ifndef _WIN32` is used around affected files.
+this part on Windows, the `#ifndef _WIN32` guard is used around affected files.
+
+Isolate in particular is executed in a separate Linux process created by `fork`
+and `exec` system calls. Communication between processes is performed through an
+unnamed pipe with standard input and output descriptors redirection. To prevent
+Isolate failure there is another safety guard -- whole sandbox is killed when it
+does not end in `(time + 300) * 1.2` seconds where `time` is the original
+maximum time allowed for the task. This formula works well both for short and
+long tasks, but the timeout should never be reached if Isolate works properly --
+it should always end itself in time.

 ### Directories and files

@ -2771,27 +2781,28 @@ directory is configurable and can be the same for multiple worker instances.
 ### Judges

 ReCodEx provides a few initial judges programs. They are mostly adopted from
-CodEx system and installed with worker component. Judging programs have to meet
-some requirements. Basic ones are inspired by standard `diff` application -- two
-mandatory positional parameters which have to be the files for comparison and
-exit code reflecting if the results is correct (0) of wrong (1).
+CodEx and installed automatically with the worker component. Judging programs
+have to meet some requirements. Basic ones are inspired by standard `diff`
+application -- two mandatory positional parameters which have to be the files
+for comparison and exit code reflecting if the result is correct (0) of wrong
+(1).

 This interface lacks support for returning additional data by the judges, for
 example similarity of the two files calculated as Levenshtein's edit distance.
 To allow passing these additional values an extended judge interface can be
 implemented:

- Parameters: There are two mandatory positional parameters which has to be
+- Parameters: There are two mandatory positional parameters which have to be
  files for comparision
 - Results:
    - _comparison OK_
 	- exitcode: 0
-		- stdout: there is one line with a double value which should be set to
-		  1.0
+		- stdout: there is a single line with a double value which
+		  should be 1.0
    - _comparison BAD_
 	- exitcode: 1
-		- stdout: there is one line with a double value which should be
-		  quality percentage of the two given files
+		- stdout: there is a single line with a double value which
+		  should be quality percentage of the judged file
 	- _error during execution_
 		- exitcode: 2
 		- stderr: there should be description of error
@ -2810,15 +2821,14 @@ comply most of possible use cases.

 ## Monitor

-Monitor is optional part of the ReCodEx solution for reporting progress of job
-evaluation back to users in the real time. It is written in Python, tested
+Monitor is an optional part of the ReCodEx solution for reporting progress of
+job evaluation back to users in the real time. It is written in Python, tested
 versions are 3.4 and 3.5. Following dependencies are used:

 - zmq -- binding to ZeroMQ message framework
 - websockets -- framework for communication over WebSockets
 - asyncio -- library for fast asynchronous operations
 - pyyaml -- parsing YAML configuration files
- argparse -- parsing command line arguments

 There is just one monitor instance required per broker. Also, monitor has to be
 publicly visible (has to have public IP address or be behind public proxy