Reorganization of implementation text

8 years ago · 538959f2b0
parent 492bbd2ab2
commit 538959f2b0
1 changed files with 74 additions and 64 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -2629,38 +2629,46 @@ used.
 ## Fileserver
-Fileserver component provides shared storage between frontend and backend. It is
+The fileserver component provides a shared file storage between the frontend and
-writtend in Python 3 using Flask web framework. Fileserver stores files in
+the backend. It is writtend in Python 3 using Flask web framework. Fileserver
-configurable filesystem directory, provides file deduplication and HTTP access.
+stores files in configurable filesystem directory, provides file deduplication
-To keep the stored data safe, fileserver is not visible from public internet.
+and HTTP access. To keep the stored data safe, the fileserver should not be
 visible from public internet. Instead, it should be accessed indirectly through
 the REST API.
 ### File deduplication
-File deduplication is designed as storing files under the hashes of their
+From our analysis of the requirements, it is certain we need to implement a
 means of dealing with duplicate files.
 File deduplication is implemented by storing files under the hashes of their
 content. This procedure is done completely inside fileserver. Plain files are
-uploaded into fileserver, hashed, saved and the new filename returned back to
+uploaded into fileserver, hashed, saved and the new filename is returned back to
 the uploader.
 SHA1 is used as hashing function, because it is fast to compute and provides
-better collision safety than MD5 hashing function. Files with the same hash are
+reasonable collision safety for non-cryptographic purposes. Files with the same
-treated as the same, no additional checks for collisions are performed. However,
+hash are treated as the same, no additional checks for collisions are performed.
-it is really unlikely to find one.
+However, it is really unlikely to find one. If SHA1 proves insufficient, it is
 possible to change the hash function to something else, because the naming
 strategy is fully contained in the fileserver (special care must be taken to
 maintain backward compatibility).
 ### Storage structure
 Fileserver stores its data in following structure:
- `./submissions/<id>/` -- folder that contains files submitted by users 
+- `./submissions/<id>/` -- folder that contains files submitted by users
-  (student's solutions to assignments). `<id>` is an identifier received from 
+  (student's solutions to assignments). `<id>` is an identifier received from
  the REST API.
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are 
+- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
-  created automatically when a submission is uploaded. `<id>` is an identifier 
+  created automatically when a submission is uploaded. `<id>` is an identifier
  of the corresponding submission.
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and
+- `./exercises/<subkey>/<key>` -- supplementary exercise files (e.g. test inputs
-  outputs). `<key>` is a hash of the file content (`sha1` is used) and
+  and outputs). `<key>` is a hash of the file content (`sha1` is used) and
  `<subkey>` is its first letter (this is an attempt to prevent creating a flat
  directory structure).
- `./results/<id>.zip` -- ZIP archive of results for submission with `<id>`
+- `./results/<id>.zip` -- ZIP archives of results for submission with `<id>`
  identifier.
@ -2702,52 +2710,54 @@ processing a job.
 ### Capability identification
-There are possibly multiple worker in a ReCodEx instance and each one can run on
+There are possibly multiple worker instances in a ReCodEx installation and each
-different computer or have installed different tools. To identify worker's
+one can run on different hardware, operating system, or have different tools
-hardware capabilities is used concept of **hardware groups**. Every worker
+installed. To identify the hardware capabilities of a worker, we use the concept
-belongs to exactly one group with set of additional properties called
+of **hardware groups**. Each worker belongs to exactly one group that specifies
-**headers**. Together they help the broker to decide which worker is suitable
+the hardware and operating system on which the submitted programs will be run. A
-for processing a job evaluation request. These information are sent to the
+worker also has a set of additional properties called **headers**. Together they
-broker on worker startup.
+help the broker to decide which worker is suitable for processing a job
-
+evaluation request. This information is sent to the broker on worker startup.
-The hardware group is a string identifier used to group worker machines with
+
-similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
+The hardware group is a string identifier of the hardware configuration, for
-groups and headers are configured by the administrator for each worker instance.
+example "i7-4560-quad-ssd-linux" configured by the administrator for each worker
-If this is done correctly, performance measurements of a submission should yield
+instance. If this is done correctly, performance measurements of a submission
-the same results on all computer from the same hardware group. Thanks to this
+should yield the same results on all computers from the same hardware group.
-fact, we can use the same resource limits on every worker in a hardware group.
+Thanks to this fact, we can use the same resource limits on every worker in a
 hardware group.
 The headers are a set of key-value pairs that describe the worker capabilities.
 For example, they can show which runtime environments are installed or whether
-this worker measures time precisely.
+this worker measures time precisely. Headers are also configured manually by an
 administrator.
 ### Running student submissions
-Student submissions are executed inside sandboxing environment to prevent damage
+Student submissions are executed in a sandbox environment to prevent them from
-of host system and also to restrict amount of used resources. Now only the
+damaging the host system and also to restrict the amount of used resources.
-Isolate sandbox support is implemented in worker, but there is a possibility of
+Currently, only the Isolate sandbox support is implemented, but it is possible
-easy extending list of supported sandboxes.
+to add support for another sandox.
 Isolate is executed in separate Linux process created by `fork` and `exec`
 system calls. Communication between processes is performed through unnamed pipe
 with standard input and output descriptors redirection. To prevent Isolate
 failure there is another safety guard -- whole sandbox is killed when it does
 not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
 allowed for the task. This formula worksi well both for short and long tasks,
 but is not meant to be used unless there is ai really big trouble. Isolate
 should allways end itself in time, so this additional safety should never be
 used.
-Sandbox in general has to be command line application taking parameters with
+Every sandbox, regardless of the concrete implementation, has to be a command
-arguments, standard input or file. Outputs should be written to file or standard
+line application taking parameters with arguments, standard input or file.
-output. There are no other requirements, worker design is very versatile and can
+Outputs should be written to a file or to the standard output. There are no
-be adapted to different needs.
+other requirements, the design of the worker is very versatile and can be
 adapted to different needs.
 The sandbox part of the worker is the only one which is not portable, so
 conditional compilation is used to include only supported parts of the project.
 Isolate does not work on Windows environment, so also its invocation is done
 through native calls of Linux OS (`fork`, `exec`). To disable compilation of
-this part on Windows, guard `#ifndef _WIN32` is used around affected files.
+this part on Windows, the `#ifndef _WIN32` guard is used around affected files.
 Isolate in particular is executed in a separate Linux process created by `fork`
 and `exec` system calls. Communication between processes is performed through an
 unnamed pipe with standard input and output descriptors redirection. To prevent
 Isolate failure there is another safety guard -- whole sandbox is killed when it
 does not end in `(time + 300) * 1.2` seconds where `time` is the original
 maximum time allowed for the task. This formula works well both for short and
 long tasks, but the timeout should never be reached if Isolate works properly --
 it should always end itself in time.
 ### Directories and files
@ -2771,27 +2781,28 @@ directory is configurable and can be the same for multiple worker instances.
 ### Judges
 ReCodEx provides a few initial judges programs. They are mostly adopted from
-CodEx system and installed with worker component. Judging programs have to meet
+CodEx and installed automatically with the worker component. Judging programs
-some requirements. Basic ones are inspired by standard `diff` application -- two
+have to meet some requirements. Basic ones are inspired by standard `diff`
-mandatory positional parameters which have to be the files for comparison and
+application -- two mandatory positional parameters which have to be the files
-exit code reflecting if the results is correct (0) of wrong (1).
+for comparison and exit code reflecting if the result is correct (0) of wrong
 (1).
 This interface lacks support for returning additional data by the judges, for
 example similarity of the two files calculated as Levenshtein's edit distance.
 To allow passing these additional values an extended judge interface can be
 implemented:
- Parameters: There are two mandatory positional parameters which has to be
+- Parameters: There are two mandatory positional parameters which have to be
  files for comparision
 - Results:
    - _comparison OK_
-        - exitcode: 0
+	- exitcode: 0
-		- stdout: there is one line with a double value which should be set to
+		- stdout: there is a single line with a double value which
-		  1.0
+		  should be 1.0
    - _comparison BAD_
-        - exitcode: 1
+	- exitcode: 1
-		- stdout: there is one line with a double value which should be
+		- stdout: there is a single line with a double value which
-		  quality percentage of the two given files
+		  should be quality percentage of the judged file
 	- _error during execution_
 		- exitcode: 2
 		- stderr: there should be description of error
@ -2810,15 +2821,14 @@ comply most of possible use cases.
 ## Monitor
-Monitor is optional part of the ReCodEx solution for reporting progress of job
+Monitor is an optional part of the ReCodEx solution for reporting progress of
-evaluation back to users in the real time. It is written in Python, tested
+job evaluation back to users in the real time. It is written in Python, tested
 versions are 3.4 and 3.5. Following dependencies are used:
 - zmq -- binding to ZeroMQ message framework
 - websockets -- framework for communication over WebSockets
 - asyncio -- library for fast asynchronous operations
 - pyyaml -- parsing YAML configuration files
 - argparse -- parsing command line arguments
 There is just one monitor instance required per broker. Also, monitor has to be
 publicly visible (has to have public IP address or be behind public proxy