Reorganization of implementation text

master
Teyras 8 years ago
parent 492bbd2ab2
commit 538959f2b0

@ -2629,38 +2629,46 @@ used.
## Fileserver ## Fileserver
Fileserver component provides shared storage between frontend and backend. It is The fileserver component provides a shared file storage between the frontend and
writtend in Python 3 using Flask web framework. Fileserver stores files in the backend. It is writtend in Python 3 using Flask web framework. Fileserver
configurable filesystem directory, provides file deduplication and HTTP access. stores files in configurable filesystem directory, provides file deduplication
To keep the stored data safe, fileserver is not visible from public internet. and HTTP access. To keep the stored data safe, the fileserver should not be
visible from public internet. Instead, it should be accessed indirectly through
the REST API.
### File deduplication ### File deduplication
File deduplication is designed as storing files under the hashes of their From our analysis of the requirements, it is certain we need to implement a
means of dealing with duplicate files.
File deduplication is implemented by storing files under the hashes of their
content. This procedure is done completely inside fileserver. Plain files are content. This procedure is done completely inside fileserver. Plain files are
uploaded into fileserver, hashed, saved and the new filename returned back to uploaded into fileserver, hashed, saved and the new filename is returned back to
the uploader. the uploader.
SHA1 is used as hashing function, because it is fast to compute and provides SHA1 is used as hashing function, because it is fast to compute and provides
better collision safety than MD5 hashing function. Files with the same hash are reasonable collision safety for non-cryptographic purposes. Files with the same
treated as the same, no additional checks for collisions are performed. However, hash are treated as the same, no additional checks for collisions are performed.
it is really unlikely to find one. However, it is really unlikely to find one. If SHA1 proves insufficient, it is
possible to change the hash function to something else, because the naming
strategy is fully contained in the fileserver (special care must be taken to
maintain backward compatibility).
### Storage structure ### Storage structure
Fileserver stores its data in following structure: Fileserver stores its data in following structure:
- `./submissions/<id>/` -- folder that contains files submitted by users - `./submissions/<id>/` -- folder that contains files submitted by users
(student's solutions to assignments). `<id>` is an identifier received from (student's solutions to assignments). `<id>` is an identifier received from
the REST API. the REST API.
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are - `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
created automatically when a submission is uploaded. `<id>` is an identifier created automatically when a submission is uploaded. `<id>` is an identifier
of the corresponding submission. of the corresponding submission.
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and - `./exercises/<subkey>/<key>` -- supplementary exercise files (e.g. test inputs
outputs). `<key>` is a hash of the file content (`sha1` is used) and and outputs). `<key>` is a hash of the file content (`sha1` is used) and
`<subkey>` is its first letter (this is an attempt to prevent creating a flat `<subkey>` is its first letter (this is an attempt to prevent creating a flat
directory structure). directory structure).
- `./results/<id>.zip` -- ZIP archive of results for submission with `<id>` - `./results/<id>.zip` -- ZIP archives of results for submission with `<id>`
identifier. identifier.
@ -2702,52 +2710,54 @@ processing a job.
### Capability identification ### Capability identification
There are possibly multiple worker in a ReCodEx instance and each one can run on There are possibly multiple worker instances in a ReCodEx installation and each
different computer or have installed different tools. To identify worker's one can run on different hardware, operating system, or have different tools
hardware capabilities is used concept of **hardware groups**. Every worker installed. To identify the hardware capabilities of a worker, we use the concept
belongs to exactly one group with set of additional properties called of **hardware groups**. Each worker belongs to exactly one group that specifies
**headers**. Together they help the broker to decide which worker is suitable the hardware and operating system on which the submitted programs will be run. A
for processing a job evaluation request. These information are sent to the worker also has a set of additional properties called **headers**. Together they
broker on worker startup. help the broker to decide which worker is suitable for processing a job
evaluation request. This information is sent to the broker on worker startup.
The hardware group is a string identifier used to group worker machines with
similar hardware configuration, for example "i7-4560-quad-ssd". The hardware The hardware group is a string identifier of the hardware configuration, for
groups and headers are configured by the administrator for each worker instance. example "i7-4560-quad-ssd-linux" configured by the administrator for each worker
If this is done correctly, performance measurements of a submission should yield instance. If this is done correctly, performance measurements of a submission
the same results on all computer from the same hardware group. Thanks to this should yield the same results on all computers from the same hardware group.
fact, we can use the same resource limits on every worker in a hardware group. Thanks to this fact, we can use the same resource limits on every worker in a
hardware group.
The headers are a set of key-value pairs that describe the worker capabilities. The headers are a set of key-value pairs that describe the worker capabilities.
For example, they can show which runtime environments are installed or whether For example, they can show which runtime environments are installed or whether
this worker measures time precisely. this worker measures time precisely. Headers are also configured manually by an
administrator.
### Running student submissions ### Running student submissions
Student submissions are executed inside sandboxing environment to prevent damage Student submissions are executed in a sandbox environment to prevent them from
of host system and also to restrict amount of used resources. Now only the damaging the host system and also to restrict the amount of used resources.
Isolate sandbox support is implemented in worker, but there is a possibility of Currently, only the Isolate sandbox support is implemented, but it is possible
easy extending list of supported sandboxes. to add support for another sandox.
Isolate is executed in separate Linux process created by `fork` and `exec`
system calls. Communication between processes is performed through unnamed pipe
with standard input and output descriptors redirection. To prevent Isolate
failure there is another safety guard -- whole sandbox is killed when it does
not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
allowed for the task. This formula worksi well both for short and long tasks,
but is not meant to be used unless there is ai really big trouble. Isolate
should allways end itself in time, so this additional safety should never be
used.
Sandbox in general has to be command line application taking parameters with Every sandbox, regardless of the concrete implementation, has to be a command
arguments, standard input or file. Outputs should be written to file or standard line application taking parameters with arguments, standard input or file.
output. There are no other requirements, worker design is very versatile and can Outputs should be written to a file or to the standard output. There are no
be adapted to different needs. other requirements, the design of the worker is very versatile and can be
adapted to different needs.
The sandbox part of the worker is the only one which is not portable, so The sandbox part of the worker is the only one which is not portable, so
conditional compilation is used to include only supported parts of the project. conditional compilation is used to include only supported parts of the project.
Isolate does not work on Windows environment, so also its invocation is done Isolate does not work on Windows environment, so also its invocation is done
through native calls of Linux OS (`fork`, `exec`). To disable compilation of through native calls of Linux OS (`fork`, `exec`). To disable compilation of
this part on Windows, guard `#ifndef _WIN32` is used around affected files. this part on Windows, the `#ifndef _WIN32` guard is used around affected files.
Isolate in particular is executed in a separate Linux process created by `fork`
and `exec` system calls. Communication between processes is performed through an
unnamed pipe with standard input and output descriptors redirection. To prevent
Isolate failure there is another safety guard -- whole sandbox is killed when it
does not end in `(time + 300) * 1.2` seconds where `time` is the original
maximum time allowed for the task. This formula works well both for short and
long tasks, but the timeout should never be reached if Isolate works properly --
it should always end itself in time.
### Directories and files ### Directories and files
@ -2771,27 +2781,28 @@ directory is configurable and can be the same for multiple worker instances.
### Judges ### Judges
ReCodEx provides a few initial judges programs. They are mostly adopted from ReCodEx provides a few initial judges programs. They are mostly adopted from
CodEx system and installed with worker component. Judging programs have to meet CodEx and installed automatically with the worker component. Judging programs
some requirements. Basic ones are inspired by standard `diff` application -- two have to meet some requirements. Basic ones are inspired by standard `diff`
mandatory positional parameters which have to be the files for comparison and application -- two mandatory positional parameters which have to be the files
exit code reflecting if the results is correct (0) of wrong (1). for comparison and exit code reflecting if the result is correct (0) of wrong
(1).
This interface lacks support for returning additional data by the judges, for This interface lacks support for returning additional data by the judges, for
example similarity of the two files calculated as Levenshtein's edit distance. example similarity of the two files calculated as Levenshtein's edit distance.
To allow passing these additional values an extended judge interface can be To allow passing these additional values an extended judge interface can be
implemented: implemented:
- Parameters: There are two mandatory positional parameters which has to be - Parameters: There are two mandatory positional parameters which have to be
files for comparision files for comparision
- Results: - Results:
- _comparison OK_ - _comparison OK_
- exitcode: 0 - exitcode: 0
- stdout: there is one line with a double value which should be set to - stdout: there is a single line with a double value which
1.0 should be 1.0
- _comparison BAD_ - _comparison BAD_
- exitcode: 1 - exitcode: 1
- stdout: there is one line with a double value which should be - stdout: there is a single line with a double value which
quality percentage of the two given files should be quality percentage of the judged file
- _error during execution_ - _error during execution_
- exitcode: 2 - exitcode: 2
- stderr: there should be description of error - stderr: there should be description of error
@ -2810,15 +2821,14 @@ comply most of possible use cases.
## Monitor ## Monitor
Monitor is optional part of the ReCodEx solution for reporting progress of job Monitor is an optional part of the ReCodEx solution for reporting progress of
evaluation back to users in the real time. It is written in Python, tested job evaluation back to users in the real time. It is written in Python, tested
versions are 3.4 and 3.5. Following dependencies are used: versions are 3.4 and 3.5. Following dependencies are used:
- zmq -- binding to ZeroMQ message framework - zmq -- binding to ZeroMQ message framework
- websockets -- framework for communication over WebSockets - websockets -- framework for communication over WebSockets
- asyncio -- library for fast asynchronous operations - asyncio -- library for fast asynchronous operations
- pyyaml -- parsing YAML configuration files - pyyaml -- parsing YAML configuration files
- argparse -- parsing command line arguments
There is just one monitor instance required per broker. Also, monitor has to be There is just one monitor instance required per broker. Also, monitor has to be
publicly visible (has to have public IP address or be behind public proxy publicly visible (has to have public IP address or be behind public proxy

Loading…
Cancel
Save