Reorganization of implementation text

master
Teyras 8 years ago
parent 492bbd2ab2
commit 538959f2b0

@ -2629,22 +2629,30 @@ used.
## Fileserver
Fileserver component provides shared storage between frontend and backend. It is
writtend in Python 3 using Flask web framework. Fileserver stores files in
configurable filesystem directory, provides file deduplication and HTTP access.
To keep the stored data safe, fileserver is not visible from public internet.
The fileserver component provides a shared file storage between the frontend and
the backend. It is writtend in Python 3 using Flask web framework. Fileserver
stores files in configurable filesystem directory, provides file deduplication
and HTTP access. To keep the stored data safe, the fileserver should not be
visible from public internet. Instead, it should be accessed indirectly through
the REST API.
### File deduplication
File deduplication is designed as storing files under the hashes of their
From our analysis of the requirements, it is certain we need to implement a
means of dealing with duplicate files.
File deduplication is implemented by storing files under the hashes of their
content. This procedure is done completely inside fileserver. Plain files are
uploaded into fileserver, hashed, saved and the new filename returned back to
uploaded into fileserver, hashed, saved and the new filename is returned back to
the uploader.
SHA1 is used as hashing function, because it is fast to compute and provides
better collision safety than MD5 hashing function. Files with the same hash are
treated as the same, no additional checks for collisions are performed. However,
it is really unlikely to find one.
reasonable collision safety for non-cryptographic purposes. Files with the same
hash are treated as the same, no additional checks for collisions are performed.
However, it is really unlikely to find one. If SHA1 proves insufficient, it is
possible to change the hash function to something else, because the naming
strategy is fully contained in the fileserver (special care must be taken to
maintain backward compatibility).
### Storage structure
@ -2656,11 +2664,11 @@ Fileserver stores its data in following structure:
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
created automatically when a submission is uploaded. `<id>` is an identifier
of the corresponding submission.
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and
outputs). `<key>` is a hash of the file content (`sha1` is used) and
- `./exercises/<subkey>/<key>` -- supplementary exercise files (e.g. test inputs
and outputs). `<key>` is a hash of the file content (`sha1` is used) and
`<subkey>` is its first letter (this is an attempt to prevent creating a flat
directory structure).
- `./results/<id>.zip` -- ZIP archive of results for submission with `<id>`
- `./results/<id>.zip` -- ZIP archives of results for submission with `<id>`
identifier.
@ -2702,52 +2710,54 @@ processing a job.
### Capability identification
There are possibly multiple worker in a ReCodEx instance and each one can run on
different computer or have installed different tools. To identify worker's
hardware capabilities is used concept of **hardware groups**. Every worker
belongs to exactly one group with set of additional properties called
**headers**. Together they help the broker to decide which worker is suitable
for processing a job evaluation request. These information are sent to the
broker on worker startup.
The hardware group is a string identifier used to group worker machines with
similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
groups and headers are configured by the administrator for each worker instance.
If this is done correctly, performance measurements of a submission should yield
the same results on all computer from the same hardware group. Thanks to this
fact, we can use the same resource limits on every worker in a hardware group.
There are possibly multiple worker instances in a ReCodEx installation and each
one can run on different hardware, operating system, or have different tools
installed. To identify the hardware capabilities of a worker, we use the concept
of **hardware groups**. Each worker belongs to exactly one group that specifies
the hardware and operating system on which the submitted programs will be run. A
worker also has a set of additional properties called **headers**. Together they
help the broker to decide which worker is suitable for processing a job
evaluation request. This information is sent to the broker on worker startup.
The hardware group is a string identifier of the hardware configuration, for
example "i7-4560-quad-ssd-linux" configured by the administrator for each worker
instance. If this is done correctly, performance measurements of a submission
should yield the same results on all computers from the same hardware group.
Thanks to this fact, we can use the same resource limits on every worker in a
hardware group.
The headers are a set of key-value pairs that describe the worker capabilities.
For example, they can show which runtime environments are installed or whether
this worker measures time precisely.
this worker measures time precisely. Headers are also configured manually by an
administrator.
### Running student submissions
Student submissions are executed inside sandboxing environment to prevent damage
of host system and also to restrict amount of used resources. Now only the
Isolate sandbox support is implemented in worker, but there is a possibility of
easy extending list of supported sandboxes.
Isolate is executed in separate Linux process created by `fork` and `exec`
system calls. Communication between processes is performed through unnamed pipe
with standard input and output descriptors redirection. To prevent Isolate
failure there is another safety guard -- whole sandbox is killed when it does
not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
allowed for the task. This formula worksi well both for short and long tasks,
but is not meant to be used unless there is ai really big trouble. Isolate
should allways end itself in time, so this additional safety should never be
used.
Student submissions are executed in a sandbox environment to prevent them from
damaging the host system and also to restrict the amount of used resources.
Currently, only the Isolate sandbox support is implemented, but it is possible
to add support for another sandox.
Sandbox in general has to be command line application taking parameters with
arguments, standard input or file. Outputs should be written to file or standard
output. There are no other requirements, worker design is very versatile and can
be adapted to different needs.
Every sandbox, regardless of the concrete implementation, has to be a command
line application taking parameters with arguments, standard input or file.
Outputs should be written to a file or to the standard output. There are no
other requirements, the design of the worker is very versatile and can be
adapted to different needs.
The sandbox part of the worker is the only one which is not portable, so
conditional compilation is used to include only supported parts of the project.
Isolate does not work on Windows environment, so also its invocation is done
through native calls of Linux OS (`fork`, `exec`). To disable compilation of
this part on Windows, guard `#ifndef _WIN32` is used around affected files.
this part on Windows, the `#ifndef _WIN32` guard is used around affected files.
Isolate in particular is executed in a separate Linux process created by `fork`
and `exec` system calls. Communication between processes is performed through an
unnamed pipe with standard input and output descriptors redirection. To prevent
Isolate failure there is another safety guard -- whole sandbox is killed when it
does not end in `(time + 300) * 1.2` seconds where `time` is the original
maximum time allowed for the task. This formula works well both for short and
long tasks, but the timeout should never be reached if Isolate works properly --
it should always end itself in time.
### Directories and files
@ -2771,27 +2781,28 @@ directory is configurable and can be the same for multiple worker instances.
### Judges
ReCodEx provides a few initial judges programs. They are mostly adopted from
CodEx system and installed with worker component. Judging programs have to meet
some requirements. Basic ones are inspired by standard `diff` application -- two
mandatory positional parameters which have to be the files for comparison and
exit code reflecting if the results is correct (0) of wrong (1).
CodEx and installed automatically with the worker component. Judging programs
have to meet some requirements. Basic ones are inspired by standard `diff`
application -- two mandatory positional parameters which have to be the files
for comparison and exit code reflecting if the result is correct (0) of wrong
(1).
This interface lacks support for returning additional data by the judges, for
example similarity of the two files calculated as Levenshtein's edit distance.
To allow passing these additional values an extended judge interface can be
implemented:
- Parameters: There are two mandatory positional parameters which has to be
- Parameters: There are two mandatory positional parameters which have to be
files for comparision
- Results:
- _comparison OK_
- exitcode: 0
- stdout: there is one line with a double value which should be set to
1.0
- stdout: there is a single line with a double value which
should be 1.0
- _comparison BAD_
- exitcode: 1
- stdout: there is one line with a double value which should be
quality percentage of the two given files
- stdout: there is a single line with a double value which
should be quality percentage of the judged file
- _error during execution_
- exitcode: 2
- stderr: there should be description of error
@ -2810,15 +2821,14 @@ comply most of possible use cases.
## Monitor
Monitor is optional part of the ReCodEx solution for reporting progress of job
evaluation back to users in the real time. It is written in Python, tested
Monitor is an optional part of the ReCodEx solution for reporting progress of
job evaluation back to users in the real time. It is written in Python, tested
versions are 3.4 and 3.5. Following dependencies are used:
- zmq -- binding to ZeroMQ message framework
- websockets -- framework for communication over WebSockets
- asyncio -- library for fast asynchronous operations
- pyyaml -- parsing YAML configuration files
- argparse -- parsing command line arguments
There is just one monitor instance required per broker. Also, monitor has to be
publicly visible (has to have public IP address or be behind public proxy

Loading…
Cancel
Save