|
|
|
@ -2629,22 +2629,30 @@ used.
|
|
|
|
|
|
|
|
|
|
## Fileserver
|
|
|
|
|
|
|
|
|
|
Fileserver component provides shared storage between frontend and backend. It is
|
|
|
|
|
writtend in Python 3 using Flask web framework. Fileserver stores files in
|
|
|
|
|
configurable filesystem directory, provides file deduplication and HTTP access.
|
|
|
|
|
To keep the stored data safe, fileserver is not visible from public internet.
|
|
|
|
|
The fileserver component provides a shared file storage between the frontend and
|
|
|
|
|
the backend. It is writtend in Python 3 using Flask web framework. Fileserver
|
|
|
|
|
stores files in configurable filesystem directory, provides file deduplication
|
|
|
|
|
and HTTP access. To keep the stored data safe, the fileserver should not be
|
|
|
|
|
visible from public internet. Instead, it should be accessed indirectly through
|
|
|
|
|
the REST API.
|
|
|
|
|
|
|
|
|
|
### File deduplication
|
|
|
|
|
|
|
|
|
|
File deduplication is designed as storing files under the hashes of their
|
|
|
|
|
From our analysis of the requirements, it is certain we need to implement a
|
|
|
|
|
means of dealing with duplicate files.
|
|
|
|
|
|
|
|
|
|
File deduplication is implemented by storing files under the hashes of their
|
|
|
|
|
content. This procedure is done completely inside fileserver. Plain files are
|
|
|
|
|
uploaded into fileserver, hashed, saved and the new filename returned back to
|
|
|
|
|
uploaded into fileserver, hashed, saved and the new filename is returned back to
|
|
|
|
|
the uploader.
|
|
|
|
|
|
|
|
|
|
SHA1 is used as hashing function, because it is fast to compute and provides
|
|
|
|
|
better collision safety than MD5 hashing function. Files with the same hash are
|
|
|
|
|
treated as the same, no additional checks for collisions are performed. However,
|
|
|
|
|
it is really unlikely to find one.
|
|
|
|
|
reasonable collision safety for non-cryptographic purposes. Files with the same
|
|
|
|
|
hash are treated as the same, no additional checks for collisions are performed.
|
|
|
|
|
However, it is really unlikely to find one. If SHA1 proves insufficient, it is
|
|
|
|
|
possible to change the hash function to something else, because the naming
|
|
|
|
|
strategy is fully contained in the fileserver (special care must be taken to
|
|
|
|
|
maintain backward compatibility).
|
|
|
|
|
|
|
|
|
|
### Storage structure
|
|
|
|
|
|
|
|
|
@ -2656,11 +2664,11 @@ Fileserver stores its data in following structure:
|
|
|
|
|
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
|
|
|
|
|
created automatically when a submission is uploaded. `<id>` is an identifier
|
|
|
|
|
of the corresponding submission.
|
|
|
|
|
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and
|
|
|
|
|
outputs). `<key>` is a hash of the file content (`sha1` is used) and
|
|
|
|
|
- `./exercises/<subkey>/<key>` -- supplementary exercise files (e.g. test inputs
|
|
|
|
|
and outputs). `<key>` is a hash of the file content (`sha1` is used) and
|
|
|
|
|
`<subkey>` is its first letter (this is an attempt to prevent creating a flat
|
|
|
|
|
directory structure).
|
|
|
|
|
- `./results/<id>.zip` -- ZIP archive of results for submission with `<id>`
|
|
|
|
|
- `./results/<id>.zip` -- ZIP archives of results for submission with `<id>`
|
|
|
|
|
identifier.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@ -2702,52 +2710,54 @@ processing a job.
|
|
|
|
|
|
|
|
|
|
### Capability identification
|
|
|
|
|
|
|
|
|
|
There are possibly multiple worker in a ReCodEx instance and each one can run on
|
|
|
|
|
different computer or have installed different tools. To identify worker's
|
|
|
|
|
hardware capabilities is used concept of **hardware groups**. Every worker
|
|
|
|
|
belongs to exactly one group with set of additional properties called
|
|
|
|
|
**headers**. Together they help the broker to decide which worker is suitable
|
|
|
|
|
for processing a job evaluation request. These information are sent to the
|
|
|
|
|
broker on worker startup.
|
|
|
|
|
|
|
|
|
|
The hardware group is a string identifier used to group worker machines with
|
|
|
|
|
similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
|
|
|
|
|
groups and headers are configured by the administrator for each worker instance.
|
|
|
|
|
If this is done correctly, performance measurements of a submission should yield
|
|
|
|
|
the same results on all computer from the same hardware group. Thanks to this
|
|
|
|
|
fact, we can use the same resource limits on every worker in a hardware group.
|
|
|
|
|
There are possibly multiple worker instances in a ReCodEx installation and each
|
|
|
|
|
one can run on different hardware, operating system, or have different tools
|
|
|
|
|
installed. To identify the hardware capabilities of a worker, we use the concept
|
|
|
|
|
of **hardware groups**. Each worker belongs to exactly one group that specifies
|
|
|
|
|
the hardware and operating system on which the submitted programs will be run. A
|
|
|
|
|
worker also has a set of additional properties called **headers**. Together they
|
|
|
|
|
help the broker to decide which worker is suitable for processing a job
|
|
|
|
|
evaluation request. This information is sent to the broker on worker startup.
|
|
|
|
|
|
|
|
|
|
The hardware group is a string identifier of the hardware configuration, for
|
|
|
|
|
example "i7-4560-quad-ssd-linux" configured by the administrator for each worker
|
|
|
|
|
instance. If this is done correctly, performance measurements of a submission
|
|
|
|
|
should yield the same results on all computers from the same hardware group.
|
|
|
|
|
Thanks to this fact, we can use the same resource limits on every worker in a
|
|
|
|
|
hardware group.
|
|
|
|
|
|
|
|
|
|
The headers are a set of key-value pairs that describe the worker capabilities.
|
|
|
|
|
For example, they can show which runtime environments are installed or whether
|
|
|
|
|
this worker measures time precisely.
|
|
|
|
|
this worker measures time precisely. Headers are also configured manually by an
|
|
|
|
|
administrator.
|
|
|
|
|
|
|
|
|
|
### Running student submissions
|
|
|
|
|
|
|
|
|
|
Student submissions are executed inside sandboxing environment to prevent damage
|
|
|
|
|
of host system and also to restrict amount of used resources. Now only the
|
|
|
|
|
Isolate sandbox support is implemented in worker, but there is a possibility of
|
|
|
|
|
easy extending list of supported sandboxes.
|
|
|
|
|
|
|
|
|
|
Isolate is executed in separate Linux process created by `fork` and `exec`
|
|
|
|
|
system calls. Communication between processes is performed through unnamed pipe
|
|
|
|
|
with standard input and output descriptors redirection. To prevent Isolate
|
|
|
|
|
failure there is another safety guard -- whole sandbox is killed when it does
|
|
|
|
|
not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
|
|
|
|
|
allowed for the task. This formula worksi well both for short and long tasks,
|
|
|
|
|
but is not meant to be used unless there is ai really big trouble. Isolate
|
|
|
|
|
should allways end itself in time, so this additional safety should never be
|
|
|
|
|
used.
|
|
|
|
|
Student submissions are executed in a sandbox environment to prevent them from
|
|
|
|
|
damaging the host system and also to restrict the amount of used resources.
|
|
|
|
|
Currently, only the Isolate sandbox support is implemented, but it is possible
|
|
|
|
|
to add support for another sandox.
|
|
|
|
|
|
|
|
|
|
Sandbox in general has to be command line application taking parameters with
|
|
|
|
|
arguments, standard input or file. Outputs should be written to file or standard
|
|
|
|
|
output. There are no other requirements, worker design is very versatile and can
|
|
|
|
|
be adapted to different needs.
|
|
|
|
|
Every sandbox, regardless of the concrete implementation, has to be a command
|
|
|
|
|
line application taking parameters with arguments, standard input or file.
|
|
|
|
|
Outputs should be written to a file or to the standard output. There are no
|
|
|
|
|
other requirements, the design of the worker is very versatile and can be
|
|
|
|
|
adapted to different needs.
|
|
|
|
|
|
|
|
|
|
The sandbox part of the worker is the only one which is not portable, so
|
|
|
|
|
conditional compilation is used to include only supported parts of the project.
|
|
|
|
|
Isolate does not work on Windows environment, so also its invocation is done
|
|
|
|
|
through native calls of Linux OS (`fork`, `exec`). To disable compilation of
|
|
|
|
|
this part on Windows, guard `#ifndef _WIN32` is used around affected files.
|
|
|
|
|
this part on Windows, the `#ifndef _WIN32` guard is used around affected files.
|
|
|
|
|
|
|
|
|
|
Isolate in particular is executed in a separate Linux process created by `fork`
|
|
|
|
|
and `exec` system calls. Communication between processes is performed through an
|
|
|
|
|
unnamed pipe with standard input and output descriptors redirection. To prevent
|
|
|
|
|
Isolate failure there is another safety guard -- whole sandbox is killed when it
|
|
|
|
|
does not end in `(time + 300) * 1.2` seconds where `time` is the original
|
|
|
|
|
maximum time allowed for the task. This formula works well both for short and
|
|
|
|
|
long tasks, but the timeout should never be reached if Isolate works properly --
|
|
|
|
|
it should always end itself in time.
|
|
|
|
|
|
|
|
|
|
### Directories and files
|
|
|
|
|
|
|
|
|
@ -2771,27 +2781,28 @@ directory is configurable and can be the same for multiple worker instances.
|
|
|
|
|
### Judges
|
|
|
|
|
|
|
|
|
|
ReCodEx provides a few initial judges programs. They are mostly adopted from
|
|
|
|
|
CodEx system and installed with worker component. Judging programs have to meet
|
|
|
|
|
some requirements. Basic ones are inspired by standard `diff` application -- two
|
|
|
|
|
mandatory positional parameters which have to be the files for comparison and
|
|
|
|
|
exit code reflecting if the results is correct (0) of wrong (1).
|
|
|
|
|
CodEx and installed automatically with the worker component. Judging programs
|
|
|
|
|
have to meet some requirements. Basic ones are inspired by standard `diff`
|
|
|
|
|
application -- two mandatory positional parameters which have to be the files
|
|
|
|
|
for comparison and exit code reflecting if the result is correct (0) of wrong
|
|
|
|
|
(1).
|
|
|
|
|
|
|
|
|
|
This interface lacks support for returning additional data by the judges, for
|
|
|
|
|
example similarity of the two files calculated as Levenshtein's edit distance.
|
|
|
|
|
To allow passing these additional values an extended judge interface can be
|
|
|
|
|
implemented:
|
|
|
|
|
|
|
|
|
|
- Parameters: There are two mandatory positional parameters which has to be
|
|
|
|
|
- Parameters: There are two mandatory positional parameters which have to be
|
|
|
|
|
files for comparision
|
|
|
|
|
- Results:
|
|
|
|
|
- _comparison OK_
|
|
|
|
|
- exitcode: 0
|
|
|
|
|
- stdout: there is one line with a double value which should be set to
|
|
|
|
|
1.0
|
|
|
|
|
- exitcode: 0
|
|
|
|
|
- stdout: there is a single line with a double value which
|
|
|
|
|
should be 1.0
|
|
|
|
|
- _comparison BAD_
|
|
|
|
|
- exitcode: 1
|
|
|
|
|
- stdout: there is one line with a double value which should be
|
|
|
|
|
quality percentage of the two given files
|
|
|
|
|
- exitcode: 1
|
|
|
|
|
- stdout: there is a single line with a double value which
|
|
|
|
|
should be quality percentage of the judged file
|
|
|
|
|
- _error during execution_
|
|
|
|
|
- exitcode: 2
|
|
|
|
|
- stderr: there should be description of error
|
|
|
|
@ -2810,15 +2821,14 @@ comply most of possible use cases.
|
|
|
|
|
|
|
|
|
|
## Monitor
|
|
|
|
|
|
|
|
|
|
Monitor is optional part of the ReCodEx solution for reporting progress of job
|
|
|
|
|
evaluation back to users in the real time. It is written in Python, tested
|
|
|
|
|
Monitor is an optional part of the ReCodEx solution for reporting progress of
|
|
|
|
|
job evaluation back to users in the real time. It is written in Python, tested
|
|
|
|
|
versions are 3.4 and 3.5. Following dependencies are used:
|
|
|
|
|
|
|
|
|
|
- zmq -- binding to ZeroMQ message framework
|
|
|
|
|
- websockets -- framework for communication over WebSockets
|
|
|
|
|
- asyncio -- library for fast asynchronous operations
|
|
|
|
|
- pyyaml -- parsing YAML configuration files
|
|
|
|
|
- argparse -- parsing command line arguments
|
|
|
|
|
|
|
|
|
|
There is just one monitor instance required per broker. Also, monitor has to be
|
|
|
|
|
publicly visible (has to have public IP address or be behind public proxy
|
|
|
|
|