|
|
@ -2666,19 +2666,9 @@ Fileserver stores its data in following structure:
|
|
|
|
|
|
|
|
|
|
|
|
## Worker
|
|
|
|
## Worker
|
|
|
|
|
|
|
|
|
|
|
|
@todo: describe a bit of internal structure in general
|
|
|
|
The worker's job is to securely execute a job according to its configuration and
|
|
|
|
- two threads
|
|
|
|
upload results back for latter processing. After receiving an evaluation
|
|
|
|
- number of ZeroMQ sockets, using it also for internal communication
|
|
|
|
request, worker has to do following:
|
|
|
|
- how sandboxes are fitted into worker, unix syscalls, #ifndef
|
|
|
|
|
|
|
|
- libcurl for fetchning, why not to use some object binding
|
|
|
|
|
|
|
|
- working with local filesystem, directory structure
|
|
|
|
|
|
|
|
- hardware groups in detail
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@todo: describe how jobs are generally executed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The worker's job is to securely execute submitted assignments and possibly
|
|
|
|
|
|
|
|
evaluate results against model solutions provided by the exercise author. After
|
|
|
|
|
|
|
|
receiving an evaluation request, worker has to:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
- download the archive containing submitted source files and configuration file
|
|
|
|
- download the archive containing submitted source files and configuration file
|
|
|
|
- download any supplementary files based on the configuration file, such as test
|
|
|
|
- download any supplementary files based on the configuration file, such as test
|
|
|
@ -2689,37 +2679,17 @@ receiving an evaluation request, worker has to:
|
|
|
|
- upload the results of the evaluation to the fileserver
|
|
|
|
- upload the results of the evaluation to the fileserver
|
|
|
|
- notify broker that the evaluation finished
|
|
|
|
- notify broker that the evaluation finished
|
|
|
|
|
|
|
|
|
|
|
|
### Header matching
|
|
|
|
### Internal structure
|
|
|
|
|
|
|
|
|
|
|
|
Every worker belongs to exactly one **hardware group** and has a set of **headers**.
|
|
|
|
|
|
|
|
These properties help the broker decide which worker is suitable for processing
|
|
|
|
|
|
|
|
a request.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The hardware group is a string identifier used to group worker machines with
|
|
|
|
|
|
|
|
similar hardware configuration, for example "i7-4560-quad-ssd". It is
|
|
|
|
|
|
|
|
important for assignments where running times are compared to those of reference
|
|
|
|
|
|
|
|
solutions (we have to make sure that both programs run on simmilar hardware).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The headers are a set of key-value pairs that describe the worker
|
|
|
|
|
|
|
|
capabilities -- which runtime environments are installed, how many threads can
|
|
|
|
|
|
|
|
the worker run or whether it measures time precisely.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These information are sent to the broker on startup using the `init` command.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Internal communication
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Worker is logicaly divided into three parts:
|
|
|
|
Worker is logicaly divided into two parts:
|
|
|
|
|
|
|
|
|
|
|
|
- **Listener** - communicates with broker through
|
|
|
|
- **Listener** -- communicates with broker through ZeroMQ. On startup, it
|
|
|
|
[ZeroMQ](http://zeromq.org/). On startup, it introduces itself to the broker.
|
|
|
|
introduces itself to the broker. Then it receives new jobs, passes them to
|
|
|
|
Then it receives new jobs, passes them to the **evaluator** part and sends
|
|
|
|
the evaluator part and sends back results and progress reports.
|
|
|
|
back results and progress reports.
|
|
|
|
- **Evaluator** -- gets jobs from the listener part, evaluates them (possibly in
|
|
|
|
- **Evaluator** - gets jobs from the **listener** part, evaluates them (possibly
|
|
|
|
sandbox) and notifies the other part when the evaluation ends. Evaluator also
|
|
|
|
in sandbox) and notifies the other part when the evaluation ends. **Evaluator**
|
|
|
|
communicates with fileserver, downloads supplementary files and uploads
|
|
|
|
also communicates with fileserver, downloads supplementary files and
|
|
|
|
detailed results.
|
|
|
|
uploads detailed results.
|
|
|
|
|
|
|
|
- **Progress callback** -- receives information about the progress of an
|
|
|
|
|
|
|
|
evaluation from the evaluator and forwards them to the broker.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These parts run in separate threads of the same process and communicate through
|
|
|
|
These parts run in separate threads of the same process and communicate through
|
|
|
|
ZeroMQ in-process sockets. Alternative approach would be using shared memory
|
|
|
|
ZeroMQ in-process sockets. Alternative approach would be using shared memory
|
|
|
@ -2730,19 +2700,26 @@ there is no big overhead copying data between threads. This multi-threaded
|
|
|
|
design allows the worker to keep sending `ping` messages even when it is
|
|
|
|
design allows the worker to keep sending `ping` messages even when it is
|
|
|
|
processing a job.
|
|
|
|
processing a job.
|
|
|
|
|
|
|
|
|
|
|
|
### File management
|
|
|
|
### Capability identification
|
|
|
|
|
|
|
|
|
|
|
|
The messages sent by the broker to assign jobs to workers are rather simple -
|
|
|
|
There are possibly multiple worker in a ReCodEx instance and each one can run on
|
|
|
|
they don't contain any files, only a URL of an archive with a job configuration.
|
|
|
|
different computer or have installed different tools. To identify worker's
|
|
|
|
When processing the job, it may also be necessary to fetch supplementary files
|
|
|
|
hardware capabilities is used concept of **hardware groups**. Every worker
|
|
|
|
such as helper scripts or test inputs and outputs.
|
|
|
|
belongs to exactly one group with set of additional properties called
|
|
|
|
|
|
|
|
**headers**. Together they help the broker to decide which worker is suitable
|
|
|
|
|
|
|
|
for processing a job evaluation request. These information are sent to the
|
|
|
|
|
|
|
|
broker on worker startup.
|
|
|
|
|
|
|
|
|
|
|
|
Supplementary files are addressed using hashes of their content, which allows
|
|
|
|
The hardware group is a string identifier used to group worker machines with
|
|
|
|
simple caching. Requested files are downloaded into the cache on demand.
|
|
|
|
similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
|
|
|
|
This mechanism is hidden from the job evaluator, which depends on a
|
|
|
|
groups and headers are configured by the administrator for each worker instance.
|
|
|
|
`file_manager_interface` instance. Because the filesystem cache can be shared
|
|
|
|
If this is done correctly, performance measurements of a submission should yield
|
|
|
|
between more workers, cleaning functionality is implemented by the Cleaner
|
|
|
|
the same results on all computer from the same hardware group. Thanks to this
|
|
|
|
program that should be set up to run periodically.
|
|
|
|
fact, we can use the same resource limits on every worker in a hardware group.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The headers are a set of key-value pairs that describe the worker capabilities.
|
|
|
|
|
|
|
|
For example, they can show which runtime environments are installed or whether
|
|
|
|
|
|
|
|
this worker measures time precisely.
|
|
|
|
|
|
|
|
|
|
|
|
### Running student submissions
|
|
|
|
### Running student submissions
|
|
|
|
|
|
|
|
|
|
|
@ -2756,37 +2733,21 @@ system calls. Communication between processes is performed through unnamed pipe
|
|
|
|
with standard input and output descriptors redirection. To prevent Isolate
|
|
|
|
with standard input and output descriptors redirection. To prevent Isolate
|
|
|
|
failure there is another safety guard -- whole sandbox is killed when it does
|
|
|
|
failure there is another safety guard -- whole sandbox is killed when it does
|
|
|
|
not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
|
|
|
|
not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
|
|
|
|
allowed for the task. However, Isolate should allways end itself in time, so
|
|
|
|
allowed for the task. This formula worksi well both for short and long tasks,
|
|
|
|
this additional safety should never be used.
|
|
|
|
but is not meant to be used unless there is ai really big trouble. Isolate
|
|
|
|
|
|
|
|
should allways end itself in time, so this additional safety should never be
|
|
|
|
|
|
|
|
used.
|
|
|
|
|
|
|
|
|
|
|
|
Sandbox in general has to be command line application taking parameters with
|
|
|
|
Sandbox in general has to be command line application taking parameters with
|
|
|
|
arguments, standard input or file. Outputs should be written to file or standard
|
|
|
|
arguments, standard input or file. Outputs should be written to file or standard
|
|
|
|
output. There are no other requirements, worker design is very versatile and can
|
|
|
|
output. There are no other requirements, worker design is very versatile and can
|
|
|
|
be adapted to different needs.
|
|
|
|
be adapted to different needs.
|
|
|
|
|
|
|
|
|
|
|
|
### Runtime environments
|
|
|
|
The sandbox part of the worker is the only one which is not portable, so
|
|
|
|
|
|
|
|
conditional compilation is used to include only supported parts of the project.
|
|
|
|
ReCodEx is designed to utilize a rather diverse set of workers -- there can be
|
|
|
|
Isolate does not work on Windows environment, so also its invocation is done
|
|
|
|
differences in many aspects, such as the actual hardware running the worker
|
|
|
|
through native calls of Linux OS (`fork`, `exec`). To disable compilation of
|
|
|
|
(which impacts the results of measuring) or installed compilers, interpreters
|
|
|
|
this part on Windows, guard `#ifndef _WIN32` is used around affected files.
|
|
|
|
and other tools needed for evaluation. To address these two examples in
|
|
|
|
|
|
|
|
particular, we assign runtime environments and hardware groups to exercises.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The purpose of runtime environments is to specify which tools (and often also
|
|
|
|
|
|
|
|
operating system) are required to evaluate a solution of the exercise -- for
|
|
|
|
|
|
|
|
example, a C# programming exercise can be evaluated on a Linux worker running
|
|
|
|
|
|
|
|
Mono or a Windows worker with the .NET runtime. Such exercise would be assigned
|
|
|
|
|
|
|
|
two runtime environments, `Linux+Mono` and `Windows+.NET` (the environment names
|
|
|
|
|
|
|
|
are arbitrary strings configured by the administrator).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
A hardware group is a set of workers that run on similar hardware (e.g. a
|
|
|
|
|
|
|
|
particular quad-core processor model and a SSD hard drive). Workers are assigned
|
|
|
|
|
|
|
|
to these groups by the administrator. If this is done correctly, performance
|
|
|
|
|
|
|
|
measurements of a submission should yield the same results. Thanks to this fact,
|
|
|
|
|
|
|
|
we can use the same resource limits on every worker in a hardware group.
|
|
|
|
|
|
|
|
However, limits can differ between runtime environments -- formally speaking,
|
|
|
|
|
|
|
|
limits are a function of three arguments: an assignment, a hardware group and a
|
|
|
|
|
|
|
|
runtime environment.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Directories and files
|
|
|
|
### Directories and files
|
|
|
|
|
|
|
|
|
|
|
@ -2807,7 +2768,7 @@ directory is configurable and can be the same for multiple worker instances.
|
|
|
|
fileserver, usually there will be only yaml result file and optionally log,
|
|
|
|
fileserver, usually there will be only yaml result file and optionally log,
|
|
|
|
every other file has to be copied here explicitly from job
|
|
|
|
every other file has to be copied here explicitly from job
|
|
|
|
|
|
|
|
|
|
|
|
### Judges interface
|
|
|
|
### Judges
|
|
|
|
|
|
|
|
|
|
|
|
For future extensibility is critical that judges have some shared interface of
|
|
|
|
For future extensibility is critical that judges have some shared interface of
|
|
|
|
calling and return values.
|
|
|
|
calling and return values.
|
|
|
@ -2826,16 +2787,28 @@ calling and return values.
|
|
|
|
- exitcode: 2
|
|
|
|
- exitcode: 2
|
|
|
|
- stderr: there should be description of error
|
|
|
|
- stderr: there should be description of error
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Additional libraries
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
@todo: libcurl, spdlog, boost, yaml-cpp, libarchive, cppzmq
|
|
|
|
|
|
|
|
|
|
|
|
## Monitor
|
|
|
|
## Monitor
|
|
|
|
|
|
|
|
|
|
|
|
Monitor is optional part of the ReCodEx solution for reporting progress of job
|
|
|
|
Monitor is optional part of the ReCodEx solution for reporting progress of job
|
|
|
|
evaluation back to users in the real time. It is written in Python, tested
|
|
|
|
evaluation back to users in the real time. It is written in Python, tested
|
|
|
|
versions are 3.4 and 3.5. There is just one monitor instance required per
|
|
|
|
versions are 3.4 and 3.5. Following dependencies are used:
|
|
|
|
broker. Also, monitor has to be publicly visible (has to have public IP address
|
|
|
|
|
|
|
|
or be behind public proxy server) and also needs a connection to the broker. If
|
|
|
|
- zmq -- binding to ZeroMQ message framework
|
|
|
|
the web application is using HTTPS, it is required to use a proxy for monitor to
|
|
|
|
- websockets -- framework for communication over WebSockets
|
|
|
|
provide encryption over WebSockets. If this is not done, browsers of the users
|
|
|
|
- asyncio -- library for fast asynchronous operations
|
|
|
|
will block unencrypted connection and will not show the progress to the users.
|
|
|
|
- pyyaml -- parsing YAML configuration files
|
|
|
|
|
|
|
|
- argparse -- parsing command line arguments
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There is just one monitor instance required per broker. Also, monitor has to be
|
|
|
|
|
|
|
|
publicly visible (has to have public IP address or be behind public proxy
|
|
|
|
|
|
|
|
server) and also needs a connection to the broker. If the web application is
|
|
|
|
|
|
|
|
using HTTPS, it is required to use a proxy for monitor to provide encryption
|
|
|
|
|
|
|
|
over WebSockets. If this is not done, browsers of the users will block
|
|
|
|
|
|
|
|
unencrypted connection and will not show the progress to the users.
|
|
|
|
|
|
|
|
|
|
|
|
### Message flow
|
|
|
|
### Message flow
|
|
|
|
|
|
|
|
|
|
|
@ -2886,13 +2859,15 @@ there can be numerous instances of workers with the same cache folder, but there
|
|
|
|
should be only one cleaner instance.
|
|
|
|
should be only one cleaner instance.
|
|
|
|
|
|
|
|
|
|
|
|
Cleaner is written in Python 3 programming language, so it works well
|
|
|
|
Cleaner is written in Python 3 programming language, so it works well
|
|
|
|
multi-platform. It is a simple script which checks the cache folder, possibly
|
|
|
|
multi-platform. It uses only `pyyaml` library for reading configuration file and
|
|
|
|
deletes old files and then ends. This means that the cleaner has to be run
|
|
|
|
`argparse` library for processing command line arguments.
|
|
|
|
repeatedly, for example using cron, systemd timer or Windows task scheduler.
|
|
|
|
|
|
|
|
|
|
|
|
It is a simple script which checks the cache folder, possibly deletes old files
|
|
|
|
For proper function of the cleaner a suitable cronning interval has to be used.
|
|
|
|
and then ends. This means that the cleaner has to be run repeatedly, for example
|
|
|
|
It is recommended to use 24 hour interval which is sufficient enough for
|
|
|
|
using cron, systemd timer or Windows task scheduler. For proper function of the
|
|
|
|
intended usage.
|
|
|
|
cleaner a suitable cronning interval has to be used. It is recommended to use
|
|
|
|
|
|
|
|
24 hour interval which is sufficient enough for intended usage. The value is set
|
|
|
|
|
|
|
|
in cleanr's configuration file.
|
|
|
|
|
|
|
|
|
|
|
|
## REST API
|
|
|
|
## REST API
|
|
|
|
|
|
|
|
|
|
|
|