Part of worker implementation

master
Petr Stefan 8 years ago
parent 8eb071f902
commit 6e07c308e2

@ -2666,19 +2666,9 @@ Fileserver stores its data in following structure:
## Worker ## Worker
@todo: describe a bit of internal structure in general The worker's job is to securely execute a job according to its configuration and
- two threads upload results back for latter processing. After receiving an evaluation
- number of ZeroMQ sockets, using it also for internal communication request, worker has to do following:
- how sandboxes are fitted into worker, unix syscalls, #ifndef
- libcurl for fetchning, why not to use some object binding
- working with local filesystem, directory structure
- hardware groups in detail
@todo: describe how jobs are generally executed
The worker's job is to securely execute submitted assignments and possibly
evaluate results against model solutions provided by the exercise author. After
receiving an evaluation request, worker has to:
- download the archive containing submitted source files and configuration file - download the archive containing submitted source files and configuration file
- download any supplementary files based on the configuration file, such as test - download any supplementary files based on the configuration file, such as test
@ -2689,37 +2679,17 @@ receiving an evaluation request, worker has to:
- upload the results of the evaluation to the fileserver - upload the results of the evaluation to the fileserver
- notify broker that the evaluation finished - notify broker that the evaluation finished
### Header matching ### Internal structure
Every worker belongs to exactly one **hardware group** and has a set of **headers**.
These properties help the broker decide which worker is suitable for processing
a request.
The hardware group is a string identifier used to group worker machines with
similar hardware configuration, for example "i7-4560-quad-ssd". It is
important for assignments where running times are compared to those of reference
solutions (we have to make sure that both programs run on simmilar hardware).
The headers are a set of key-value pairs that describe the worker
capabilities -- which runtime environments are installed, how many threads can
the worker run or whether it measures time precisely.
These information are sent to the broker on startup using the `init` command.
### Internal communication
Worker is logicaly divided into three parts: Worker is logicaly divided into two parts:
- **Listener** - communicates with broker through - **Listener** -- communicates with broker through ZeroMQ. On startup, it
[ZeroMQ](http://zeromq.org/). On startup, it introduces itself to the broker. introduces itself to the broker. Then it receives new jobs, passes them to
Then it receives new jobs, passes them to the **evaluator** part and sends the evaluator part and sends back results and progress reports.
back results and progress reports. - **Evaluator** -- gets jobs from the listener part, evaluates them (possibly in
- **Evaluator** - gets jobs from the **listener** part, evaluates them (possibly sandbox) and notifies the other part when the evaluation ends. Evaluator also
in sandbox) and notifies the other part when the evaluation ends. **Evaluator** communicates with fileserver, downloads supplementary files and uploads
also communicates with fileserver, downloads supplementary files and detailed results.
uploads detailed results.
- **Progress callback** -- receives information about the progress of an
evaluation from the evaluator and forwards them to the broker.
These parts run in separate threads of the same process and communicate through These parts run in separate threads of the same process and communicate through
ZeroMQ in-process sockets. Alternative approach would be using shared memory ZeroMQ in-process sockets. Alternative approach would be using shared memory
@ -2730,19 +2700,26 @@ there is no big overhead copying data between threads. This multi-threaded
design allows the worker to keep sending `ping` messages even when it is design allows the worker to keep sending `ping` messages even when it is
processing a job. processing a job.
### File management ### Capability identification
The messages sent by the broker to assign jobs to workers are rather simple - There are possibly multiple worker in a ReCodEx instance and each one can run on
they don't contain any files, only a URL of an archive with a job configuration. different computer or have installed different tools. To identify worker's
When processing the job, it may also be necessary to fetch supplementary files hardware capabilities is used concept of **hardware groups**. Every worker
such as helper scripts or test inputs and outputs. belongs to exactly one group with set of additional properties called
**headers**. Together they help the broker to decide which worker is suitable
for processing a job evaluation request. These information are sent to the
broker on worker startup.
Supplementary files are addressed using hashes of their content, which allows The hardware group is a string identifier used to group worker machines with
simple caching. Requested files are downloaded into the cache on demand. similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
This mechanism is hidden from the job evaluator, which depends on a groups and headers are configured by the administrator for each worker instance.
`file_manager_interface` instance. Because the filesystem cache can be shared If this is done correctly, performance measurements of a submission should yield
between more workers, cleaning functionality is implemented by the Cleaner the same results on all computer from the same hardware group. Thanks to this
program that should be set up to run periodically. fact, we can use the same resource limits on every worker in a hardware group.
The headers are a set of key-value pairs that describe the worker capabilities.
For example, they can show which runtime environments are installed or whether
this worker measures time precisely.
### Running student submissions ### Running student submissions
@ -2756,37 +2733,21 @@ system calls. Communication between processes is performed through unnamed pipe
with standard input and output descriptors redirection. To prevent Isolate with standard input and output descriptors redirection. To prevent Isolate
failure there is another safety guard -- whole sandbox is killed when it does failure there is another safety guard -- whole sandbox is killed when it does
not end in `(time + 300) * 1.2` seconds for `time` as original maximum time not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
allowed for the task. However, Isolate should allways end itself in time, so allowed for the task. This formula worksi well both for short and long tasks,
this additional safety should never be used. but is not meant to be used unless there is ai really big trouble. Isolate
should allways end itself in time, so this additional safety should never be
used.
Sandbox in general has to be command line application taking parameters with Sandbox in general has to be command line application taking parameters with
arguments, standard input or file. Outputs should be written to file or standard arguments, standard input or file. Outputs should be written to file or standard
output. There are no other requirements, worker design is very versatile and can output. There are no other requirements, worker design is very versatile and can
be adapted to different needs. be adapted to different needs.
### Runtime environments The sandbox part of the worker is the only one which is not portable, so
conditional compilation is used to include only supported parts of the project.
ReCodEx is designed to utilize a rather diverse set of workers -- there can be Isolate does not work on Windows environment, so also its invocation is done
differences in many aspects, such as the actual hardware running the worker through native calls of Linux OS (`fork`, `exec`). To disable compilation of
(which impacts the results of measuring) or installed compilers, interpreters this part on Windows, guard `#ifndef _WIN32` is used around affected files.
and other tools needed for evaluation. To address these two examples in
particular, we assign runtime environments and hardware groups to exercises.
The purpose of runtime environments is to specify which tools (and often also
operating system) are required to evaluate a solution of the exercise -- for
example, a C# programming exercise can be evaluated on a Linux worker running
Mono or a Windows worker with the .NET runtime. Such exercise would be assigned
two runtime environments, `Linux+Mono` and `Windows+.NET` (the environment names
are arbitrary strings configured by the administrator).
A hardware group is a set of workers that run on similar hardware (e.g. a
particular quad-core processor model and a SSD hard drive). Workers are assigned
to these groups by the administrator. If this is done correctly, performance
measurements of a submission should yield the same results. Thanks to this fact,
we can use the same resource limits on every worker in a hardware group.
However, limits can differ between runtime environments -- formally speaking,
limits are a function of three arguments: an assignment, a hardware group and a
runtime environment.
### Directories and files ### Directories and files
@ -2807,7 +2768,7 @@ directory is configurable and can be the same for multiple worker instances.
fileserver, usually there will be only yaml result file and optionally log, fileserver, usually there will be only yaml result file and optionally log,
every other file has to be copied here explicitly from job every other file has to be copied here explicitly from job
### Judges interface ### Judges
For future extensibility is critical that judges have some shared interface of For future extensibility is critical that judges have some shared interface of
calling and return values. calling and return values.
@ -2826,16 +2787,28 @@ calling and return values.
- exitcode: 2 - exitcode: 2
- stderr: there should be description of error - stderr: there should be description of error
### Additional libraries
@todo: libcurl, spdlog, boost, yaml-cpp, libarchive, cppzmq
## Monitor ## Monitor
Monitor is optional part of the ReCodEx solution for reporting progress of job Monitor is optional part of the ReCodEx solution for reporting progress of job
evaluation back to users in the real time. It is written in Python, tested evaluation back to users in the real time. It is written in Python, tested
versions are 3.4 and 3.5. There is just one monitor instance required per versions are 3.4 and 3.5. Following dependencies are used:
broker. Also, monitor has to be publicly visible (has to have public IP address
or be behind public proxy server) and also needs a connection to the broker. If - zmq -- binding to ZeroMQ message framework
the web application is using HTTPS, it is required to use a proxy for monitor to - websockets -- framework for communication over WebSockets
provide encryption over WebSockets. If this is not done, browsers of the users - asyncio -- library for fast asynchronous operations
will block unencrypted connection and will not show the progress to the users. - pyyaml -- parsing YAML configuration files
- argparse -- parsing command line arguments
There is just one monitor instance required per broker. Also, monitor has to be
publicly visible (has to have public IP address or be behind public proxy
server) and also needs a connection to the broker. If the web application is
using HTTPS, it is required to use a proxy for monitor to provide encryption
over WebSockets. If this is not done, browsers of the users will block
unencrypted connection and will not show the progress to the users.
### Message flow ### Message flow
@ -2886,13 +2859,15 @@ there can be numerous instances of workers with the same cache folder, but there
should be only one cleaner instance. should be only one cleaner instance.
Cleaner is written in Python 3 programming language, so it works well Cleaner is written in Python 3 programming language, so it works well
multi-platform. It is a simple script which checks the cache folder, possibly multi-platform. It uses only `pyyaml` library for reading configuration file and
deletes old files and then ends. This means that the cleaner has to be run `argparse` library for processing command line arguments.
repeatedly, for example using cron, systemd timer or Windows task scheduler.
It is a simple script which checks the cache folder, possibly deletes old files
For proper function of the cleaner a suitable cronning interval has to be used. and then ends. This means that the cleaner has to be run repeatedly, for example
It is recommended to use 24 hour interval which is sufficient enough for using cron, systemd timer or Windows task scheduler. For proper function of the
intended usage. cleaner a suitable cronning interval has to be used. It is recommended to use
24 hour interval which is sufficient enough for intended usage. The value is set
in cleanr's configuration file.
## REST API ## REST API

Loading…
Cancel
Save