Part of worker implementation

8 years ago · 6e07c308e2
parent 8eb071f902
commit 6e07c308e2
1 changed files with 73 additions and 98 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -2666,19 +2666,9 @@ Fileserver stores its data in following structure:
 ## Worker
-@todo: describe a bit of internal structure in general
+The worker's job is to securely execute a job according to its configuration and
-	- two threads
+upload results back for latter processing. After receiving an evaluation
-	- number of ZeroMQ sockets, using it also for internal communication
+request, worker has to do following:
 	- how sandboxes are fitted into worker, unix syscalls, #ifndef
 	- libcurl for fetchning, why not to use some object binding
 	- working with local filesystem, directory structure
 	- hardware groups in detail
@todo: describe how jobs are generally executed
 The worker's job is to securely execute submitted assignments and possibly 
 evaluate results against model solutions provided by the exercise author. After 
 receiving an evaluation request, worker has to:
 - download the archive containing submitted source files and configuration file
 - download any supplementary files based on the configuration file, such as test 
@ -2689,37 +2679,17 @@ receiving an evaluation request, worker has to:
 - upload the results of the evaluation to the fileserver
 - notify broker that the evaluation finished
-### Header matching
+### Internal structure
 Every worker belongs to exactly one **hardware group** and has a set of **headers**. 
 These properties help the broker decide which worker is suitable for processing 
 a request. 
 The hardware group is a string identifier used to group worker machines with 
 similar hardware configuration, for example "i7-4560-quad-ssd". It is 
 important for assignments where running times are compared to those of reference 
 solutions (we have to make sure that both programs run on simmilar hardware).
 The headers are a set of key-value pairs that describe the worker 
 capabilities -- which runtime environments are installed, how many threads can 
 the worker run or whether it measures time precisely.
 These information are sent to the broker on startup using the `init` command.
 ### Internal communication
-Worker is logicaly divided into three parts:
+Worker is logicaly divided into two parts:
- **Listener** - communicates with broker through 
+- **Listener** -- communicates with broker through ZeroMQ. On startup, it
-  [ZeroMQ](http://zeromq.org/). On startup, it introduces itself to the broker. 
+  introduces itself to the broker. Then it receives new jobs, passes them to
-  Then it receives new jobs, passes them to the **evaluator** part and sends 
+  the evaluator part and sends back results and progress reports.
-  back results and progress reports.
+- **Evaluator** -- gets jobs from the listener part, evaluates them (possibly in
- **Evaluator** - gets jobs from the **listener** part, evaluates them (possibly 
+  sandbox) and notifies the other part when the evaluation ends. Evaluator also
-  in sandbox) and notifies the other part when the evaluation ends. **Evaluator** 
+  communicates with fileserver, downloads supplementary files and uploads
-  also communicates with fileserver, downloads supplementary files and 
+  detailed results.
  uploads detailed results.
 - **Progress callback** -- receives information about the progress of an 
  evaluation from the evaluator and forwards them to the broker.
 These parts run in separate threads of the same process and communicate through
 ZeroMQ in-process sockets. Alternative approach would be using shared memory
@ -2730,19 +2700,26 @@ there is no big overhead copying data between threads. This multi-threaded
 design allows the worker to keep sending `ping` messages even when it is
 processing a job.
-### File management
+### Capability identification
-The messages sent by the broker to assign jobs to workers are rather simple - 
+There are possibly multiple worker in a ReCodEx instance and each one can run on
-they don't contain any files, only a URL of an archive with a job configuration. 
+different computer or have installed different tools. To identify worker's
-When processing the job, it may also be necessary to fetch supplementary files 
+hardware capabilities is used concept of **hardware groups**. Every worker
-such as helper scripts or test inputs and outputs.
+belongs to exactly one group with set of additional properties called
 **headers**. Together they help the broker to decide which worker is suitable
 for processing a job evaluation request. These information are sent to the
 broker on worker startup.
-Supplementary files are addressed using hashes of their content, which allows 
+The hardware group is a string identifier used to group worker machines with
-simple caching. Requested files are downloaded into the cache on demand. 
+similar hardware configuration, for example "i7-4560-quad-ssd". The hardware
-This mechanism is hidden from the job evaluator, which depends on a 
+groups and headers are configured by the administrator for each worker instance.
-`file_manager_interface` instance. Because the filesystem cache can be shared 
+If this is done correctly, performance measurements of a submission should yield
-between more workers, cleaning functionality is implemented by the Cleaner 
+the same results on all computer from the same hardware group. Thanks to this
-program that should be set up to run periodically.
+fact, we can use the same resource limits on every worker in a hardware group.
 The headers are a set of key-value pairs that describe the worker capabilities.
 For example, they can show which runtime environments are installed or whether
 this worker measures time precisely.
 ### Running student submissions
@ -2756,37 +2733,21 @@ system calls. Communication between processes is performed through unnamed pipe
 with standard input and output descriptors redirection. To prevent Isolate
 failure there is another safety guard -- whole sandbox is killed when it does
 not end in `(time + 300) * 1.2` seconds for `time` as original maximum time
-allowed for the task. However, Isolate should allways end itself in time, so
+allowed for the task. This formula worksi well both for short and long tasks,
-this additional safety should never be used.
+but is not meant to be used unless there is ai really big trouble. Isolate
 should allways end itself in time, so this additional safety should never be
 used.
 Sandbox in general has to be command line application taking parameters with
 arguments, standard input or file. Outputs should be written to file or standard
 output. There are no other requirements, worker design is very versatile and can
 be adapted to different needs.
-### Runtime environments
+The sandbox part of the worker is the only one which is not portable, so
-
+conditional compilation is used to include only supported parts of the project.
-ReCodEx is designed to utilize a rather diverse set of workers -- there can be 
+Isolate does not work on Windows environment, so also its invocation is done
-differences in many aspects, such as the actual hardware running the worker 
+through native calls of Linux OS (`fork`, `exec`). To disable compilation of
-(which impacts the results of measuring) or installed compilers, interpreters 
+this part on Windows, guard `#ifndef _WIN32` is used around affected files.
 and other tools needed for evaluation. To address these two examples in 
 particular, we assign runtime environments and hardware groups to exercises.
 The purpose of runtime environments is to specify which tools (and often also 
 operating system) are required to evaluate a solution of the exercise -- for 
 example, a C# programming exercise can be evaluated on a Linux worker running 
 Mono or a Windows worker with the .NET runtime. Such exercise would be assigned 
 two runtime environments, `Linux+Mono` and `Windows+.NET` (the environment names 
 are arbitrary strings configured by the administrator).
 A hardware group is a set of workers that run on similar hardware (e.g. a 
 particular quad-core processor model and a SSD hard drive). Workers are assigned 
 to these groups by the administrator. If this is done correctly, performance 
 measurements of a submission should yield the same results. Thanks to this fact, 
 we can use the same resource limits on every worker in a hardware group. 
 However, limits can differ between runtime environments -- formally speaking, 
 limits are a function of three arguments: an assignment, a hardware group and a 
 runtime environment.
 ### Directories and files
@ -2807,7 +2768,7 @@ directory is configurable and can be the same for multiple worker instances.
  fileserver, usually there will be only yaml result file and optionally log,
  every other file has to be copied here explicitly from job
-### Judges interface
+### Judges
 For future extensibility is critical that judges have some shared interface of
 calling and return values.
@ -2826,16 +2787,28 @@ calling and return values.
 		- exitcode: 2
 		- stderr: there should be description of error
 ### Additional libraries
@todo: libcurl, spdlog, boost, yaml-cpp, libarchive, cppzmq
 ## Monitor
 Monitor is optional part of the ReCodEx solution for reporting progress of job
 evaluation back to users in the real time. It is written in Python, tested
-versions are 3.4 and 3.5. There is just one monitor instance required per
+versions are 3.4 and 3.5. Following dependencies are used:
-broker. Also, monitor has to be publicly visible (has to have public IP address
+
-or be behind public proxy server) and also needs a connection to the broker. If
+- zmq -- binding to ZeroMQ message framework
-the web application is using HTTPS, it is required to use a proxy for monitor to
+- websockets -- framework for communication over WebSockets
-provide encryption over WebSockets. If this is not done, browsers of the users
+- asyncio -- library for fast asynchronous operations
-will block unencrypted connection and will not show the progress to the users.
+- pyyaml -- parsing YAML configuration files
 - argparse -- parsing command line arguments
 There is just one monitor instance required per broker. Also, monitor has to be
 publicly visible (has to have public IP address or be behind public proxy
 server) and also needs a connection to the broker. If the web application is
 using HTTPS, it is required to use a proxy for monitor to provide encryption
 over WebSockets. If this is not done, browsers of the users will block
 unencrypted connection and will not show the progress to the users.
 ### Message flow
@ -2886,13 +2859,15 @@ there can be numerous instances of workers with the same cache folder, but there
 should be only one cleaner instance.
 Cleaner is written in Python 3 programming language, so it works well
-multi-platform. It is a simple script which checks the cache folder, possibly
+multi-platform. It uses only `pyyaml` library for reading configuration file and
-deletes old files and then ends. This means that the cleaner has to be run
+`argparse` library for processing command line arguments.
-repeatedly, for example using cron, systemd timer or Windows task scheduler.
+
-
+It is a simple script which checks the cache folder, possibly deletes old files
-For proper function of the cleaner a suitable cronning interval has to be used.
+and then ends. This means that the cleaner has to be run repeatedly, for example
-It is recommended to use 24 hour interval which is sufficient enough for
+using cron, systemd timer or Windows task scheduler. For proper function of the
-intended usage.
+cleaner a suitable cronning interval has to be used. It is recommended to use
 24 hour interval which is sufficient enough for intended usage. The value is set
 in cleanr's configuration file.
 ## REST API