Merge branch 'master' of ssh://github.com/ReCodEx/wiki.wiki

master
Simon Rozsival 8 years ago
commit 24cf916121

@ -1,56 +1,3 @@
<!---
Notes:
* Dvoustrankovy uvod - co by to melo umet
* Analýza - co se rozhodneme delat, jak by se to dalo delat, pridelit dulezitost
- pak se da odkazat na to, proc jsme co nestihli, zahrnout i advanced featury
- odkazovat se u featur, ze to je v planu v pristich verzi - co je dulezite
a co ne!! Zduvodnit tim, jakou podmnozinu featur nechat, snaze se pak bude
popisovat architektura
* V analyze vysvetlit architekturu
* Related works nechat jako samostatnou kapitolu
* Poradi - pozadavky -> related works -> analyza
* Provazani komponent musi rozumet administrator a tvurce ulohy - obecna
kapitola v analyze - puvodni kapitola o analyze byla povedena, jen se tam
micha seznam zprav nebo co - to nezajima vsechny
* Po obecnym uvodu - rozdelit podle potencialniho ctenare - uzivatel ucitel, pak
uzivatel admin
* Instalacni dokumentace stranou, jako posledni
* Uzivatelaka dokumentace - admin: popis prav, autor uloh: nejobsahlejsi, format
skriptu - ale formulovat tak, ze bude popis na co kde kliknout, jazyk popsat
separatne - v budoucnu to bude irelevantni, je potreba daleko hloubeji - je
treba popsat detailne co eelaji, i treba relativni/absolutni adresy, makra,
kde vidi prekladac knihovny a headery... - kapitola na konci
* Uzivatelska dokumentace pro studenta: vysvetleni
* Jak se boduje uloha - tezko rict, kam to patri - nekde na zacatku? Ale zajima
to vsechny role, ucitel musi vedet, jak to nakonfigurovat - zminit treba i jak
bodovat podle casu a pameti (v analyze nebo v uvodu) - vice vystupu od judge,
interpolace bodu podle vyuziti pameti... je to spis mimo uživatelskou
* Nepsat kde na jake tlacitko kliknout
* Tutorialy - scenare, co udelat kdyz chci neco, vzorove pruchody
* U formularu je nejlepsi kdyz zadna dokumentace neni, doplnit popisky k polim
formularu
* V dokumentaci popsat konfigy nekde separatne - skore, yaml - referencni
dokumentace
* Urcite ne FAQ, vic strukturovane
* Instalaci dohromady na konec
* Programatorska dokumentace - "nejmene ctenaru" - neco uz tam mame, neni to
treba davat do tistene dokumentace - do tistene dokumentace dat odkaz na wiki,
neco v tistene ale byt musi - jaky jazyk, designové rozhodnutí - zdůvodnění
nedávat do úvodní analýzy - k referencnim dokumentacim udelat uvod - "restove
API jsme pojali timto zpusobem, deli se to na tyto skupiny, ..."
* Co zvolena architektura znamena, neco to ma dat i uzivateli, ktery
architekturu nezna, kde je drzenej stav
* Z dokumentace musi byt patrne, co dela knihovna a co se musi udelat rucne -
kolik je to prace - psat to vic pro uzivatele, ktery zna technologie, nezna
knihovny
* Mit soucit s tema, ktery to toho tolik neznaji - jak technologie, tak
architekturu a system CodExu
* Nesedi cisla stranek
* Stazeni ZIPu s vystupy Backendu - roztridit na verejne a tajne, verejne i pro
studenta
-->
# Introduction
Generally, there are many different ways and opinions on how to teach people
@ -139,16 +86,15 @@ corresponds to his/her privileges. There are user groups reflecting the
structure of lectured courses.
A database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of a text describing the problem (optionally in two
language variants -- Czech and English), an evaluation configuration
(machine-readable instructions on how to evaluate solutions to the exercise) and
a set of inputs and reference outputs. Exercises are created by instructed
privileged users. Assigning an exercise to a group means choosing one of the
available exercises and specifying additional properties: a deadline (optionally
a second deadline), a maximum amount of points, a configuration for calculating
the score, a maximum number of submissions, and a list of supported runtime
environments (e.g. programming languages) including specific time and memory
limits for each one.
Each exercise consists of a text describing the problem, an evaluation
configuration (machine-readable instructions on how to evaluate solutions to the
exercise), time and memory limits for all supported runtimes (e.g. programming
languages), a configuration for calculating the final score and a set of inputs
and reference outputs. Exercises are created by instructed privileged users.
Assigning an exercise to a group means choosing one of the available exercises
and specifying additional properties: a deadline (optionally a second deadline),
a maximum amount of points, a maximum number of submissions and a list of
supported runtime environments.
Typical use cases for supported user roles are following:
@ -160,7 +106,7 @@ Typical use cases for supported user roles are following:
evaluation process
- view solution results -- which parts succeeded and failed, total number of
acquired points, bonus points
- **supervisor**
- **supervisor** (similar to CodEx **operator**)
- create exercise -- create description text and evaluation configuration
(for each programming environment), upload testing inputs and outputs
- assign exercise to group -- choose exercise and set deadlines, number of
@ -195,10 +141,12 @@ Incoming jobs are kept in a queue until a free worker picks them. Workers are
capable of sequential evaluation of jobs, one at a time.
The worker obtains the solution and its evaluation configuration, parses it and
starts executing the contained instructions. It is crucial to keep the worker
computer secure and stable, so a sandboxed environment is used for dealing with
unknown source code. When the execution is finished, results are saved and the
submitter is notified.
starts executing the contained instructions. Each job should have more testing
cases, which examine wrong inputs, corner values and data of different sizes to
guess the program complexity. It is crucial to keep the worker computer secure
and stable, so a sandboxed environment is used for dealing with unknown source
code. When the execution is finished, results are saved and the submitter is
notified.
The output of the worker contains data about the evaluation, such as time and
memory spent on running the program for each test input and whether its output
@ -225,10 +173,10 @@ several drawbacks. The main ones are:
test multi-threaded applications as well.
- **instances** -- Different ways of CodEx usage scenarios requires separate
installations (Programming I and II, Java, C#, etc.). This configuration is
not user friendly (students have to register in each instance separately) and
burdens administrators with unnecessary work. CodEx architecture does not
allow sharing hardware between instances, which results in an inefficient use
of hardware for evaluation.
not user friendly (students have to register in each installation separately)
and burdens administrators with unnecessary work. CodEx architecture does not
allow sharing workers between installations, which results in an inefficient
use of hardware for evaluation.
- **task extensibility** -- There is a need to test and evaluate complicated
programs for classes such as Parallel programming or Compiler principles,
which have a more difficult evaluation chain than simple
@ -249,17 +197,12 @@ In general, CodEx features should be preserved, so only differences are
presented here. For clear arrangement all the requirements and wishes are
presented grouped by categories.
### System Features
System features represents directly accessible functionality to users of the
system. They describe the evaluation system in general and also university
addons (mostly administrative features).
#### Requirements of The Users
### Requirements of The Users
- _group hierarchy_ -- creating an arbitrarily nested tree structure should be
supported to allow keeping related groups together, such as in the example
below. A group hierarchy also allows archiving data from past courses.
below. CodEx supported only a flat group structure. A group hierarchy also
allows archiving data from past courses.
```
Summer term 2016
@ -271,33 +214,31 @@ addons (mostly administrative features).
...
```
- _a database of exercises_ -- teachers should be able to create exercises
including textual description, sample inputs and correct reference outputs
(for example "sum all numbers from given file and write the result to the
standard output") and to browse this database
- _a database of exercises_ -- teachers should be able to filter viewed
exercises according to several criteria, for example supported runtime
environment or author. It should also be possible to link exercises to a group
so that groups supervisors do not have to browse hundreds of exercises when
their group only uses five of them
- _advanced exercises_ -- the system should support more advanced evaluation
pipeline than basic compilation/execution/evaluation which is in CodEx
- _customizable grading system_ -- teachers need to specify the way of
computation of the final score, which will be awarded to the submissions of the student depending on their quality
- _viewing student details_ -- teachers should be able to view the details of
their students (members of their groups), including all submitted solutions
- _awarding additional points_ -- adding (or subtracting) points from the final
score of a submission by a supervisor must be supported
computation of the final score, which will be awarded to the submissions of
the student depending on their quality
- _marking a solution as accepted_ -- the system should allow marking one
particular solution as accepted (used for grading the assignment) by the
supervisor
- _solution resubmission_ -- teachers should be able edit the solutions of the student
and privately resubmit them, optionally saving all results (including
- _solution resubmission_ -- teachers should be able edit the solutions of the
student and privately resubmit them, optionally saving all results (including
temporary ones); this feature can be used to quickly fix obvious errors in the
solution and see if it is otherwise viable
- _localization_ -- all texts (UI and exercises) should be translatable
- _formatted exercise texts_ -- Markdown or another lightweight markup language
should be supported for formatting exercise texts
- _exercise tags_ -- the system should support tagging exercises searching by
these tags
- _comments_ -- adding both private and public comments to exercises, tests and
solutions should be supported
- _plagiarism detection_
#### Administrative Requirements
### Administrative Requirements
- _pluggable user interface_ -- the system should allow using an alternative
user interface, such as a command line client; implementation of such clients
@ -310,10 +251,10 @@ addons (mostly administrative features).
OAuth, should be supported
- _querying SIS_ -- loading user data from the university information system
should be supported
- _sandboxing_ -- there should be a safe environment in which the
solutions of the students are executed to prevent system failures due to malicious code being
submitted; the sandboxed environment should have the least possible impact on
measurement results (most importantly on measured times)
- _sandboxing_ -- there should be more advanced sandboxing which supports
execution of parallel programs and easy integration of different programming
environments and tools; the sandboxed environment should have the least
possible impact on measurement results (most importantly on measured times)
- _heterogeneous worker pool_ -- there must be support for submission evaluation
in multiple programming environments in a single installation to avoid
unacceptable workload for the administrator (maintaining a separate
@ -328,14 +269,10 @@ addons (mostly administrative features).
### Non-functional Requirements
Non-functional requirements are requirements of technical character with no
direct mapping to visible parts of the system. In an ideal world, users should
not know about these features if they work properly, but would be at least
annoyed if they did not.
- _no installation_ -- the primary user interface of the system must be
accessible on the computers of the users without the need to install any additional
software
accessible on the computers of the users without the need to install any
additional software except for a web browser (which is installed on a vast
majority of personal computers)
- _performance_ -- the system must be ready for at least hundreds of students
and tens of supervisors using it at once
- _automated deployment_ -- all of the components of the system must be easy to
@ -1035,6 +972,91 @@ HTTP(S).
![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png)
### Job Configuration File
As discussed previously in 'Evaluation Unit Executed by ReCodEx' evaluation unit
will have form of job which will contain small tasks representing one piece of
work executed by worker. This implies jobs have to be somehow given from
frontend to backend. The best option for this is to use some kind of
configuration file which will represent particular jobs. Mentioned configuration
file should be specified in frontend and in backend, namely worker, will be
parsed and executed.
There are many formats which can be used for configuration representation. The
ones which make sense are:
- *XML* -- is broadly used general markup language which is flavoured with DTD
definition which can express and check XML file structure, so it does not have
to be checked within application. But XML with its tags can be sometimes quite
'chatty' and extensive which does not have to be desirable. And overally XML
with all its features and properties can be a bit heavy-weight.
- *JSON* -- is notation which was developed to represent javascript objects. As
such it is quite simple, there can be expressed only: key-value structures,
arrays and primitive values. Structure and hierarchy of data is solved by
braces and brackets.
- *INI* -- is very simple configuration format which is able to represents only
key-value structures which can be grouped into sections. Which is not enough
to represent job and its tasks hierarchy.
- *YAML* -- format which is very similar to JSON with its capabilities. But with
small difference in structure and hirarchy of configuration which is solved
not with braces but with indentation. This means that YAML is easily readable
by both human and machine.
- *specific format* -- newly created format used just for job configuration.
Obvious drawback is non-existing parsers which would have to be written from
scratch.
Given previous list of different formats we decided to use YAML. There are
existing parsers for most of the programming languages and it is easy enough to
learn and understand. Another choice which make sense is JSON but at the end
YAML seemed to be better.
#### Configuration File Content
@todo: discuss what should be in configuration: limits, dependencies, priorities... whatever
#### Supplementary Files
Interesting problem arise with supplementary files (e.g., inputs, sample
outputs). There are two approaches which can be observed. Supplementary files
can be downloaded either on the start of the execution or during execution.
If the files are downloaded at the beginning, execution does not really started
at this point and thus if there are problems with network, worker will find it
right away and can abort execution without executing single task. Slight
problems can arise if some of the files needs to have same name (e.g. solution
assumes that input is `input.txt`), in this scenario downloaded files cannot be
renamed at the beginning but during execution which is somehow impractical and
not easily observed by the authors of job configurations.
Second solution of this problem when files are downloaded on the fly has quite
opposite problem, if there are problems with network, worker will find it during
execution when for instance almost whole execution is done, this is also not
ideal solution if we care about burnt hardware resources. On the other hand
using this approach users have quite advanced control of execution flow and know
what files exactly are available during execution which is from users
perspective probably more appealing then the first solution. Based on that,
downloading of supplementary files using 'fetch' tasks during execution was
chosen and implemented.
#### Job Variables
Considering the fact that jobs can be executed within the worker on different
machines with specific settings, it can be handy to have some kind of mechanism
in the job configuration which will hide these particular worker details, most
notably specific directory structure. For this purpose marks or signs can be
used and can have a form of broadly used variables.
Variables in general can be used everywhere where configuration values (not
keys) are expected. This implies that substitution should be done after parsing
of job configuration, not before. The only usage for variables which was
considered is for directories within worker, but in future this might be subject
to change.
Final form of variables is `${...}` where triple dot is textual description.
This format was used because of special dollar sign character which cannot be
used within paths of regular filesystems. Braces are there only to border
textual description of variable.
### Broker
The broker is responsible for keeping track of available workers and
@ -1141,44 +1163,48 @@ kind of in-process messages. The ZeroMQ library which we already use provides
in-process messages that work on the same principles as network communication,
which is convenient and solves problems with thread synchronization.
#### Evaluation
#### Execution of Jobs
At this point we have worker with two internal parts listening one and execution
one. Implementation of first one is quite straightforward and clear. So let us
discuss what should be happening in execution subsystem.
After successful arrival of job, worker has to prepare new execution
environment, then solution archive has to be downloaded from fileserver and
extracted. Job configuration is located within these files and loaded into
After successful arrival of the job from broker to the listening thread, the job
is immediatelly redirected to execution thread. In there worker has to prepare
new execution environment, solution archive has to be downloaded from fileserver
and extracted. Job configuration is located within these files and loaded into
internal structures and executed. After that, results are uploaded back to
fileserver. These steps are the basic ones which are really necessary for whole
execution and have to be executed in this precise order.
#### Job Configuration
Jobs as work units can quite vary and do completely different things, that means
configuration and worker has to be prepared for this kind of generality.
Configuration and its solution was already discussed above, implementation in
worker is then quite also quite straightforward.
The evaluation unit executed by ReCodEx and job configuration were already
discussed above. The conclusion was that jobs containing small tasks will be
used. Particular format of the actual job configuration can be found in 'Job
configuration' appendix. Implementation of parsing and storing these data in
worker is then quite straightforward.
Worker has internal structures to which loads and which stores metadata given in
configuration. Whole job is mapped to job metadata structure and tasks are
mapped to either external ones or internal ones (internal commands has to be
defined within worker), both are different whether they are executed in sandbox
or as internal worker commands.
or as an internal worker commands.
#### Task Execution Failure
Another division of tasks is by task-type field in configuration. This field can
have four values: initiation, execution, evaluation and inner. All was discussed
and described above in configuration analysis. What is important to worker is
and described above in evaluation unit analysis. What is important to worker is
how to behave if execution of task with some particular type fails.
There are two possible situations execution fails due to bad user solution or
due to some internal error. If execution fails on internal error solution cannot
be declared overly as failed. User should not be punished for bad configuration
or some network error. This is where task types are useful. Generally
initiation, execution and evaluation are tasks which are somehow executing code
or some network error. This is where task types are useful.
Initiation, execution and evaluation are tasks which are usually executing code
which was given by users who submitted solution of exercise. If this kinds of
tasks fail it is probably connected with bad user solution and can be evaluated.
But if some inner task fails solution should be re-executed, in best case
scenario on different worker. That is why if inner task fails it is sent back to
broker which will reassign job to another worker. More on this subject should be
@ -1214,45 +1240,6 @@ searching through this system should be easy. In addition if solutions of users
have access only to evaluation directory then they do not have access to
unnecessary files which is better for overall security of whole ReCodEx.
#### Job Variables
As mentioned above worker has job directories but users who are writing and
managing job configurations do not know where they are (on some particular
worker) and how they can be accessed and written into configuration. For this
kind of task we have to introduce some kind of marks or signs which will
represent particular folders. Marks or signs can have form broadly used
variables.
Variables can be used everywhere where filesystem paths are used within
configuration file. This will solve problem with specific worker environment and
specific hierarchy of directories. Final form of variables is `${...}` where
triple dot is textual description. This format was used because of special
dollar sign character which cannot be used within filesystem path, braces are
there only to border textual description of variable.
#### Supplementary Files
Interesting problem is with supplementary files (inputs, sample outputs). There
are two approaches which can be observed. Supplementary files can be downloaded
either on the start of the execution or during execution. If the files are
downloaded at the beginning, execution does not really started at this point and
if there are problems with network worker will find it right away and can abort
execution without executing single task. Slight problems can arise if some of
the files needs to have same name (e.g. solution assumes that input is
`input.txt`), in this scenario downloaded files cannot be renamed at the
beginning but during execution which is somehow impractical and not easily
observed.
Second solution of this problem when files are downloaded on the fly has quite
opposite problem, if there are problems with network, worker will find it during
execution when for instance almost whole execution is done, this is also not
ideal solution if we care about burnt hardware resources. On the other hand
using this approach users have quite advanced control of execution flow and know
what files exactly are available during execution which is from users
perspective probably more appealing then the first solution. Based on that,
downloading of supplementary files using 'fetch' tasks during execution was
chosen and implemented.
### Sandboxing
There are numerous ways how to approach sandboxing on different platforms,
@ -1306,10 +1293,12 @@ But designing sandbox only for specific environment is possible, namely for C#
and .NET. CLR as a virtual machine and runtime environment has a pretty good
security support for restrictions and separation which is also transferred to
C#. This makes it quite easy to implement simple sandbox within C# but there are
not any well known general purpose implementations. As said in previous
paragraph implementing our own solution is out of scope of project. But C#
sandbox is quite good topic for another project for example term project for C#
course so it might be written and integrated in future.
not any well known general purpose implementations.
As mentioned in previous paragraphs implementing our own solution is out of
scope of project. But C# sandbox is quite good topic for another project for
example term project for C# course so it might be written and integrated in
future.
### Fileserver
@ -1658,20 +1647,21 @@ for implementation of a website.
There are two basic ways how to create a website these days:
- **server-side approach** - the actions of the user are processed on the server and the
HTML code with the results of the action is generated on the server and sent
back to the web browser of the user. The client does not handle any logic
(apart from rendering of the user interface and some basic user interaction)
and is therefore very simple. The server can use the API server for processing
of the actions so the business logic of the server can be very simple as well.
A disadvantage of this approach is that a lot of redundant data is transferred
across the requests although some parts of the content can be cached (e.g.,
CSS files). This results in longer loading times of the website.
- **server-side approach** - the actions of the user are processed on the server
and the HTML code with the results of the action is generated on the server
and sent back to the web browser of the user. The client does not handle any
logic (apart from rendering of the user interface and some basic user
interaction) and is therefore very simple. The server can use the API server
for processing of the actions so the business logic of the server can be very
simple as well. A disadvantage of this approach is that a lot of redundant
data is transferred across the requests although some parts of the content can
be cached (e.g., CSS files). This results in longer loading times of the
website.
- **server-side rendering with asynchronous updates (AJAX)** - a slightly
different approach is to render the page on the server as in the previous case
but then execute the actions of the user asynchronously using the `XMLHttpRequest`
JavaScript functionality. Which creates a HTTP request and transfers only the
part of the website which will be updated.
but then execute the actions of the user asynchronously using the
`XMLHttpRequest` JavaScript functionality. Which creates a HTTP request and
transfers only the part of the website which will be updated.
- **client-side approach** - the opposite approach is to transfer the
communication with the API server and the rendering of the HTML completely
from the server directly to the client. The client runs the code (usually
@ -1698,8 +1688,8 @@ modern web applications.
We examined several frameworks which are commonly used to speed up the
development of a web application. There are several open source options
available with a large number of tools, tutorials, and libraries. From the many
options (Backbone, Ember, Vue, Cycle.js, ...) there are two main frameworks worth
considering:
options (Backbone, Ember, Vue, Cycle.js, ...) there are two main frameworks
worth considering:
- **Angular 2** - it is a new framework which was developed by Google. This
framework is very complex and provides the developer with many tools which
@ -1899,7 +1889,7 @@ sign. In the second column there is a list of assigned exercises with its
deadlines. If you want to quickly get to the groups page you might want to use
provided "Show group's detail" button.
### Join Group and Start Solving Assignments
### Join Group
To be able to submit solutions you have to be a member of the right group. Each
instance has its own group hierarchy, so you can choose only those within your
@ -1916,13 +1906,15 @@ clicking on "See group's page" link following with "Join group" link.
in hierarchy and membership cannot be established by students themselves.
Management of students in this type of groups is in the hands of supervisors.
On the group detail page there are multiple interesting things for you. The first
one is a brief overview containing the information describing the group, there is a list of
supervisors and also the hierarchy of the subgroups. The most important section
is the "Student's dashboard" section. This section contains the list of assignments and
the list of fellow students. If the supervisors of the group allowed students to see the
statistic of their fellow students then there will also be the number of
points each of the students has gained so far.
On the group detail page there are multiple interesting things for you. The
first one is a brief overview containing the information describing the group,
there is a list of supervisors and also the hierarchy of the subgroups. The most
important section is the "Student's dashboard" section. This section contains
the list of assignments and the list of fellow students. If the supervisors of
the group allowed students to see the statistic of their fellow students then
there will also be the number of points each of the students has gained so far.
### Start Solving Assignments
In the "Assignments" box on the group detail page there is a list of assigned
exercises which students are supposed to solve. The assignments are displayed
@ -1954,6 +1946,8 @@ your browser which will be displayed in another dialog window. When the whole
execution is finished then a "See the results" button will appear and you can
look at the results of your solution.
### View Results of Submission
On the results detail page there are a lot of information. Apart from assignment
description, which is not connected to your results, there is also the solution
submitter name (supervisor can submit a solution on your behalf), further there
@ -2003,10 +1997,10 @@ available only for group administrators.
On "Dashboard" page you can find "Groups you supervise" section. Here there are
boxes representing your groups with the list of students attending course and
their points. Student names are clickable with redirection to the profile of the user
where further information about his/hers assignments and solution can be found.
To quickly jump onto groups page, use "Show group's detail" button at the bottom
of the matching group box.
their points. Student names are clickable with redirection to the profile of the
user where further information about his/hers assignments and solution can be
found. To quickly jump onto groups page, use "Show group's detail" button at
the bottom of the matching group box.
### Manage Group
@ -2330,8 +2324,8 @@ appear in "Groups hierarchy" box at the top of the page.
On the instance details page, there is a box "Licences". On the first line, it
shows it this instance has currently valid licence or not. Then, there are
multiple lines with all licences assigned to this instance. Each line consists of
a note, validity status (if it is valid or revoked by superadministrator) and
multiple lines with all licences assigned to this instance. Each line consists
of a note, validity status (if it is valid or revoked by superadministrator) and
the last date of licence validity.
A box "Add new licence" is used for creating new licences. Required fields are
@ -2472,9 +2466,9 @@ submission:
```
Basically it means, that the job _hello-world-job_ needs to be run on workers
that belong to the `group_1` hardware group . Reference files are downloaded
from the default location configured in API (such as
`http://localhost:9999/exercises`) if not stated explicitly otherwise. Job
that belong to the `group_1` hardware group . Reference files are downloaded
from the default location configured in API (such as
`http://localhost:9999/exercises`) if not stated explicitly otherwise. Job
execution log will not be saved to result archive.
Next the tasks have to be constructed under _tasks_ section. In this demo job,
@ -2622,9 +2616,9 @@ Broker implementation depends on several open-source C and C++ libraries.
YAML format.
- **boost-filesystem** -- Boost filesystem is used for managing logging
directory (create if necessary) and parsing filesystem paths from strings as
written in the configuration of the broker. Filesystem operations will be included in
future releases of C++ standard, so this dependency may be removed in the
future.
written in the configuration of the broker. Filesystem operations will be
included in future releases of C++ standard, so this dependency may be
removed in the future.
- **boost-program_options** -- Boost program options is used for
parsing of command line positional arguments. It is possible to use POSIX
`getopt` C function, but we decided to use boost, which provides nicer API and
@ -2662,9 +2656,9 @@ maintain backward compatibility).
Fileserver stores its data in following structure:
- `./submissions/<id>/` -- folder that contains files submitted by users
(the solutions to the assignments of the student). `<id>` is an identifier received from
the REST API.
- `./submissions/<id>/` -- folder that contains files submitted by users (the
solutions to the assignments of the student). `<id>` is an identifier received
from the REST API.
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
created automatically when a submission is uploaded. `<id>` is an identifier
of the corresponding submission.
@ -2678,9 +2672,9 @@ Fileserver stores its data in following structure:
## Worker
The job of the worker is to securely execute a job according to its configuration and
upload results back for latter processing. After receiving an evaluation
request, worker has to do following:
The job of the worker is to securely execute a job according to its
configuration and upload results back for latter processing. After receiving an
evaluation request, worker has to do following:
- download the archive containing submitted source files and configuration file
- download any supplementary files based on the configuration file, such as test
@ -2772,13 +2766,13 @@ separate directory structure which is removed after finishing the job.
The files are stored in local filesystem of the worker computer in a
configurable location. The job is not restricted to use only specified
directories (tasks can do whatever is allowed on the target system), but it is
directories (tasks can do anything that is allowed by the system), but it is
advised not to write outside them. In addition, sandboxed tasks are usually
restricted to use only a specific (evaluation) directory.
The following directory structure is used for execution. The working
directory of the worker (root of the following paths) is shared for multiple instances on the
same computer.
The following directory structure is used for execution. The working directory
of the worker (root of the following paths) is shared for multiple instances on
the same computer.
- `downloads/${WORKER_ID}/${JOB_ID}` -- place to store the downloaded archive
with submitted sources and job configuration
@ -2804,7 +2798,7 @@ for comparison and exit code reflecting if the result is correct (0) of wrong
(1).
This interface lacks support for returning additional data by the judges, for
example similarity of the two files calculated as the edit distance of Levenshtein.
example similarity of the two files calculated as the Levenshtein edit distance.
To allow passing these additional values an extended judge interface can be
implemented:
@ -2843,16 +2837,18 @@ them are multi-platform, so both Linux and Windows builds are possible.
Actual supported formats depends on installed packages on target system, but
at least ZIP and TAR.GZ should be available.
- **cppzmq** -- Cppzmq is a simple C++ wrapper for core ZeroMQ C API. It
basicaly contains only one header file, but its API fits into the object architecture of the worker.
basicaly contains only one header file, but its API fits into the object
architecture of the worker.
- **spdlog** -- Spdlog is small, fast and modern logging library. It is used for
all of the logging, both system and job logs. It is highly customizable and
configurable from the configuration of the worker.
- **yaml-cpp** -- Yaml-cpp is used for parsing and creating text files in YAML
format. That includes the configuration of the worker, the configuration and the results of a job.
format. That includes the configuration of the worker, the configuration and
the results of a job.
- **boost-filesystem** -- Boost filesystem is used for multi-platform
manipulation with files and directories. However, these operations will be
included in future releases of C++ standard, so this dependency may be
removed in the future.
included in future releases of C++ standard, so this dependency may be removed
in the future.
- **boost-program_options** -- Boost program options is used for multi-platform
parsing of command line positional arguments. It is not necessary to use it,
similar functionality can be implemented be ourselves, but this well known
@ -2911,15 +2907,15 @@ command (normally FINISHED) is received, then are permanently deleted. This
caching mechanism was implemented because early testing shows, that first couple
of messages are missed quite often.
Messages from the queue of the client are sent through corresponding WebSocket connection
via main event loop as soon as possible. This approach with separate queue per
connection is easy to implement and guarantees reliability and order of message
delivery.
Messages from the queue of the client are sent through corresponding WebSocket
connection via main event loop as soon as possible. This approach with separate
queue per connection is easy to implement and guarantees reliability and order
of message delivery.
## Cleaner
Cleaner component is tightly bound to the worker. It manages the cache folder of the worker,
mainly deletes outdated files. Every cleaner instance maintains one
Cleaner component is tightly bound to the worker. It manages the cache folder of
the worker, mainly deletes outdated files. Every cleaner instance maintains one
cache folder, which can be used by multiple workers. This means on one server
there can be numerous instances of workers with the same cache folder, but there
should be only one cleaner instance.
@ -3179,8 +3175,8 @@ no empty frames (unles explicitly specified otherwise).
Broker acts as server when communicating with worker. Listening IP address and
port are configurable, protocol family is TCP. Worker socket is of DEALER type,
broker one is ROUTER type. Because of that, very first part of every (multipart)
message from broker to worker must be target the socket identity of the worker (which is
saved on its **init** command).
message from broker to worker must be target the socket identity of the worker
(which is saved on its **init** command).
#### Commands from Broker to Worker:
@ -3284,13 +3280,13 @@ capable to send corresponding credentials with each request.
#### Worker Side
Workers comunicate with the file server in both directions -- they download
the submissions of the student and then upload evaluation results. Internally, worker is
using libcurl C library with very similar setup. In both cases it can verify
HTTPS certificate (on Linux against system cert list, on Windows against
downloaded one from CURL website during installation), support basic HTTP
authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error
(returned HTTP status code is >=400). Worker have list of credentials to all
Workers comunicate with the file server in both directions -- they download the
submissions of the student and then upload evaluation results. Internally,
worker is using libcurl C library with very similar setup. In both cases it can
verify HTTPS certificate (on Linux against system cert list, on Windows against
downloaded one from CURL website during installation), support basic HTTP
authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error
(returned HTTP status code is >=400). Worker have list of credentials to all
available file servers in its config file.
- download file -- standard HTTP GET request to given URL expecting file content
@ -3315,20 +3311,20 @@ with proper configuration. Relevant commands for communication with workers:
successful upload returns JSON `{ "result": "OK" }` as body of returned page.
If not specified otherwise, `zip` format of archives is used. Symbol `/` in API
description is root of the domain of the file server. If the domain is for example
`fs.recodex.org` with SSL support, getting input file for one task could look as
GET request to
description is root of the domain of the file server. If the domain is for
example `fs.recodex.org` with SSL support, getting input file for one task could
look as GET request to
`https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c`.
### Broker - Monitor Communication
Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of
socket is same on both sides, ROUTER. Monitor is set to act as server in this
communication, its IP address and port are configurable in the config of the monitor
file. ZeroMQ socket ID (set on the side of the monitor) is "recodex-monitor" and must be
sent as first frame of every multipart message -- see ZeroMQ ROUTER socket
documentation for more info.
Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of
socket is same on both sides, ROUTER. Monitor is set to act as server in this
communication, its IP address and port are configurable in the config of the
monitor file. ZeroMQ socket ID (set on the side of the monitor) is
"recodex-monitor" and must be sent as first frame of every multipart message --
see ZeroMQ ROUTER socket documentation for more info.
Note that the monitor is designed so that it can receive data both from the
broker and workers. The current architecture prefers the broker to do all the
@ -3467,12 +3463,12 @@ Message format:
### Web App - Web API Communication
The provided web application runs as a JavaScript process inside the browser of the user.
It communicates with the REST API on the server through the standard HTTP requests.
Documentation of the main REST API is in a separate
[document](https://recodex.github.io/api/) due to its extensiveness. The results are
returned encoded in JSON which is simply processed by the web application and
presented to the user in an appropriate way.
The provided web application runs as a JavaScript process inside the browser of
the user. It communicates with the REST API on the server through the standard
HTTP requests. Documentation of the main REST API is in a separate
[document](https://recodex.github.io/api/) due to its extensiveness. The results
are returned encoded in JSON which is simply processed by the web application
and presented to the user in an appropriate way.
<!---

Loading…
Cancel
Save