master
Petr Stefan 8 years ago
parent c906ed83af
commit 91fa5d0af1

@ -69,7 +69,7 @@ logical mistakes is really hard to automate and requires manpower.
Checking programs written by students takes a lot of time and requires a lot of
mechanical, repetitive work. The first idea of an automatic evaluation system
comes from Stanford University profesors in 1965. They implemented a system
comes from Stanford University professors in 1965. They implemented a system
which evaluated code in Algol submitted on punch cards. In following years, many
similar products were written.
@ -101,7 +101,7 @@ that following four basic steps have to be supported:
1. compile the code and check for compilation errors
2. run compiled binary in a sandbox with predefined inputs
3. check constraints on used amount of memory and time
4. compare program outpus with predefined values
4. compare program outputs with predefined values
The project has a great starting point -- there is an old grading system
currently used at the university (CodEx), so its flaws and weaknesses can be
@ -141,14 +141,14 @@ which is a member.
Database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of a text in multiple language variants, an evaluation
configuration and a set of inputs and reference outputs. Exercises are created
by instructed priviledged users. Assigning an exercise to a group means to
by instructed privileged users. Assigning an exercise to a group means to
choose one of the available exercises and specifying additional properties. An
assignment has a deadline (optionally a second deadline), a maximum amount of
points, a configuration for calculating the final score, a maximum number of
submissions, and a list of supported runtime environemnts (e.g., programming
submissions, and a list of supported runtime environements (e.g., programming
languages) including specific time and memory limits for the sandboxed tasks.
Typical use cases for supported user roles are ilustrated on following UML
Typical use cases for supported user roles are illustrated on following UML
diagram:
![System use case diagram](https://github.com/ReCodEx/wiki/raw/master/images/System_use_case.png)
@ -199,7 +199,7 @@ came from administrators and supervisors. The ideas were gathered mostly our
personal experience with the system and from meetings with faculty staff
involved with the current system.
For clear arragement all the requirements and wishes are presented grouped by
For clear arrangement all the requirements and wishes are presented grouped by
categories.
### System features
@ -228,7 +228,7 @@ They describe the evaluation system in general and also university addons
reviewed, commented and assigned additional points (positive or negative)
- one particular solution can be marked as accepted (used for grading this
assignment)
- teacher can edit student solution and privately resubmit it; optionaly saving
- teacher can edit student solution and privately resubmit it; optionally saving
all results (including temporary ones)
- localization of all texts (UI and exercises)
- Markdown support for creating exercise texts
@ -242,7 +242,7 @@ They describe the evaluation system in general and also university addons
mainly for viewing assigned exercises, uploading their own solutions to the
assignments, and viewing the results of the solutions after an automatic
evaluation is finished; wanted two interfaces are web and command-line based
- user priviledge separation (at least two roles -- _student_ and _supervisor_)
- user privilege separation (at least two roles -- _student_ and _supervisor_)
- logging in through a university authentication system (e.g. LDAP)
- SIS (university information system) integration for fetching personal user
data
@ -264,7 +264,7 @@ met. Most notably they are these ones:
- user interface of the system accessible on users' computers without
installation of any kind of additional software
- easy implementation of different user interfaces
- be ready for workload hundreads of students and tens of supervisors
- be ready for workload hundreds of students and tens of supervisors
- automated installation of all components
@todo: fill some nonfunctional requirements;
@ -303,7 +303,7 @@ for adapting it for many different subjects.
CodEx is based on dynamic analysis. It features a web-based interface, where
supervisors can assign exercises to their students and the students have a time
window to submit their solutions. Each solution is compiled and run in sandbox
(MO-Eval). The metrics which are checked are: corectness of the output, time
(MO-Eval). The metrics which are checked are: correctness of the output, time
and memory limits. It supports programs written in C, C++, C#, Java, Pascal,
Python and Haskell.
@ -378,7 +378,7 @@ the system is generally obsolete.
and functional web UI, but the rest of the application is too simple. A nice
feature is the usage of a [standardized
format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for
exercises. Kattis is primarily used by programming contest organizators, company
exercises. Kattis is primarily used by programming contest organizers, company
recruiters and also some universities.
@ -386,7 +386,7 @@ recruiters and also some universities.
## ReCodEx goals
@todo: improve and extend this chapter - analysis of user requrements and way we
@todo: improve and extend this chapter - analysis of user requirements and way we
solve them; exercise is a template for assignment, users are in groups, what is
group, how points are assigned for solutions, ...
@ -446,8 +446,9 @@ notable features are following:
- which problems are they? ... these ones below:
- what type of users there should be, why they are needed
- explain why there is exercise and assignment division, what means what and how they are used
- explain instances why they are usefull what they solve and also discuss licences concept
- groups, they can be public and private and why is that, what it solves, explain amd discuss treshold and other group features
- explain instances why they are useful what they solve and also discuss licenses concept
- groups, they can be public and private and why is that, what it solves,
explain and discuss threshold and other group features
- extended execution pipeline (not just compilation/execution/evaluation) and why it is needed
- progress state, how it can be done and displayed to user, why random messages
- how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why
@ -500,7 +501,7 @@ which will execute jobs and component which will distribute jobs to the
instances of the first one. This ensures scalability in manner of parallel
execution of numerous jobs which is exactly what is needed. Implementation of
these services are called **broker** and **worker**, first one handles
distribution, latter execution. These components should be enough to fulfil all
distribution, latter execution. These components should be enough to fulfill all
above said, but for the sake of simplicity and better communication gateways
with frontend two other components were added, **fileserver** and **monitor**.
Fileserver is simple component whose purpose is to store files which are
@ -556,7 +557,7 @@ protocol between these two logical parts will be described as well.
One of the bigger requests for the new system is to support a complex
configuration of execution pipeline. The idea comes from lecturers of Compiler
principles class who want to migrate their semi-manual evaluation process to
CodEx. Unfortunately, CodEx is not capable of such compilicated exercise setup.
CodEx. Unfortunately, CodEx is not capable of such complicated exercise setup.
None of evaluation systems we found is can handle such task, so design from
scratch is needed.
@ -578,18 +579,18 @@ systems it is better to implement reasonable subset of operations directly
without calling system provided binaries. These operations are copy file, create
new directory, extract archive and so on, altogether called internal tasks.
Another benefit from custom implementation of these tasks is guarantied safety,
so no sandbox needs to be used as in exernal tasks case.
so no sandbox needs to be used as in external tasks case.
For a job evaluation, the tasks needs to be executed sequentialy in a specified
For a job evaluation, the tasks needs to be executed sequentially in a specified
order. The idea of running independent tasks in parallel is bad because exact
time measurement needs controled environment on target computer with
minimalization of interrupts by other processes. It seems that connecting tasks
time measurement needs controlled environment on target computer with
minimization of interrupts by other processes. It seems that connecting tasks
into directed acyclic graph (DAG) can handle all possible problem cases. None of
the authors, supervisors and involved faculty staff can think of a problem that
cannot be decomposed into tasks connected in a DAG. The goal of evaluation is
to satisfy as many tasks as possible. During execution there are sometimes
multiple choices of next task. To control that, each task can have a priority,
which is used as a secondary ordering criterium. For better understanding, here
which is used as a secondary ordering criterion. For better understanding, here
is a small example.
![Task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png)
@ -614,7 +615,7 @@ reasonable, to keep this piece of information alongside the tasks in job
configuration, so each task can have a label about its purpose. Unlabeled tasks
have an internal type _inner_. There are four categories of tasks:
- _initiation_ -- setting up the environment, compilling code, etc.; for users
- _initiation_ -- setting up the environment, compiling code, etc.; for users
failure means error in their sources which are not compatible with running it
with examination data
- _execution_ -- running the user code with examination data, must not exceed
@ -625,11 +626,11 @@ have an internal type _inner_. There are four categories of tasks:
- _inner_ -- no special meaning for frontend, technical tasks for fetching and
copying files, creating directories, etc.
Each job is composed of multiple tasks of these types which are semanticaly
grupped into tests. A test can represent one set of examination data for user
code. To mark the groupping, another task label can be used. Each test must have
Each job is composed of multiple tasks of these types which are semantically
grouped into tests. A test can represent one set of examination data for user
code. To mark the grouping, another task label can be used. Each test must have
exactly one _evaluation_ task (to show success or failure to users) and
arbitraty number of tasks with other types.
arbitrary number of tasks with other types.
## Implementation analysis
@ -666,7 +667,7 @@ messages even during execution. So worker has to be divided into two separate
parts, the one which will handle communication with broker and the another which
will execute jobs. The easiest solution is to have these parts in separate
threads which somehow tightly communicates with each other. For inner process
commucation there can be used numerous technologies, from shared memory to
communication there can be used numerous technologies, from shared memory to
condition variables or some kind of in-process messages. Already used library
ZeroMQ is possible to provide in-process messages working on the same principles
as network communication which is quite handy and solves problems with threads
@ -674,11 +675,22 @@ synchronization and such.
At this point we have worker with two internal parts listening one and execution one. Implementation of first one is quite straighforward and clear. So lets discuss what should be happening in execution subsystem. Jobs as work units can quite vary and do completely different things, that means configuration and worker has to be prepared for this kind of generality. Configuration and its solution was already discussed above, implementation in worker is then quite also quite straightforward. Worker has internal structures to which loads and which stores metadata given in configuration. Whole job is mapped to job metadata structure and tasks are mapped to either external ones or internal ones (internal commands has to be defined within worker), both are different whether they are executed in sandbox or as internal worker commands.
Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails. There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overally as failed. User should not be punished for bad configuration or some network error. This is where task types are usefull. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.
Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails. There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overly as failed. User should not be punished for bad configuration or some network error. This is where task types are useful. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.
There is also question about working directory or directories of job, which directories should be used and what for. There is one simple answer on this every job will have only one specified directory which will contain every file with which worker will work in the scope of whole job execution. This is of course nonsense there has to be some logical division. The least which must be done are two folders one for internal temporary files and second one for evaluation. The directory for temporary files is enough to comprehend all kind of internal work with filesystem but only one directory for whole evaluation is somehow not enough. Users solutions are downloaded in form of zip archives so why these should be present during execution or why the results and files which should be uploaded back to fileserver should be cherry picked from the one big directory? The answer is of course another logical division into subfolders. The solution which was chosen at the end is to have folders for downloaded archive, decompressed solution, evaluation directory in which user solution is executed and then folders for temporary files and for results and generally files which should be uploaded back to fileserver with solution results. Of course there has to be hierarchy which separate folders from different workers on the same machines. That is why paths to directories are in format: ${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID} where default means default working directory of whole worker, folder is particular directory for some purpose (archives, evaluation...). Mentioned division of job directories proved to be flexible and detailed enough, everything is in logical units and where it is supposed to be which means that searching through this system should be easy. In addition if solutions of users have access only to evaluation directory then they do not have access to unnecessary files which is better for overall security of whole ReCodEx.
As we discovered above worker has job directories but users who are writing and managing job configurations do not know where they are (on some particular worker) and how they can be accessed and written into configuration. For this kind of task we have to introduce some kind of marks or signs which will represent particular folders. Marks or signs can have form of some kind of special strings which can be called variables. These variables then can be used everywhere where filesystems paths are used within configuration file. This will solve problem with specific worker environment and specific hierarchy of directories. Final form of variables is ${...} where triple dot is textual description. This format was used because of special dolar sign character which cannot be used within filesystem path, braces are there only to border textual description of variable.
As we discovered above worker has job directories but users who are writing and
managing job configurations do not know where they are (on some particular
worker) and how they can be accessed and written into configuration. For this
kind of task we have to introduce some kind of marks or signs which will
represent particular folders. Marks or signs can have form of some kind of
special strings which can be called variables. These variables then can be used
everywhere where filesystems paths are used within configuration file. This will
solve problem with specific worker environment and specific hierarchy of
directories. Final form of variables is ${...} where triple dot is textual
description. This format was used because of special dollar sign character which
cannot be used within filesystem path, braces are there only to border textual
description of variable.
#### Evaluation
@ -691,7 +703,7 @@ Interesting problem is with supplementary files (inputs, sample outputs). There
As described in fileserver section stored supplementary files have special
filenames which reflects hashes of their content. As such there are no
duplicates stored in fileserver. Worker can use feature too and caches these
files for some while and saves precious bandwith. This means there has to be
files for some while and saves precious bandwidth. This means there has to be
system which can download file, store it in cache and after some time of
inactivity delete it. Because there can be multiple worker instances on some
particular server it is not efficient to have this system in every worker on its
@ -712,22 +724,34 @@ system.
Cleaner as mentioned is simple script which is executed regularly as cron job. If there is caching system like it was introduced in paragraph above there are little possibilities how cleaner should be implemented. On various filesystems there is usually support for two particular timestamps, `last access time` and `last modification time`. Files in cache are once downloaded and then just copied, this means that last modification time is set only once on creation of file and last access time should be set every time on copy. This imply last access time is what is needed here. But last modification time is widely used by operating systems, on the other hand last access time is not by default. More on this subject can be found [here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime). For proper cleaner functionality filesystem which is used by worker for caching has to have last access time for files enabled.
Having cleaner as separated component and caching itself handled in worker is kind of blury and is not clearly observable that it works without any race conditions. The goal here is not to have system without races but to have system which can recover from them. Implementation of caching system is based upon atomic operations of underlying filesystem. Follows description of one possible robust implementation. First start with worker implementation:
Having cleaner as separated component and caching itself handled in worker is
kind of blurry and is not clearly observable that it works without any race
conditions. The goal here is not to have system without races but to have system
which can recover from them. Implementation of caching system is based upon
atomic operations of underlying filesystem. Follows description of one possible
robust implementation. First start with worker implementation:
- worker discovers fetch task which should download supplementary file
- worker takes name of file and tries to copy it from cache folder to its working folder
- if successful then last access time should be rewritten (by filesystem itself) and whole operation is done
- worker takes name of file and tries to copy it from cache folder to its
working folder
- if successful then last access time should be rewritten (by filesystem
itself) and whole operation is done
- if not successful then file has to be downloaded
- file is downloaded from fileserver to working folder
- downloaded file is then copied to cache
Previous implementation is only within worker, cleaner can anytime intervene and delete files. Implementation in cleaner follows:
Previous implementation is only within worker, cleaner can anytime intervene and
delete files. Implementation in cleaner follows:
- cleaner on its start stores current reference timestamp which will be used for comparision and load configuration values of caching folder and maximal file age
- there is a loop going through all files and even directories in specified cache folder
- cleaner on its start stores current reference timestamp which will be used for
comparison and load configuration values of caching folder and maximal file
age
- there is a loop going through all files and even directories in specified
cache folder
- last access time of file or folder is detected
- last access time is subtracted from reference timestamp into difference
- difference is compared against specified maximal file age, if difference is greater, file or folder is deleted
- difference is compared against specified maximal file age, if difference
is greater, file or folder is deleted
Previous description implies that there is gap between detection of last access time and deleting file within cleaner. In the gap there can be worker which will access file and the file is anyway deleted but this is fine, file is deleted but worker has it copied. Another problem can be with two workers downloading the same file, but this is also not a problem file is firstly downloaded to working folder and after that copied to cache. And even if something else unexpectedly fails and because of that fetch task will fail during execution even that should be fine. Because fetch tasks should have 'inner' task type which implies that fail in this task will stop all execution and job will be reassigned to another worker. It should be like the last salvation in case everything else goes wrong.
@ -749,7 +773,7 @@ Previous description implies that there is gap between detection of last access
Users want to view real time evaluation progress of their solution. It can be
easily done with established double-sided connection stream, but it is hard to
achive with web technologies. HTTP protocol works differently on separate
achieve with web technologies. HTTP protocol works differently on separate
requests basis with no longterm connection. However, there is widely used
technology to solve this problem, WebSocket protocol.
@ -761,7 +785,7 @@ surface for possible attacks. With this in mind, there are two possible options:
- make separate component for progress messages
Each of the two possibilities has some pros and cons. The first one is good
beacuse there is no additional component and API is already publicly visible. On
because there is no additional component and API is already publicly visible. On
the other side, working with WebSocket protocol from PHP is not much pleasant
(but it is possible) and embedding this functionality into API is not
extendable. The second approach is better for future changing the protocol or
@ -784,7 +808,7 @@ following picture.
![Message flow inside montior](https://raw.githubusercontent.com/ReCodEx/wiki/master/images/Monitor_arch.png)
The message channel inputing the monitor uses ZeroMQ as main message framework
used by backend. This decission keeps rest of backend avare of used
used by backend. This decision keeps rest of backend avare of used
communication protocol and related libraries. Output channel is WebSocket as a
protocol for sending messages to web browsers. In Python, there are several
WebSocket libraries. The most popular one is `websockets` in cooperation with
@ -792,7 +816,7 @@ WebSocket libraries. The most popular one is `websockets` in cooperation with
monitor component too. For ZeroMQ, there is `zmq` library with binding to
framework core in C++.
Incomming messages are cached for short period of time. Early testing shows,
Incoming messages are cached for short period of time. Early testing shows,
that backend can start sending progress messages sooner than client connects to
the monitor. To solve this, messages for each job are hold 5 minutes after
reception of last message. The client gets all already received messages at time
@ -816,7 +840,7 @@ client-server architecture. There are several options:
a standard.
- *HTTP protocol* -- The HTTP protocol is a state-less protocol implemented on
top of the TCP protocol. The communication between the client and server
consists of a requests sent by the client and reponses to these requests sent
consists of a requests sent by the client and responses to these requests sent
back by the sever. The client can send as many requests as needed and it may
ignore the responses from the server, but the server must respond only to the
requests of the client and it cannot initiate communication on its own.
@ -858,7 +882,8 @@ We considered several technologies which could be used:
Linux servers (ASP.NET using the .NET Core).
- JavaScript (Node.js) -- it is a quite new technology and it is being used to
create REST APIs lately. Applications running on Node.js are quite performant
and the number of open-source libraries avialble on the Internet is very huge.
and the number of open-source libraries available on the Internet is very
huge.
We chose PHP and Apache mainly because we were familiar with these technologies
and we were able to develop all the features we needed without learning to use a
@ -879,7 +904,7 @@ framework is very common in the Czech Republic -- its main developer is a
well-known Czech programmer David Grudl -- and we were already familiar with the
patterns used in this framework (e.g., dependency injection, authentication,
routing). There is a good extension for the Nette framework which makes usage of
Doctrine 2 very straighforward.
Doctrine 2 very straightforward.
@todo: what database can be used, how it is mapped and used within code

Loading…
Cancel
Save