You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

159 KiB

Raw Blame History Unescape Escape

# Introduction

Generally, there are many different ways and opinions on how to teach people something new. However, most people agree that a hands-on experience is one of the best ways to make the human brain remember a new skill. Learning must be entertaining and interactive, with fast and frequent feedback. Some kinds of knowledge are more suitable for this practical type of learning than others, and fortunately, programming is one of them.

University education system is one of the areas where this knowledge can be applied. In computer programming, there are several requirements a program should satisfy, such as the code being syntactically correct, efficient and easy to read, maintain and extend.

Checking programs written by students takes time and requires a lot of mechanical, repetitive work -- reviewing source codes, compiling them and running them through testing scenarios. It is therefore desirable to automate as much of this work as possible. The first idea of an automatic evaluation system comes from Stanford University professors in 1965. They implemented a system which evaluated code in Algol submitted on punch cards. In following years, many similar products were written.

In today's world, properties like correctness and efficiency can be tested automatically to a large extent. This fact should be exploited to help teachers save time for tasks such as examining bad design, bad coding habits and logical mistakes, which are difficult to perform automatically.

There are two basic ways of automatically evaluating code -- statically (checking the source code without running it; safe, but not very precise) or dynamically (running the code on test inputs and checking the correctness of outputs ones; provides good real world experience, but requires extensive security measures).

This project focuses on the machine-controlled part of source code evaluation. First, general concepts of grading systems are observed and problems of the software previously used at Charles University in Prague are briefly discussed. Then new requirements are specified and projects with similar functionality are examined. With acquired knowledge from such projects in production, we set up goals for the new evaluation system, designed the architecture and implemented a fully operational solution based on dynamic evaluation. The system is now ready for production testing at the university.

Assignment

The major goal of this project is to create a grading application that will be used for programming classes at the Faculty of Mathematics and Physics of the Charles University in Prague. However, the application should be designed in a modular fashion to be easily extended or even modified to make other ways of usage possible.

The system should be capable of dynamic analysis of submitted source codes. This consists of following basic steps:

compile the code and check for compilation errors
run compiled binary in a sandbox with predefined inputs
check constraints on used amount of memory and time
compare program outputs with predefined values
award the code with a numeric score

The whole system is intended to help both teachers (supervisors) and students. To achieve this, it is crucial to keep in mind the typical usage scenarios of the system and to try to make these tasks as simple as possible. To fulfil this task, the project has a great starting point -- there is an old grading system currently used at the university (CodEx), so its flaws and weaknesses can be addressed. Furthermore, many teachers desire to use and test the new system and they are willing to consult ideas or problems during development with us.

Current system

The grading solution currently used at the Faculty of Mathematics and Physics of the Charles University in Prague was implemented in 2006 by a group of students. It is called CodEx -- The Code Examiner and it has been used with some improvements since then. The original plan was to use the system only for basic programming courses, but there was a demand for adapting it for many different subjects.

CodEx is based on dynamic analysis. It features a web-based interface, where supervisors can assign exercises to their students and the students have a time window to submit their solutions. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: correctness of the output, time and memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python and Haskell.

The system has a database of users. Each user is assigned a role, which corresponds to his/her privileges. There are user groups reflecting the structure of lectured courses.

A database of exercises (algorithmic problems) is another part of the project. Each exercise consists of a text describing the problem (optionally in two language variants -- Czech and English), an evaluation configuration (machine-readable instructions on how to evaluate solutions to the exercise) and a set of inputs and reference outputs. Exercises are created by instructed privileged users. Assigning an exercise to a group means choosing one of the available exercises and specifying additional properties: a deadline (optionally a second deadline), a maximum amount of points, a configuration for calculating the score, a maximum number of submissions, and a list of supported runtime environments (e.g. programming languages) including specific time and memory limits for each one.

Typical use cases for supported user roles are following:

student
- create new user account via registration form
- join a group
- get assignments in group
- submit solution to assignment -- upload one source file and trigger evaluation process
- view solution results -- which parts succeeded and failed, total number of acquired points, bonus points
supervisor
- create exercise -- create description text and evaluation configuration (for each programming environment), upload testing inputs and outputs
- assign exercise to group -- choose exercise and set deadlines, number of allowed submissions, weights of all testing cases and amount of points for correct solutions
- modify assignment
- view all results in group
- check automatic solution grading -- view submitted source and optionally set bonus points
administrator
- create groups
- alter user privileges -- make supervisor accounts
- check system logs, upgrades and other management

Exercise evaluation chain

The most important part of the system is evaluation of solutions submitted by students. Concepts of consecutive steps from source code to final results is described in more detail below to give readers solid overview of what have to happen during evaluation process.

First thing students have to do is to submit their solutions through web user interface. The system checks assignment invariants (deadlines, count of submissions, ...) and stores the submitted code. The runtime environment is automatically detected based on input file extension and a suitable evaluation configuration variant is chosen (one exercise can have multiple variants, for example C and Java languages). This exercise configuration is then used for taking care of evaluation process.

There is a pool of uniform worker engines dedicated to evaluation jobs. Incoming jobs are kept in a queue until a free worker picks them. Worker is capable of sequential evaluation of jobs, one at a time.

The worker obtains the solution and its evaluation configuration, parses it and starts executing the contained instructions. It is crucial to keep the worker computer secure and stable, so a sandboxed environment is used for dealing with unknown source code. When the execution is finished, results are saved and the submitter is notified.

The output of the worker contains data about the evaluation, such as time and memory spent on running the program for each test input and whether its output was correct. The system then calculates a numeric score from this data, which is presented to the student. If the solution is wrong (incorrect output, uses too much memory,..), error messages are also displayed to the submitter.

Weaknesses

Current system is old, but robust. There were no major security incidents during its production usage. However, from today's perspective there are several drawbacks. The main ones are:

web interface -- The web interface is simple and fully functional. But rapid development in web technologies opens new horizons of how web interface can be made.
web API -- CodEx offers a very limited XML API based on outdated technologies that is not sufficient for users who would like to create custom interfaces such as a command line tool or mobile application.
sandboxing -- MO-Eval sandbox is based on principle of monitoring system calls and blocking the bad ones. This can be easily done for single-threaded applications, but proves difficult with multi-threaded ones. In present day, parallelism is a very important area of computing, so there is requirement to test multi-threaded applications too.
instances -- Different ways of CodEx usage scenarios requires separate instances (Programming I and II, Java, C#, etc.). This configuration is not user friendly (students have to register in each instance separately) and burdens administrators with unnecessary work. CodEx architecture does not allow sharing hardware between instances, which results in an inefficient use of hardware for evaluation.
task extensibility -- There is a need to test and evaluate complicated programs for classes such as Parallel programming or Compiler principles, which have a more difficult evaluation chain than simple compilation/execution/evaluation provided by CodEx.

Requirements

There are many different formal requirements for the system. Some of them are necessary for any system for source code evaluation, some of them are specific for university deployment and some of them arose during the ten year long lifetime of the old system. There are not many ways to improve CodEx experience from the perspective of a student, but a lot of feature requests come from administrators and supervisors. The ideas were gathered mostly from our personal experience with the system and from meetings with faculty staff involved with the current system.

In general, CodEx features should be preserved, so only differences are presented here. For clear arrangement all the requirements and wishes are presented grouped by categories.

System features

System features represents directly accessible functionality to users of the system. They describe the evaluation system in general and also university addons (mostly administrative features).

Requirements of the users

group hierarchy -- creating an arbitrarily nested tree structure should be supported to allow keeping related groups together, such as in the example below. A group hierarchy also allows archiving data from past courses.

  Summer term 2016
  |-- Language C# and .NET platform
  |   |-- Labs Monday 10:30
  |   `-- Labs Thursday 9:00
  |-- Programming I
  |   |-- Labs Monday 14:00
   ...

a database of exercises -- teachers should be able to create exercises including textual description, sample inputs and correct reference outputs (for example "sum all numbers from given file and write the result to the standard output") and to browse this database
customizable grading system -- teachers need to specify the way of computation of the final score, which will be awarded to the student's submissions depending on their quality
viewing student details -- teachers should be able to view the details of their students (members of their groups), including all submitted solutions
awarding additional points -- adding (or subtracting) points from the final score of a submission by a supervisor must be supported
marking a solution as accepted -- the system should allow marking one particular solution as accepted (used for grading the assignment) by the supervisor
solution resubmission -- teachers should be able edit student's solutions and privately resubmit them, optionally saving all results (including temporary ones); this feature can be used to quickly fix errors in the solution
localization -- all texts (UI and exercises) should be translatable
formatted exercise texts -- Markdown or another lightweight markup language should be supported for formatting exercise texts
exercise tags -- the system should support tagging exercises searching by these tags
comments -- adding both private and public comments to exercises, tests and solutions should be supported
plagiarism detection

Administrative requirements

pluggable user interface -- the system should allow using an alternative user interface, such as a command line client; implementation of such clients should be as straightforward as possible
privilege separation -- there should be at least two roles -- student and supervisor. Cases when a student of a course is also a teacher of another lab must be handled correctly
alternate authentication methods -- logging in through a university authentication system (e.g. LDAP) and potentially other services, such as OAuth, should be supported
querying SIS -- loading user data from the university information system should be supported
sandboxing -- there should be a safe environment in which the students' solutions are executed to prevent system failures due to malicious code being submitted; the sandboxed environment should have the least possible impact on measurement results (most importantly on measured times)
heterogeneous worker pool -- there must be support for submission evaluation in multiple programming environments in a single installation to avoid unacceptable workload for the administrator (maintaining a separate installation for every course) and high hardware occupation
advanced low-level evaluation flow configuration with high-level abstraction layer for ordinary configuration cases; the configuration should be able to express more complicated flows than just compiling a source code and running the program against test inputs -- for example, some exercises need to build the source code with a tool, run some tests, then run the program through another tool and perform additional tests
use of modern technologies with state-of-the-art compilers

Non-functional requirements

Non-functional requirements are requirements of technical character with no direct mapping to visible parts of the system. In an ideal world, users should not know about these features if they work properly, but would be at least annoyed if they did not.

no installation -- the primary user interface of the system must be accessible on users' computers without the need to install any additional software
performance -- the system must be ready for at least hundreds of students and tens of supervisors using it at once
automated deployment -- all of the components of the system must be easy to deploy in an automated fashion
open source licensing -- the source code should be released under a permissive licence allowing further development; this also applies to used libraries and frameworks
multi-platform worker -- worker machines running Linux, Windows and potentially other operating systems must be supported

Conclusion

The survey shows that there are a lot of different requirements and wishes for the new system. When the system is ready, it is likely that there will be new ideas of how to use the system and thus the system must be designed to be easily extendable, so that these new ideas can be easily implemented, either by us or community members. This also means that widely used programming languages and techniques should be used, so that users can quickly understand the code and make changes.

To find out the current state in the field of automatic grading systems, we did a short market survey on the field of automatic grading systems at universities, programming contests, and possibly other places where similar tools are available.

This is not a complete list of available evaluators, but only a few projects which are used these days and can be an inspiration for our project. Each project from the list has a brief description and some key features mentioned.

Progtest

Progtest is private project of FIT ČVUT in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes. There are several bonus points and penalties and also a few hints what is failing in the submitted solution. It is very strict on source code quality, for example -pedantic option of GCC, Valgrind for memory leaks or array boundaries checks via mudflap library.

Codility

Codility is a web based solution primary targeted to company recruiters. It is a commercial product available as a SaaS and it supports 16 programming languages. The UI of Codility is opensource, the rest of source code is not available. One interesting feature is 'task timeline' -- captured progress of writing code for each user.

CMS

CMS is an opensource distributed system for running and organizing programming contests. It is written in Python and contains several modules. CMS supports C/C++, Pascal, Python, PHP, and Java programming languages. PostgreSQL is a single point of failure, all modules heavily depend on the database connection. Task evaluation can be only a three step pipeline -- compilation, execution, evaluation. Execution is performed in Isolate, sandbox written by the consultant of our project, Mgr. Martin Mareš, Ph.D.

MOE

MOE is a grading system written in Shell scripts, C and Python. It does not provide a default GUI interface, all actions have to be performed from command line. The system does not evaluate submissions in real time, results are computed in batch mode after exercise deadline, using Isolate for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but the system is generally obsolete.

Kattis

Kattis is another SaaS solution. It provides a clean and functional web UI, but the rest of the application is too simple. A nice feature is the usage of a standardized format for exercises. Kattis is primarily used by programming contest organizers, company recruiters and also some universities.

Analysis

None of the existing projects we came across fulfills all the requested features for the new system. There is no grading system which supports arbitrary-length evaluation pipeline, so we have to implement this feature ourselves, cautiously treading through unexplored fields. Also, no existing solution is extensible enough to be used as a base for the new system. After considering all these facts, it is clear that a new system has to be written from scratch. This implies that only a subset of all the features will be implemented in the first version, the others coming in the following releases.

Gathered features are categorized based on priorities for the whole system. The highest priority has main functionality similar to current CodEx. It is a base line to be useful in production environment, but a new design allows to easily develop further. On top of that, most of ideas from faculty staff belongs to second priority bucket, which will be implemented as part of the project. The most complicated tasks from this category are advanced low-level evaluation configuration format, using modern tools, connecting to a university systems and merging separate system instances into single one. Other tasks are scheduled for next releases after successful project defense. Namely, these are high-level exercise evaluation configuration with user-friendly interface for common exercise types, SIS integration (when some API will be available from their side) and command-line submit tool. Plagiarism detection is not likely to be part of any release in near future unless someone other makes the engine. The detection problem is too hard to be solved as part of this project.

We named the new project ReCodEx -- ReCodEx Code Examiner. The name should point to the old CodEx, but also reflect the new approach to solve issues. Re as part of the name means redesigned, rewritten, renewed, or restarted.

At this point there is a clear idea how the new system will be used and what are the major enhancements for future releases. With this in mind, the overall architecture can be sketched. To sum up, here is a list of key features of the new system. They come from previous research of current system's drawbacks, reasonable wishes of university users and our major design choices.

modern HTML5 web frontend written in JavaScript using a suitable framework
REST API communicating with database, evaluation backend and a file server
evaluation backend implemented as a distributed system on top of a message queue framework with master-worker architecture
multi-platform worker supporting Linux and Windows environment (latter without sandbox, no general purpose suitable tool available yet)
evaluation procedure configured in a human readable text file, compound of small tasks connected into an arbitrary oriented acyclic graph

The reasons supporting these decisions are explained in the rest of analysis chapter. Also a lot of smaller design choices are mentioned including possible options, what is picked to implement and why. But first, discuss basic concepts of the system.

Basic concepts

The system is designed as a web application. The requirements say that the user interface must be accessible from students' computers without the need to install additional software. This immediately implies that users have to be connected to the internet, so it is used as communication medium. Today, there are two main ways of designing graphical user interface -- as a native application or a web page. Creating a nice and multi-platform application with graphical interface is almost impossible because of the large number of different environments. Also, these applications often requires installation or at least downloading its files (sources or binaries). On the other hand, distributing a web application is easier, because every personal computer has an internet browser installed. Also, browsers support an (mostly) unified and standardized environment of HTML5 and JavaScript. CodEx is also a web application and everybody seems satisfied with it. There are other communicating channels most programmers have available, such as e-mail or git, but they are inappropriate for designing user interfaces on top of them.

The application interacts with users. From the project assignment it is clear, that the system has to keep personalized data about users and adapt presented content according to this knowledge. User data cannot be publicly visible, so that implies necessity of user authentication. The application also has to support multiple ways of authentication (university authentication systems, a company LDAP server, an OAuth server...) and permit adding more security measures in the future, such as two-factor authentication.

User data also includes a privilege level. From the assignment it is required to have at least two roles, student and supervisor. However, it is wise to add administrator level, which takes care of the system as a whole and is responsible for core setup, monitoring, updates and so on. Student role has the least power, basically can just view assignments and submit solutions. Supervisors have more authority, so they can create exercises and assignments, view results of students etc. From the university organization, one possible level could be introduced, course guarantor. However, from real experience all duties related with lecturing of labs are already associated with supervisors, so this role seems not so useful. In addition, no one requested more than three level privilege scheme.

School labs are lessons for some students lead by supervisors. Students have the same homework and supervisors are evaluating its solutions. This organization has to be carried into the new system. Counterpart to real labs are virtual groups. This concept was already discussed in previous chapter including need for hierarchical structure of groups. Right for attending labs has only a person, who is student of the university and is recorded in university information system. To allow restriction of group members in ReCodEx, there two type of groups -- public and private. Public groups are open for every registered users, but to become a member of private group one of its supervisors have to add that user. This could be done automatically at beginning of the term with data from information system, but unfortunately there is no such API yet. However, creating this API is now considered by university leadership. Another just as good solution for restricting membership of a group is to allow anyone join the group with supplementary confirmation of supervisors. It has no additional benefits, so approach with public and private groups is implemented.

Supervisors using CodEx in their labs usually set minimum amount of points required to get a credit. These points can be get by solving assigned exercises. To visually show users if they already have enough points, ReCodEx groups supports setting this limit. There are two equal ways how to set a limit -- absolute value or relative value to maximum. The latter way seems nicer, so it is implemented. The relative value is set in percents and is called threshold.

Our university has a few partner grammar schools. There were an idea, that they could use CodEx for teaching informatics classes. To make the setup simple for them, all the software and hardware would be provided by the university as a completely ready-to-use remote service. However, CodEx were not prepared to support this kind of usage and no one had time to manage a separate instance. With ReCodEx it is possible to offer hosted environment as a service to other subjects. The concept we figured out is based on user and group separation inside the system. There are multiple instances in the system, which means unit of separation. Each instance has own set of users and groups, exercises can be optionally shared. Evaluation backend is common for all instances. To keep track of active instances and paying customers, each instance must have a valid licence to allow users submit their solutions. licence is granted for defined period of time and can be revoked in advance if the subject do not keep approved terms and conditions.

The main work for the system is to evaluate programming exercises. The exercise is quite similar to homework assignment during school labs. When a homework is assigned, two things are important to know for users:

description of the problem
metadata -- when and whom to submit solutions, grading scale, penalties, etc.

To reflect this idea teachers and students are already familiar with, we decided to keep separation between problem itself (exercise) and its assignment. Exercise only describes one problem and provides testing data with description of how to evaluate it. In fact, it is template for assignments. Assignment then contains data from its exercise and additional metadata, which can be different for every assignment of the same exercise. This separation is natural for all users, in CodEx it is implemented in similar way and no other considerable solution was found.

Evaluation unit executed by ReCodEx

One of the bigger requests for the new system is to support a complex configuration of execution pipeline. The idea comes from lecturers of Compiler principles class who want to migrate their semi-manual evaluation process to CodEx. Unfortunately, CodEx is not capable of such complicated exercise setup. None of evaluation systems we found can handle such task, so design from scratch is needed.

There are two main approaches to design a complex execution configuration. It can be composed of small amount of relatively big components or much more small tasks. Big components are easy to write and whole configuration is reasonably small. The components are designed for current problems, so it is not scalable enough for pleasant future usage. This can be solved by introducing small set of single-purposed tasks which can be composed together. The whole configuration is then quite bigger, but with great adaptation ability for new conditions and also less amount of work programming them. For better user experience, configuration generators for some common cases can be introduced.

ReCodEx target is to be continuously developed and used for many years, so the smaller tasks are the right choice. Observation of CodEx system shows that only a few tasks are needed. In extreme case, only one task is enough -- execute a binary. However, for better portability of configurations along different systems it is better to implement reasonable subset of operations directly without calling system provided binaries. These operations are copy file, create new directory, extract archive and so on, altogether called internal tasks. Another benefit from custom implementation of these tasks is guarantied safety, so no sandbox needs to be used as in external tasks case.

For a job evaluation, the tasks needs to be executed sequentially in a specified order. The idea of running independent tasks in parallel is bad because exact time measurement needs controlled environment on target computer with minimization of interrupts by other processes. It would be possible to run tasks which does not need exact time measuremet in parallel, but in this case a synchronization mechanism has to be developed to exclude paralellism for measured tasks. Usually, there are about four times more unmeasured tasks than tasks with time measurement, but measured tasks tends to be much longer. With Amdahl's law in mind, the parallelism seems not to provide a huge benefit in overall execution speed and brings troubles with synchronization. However, it there will be speed issues, this approach could be reconsiderred.

It seems that connecting tasks into directed acyclic graph (DAG) can handle all possible problem cases. None of the authors, supervisors and involved faculty staff can think of a problem that cannot be decomposed into tasks connected in a DAG. The goal of evaluation is to satisfy as many tasks as possible. During execution there are sometimes multiple choices of next task. To control that, each task can have a priority, which is used as a secondary ordering criterion. For better understanding, here is a small example.

The job root task is imaginary single starting point of each job. When the CompileA task is finished, the RunAA task is started (or RunAB, but should be deterministic by position in configuration file -- tasks stated earlier should be executed earlier). The task priorities guaranties, that after CompileA task all dependent tasks are executed before CompileB task (they have higher priority number). To sum up, connection of tasks represents dependencies and priorities can be used to order unrelated tasks and with this provide a total ordering of them. For well written jobs the priorities may not be so useful, but they can help control execution order for example to avoid situation, where each test of the job generates large temporary file and there is a one valid execution order which keeps all the temporary files for later processing at one time. Better approach is to finish execution of one test, clean the big temporary file and proceed with following test. If there is an ambiguity in task ordering at this point, they are executed in order of input task configuration.

The total linear ordering of tasks can be done easier with just executing them in order of input configuration. But this structure cannot handle well cases, when a task fails. There is not a easy and nice way how to tell which task should be executed next. However, this issue can be solved with graph structured dependencies of the tasks. In graph structure, it is clear that all dependent tasks has to be skipped and continue execution with a non related task. This is the main reason, why the tasks are connected in a DAG.

For grading there are several important tasks. First, tasks executing submitted code need to be checked for time and memory limits. Second, outputs of judging tasks need to be checked for correctness (represented by return value or by data on standard output) and should not fail. This division can be transparent for backend, each task is executed the same way. But frontend must know which tasks from whole job are important and what is their kind. It is reasonable, to keep this piece of information alongside the tasks in job configuration, so each task can have a label about its purpose. Unlabeled tasks have an internal type inner. There are four categories of tasks:

initiation -- setting up the environment, compiling code, etc.; for users failure means error in their sources which are not compatible with running it with examination data
execution -- running the user code with examination data, must not exceed time and memory limits; for users failure means wrong design, slow data structures, etc.
evaluation -- comparing user and examination outputs; for user failure means that the program does not compute the right results
inner -- no special meaning for frontend, technical tasks for fetching and copying files, creating directories, etc.

Each job is composed of multiple tasks of these types which are semantically grouped into tests. A test can represent one set of examination data for user code. To mark the grouping, another task label can be used. Each test must have exactly one evaluation task (to show success or failure to users) and arbitrary number of tasks with other types.

Evaluation progress state

Users surely want to know a progress state of their submitted solution. The very first idea would be to report state based on done messages from compilation, execution and evaluation as a lot of evaluation systems are already providing. However the ReCodEx have more advanced execution pipeline where there can be more compilations or more executions per test and also other technical tasks controlling the job execution flow. The users do not know about these technical details and data from this tasks may confuse them.

A solution is to show users only percentual completion of the job as a plain progress bar without additional information about task types. This solution works well for all of the jobs and is very user friendly. To make the output more interesting, there is a database of random kind-of-funny statements and a random new one is displayed every time a task is completed.

Results of evaluation

There are lot of things which deserves discussion concerning results of evaluation, how they should be displayed, what should be visible or not and also what kind of reward for users solutions should be chosen.

Evaluation outputs

At first let us focus on all kinds of outputs from executed programs within job. Out of discussion is that supervisors should be able to view almost all outputs from solutions if they choose them to be visible and recorded. This feature is critical in debugging either whole exercises or users solutions. Supervisor should have a choice to turn on preserving the data while the default behaviour is to discard them to keep a file base around whole ReCodEx system in sensible limits.

More interesting question is if students should see the logs from execution of their solution. Usual approach is to keep these information private because of possibility of leaking input data. This may lead students to hack their solutions to pass just the ReCodEx testing cases instead of properly solving the assigned problem. Martin Mareš strongly recommended to use this strategy of hiding sensitive data too, so ReCodEx does. One exception are compilation outputs which can help students a lot during troubleshooting. These logs shall be visible unless the supervisor decides otherwise. Note, that due to lack of frontend developers, this feature was not implemented in the very first release of ReCodEx, but will be definitely available in the future.

Scoring and assigning points

The overall concept of grading solutions was presented earlier. To briefly remind that, backend returns only exact measured values (used time and memory, return code of the judging task, ...) and on top of that one value is computed. The way of this computation can be very different across supervisors, so it has to be easily extendable. The best way is to provide interface, which can be implemented and any sort of magic can return the final value.

We found out several computational possibilities. There is basic arithmetic, weighted arithmetic, geometric and harmonic mean of results of each test (the result is logical value succeeded/failed, optionally with weight), some kind of interpolation of used amount of time for each test, the same with used memory amount and surely many others. To keep the project simple, we decided to design appropriate interface and implement only weighted arithmetic mean computation, which is used in about 90% of all assignments. Of course, different scheme can be chosen for every assignment and also can be configured -- for example specifying test weights for implemented weighted arithmetic mean. Advanced ways of computation can be implemented on demand when there is a real demand for them.

To avoid assigning points for insufficient solutions (like only printing "File error" which is the valid answer in two tests), a minimal point threshold can be specified. It the solution is to get less points than specified, it will get zero points instead. This functionality can be embedded into grading computation algoritm itself, but it would have to be present in each implementation separately, which is a bit ugly. So, this feature is separated from point computation.

Automatic grading cannot reflect all aspects of submitted code. For example, structuring the code, number and quality of comments and so on. To allow supervisors bring these manually checked things into grading, there is a concept of bonus points. They can be positive or negative. Generally the solution with the most assigned points is marked for grading that particular assignment. However, if supervisor is not satisfied with student solution (really bad code, cheating, ...) he/she assigns the student negative bonus points. To prevent overriding this decision by system choosing another solution with more points or even student submitting the same code again which evaluates to more points, supervisor can mark a particular solution as marked and used for grading instead of solution with the most points.

Persistence

Previous parts of analysis show that the system has to keep some state. This could be user settings, group membership, evaluated assignments and so on. The data have to be kept across restart, so persistence is important decision factor. There are several ways how to save structured data:

plain files
NoSQL database
relational database

Another important factor is amount and size of stored data. Our guess is about 1000 users, 100 exercises, 200 assignments per year and 200000 unique solutions per year. The data are mostly structured and there are a lot of them with the same format. For example, there is a thousand of users and each one has the same values -- name, email, age, etc. These kind of data are relatively small, name and email are short strings, age is an integer. Considering this, relational databases or formatted plain files (CSV for example) fits best for them. However, the data often have to support find operation, so they have to be sorted and allow random access for resolving cross references. Also, addition a deletion of entries should take reasonable time (at most logarithmic time complexity to number of saved values). This practically excludes plain files, so relational database is used instead.

On the other hand, there are some data with no such great structure and much larger size. These can be evaluation logs, sample input files for exercises or submitted sources by students. Saving this kind of data into relational database is not suitable, but it is better to keep them as ordinary files or store them into some kind of NoSQL database. Since they are already files and does not need to be backed up in multiple copies, it is easier to keep them as ordinary files in filesystem. Also, this solution is more lightweight and does not require additional dependencies on third-party software. File can be identified using its filesystem path or unique index stored as value in relational database. Both approaches are equally good, final decision depends on actual case.

Structure of the project

The ReCodEx project is divided into two logical parts – the backend and the frontend – which interact which each other and which cover the whole area of code examination. Both of these logical parts are independent of each other in the sense of being installed on separate machines at different locations and that one of the parts can be replaced with a different implementation and as long as the communication protocols are preserved, the system will continue working as expected.

Backend is the part which is responsible solely for the process of evaluation a solution of an exercise. Each evaluation of a solution is referred to as a job. For each job, the system expects a configuration document of the job, supplementary files for the exercise (e.g., test inputs, expected outputs, predefined header files), and the solution of the exercise (typically source codes created by a student). There might be some specific requirements for the job, such as a specific runtime environment, specific version of a compiler or the job must be evaluated on a processor with a specific number of cores. The backend infrastructure decides whether it will accept a job or decline it based on the specified requirements. In case it accepts the job, it will be placed in a queue and it will be processed as soon as possible. The backend publishes the progress of processing of the queued jobs and the results of the evaluations can be queried after the job processing is finished. The backend produces a log of the evaluation and scores the solution based on the job configuration document.

To make the backend scalable, there are two necessary components -- the one which will execute jobs and the other which will distribute jobs to the instances of the first one. This ensures scalability in manner of parallel execution of numerous jobs which is exactly what is needed. Implementation of these services are called broker and worker, first one handles distribution, latter execution. These components should be enough to fulfill all above said, but for the sake of simplicity and better communication gateways with frontend two other components were added, fileserver and monitor. Fileserver is simple component whose purpose is to store files which are exchanged between frontend and backend. Monitor is also quite simple service which is able to serve job progress state from worker to web application. These two additional services are on the edge of frontend and backend (like gateways) but logically they are more connected with backend, so it is considered they belong there.

Frontend on the other hand is responsible for the communication with the users and provides them a convenient access to the backend infrastructure. The frontend manages user accounts and gathers them into units called groups. There is a database of exercises which can be assigned to the groups and the users of these groups can submit their solutions for these assignments. The frontend will initiate evaluation of these solutions by the backend and it will store the results afterwards. The results will be visible to authorized users and the results will be awarded with points according to the score given by the backend in the evaluation process. The supervisors of the groups can edit the parameters of the assignments, review the solutions and the evaluations in detail and award the solutions with bonus points (both positive and negative) and discuss about the solution with the author of the solution. Some of the users can be entitled to create new exercises and extend the database of exercises which can be assigned to the groups later on.

There are two main purposes of frontend -- holding the state of whole system (database of users, exercises, solutions, points, etc.) and presenting the state to users through some kind of an user interface (e.g., a web application, mobile application, or a command-line tool). According to contemporary trends in development of frontend parts of applications, we decided to split the frontend in two logical parts -- a server side and a client side. The server side is responsible for managing the state and the client side gives instructions to the server side based on the inputs from the user. This decoupling gives us the ability to create multiple client side tools which may address different needs of the users.

The frontend developed as part of this project is a web application created with the needs of the Faculty of Mathematics and Physics of the Charles university in Prague in mind. The users are the students and their teachers, groups correspond to the different courses, the teachers are the supervisors of these groups. We believe that this model is applicable to the needs of other universities, schools, and IT companies, which can use the same system for their needs. It is also possible to develop their own frontend with their own user management system for their specific needs and use the possibilities of the backend without any changes, as was mentioned in the previous paragraphs.

One possible configuration of ReCodEx system is illustrated on following picture, where there is one shared backend with three workers and two separate instances of whole frontend. This configuration may be suitable for MFF UK -- basic programming course and KSP competition. But maybe even sharing web API and fileserver with only custom instances of client (web app or own implementation) is more likely to be used. Note, that connections between components are not fully accurate.

In the latter parts of the documentation, both of the backend and frontend parts will be introduced separately and covered in more detail. The communication protocol between these two logical parts will be described as well.

Implementation analysis

Some of the most important implementation problems or interesting observations will be discussed in this chapter.

Communication between the backend components

Overall design of the project is discussed above. There are bunch of components with their own responsibility. Important thing to design is communication of these components. To choose a suitable protocol, there are some additional requirements that should be met:

reliability -- if a message is sent between components, the protocol has to ensure that it is received by target component
working over IP protocol
multi-platform and multi-language usage

TCP/IP protocol meets these conditions, however it is quite low level and working with it usually requires working with platform dependent non-object API. Often way to reflect these reproaches is to use some framework which provides better abstraction and more suitable API. We decided to go this way, so the following options are considered:

CORBA (or some other form of RPC) -- CORBA is a well known framework for remote procedure calls. There are multiple implementations for almost every known programming language. It fits nicely into object oriented programming environment.
RabbitMQ -- RabbitMQ is a messaging framework written in Erlang. It features a message broker, to which nodes connect and declare the message queues they work with. It is also capable of routing requests, which could be a useful feature for job load-balancing. Bindings exist for a large number of languages and there is a large community supporting the project.
ZeroMQ -- ZeroMQ is another messaging framework, which is different from RabbitMQ and others (such as ActiveMQ) because it features a "brokerless design". This means there is no need to launch a message broker service to which clients have to connect -- ZeroMQ based clients are capable of communicating directly. However, it only provides an interface for passing messages (basically vectors of 255B strings) and any additional features such as load balancing or acknowledgement schemes have to be implemented on top of this. The ZeroMQ library is written in C++ with a huge number of bindings.

CORBA is a large framework that would satisfy all our needs, but we are aiming towards a more loosely-coupled system, and asynchronous messaging seems better for this approach than RPC. Moreover, we rarely need to receive replies to our requests immediately.

RabbitMQ seems well suited for many use cases, but implementing a job routing mechanism between heterogenous workers would be complicated -- we would probably have to create a separate load balancing service, which cancels the advantage of a message broker already being provided by the framework. It is also written in Erlang, which nobody from our team understands.

ZeroMQ is the best option for us, even with the drawback of having to implement a load balancer ourselves (which could also be seen as a benefit and there is a notable chance we would have to do the same with RabbitMQ). It also gives us complete control over the transmitted messages and communication patterns. However, all of the three options would have been possible to use.

Frontend - backend communication

Our choices when considering how clients will communicate with the backend has to stem from the fact that ReCodEx should primarily be a web application. This rules out ZeroMQ -- while it is very useful for asynchronous communication between backend components, it is practically impossible to use it from a web browser. There are several other options:

TCP sockets -- TCP sockets give a reliable means of a full-duplex communication. All major operating systems support this protocol and there are libraries which simplify the implementation. On the other side, it is not possible to initiate a TCP socket from a web browser.
WebSockets -- The WebSocket standard is built on top of TCP. It enables a web browser to connect to a server over a TCP socket. WebSockets are implemented in recent versions of all modern web browsers and there are libraries for several programming languages like Python or JavaScript (running in Node.js). Encryption of the communication over a WebSocket is supported as a standard.
HTTP protocol -- The HTTP protocol is a state-less protocol implemented on top of the TCP protocol. The communication between the client and server consists of a requests sent by the client and responses to these requests sent back by the sever. The client can send as many requests as needed and it may ignore the responses from the server, but the server must respond only to the requests of the client and it cannot initiate communication on its own. End-to-end encryption can be achieved easily using SSL (HTTPS).

We chose the HTTP(S) protocol because of the simple implementation in all sorts of operating systems and runtime environments on both the client and the server side.

The API of the server should expose basic CRUD (Create, Read, Update, Delete) operations. There are some options on what kind of messages to send over the HTTP:

SOAP -- a protocol for exchanging XML messages. It is very robust and complex.
REST -- is a stateless architecture style, not a protocol or a technology. It relies on HTTP (but not necessarily) and its method verbs (e.g., GET, POST, PUT, DELETE). It can fully implement the CRUD operations.

Even though there are some other technologies we chose the REST style over the HTTP protocol. It is widely used, there are many tools available for development and testing, and it is understood by programmers so it should be easy for a new developer with some experience in client-side applications to get to know with the ReCodEx API and develop a client application.

To sum up, chosen ways of communication inside the ReCodEx system are captured in the following image. Red connections are through ZeroMQ sockets, blue are through WebSockets and green are through HTTP(S).

Broker

The broker is responsible for keeping track of available workers and distributing jobs that it receives from the frontend between them.

Worker management

It is intended for the broker to be a fixed part of the backend infrastructure to which workers connect at will. Thanks to this design, workers can be added and removed when necessary (and possibly in an automated fashion), without changing the configuration of the broker. An alternative solution would be configuring a list of workers before startup, thus making them passive in the communication (in the sense that they just wait for incoming jobs instead of connecting to the broker). However, this approach comes with a notable administration overhead -- in addition to starting a worker, the administrator would have to update the worker list.

Worker management must also take into account the possibility of worker disconnection, either because of a network or software failure (or termination). A common way to detect such events in distributed systems is to periodically send short messages to other nodes and expect a response. When these messages stop arriving, we presume that the other node encountered a failure. Both the broker and workers can be made responsible for initiating these exchanges and it seems that there are no differences stemming from this choice. We decided that the workers will be the active party that initiates the exchange.

Scheduling

Jobs should be scheduled in a way that ensures that they will be processed without unnecessary waiting. This depends on the fairness of the scheduling algorithm (no worker machine should be overloaded).

The design of such scheduling algorithm is complicated by the requirements on the diversity of workers -- they can differ in operating systems, available software, computing power and many other aspects.

We decided to keep the details of connected workers hidden from the frontend, which should lead to a better separation of responsibilities and flexibility. Therefore, the frontend needs a way of communicating its requirements on the machine that processes a job without knowing anything about the available workers. A key-value structure is suitable for representing such requirements.

With respect to these constraints, and because the analysis and design of a more sophisticated solution was declared out of scope of our project assignment, a rather simple scheduling algorithm was chosen. The broker shall maintain a queue of available workers. When assigning a job, it traverses this queue and chooses the first machine that matches the requirements of the job. This machine is then moved to the end of the queue.

Presented algorithm results in a simple round-robin load balancing strategy, which should be sufficient for small-scale deployments (such as a single university). However, with a large amount of jobs, some workers will easily become overloaded. The implementation must allow for a simple replacement of the load balancing strategy so that this problem can be solved in the near future.

Forwarding jobs

Information about a job can be divided in two disjoint parts -- what the worker needs to know to process it and what the broker needs to forward it to the correct worker. It remains to be decided how this information will be transferred to its destination.

It is technically possible to transfer all the data required by the worker at once through the broker. This package could contain submitted files, test data, requirements on the worker, etc. A drawback of this solution is that both submitted files and test data can be rather large. Furthermore, it is likely that test data would be transferred many times.

Because of these facts, we decided to store data required by the worker using a shared storage space and only send a link to this data through the broker. This approach leads to a more efficient network and resource utilization (the broker doesn't have to process data that it doesn't need), but also makes the job submission flow more complicated.

Further requirements

The broker can be viewed as a central point of the backend. While it has only two primary, closely related responsibilities, other requirements have arisen (forwarding messages about job evaluation progress back to the frontend) and will arise in the future. To facilitate such requirements, its architecture should allow simply adding new communication flows. It should also be as asynchronous as possible to enable efficient communication with external services, for example via HTTP.

Worker

Worker is component which is supposed to execute incoming jobs from broker. As such worker should work and support wide range of different infrastructures and maybe even platforms/operating systems. Support of at least two main operating systems is desirable and should be implemented.

Worker as a service does not have to be much complicated, but a bit of complex behaviour is needed. Mentioned complexity is almost exclusively concerned about robust communication with broker which has to be regularly checked. Ping mechanism is usually used for this in all kind of projects. This means that worker should be able to send ping messages even during execution. So worker has to be divided into two separate parts, the one which will handle communication with broker and the another which will execute jobs.

The easiest solution is to have these parts in separate threads which somehow tightly communicates with each other. For inter process communication there can be used numerous technologies, from shared memory to condition variables or some kind of in-process messages. Already used library ZeroMQ is possible to provide in-process messages working on the same principles as network communication which is quite handy and solves problems with threads synchronization and such.

Evaluation

At this point we have worker with two internal parts listening one and execution one. Implementation of first one is quite straightforward and clear. So let us discuss what should be happening in execution subsystem.

After successful arrival of job, worker has to prepare new execution environment, then solution archive has to be downloaded from fileserver and extracted. Job configuration is located within these files and loaded into internal structures and executed. After that, results are uploaded back to fileserver. These steps are the basic ones which are really necessary for whole execution and have to be executed in this precise order.

Job configuration

Jobs as work units can quite vary and do completely different things, that means configuration and worker has to be prepared for this kind of generality. Configuration and its solution was already discussed above, implementation in worker is then quite also quite straightforward.

Worker has internal structures to which loads and which stores metadata given in configuration. Whole job is mapped to job metadata structure and tasks are mapped to either external ones or internal ones (internal commands has to be defined within worker), both are different whether they are executed in sandbox or as internal worker commands.

Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails.

There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overly as failed. User should not be punished for bad configuration or some network error. This is where task types are useful. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.

Job working directories

There is also question about working directory or directories of job, which directories should be used and what for. There is one simple answer on this every job will have only one specified directory which will contain every file with which worker will work in the scope of whole job execution. This solution is easy but fails due to logical and security reasons.

The least which must be done are two folders one for internal temporary files and second one for evaluation. The directory for temporary files is enough to comprehend all kind of internal work with filesystem but only one directory for whole evaluation is somehow not enough.

The solution which was chosen at the end is to have folders for downloaded archive, decompressed solution, evaluation directory in which user solution is executed and then folders for temporary files and for results and generally files which should be uploaded back to fileserver with solution results.

There has to be also hierarchy which separate folders from different workers on the same machines. That is why paths to directories are in format: ${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID} where default means default working directory of whole worker, folder is particular directory for some purpose (archives, evaluation, ...).

Mentioned division of job directories proved to be flexible and detailed enough, everything is in logical units and where it is supposed to be which means that searching through this system should be easy. In addition if solutions of users have access only to evaluation directory then they do not have access to unnecessary files which is better for overall security of whole ReCodEx.

Job variables

As mentioned above worker has job directories but users who are writing and managing job configurations do not know where they are (on some particular worker) and how they can be accessed and written into configuration. For this kind of task we have to introduce some kind of marks or signs which will represent particular folders. Marks or signs can have form broadly used variables.

Variables can be used everywhere where filesystem paths are used within configuration file. This will solve problem with specific worker environment and specific hierarchy of directories. Final form of variables is ${...} where triple dot is textual description. This format was used because of special dollar sign character which cannot be used within filesystem path, braces are there only to border textual description of variable.

Supplementary files

Interesting problem is with supplementary files (inputs, sample outputs). There are two approaches which can be observed. Supplementary files can be downloaded either on the start of the execution or during execution. If the files are downloaded at the beginning, execution does not really started at this point and if there are problems with network worker will find it right away and can abort execution without executing single task. Slight problems can arise if some of the files needs to have same name (e.g. solution assumes that input is input.txt), in this scenario downloaded files cannot be renamed at the beginning but during execution which is somehow impractical and not easily observed.

Second solution of this problem when files are downloaded on the fly has quite opposite problem, if there are problems with network, worker will find it during execution when for instance almost whole execution is done, this is also not ideal solution if we care about burnt hardware resources. On the other hand using this approach users have quite advanced control of execution flow and know what files exactly are available during execution which is from users perspective probably more appealing then the first solution. Based on that, downloading of supplementary files using 'fetch' tasks during execution was chosen and implemented.

Caching mechanism

Worker can use caching mechanism based on files from fileserver under one condition, provided files has to have unique name. If uniqueness is fulfilled then precious bandwidth can be saved using cache. This means there has to be system which can download file, store it in cache and after some time of inactivity delete it. Because there can be multiple worker instances on some particular server it is not efficient to have this system in every worker on its own. So it is feasible to have this feature somehow shared among all workers on the same machine. Solution may be again having separate service connected through network with workers which would provide such functionality but this would mean component with another communication for the purpose where it is not exactly needed. But mainly it would be single-failure component if it would stop working it is quite problem. So there was chosen another solution which assumes worker has access to specified cache folder, to this folder worker can download supplementary files and copy them from here. This means every worker has the possibility to maintain downloads to cache, but what is worker not able to properly do is deletion of unused files after some time. For that single-purpose component is introduced which is called 'cleaner'. It is simple script executed within cron which is able to delete files which were unused for some time. Together with worker fetching feature cleaner completes machine specific caching system.

Cleaner as mentioned is simple script which is executed regularly as cron job. If there is caching system like it was introduced in paragraph above there are little possibilities how cleaner should be implemented. On various filesystems there is usually support for two particular timestamps, last access time and last modification time. Files in cache are once downloaded and then just copied, this means that last modification time is set only once on creation of file and last access time should be set every time on copy. This imply last access time is what is needed here. But last modification time is widely used by operating systems, on the other hand last access time is not by default. More on this subject can be found here. For proper cleaner functionality filesystem which is used by worker for caching has to have last access time for files enabled.

Having cleaner as separated component and caching itself handled in worker is kind of blurry and is not clearly observable that it works without any race conditions. The goal here is not to have system without races but to have system which can recover from them. Implementation of caching system is based upon atomic operations of underlying filesystem. Follows description of one possible robust implementation. First start with worker implementation:

worker discovers fetch task which should download supplementary file
worker takes name of file and tries to copy it from cache folder to its working folder
- if successful then last access time should be rewritten (by filesystem itself) and whole operation is done
- if not successful then file has to be downloaded
  - file is downloaded from fileserver to working folder
  - downloaded file is then copied to cache

Previous implementation is only within worker, cleaner can anytime intervene and delete files. Implementation in cleaner follows:

cleaner on its start stores current reference timestamp which will be used for comparison and load configuration values of caching folder and maximal file age
there is a loop going through all files and even directories in specified cache folder
- last access time of file or folder is detected
- last access time is subtracted from reference timestamp into difference
- difference is compared against specified maximal file age, if difference is greater, file or folder is deleted

Previous description implies that there is gap between detection of last access time and deleting file within cleaner. In the gap there can be worker which will access file and the file is anyway deleted but this is fine, file is deleted but worker has it copied. Another problem can be with two workers downloading the same file, but this is also not a problem file is firstly downloaded to working folder and after that copied to cache. And even if something else unexpectedly fails and because of that fetch task will fail during execution even that should be fine. Because fetch tasks should have 'inner' task type which implies that fail in this task will stop all execution and job will be reassigned to another worker. It should be like the last salvation in case everything else goes wrong.

Sandboxing

There are numerous ways how to approach sandboxing on different platforms, describing all possible approaches is out of scope of this document. Instead of that have a look at some of the features which are certainly needed for ReCodEx and propose some particular sandboxes implementations on Linux or Windows.

General purpose of sandbox is safely execute software in any form, from scripts to binaries. Various sandboxes differ in how safely are they and what limiting features they have. Ideal situation is that sandbox will have numerous options and corresponding features which will allow administrators to setup environment as they like and which will not allow user programs to somehow damage executing machine in any way possible.

For ReCodEx and its evaluation there is need for at least these features: execution time and memory limitation, disk operations limit, disk accessibility restrictions and network restrictions. All these features if combined and implemented well are giving pretty safe sandbox which can be used for all kinds of users solutions and should be able to restrict and stop any standard way of attacks or errors.

Linux

Linux systems have quite extent support of sandboxing in kernel, there were introduced and implemented kernel namespaces and cgroups which combined can limit hardware resources (cpu, memory) and separate executing program into its own namespace (pid, network). These two features comply sandbox requirement for ReCodEx so there were two options, either find existing solution or implement new one. Luckily existing solution was found and its name is isolate. Isolate does not use all possible kernel features but only subset which is still enough to be used by ReCodEx.

Windows

The opposite situation is in Windows world, there is limited support in its kernel which makes sandboxing a bit trickier. Windows kernel only has ways how to restrict privileges of a process through restriction of internal access tokens. Monitoring of hardware resources is not possible but used resources can be obtained through newly created job objects.

There are numerous sandboxes for Windows but they all are focused on different things in a lot of cases they serves as safe environment for malicious programs, viruses in particular. Or they are designed as a separate filesystem namespace for installing a lot of temporarily used programs. From all these we can mention: Sandboxie, Comodo Internet Security, Cuckoo sandbox and many others. None of these is fitted as sandbox solution for ReCodEx. With this being said we can safely state that designing and implementing new general sandbox for Windows is out of scope of this project.

But designing sandbox only for specific environment is possible, namely for C# and .NET. CLR as a virtual machine and runtime environment has a pretty good security support for restrictions and separation which is also transferred to C#. This makes it quite easy to implement simple sandbox within C# but there are not any well known general purpose implementations. As said in previous paragraph implementing our own solution is out of scope of project. But C# sandbox is quite good topic for another project for example term project for C# course so it might be written and integrated in future.

Fileserver

The fileserver provides access to a shared storage space that contains files submitted by students, supplementary files such as test inputs and outputs and results of evaluation. In other words, it acts as an intermediate storage node for data passed between the frontend and the backend. This functionality can be easily separated from the rest of the backend features, which led to designing the fileserver as a standalone component. Such design helps encapsulate the details of how the files are stored (e.g. on a file system, in a database or using a cloud storage service), while also making it possible to share the storage between multiple ReCodEx frontends.

For early releases of the system, we chose to store all files on the file system -- it is the least complicated solution (in terms of implementation complexity) and the storage backend can be rather easily migrated to a different technology.

One of the facts we learned from CodEx is that many exercises share test input and output files, and also that these files can be rather large (hundreds of megabytes). A direct consequence of this is that we cannot add these files to submission archives that are to be downloaded by workers -- the combined size of the archives would quickly exceed gigabytes, which is impractical. Another conclusion we made is that a way to deal with duplicate files must be introduced.

A simple solution to this problem is storing supplementary files under the hashes of their content. This ensures that every file is stored only once. On the other hand, it makes it more difficult to understand what the content of a file is at a glance, which might prove problematic for the administrator.

A notable part of the fileserver's work is done by a web server (e.g. listening to HTTP requests and caching recently accessed files in memory for faster access). What remains to be implemented is handling requests that upload files -- student submissions should be stored in archives to facilitate simple downloading and supplementary exercise files need to be stored under their hashes.

We decided to use Python and the Flask web framework. This combination makes it possible to express the logic in ~100 SLOC and also provides means to run the fileserver as a standalone service (without a web server), which is useful for development.

Monitor

Users want to view real time evaluation progress of their solution. It can be easily done with established double-sided connection stream, but it is hard to achieve with web technologies. HTTP protocol works differently on separate requests basis with no long term connection. However, there is widely used technology to solve this problem, WebSocket protocol.

Working with WebSocket protocol from the backend is possible, but not ideal from design point of view. Backend should be hidden from public internet to minimize surface for possible attacks. With this in mind, there are two possible options:

send progress messages through API
make separate component for progress messages

Each of the two possibilities has some pros and cons. The first one is good because there is no additional component and API is already publicly visible. On the other side, working with WebSocket protocol from PHP is not much pleasant (but it is possible) and embedding this functionality into API is not extendable. The second approach is better for future changing the protocol or implementing extensions like caching of messages. Also, the progress feature is considered only optional, because there may be clients for which this feature is useless. Major drawback of separate component is another part, which needs to be publicly exposed.

We decided to make a separate component, mainly because it is smaller component with only one role, better maintainability and optional demands for progress callback.

There are several possibilities how to write the component. Notably, considered options were already used languages C++, PHP, JavaScript and Python. At the end, the Python language was chosen for its simplicity, great support for all used technologies and also there are free Python developers in out team. Then, responsibility of this component is determined. Concept of message flow is on following picture.

The message channel inputing the monitor uses ZeroMQ as main message framework used by backend. This decision keeps rest of backend aware of used communication protocol and related libraries. Output channel is WebSocket as a protocol for sending messages to web browsers. In Python, there are several WebSocket libraries. The most popular one is websockets in cooperation with asyncio. This combination is easy to use and well documented, so it is used in monitor component too. For ZeroMQ, there is zmq library with binding to framework core in C++.

Incoming messages are cached for short period of time. Early testing shows, that backend can start sending progress messages sooner than client connects to the monitor. To solve this, messages for each job are hold 5 minutes after reception of last message. The client gets all already received messages at time of connection with no message loss.

API server

The API server must handle HTTP requests and manage the state of the application in some kind of a database. It must also be able to communicate with the backend over ZeroMQ.

We considered several technologies which could be used:

PHP + Apache -- one of the most widely used technologies for creating web servers. It is a suitable technology for this kind of a project. It has all the features we need when some additional extensions are installed (to support LDAP or ZeroMQ).
Ruby on Rails, Python (Django), etc. -- popular web technologies that appeared in the last decade. Both support ZeroMQ and LDAP via extensions and have large developer communities.
ASP.NET (C#), JSP (Java) -- these technologies are very robust and are used to create server technologies in many big enterprises. Both can run on Windows and Linux servers (ASP.NET using the .NET Core).
JavaScript (Node.js) -- it is a quite new technology and it is being used to create REST APIs lately. Applications running on Node.js are quite performant and the number of open-source libraries available on the Internet is very huge.

We chose PHP and Apache mainly because we were familiar with these technologies and we were able to develop all the features we needed without learning to use a new technology. Since the number of features was quite high and needed to meet a strict deadline. This does not mean that we would find all the other technologies superior to PHP in all other aspects - PHP 7 is a mature language with a huge community and a wide range of tools, libraries, and frameworks.

We decided to use an ORM framework to manage the database, namely the widely used PHP ORM Doctrine 2. Using an ORM tool means we do not have to write SQL queries by hand. Instead, we work with persistent objects, which provides a higher level of abstraction. Doctrine also has a robust database abstraction layer so the database engine is not very important and it can be changed without any need for changing the code. MariaDB was chosen as the storage backend.

To speed up the development process of the PHP server application we decided to use a web framework. After evaluating and trying several frameworks, such as Lumen, Laravel, and Symfony, we ended up using Nette. This framework is very common in Czech Republic -- its lead developer is a well-known Czech programmer David Grudl -- and we were already familiar with the patterns used in this framework, such as dependency injection, authentication, routing. These concepts are useful even when developing a REST application, which might be a surprise considering that Nette focuses on "traditional" web applications. There is also a Nette extension which makes integration of Doctrine 2 very straightforward.

Architecture of the system

The Nette framework is an MVP (Model, View, Presenter) framework. It has many tools for creating complex websites and we need only a subset of them or we use different libraries which suite our purposes better:

Model - the model layer is implemented using the Doctrine 2 ORM insead of Nette Database
View - the whole view layer of the Nette framework (e.g., the Latte engine used for HTML template rendering) is unnecessary since we will return all the responses encoded in JSON. JSON is a common format used in APIs and we decided to prefer it to XML or a custom format.
Presenter - the whole lifecycle of a request processing of the Nette framework is used. The Presenters are used to group the logic of the individual API endpoints. The routing mechanism is modified to distinguish the actions by both the URL and the HTTP method of the request.

Request handling

A typical scenario for handling an API request is matching the HTTP request with a corresponding handler routine which creates a response object, that is then sent back to the client, encoded with JSON. The Nette\Application package can be used to achieve this with Nette, although it is meant to be used mainly in MVP applications.

Matching HTTP requests with handlers can be done using standard Nette URL routing -- we will create a Nette route for each API endpoint. Using the routing mechanism from Nette logically leads to implementing handler routines as Nette Presenter actions. Each presenter should serve logically related endpoints.

The last step is encoding the response as JSON. In Nette\Application, HTTP responses are returned using the Presenter::sendResponse() method. We decided to write a method that calls sendResponse internally and takes care of the encoding. This method has to be called in every presenter action. An alternative approach would be using the internal payload object of the presenter, which is more convenient, but provides us with less control.

Authentication

To make certain data and actions acessible only for some specific users, there must be a way how these users can prove their identity. We decided to avoid PHP sessions to make the server stateless (session ID is stored in the cookies of the HTTP requests and responses). The server issues a specific token for the user after his/her identity is verified (i.e., by providing email and password) and sent to the client in the body of the HTTP response. The client must remember this token and attach it to every following request in the Authorization header.

The token must be valid only for a certain time period ("log out" the user after a few hours of inactivity) and it must be protected against abuse (e.g., an attacker must not be able to issue a token which will be considered valid by the system and using which the attacker could pretend to be a different user). We decided to use the JWT standard (the JWS).

The JWT is a base64-encoded string which contains three JSON documents - a header, some payload, and a signature. The interesting parts are the payload and the signature: the payload can contain any data which can identify the user and metadata of the token (i.e., the time when the token was issued, the time of expiration). The last part is a digital signature contains a digital signature of the header and payload and it ensures that nobody can issue their own token and steal someone's identity. Both of these characteristics give us the opportunity to validate the token without storing all of the tokens in the database.

To implement JWT in Nette, we have to implement some of its security-related interfaces such as IAuthenticator and IUserStorage, which is rather easy thanks to the simple authentication flow. Replacing these services in a Nette application is also straightforward, thanks to its dependency injection container implementation. The encoding and decoding of the tokens itself including generating the signature and signature verification is done through a widely used third-party library which lowers the risk of having a bug in the implementation of this critical security feature.

Forgotten password

With authentication and some sort of dealing with passwords is related a problem with forgotten credentials, especially passwords. There has to be some kind of mechanism to retrieve a new password or change the old one.

First, there are absolutely not secure and recommendable ways how to handle that, for example sending the old password through email. A better, but still not secure solution is to generate a new one and again send it through email.

Mentioned solution was provided in CodEx, users had to write an email to administrator, who generated a new password and sent it back to the sender. This simple solution could be also automated, but administrator had quite a big control over whole process. This might come in handy if there should be some additional checkups, but on the other hand it can be quite time consuming.

Probably the best solution which is often used and is fairly secure follows. Let us consider only case in which all users have to fill their email addresses into the system and these addresses are safely in the hands of the right users.

When user finds out that he/she does not remember a password, he/she requests a password reset and fill in his/her unique identifier; it might be email or unique nickname. Based on matched user account the system generates unique access token and sends it to user via email address. This token should be time limited and usable only once, so it cannot be misused. User then takes the token or URL address which is provided in the email and go to the system's appropriate section, where new password can be set. After that user can sign in with his/her new password.

As previously stated, this solution is quite safe and user can handle it on its own, so administrator does not have to worry about it. That is the main reason why this approach was chosen to be used.

Uploading files

There are two cases when users need to upload files using the API -- submitting solutions to an assignment and creating a new exercise. In both of these cases, the final destination of the files is the fileserver. However, the fileserver is not publicly accessible, so the files have to be uploaded through the API.

The files can be either forwarded to the fileserver directly, without any interference from the API server, or stored and forwarded later. We chose the second approach, which is harder to implement, but more convenient -- it lets exercise authors double-check what they upload to the fileserver and solutions to assignments can be uploaded in a single request, which makes it easy for the fileserver to create an archive of the solution files.

Permissions

In a system storing user data has to be implemented some kind of permission checking. Previous chapters implies, that each user has to have a role, which corresponds to his/her privileges. Our research showed, that three roles are sufficient -- student, supervisor and administrator. The user role has to be checked with every request. The good points is, that roles nicely match with granularity of API endpoints, so the permission checking can be done at the beginning of each request. That is implemented using PHP annotations, which allows to specify allowed user roles for each request with very little of code, but all the business logic is the same, together in one place.

However, roles cannot cover all cases. For example, if user is a supervisor, it relates only to groups, where he/she is a supervisor. But using only roles allows him/her to act as supervisor in all groups in the system. Unfortunately, this cannot be easily fixed using some annotations, because there are many different cases when this problem occurs. To fix that, some additional checks can be performed at the beginning of request processing. Usually it is only one or two simple conditions.

With this two concepts together it is possible to easily cover all cases of permission checking with quite a small amount of code.

Solution loading

When a solution evaluation on the backend is finished, the results are saved to the fileserver and the API is notified by the broker. Some further steps needs to be done at that moment before the results can be presented to the users. Some of these steps are parsing of the results, calculation of the final score, or saving the structured data into the database. There are two main possibilities when to process the results:

immediately after the API server is notified by the backend
when a user requests the results for the first time

These options are almost equal, none of them provides any kind of a big advantage. Loading solutions immediately is better, because fetching results by the client for the first time can be a bit faster as the results are already processed. On the other hand, processing the results on demand can save some of the resources when the solution results are not important (e.g., the student finds a bug in his solution before the submission has been evaluated).

We decided for the lazy loading at the time when the results are requested for the first time. However, the concept of asynchronous jobs is then introduced. This type of job is useful for batch submitting of jobs, for example re-running jobs which failed on a worker hardware issue. These jobs are typically submitted by different user than the author (an administrator for example), so the original authors should be notified. In this case it is more reasonable to load the results immediately and optionally send them a notification via an email. This is exactly what we do.

It seems with the benefit of hindsight that immediate loading of all jobs could simplify the code and it has no major drawbacks. In the next version of ReCodEx we will re-evaluate this decision.

Communication with the backend

Backend failure reporting

The backend is a separate component which does not communicate with the administrators directly. When it encounters an error it stores it in a log file. It would be handy to inform the administrator directly at this moment so he can fix the cause of the error as soon as possible. The backend does not have any mechanism for notifying users using for example an email. The API server on the other hand has email sending implemented and it can easily forward any messages to the administrator. A secured communication protocol between the backend and the frontend already exists (it is used for the reporting of a finished job processing) and it is easy to add another endpoint for bug reporting.

When a request for sending a report arrives from the backend then the type of the report is inferred and if it is an error which deserves attention of the administrator then an email is sent to him/her. There can also be errors which are not that important (e.g., it was somehow solved by the backend itself or it is only informative, then these do not have to be reported through an email but can only be stored in the persistent database for further consideration.

On top of that the separate backend component does not have to be exposed to the outside network at all.

If a job processing fails then the backend informs the API server which initiated processing of the job. If an error which is not related to job-processing occurs then the backend must communicate with a given API server which is configured by the administrator while the other API servers which are using the same backend are not informed.

Backend state monitoring

The next thing related to communication with the backend is monitoring its current state. This concerns namely which workers are available for processing different hardware groups and which languages can be therefore used in exercises.

Another step would be the overall backend state like how many jobs were processed by some particular worker, workload of the broker and the workers, etc. The easiest solution is to manage this information by hand, every instance of the API server has to have an administrator which would have to fill them. This includes only the currently available workers and runtime environments which does not change very often. The real-time statistics of the backend cannot be made accessible this way in a reasonable way.

A better solution is to update this information automatically. This can be done in two ways:

It can be provided by the backend on-demand if API needs it
The backend will send these information periodically to the API.

Things like currently available workers or runtime environments are better to be really up-to-date so this could be provided on-demand if needed. Backend statistics are not that necessary and could be updated periodically.

However due to the lack of time automatic monitoring of the backend state will not be implemented in the early versions of this project but might be implemented in some of the next releases.

Web-app

The web application is one of the possible client applications of the ReCodEx system. Creating a web application as a client has several advantages:

no installation or setup is required on the user's device
works on all platforms including mobile platforms
when a new version is rolled out all the clients will use this version without any need for installing an update manually

One of the downsides is the large number of different web browsers (including the older versions of a specific browser) and their different interpretation of the code (HTML, CSS, JS). Some features of the latest specifications of HTML5 are implemented in some browsers which are used by a subset of the Internet users. This has to be taken into account when choosing appropriate tools for implementation of a website.

There are two basic ways how to create a website these days:

server-side approach - user's actions are processed on the server and the HTML code with the results of the action is generated on the server and sent back to the user's Internet browser. The client does not handle any logic (apart from rendering of the user interface and some basic user interaction) and is therefore very simple. The server can use the API server for processing of the actions so the business logic of the server can be very simple as well. A disadvantage of this approach is that a lot of redundant data is transferred across the requests although some parts of the content can be cached (e.g., CSS files). This results in longer loading times of the website.
server-side rendering with asynchronous updates (AJAX) - a slightly different approach is to render the page on the server as in the previous case but then execute user's actions asynchronously using the XMLHttpRequest JavaScript functionality. Which creates a HTTP request and transfers only the part of the website which will be updated.
client-side approach - the opposite approach is to transfer the communication with the API server and the rendering of the HTML completely from the server directly to the client. The client runs the code (usually JavaScript) in his/her web browser and the content of the website is generated based on the data received from the API server. The script file is usually quite large but it can be cached and does not have to be downloaded from the server again (until the cached file expires). Only the data from the API server needs to be transferred over the Internet and thus reduce the volume of payload on each request which leads to a much more responsive user experience, especially on slower networks. Since the client-side code has full control over the UI and a more sophisticated user interactions with the UI can be achieved.

All of these approaches are used in production by the web developers and all of them are well documented and there are mature tools for creating websites using any of these approaches.

We decided to use the third approach -- to create a fully client-side application which would be familiar and intuitive for a user who is used to modern web applications.

@todo: please think about more stuff about api and web-app... thanks ;-)

User documentation

Users interact with the ReCodEx through the web application. It is required to use a modern web browser with good HTML5 and CSS3 support. Among others, cookies and local storage are used. Also a decent JavaScript runtime must be provided by the browser.

Supported and tested browsers are: Firefox 50+, Chrome 55+, Opera 42+ and Edge 13+. Mobile devices often have problems with internationalization and possibly lack support for some common features of desktop browsers. In this stage of development is not possible for us to fine tune the interface for major mobile browsers on all mobile platforms. However, it is confirmed to work with latest Google Chrome and Gello browser on Android 7.1+. Issues have been reported with Firefox that will be fixed in the future. Also, it is confirmed to work with Safari browser on iOS 10.

Usage of the web application is divided into sections concerning particular user roles. Under these sections all possible use cases can be found. These sections are inclusive, so more privileged users need to read instructions for all less privileged users. Described roles are:

Student
Group supervisor
Group administrator
Instance administrator
Superadministrator

Terminology

Instance -- Represents a university, company or some other organization unit. Multiple instances can exist in a single ReCodEx installation.

Group -- A group of students to which exercises are assigned by a supervisor. It should typically correspond with a real world lab group.

User -- A person that interacts with the system using the web interface (or an alternative client).

Student -- A user with least privileges who is subscribed to some groups and submits solutions to exercise assignments.

Supervisor -- A person responsible for assigning exercises to a group and reviewing submissions.

Admin -- A person responsible for the maintenance of the system and fixing problems supervisors cannot solve.

Exercise -- An algorithmic problem that can be assigned to a group. They can be shared by the teachers using an exercise database in ReCodEx.

Assignment -- An exercise assigned to a group, possibly with modifications.

Runtime environment -- Runtime environment is unique combination of platform (OS) and programming language runtime/compiler in specific version. Runtime environments are managed by the administrators to reflect abilities of whole system.

Hardware group -- Hardware group is a set of workers with similar hardware. Its purpose is to group workers that are likely to run a program using the same amount of resources. Hardware groups are managed byt the system administrators who have to keep them up-to-date.

General basics

Description of general basics which are the same for all users of ReCodEx web application follows.

First steps in ReCodEx

You can create an account by clicking the "Create account" menu item in the left sidebar. You can choose between two types of registration methods -- by creating a local account with a specific password, or pairing your new account with an existing CAS UK account.

If you decide to create a new local account using the "Create ReCodEx account” form, you will have to provide your details and choose a password for your account. Although ReCodEx allows using quite weak passwords, it is wise to use a bit stronger ones The actual strength is shown in progress bar near the password field during registration. You will later sign in using your email address as your username and the password you select.

If you decide to use the CAS UK service, then ReCodEx will verify your CAS credentials and create a new account based on information stored there (name and email address). You can change your personal information later on the "Settings" page.

Regardless of the desired account type, an instance it will belong to must be selected. The instance will be most likely your university or other organization you are a member of.

To log in, go to the homepage of ReCodEx and in the left sidebar choose the menu item "Sign in". Then you must enter your credentials into one of the two forms -- if you selected a password during registration, then you should sign with your email and password in the first form called "Sign into ReCodEx". If you registered using the Charles University Authentication Service (CAS), you should put your student’s number and your CAS password into the second form called "Sign into ReCodEx using CAS UK".

There are several options you can edit in your user account:

changing your personal information (i.e., name)
changing your credentials (email and password)
updating your preferences (source code viewer/editor settings, default language)

You can access the settings page through the "Settings" button right under your name in the left sidebar.

If you are not active in ReCodEx for a whole day, you will be logged out automatically. However, we recommend you sign out of the application after you finish your interaction with it. The logout button is placed in the top section of the left sidebar right under your name. You may need to expand the sidebar with a button next to the "ReCodEx” title (informally known as hamburger button), depending on your screen size.

Forgotten password

If you cannot remember your password and you do not use CAS UK authentication, then you can reset your password. You will find a link saying "Cannot remember what your password was? Reset your password." under the sign in form. After you click this link, you will be asked to submit your registration email address. A message with a link containing a special token will be sent to you by e-mail -- we make sure that the person who requested password resetting is really you. When you visit the link, you will be able to enter a new password for your account. The token is valid only for a couple of minutes, so do not forget to reset the password as soon as possible, or you will have to request a new link with a valid token.

If you sign in through CAS UK, then please follow the instructions provided by the administrators of the service described on their website.

Dashboard

When you log into the system you should be redirected to your "Dashboard". On this page you can see some brief information about the groups you are member of. The information presented there varies with your role in the system -- further description of dashboard will be provided later on with according roles.

Student

Student is a default role for every newly registered user. This role has quite limited capabilites in ReCodEx. Generally, a student can only submit solutions of exercises in some particular groups. These groups should correspond to courses he/she attends.

On the "Dashboard" page there is "Groups you are student of" section where you can find list of your student groups. In first column of every row there is a brief panel describing concerning group. There is name of the group and percentage of gained points from course. If you have enough points to successfully complete the course then this panel has green background with tick sign. In the second column there is a list of assigned exercises with its deadlines. If you want to quickly get to the groups page you might want to use provided "Show group's detail" button.

Join group and start solving assignments

To be able to submit solutions you have to be a member of the right group. Each instance has its own group hierarchy, so you can choose only those within your instance. That is why a list of groups is available from under an instance link located in the sidebar. This link brings you to instance detail page.

In there you can see a description of the instance and most importantly in "Groups hierarchy" box there is a hierarchical list of all public groups in the instance. Please note that groups with plus sign are collapsible and can be further extended. When you find a group you would like to join, continue by clicking on "See group's page" link following with "Join group" link.

Note: Some groups can be marked as private and these groups are not visible in hierarchy and membership cannot be established by students themselves. Management of students in this type of groups is in the hands of supervisors.

On the group detail page there are multiple interesting things for you. First one is brief overview with information describing the group, there is list with supervisors and also hierarchy of subgroups. Most importantly, there is the "Student's dashboard" section. This section contains list of assignments and a list of fellow students. If supervisors of groups allowed students to see each other's statistics there will also be the number of points the students gained.

In the "Assignments" box on the group detail page there is a list of assigned exercises which students are supposed to solve. The assignments are displayed with their names and deadlines. There are possibly two deadlines, the first one means that till this datetime student will receive full amount of points in case of successful solution. Second deadline does not have to be set, but in case it is, the maximum number of points for successful solution between these two deadlines can be different.

An assignment link will lead you to assignment detail page where are presented all known details about assignment. There are of course both deadlines, limit of submissions which you can make and also full-range description of assignment, which can be localized. The localization can be on demand switched between all language variants in tab like box.

Further on the page you can find "Submitted solutions" box where is a list of submissions with links to result details. But most importantly there is a "Submit new solution" button on the assignment page which provides an interface to submit solution of the assignment.

After clicking on submit button, dialog window will show up. In here you can upload files representing your solution, you can even add some notes to mark the solution. Your supervisor can also access this note. After you successfully upload all files necessary for your solution, click the "Submit your solution" button and let ReCodEx evaluate the solution.

During the execution ReCodEx backend might send evaluation progress state to your browser which will be displayed in another dialog window. When the whole execution is finished then a "See the results" button will appear and you can look at the results of your solution.

On the results detail page there are a lot of information. Apart from assignment description, which is not connected to your results, there is also the solution submitter name (supervisor can submit a solution on your behalf), further there are files which were uploaded on submission and most importantly "Evaluation details" and "Test results" boxes.

Evaluation details contains overall results of your solution. There are information such as whether the solution was provided before deadlines, if the evaluation process successfully finished or if compilation succeeded. After that you can find a lot of values, most important one is the last, "Total score", consisting of your score, slash and the maximum number of points for this assignment. Interestingly the your score value can be higher than the maximum, which is caused by "Bonus points" item above. If your solution is nice and supervisor notices it, he/she can assign you additional points for effort. On the other hand, points can be also subtracted for bad coding habits or even cheating.

In test results box there is a table of all exercise tests results. Columns represents these information:

test case overall result, symbol of yes/no option
test case name
percentage of correctness of this particular test
evaluation status, if test was successfully executed or failed
memory limit, if supervisor allowed it then percentual memory usage is displayed
time limit, if supervisor allowed it then percentual time usage is displayed

A new feature of web application is "Comments and notes" box where you can communicate with your supervisors or just write random private notes to your submission. Adding a note is quite simple, you just write it to text field in the bottom of box and click on the "Send" button. The button with lock image underneath can switch visibility of newly created comments.

In case you think the ReCodEx evaluation of your solution is wrong, please use the comments system described above, or even better notify your supervisor by another channel (email). Unfortunately there is currently no notification mechanism for new comment messages.

Group supervisor

Group supervisor is typically the lecturer of a course. A user in this role can modify group description and properties, assign exercises or manage list of students. Further permissions like managing subgroups or supervisors is available only for group administrators.

On "Dashboard" page you can find "Groups you supervise" section. Here there are boxes representing your groups with the list of students attending course and their points. Student names are clickable with redirection to user's profile where further information about his/hers assignments and solution can be found. To quickly jump onto groups page, use "Show group's detail" button at the bottom of the matching group box.

Manage group

Locate group you supervise and you want to manage. All your supervised groups are available in sidebar under "Groups -- supervisor" collapsible menu. If you click on one of those you will be redirected to group detail page. In addition to basic group information you can also see "Supervisor's controls" section. In this section there are lists of current students and assignments.

As a supervisor of group you are able to see "Edit group settings" button at the top of the page. Following this link will take you to group editation page with form containing these fields:

group name which is visible to other users
external identification which may be used for pairing with entries in an information system
description of group which will be available to users in instance (in Markdown)
set if group is publicly visible (and joinable by students) or private
options to set if students should be able see statistics of each other
minimal points threshold which students have to gain to successfully complete the course

After filling all necessary fields the form can be sent by clicking on "Edit group" button and all changes will be applied.

For students management there are "Students" and "Add student" boxes. The first one is simple list of all students which are attending the course with the possibility of delete them from the group. That can be done by hitting "Leave group" button near particular user. The second box is for adding students to the group. There is a text field for typing name of the student and after clicking on the magnifier image or pressing enter key there will appear list of matched users. At this moment just click on the "Join group" button and student will be signed in to your group.

Assigning exercises

Before assigning an exercise, you obviously have to know what exercises are available. A list of all exercises in the system can be found under "Exercises" link in sidebar. This page contains a table with exercises names, difficulties and names of the exercise authors. Further information about exercise is available by clicking on its name.

On the exercise details page are numerous information about it. There is a box with all possible localized descriptions and also a box with some additional information of exercise author, its difficulty, version, etc. There is also a description for supervisors by exercise author under "Exercise overview" option, where some important information can be found. And most notably there is an information about available programming languages for this exercise, under "Supported runtime environments" section.

If you decide that the exercise is suitable for one of your groups, look for the "Groups" box at the bottom of the page. There is a list of all groups you supervise with an "Assign" button which will assign the exercise to the selected group.

After clicking on the "Assign" button you should be redirected to assignment editation page. In there you can find two forms, one for editation of assignment meta information and the second one for setting exercise time and memory limits.

In meta information form you can fill these options:

name of the assignment which will be visible in a group
visibility (if an assignment is under construction then you can mark it as not visible and students will not see it)
subform for localized descriptions (new localization can be added by clicking on "Add language variant" button, current one can be deleted with "Remove this language" button)
- language of description from dropdown field (English, Czech, German)
- description in selected language
score configuration which will be used on students solution evaluation, you can find some very simple one already in here, description of score configuration can be found further in "Writing score configuration" chapter
first submission deadline
maximum points that can be gained before the first deadline; if you want to manage all points manually, set it to 0 and then use bonus points, which are described in the next subchapter
second submission deadline, after that students still can submit exercises but they are given no points no points (must be after the first deadline)
maximum points that can be gained between first deadline and second deadline
submission count limit for students' solutions -- limits the amount of attempts a student has at solving the problem
visibility of memory and time ratios; if true students can see the percentage of used memory and time (with respect to the limit) for each test
minimum percentage of points which each submission must gain to be considered correct (if it gets less, it will gain no points)
whether the assignment is marked as bonus one and points from solving it are not included into group threshold limit (that means solving it can get you additional points over the limit)

The form has to be submitted with "Edit settings" button otherwise changes will not be saved.

The same editation page serves also for the purpose of assignment editation, not only creation. That is why on bottom of the page "Delete the assignment" box can be found. Clearly the button "Delete" in there can be used to unassign exercise from group.

The last unexplored area is the time and memory limits form. The whole form is situated in a box with tabs which are leading to particular runtime environments. If you wish not to use one of those, locate "Remove" button at the bottom of the box tab which will delete this environment from the assignment. Please note that this action is irreversible.

In general, every tab in environments box contains some basic information about runtime environment and another nested tabbed box. In there you can find all hardware groups which are available for the exercise and set limits for all test cases. The time limits have to be filled in seconds (float), memory limits are in bytes (int). If you are interested in some reference values to particular test case then you can take a peek on collapsible "Reference solutions' evaluations" items. If you are satisfied with changes you made to the limits, save the form with "Change limits" button right under environments box.

Students' solutions management

One of the most important tasks of a group supervisor is checking student solutions. As automatic evaluation of them cannot catch all problems in the source code, it is advisable to do a brief manual review of student's coding style and reflect that in assignment bonus points.

On "Assignment detail" page there is an "View student results" button near top of the page (next to "Edit assignment settings" button). This will redirect you to a page where is a list of boxes, one box per student. Each student box contains a list of submissions for this assignment. The row structure of submission list is the same as the structure in student's "Submitted solution" box. More information about every solution can be showed by clicking on "Show details" link on the end of solution row.

This page is the same as for students with one exception -- there is an additional collapsed box "Set bonus points". In unfolded state, there is an input field for one number (positive or negative integer) and confirmation button "Set bonus points". After filling intended amount of points and submitting the form, the data in "Evaluation details" box get immediately updated. To remove assigned bonus points, submit just the zero number. The bonus points are not additive, newer value overrides older values.

It is useful to give a feedback about the solution back to the user. For this you can use the "Commens and notes" box. Make sure that the messages are not private, so that the student can see them. More detailed description of this box can be nicely used the "Comments and notes" box. Make sure that the messages are not private, so the student can see them. More detailed description of this box is available in student part of user documentation.

One of the discussed concept was marking one solution as accepted. However, due to lack of frontend developers it is not yet prepared in user interface. We hope, it will be ready as soon as possible. The button for accepting a solution will be most probably also on this page.

Creating exercises

Link to exercise creation can be found in exercises list which is accessible through "Exercises" link in sidebar. On the bottom of the exercises list page you can find "Add exercise" button which will redirect you to exercise editation page. In this moment exercise is already created so if you just leave this page exercise will stay in the database. This is also reason why exercise creation form is the same as the exercise editation form.

Exercise editation page is divided into three separate forms. First one is supposed to contain meta information about exercise, second one is used for uploading and management of supplementary files and third one manages runtime configuration in which exercise can be executed.

First form is located in "Edit exercise settings" and generally contains meta information needed by frontend which are somehow somewhere visible. In here you can define:

exercise name which will be visible to other supervisors
difficulty of exercise (easy, medium, hard)
description which will be available only for visitors, may be used for further description of exercise (for example information about test cases and how they could be scored)
private/public switch, if exercise is private then only you as author can see it, assign it or modify it
subform containing localized descriptions of exercise, new one can be added with "Add language variant" button and current one deleted with "Remove this language"
- language in which this particular description is in (Czech, English, German)
- actual localized description of exercise

After all information is properly set form has to be submitted with "Edit settings" button.

Management of supplementary files can be found in "Supplementary files" box. Supplementary files are files which you can use further in job configurations which have to be provided in all runtime configurations. These files are uploaded directly to fileserver from where worker can download them and use during execution according to job configuration.

Files can be uploaded either by drag and drop mechanism or by standard "Add a file" button. In opened dialog window choose file which should be uploaded. All chosen files are immediately uploaded to server but to save supplementary files list you have to hit "Save supplementary files" button. All previously uploaded files are visible right under drag and drop area, please note that files are stored on fileserver and cannot be deleted after upload.

The last form on exercise editation page is runtime configurations editation form. Exercise can have multiple runtime configurations according to the number of programming languages in which it can be run. Every runtime configuration corresponds to one programming language because all of them has to have a bit different job configuration.

New runtime configuration can be added with "Add new runtime configuration" button this will spawn new tab in runtime configurations box. In here you can fill following:

human readable identifier of runtime configuration
runtime environment which corresponds to programming language
job configuration in YAML, detailed description of job configuration can be found further in this chapter in "Writing job configuration" section

If you are done with changes to runtime configurations save form with "Change runtime configurations" button. If you want to delete some particular runtime just hit "Remove" button in the right tab, please note that after this operation runtime configurations form has to be again saved to apply changes.

All runtime configurations which were added to exercise will be visible to supervisors and all can be used in assignment, so please be sure that all of the languages and job configurations are working.

If you choose to delete exercise, at the bottom of the exercise editation page you can find "Delete the exercise" box where "Delete" button is located. By clicking on it exercise will be delete from the exercises list and will no longer be available.

Exercise's reference solutions

Each exercise should have a set of reference solutions, which are used to tune time and memory limits of assignments. Values of used time and memory for each solution are displayed in yellow boxes under forms for setting assignment limits as described earlier.

However, there is currently no user interface to upload and evaluate reference solutions. It is possible to use direct REST API calls, but it is not much user friendly. If you are interested, please look at API documentation, notably sections Uploaded-Files and Reference-Exercise-Solutions. You need to upload the reference solution files, create a new reference solution and then evaluate the solution. After that, measured data will be available in the box at assignment editing page (setting limits section).

We are now working on a better user interface, which will be available soon. Then its description will be added here.

Group administrator

Group administrator is the group supervisor with some additional permissions in particular group. Namely group administrator is capable of creating a subgroups in managed group and also adding and deleting supervisors. Administrator of the particular group can be only one person.

Creating subgroups and managing supervisors

There is no special link which will get you to groups in which you are administrator. So you have to get there through "Groups - supervisor" link in sidebar and choose the right group detail page. If you are there you can see "Administrator controls" section, here you can either add supervisor to group or create new subgroup.

Form for creating a subgroup is present right on the group detail page in "Add subgroup" box. Group can be created with following options:

name which will be visible in group hierarchy
external identification, can be for instance ID of group from school system
some brief description about group
allow or deny users to see each others statistics from assignments

After filling all the information a group can be created by clicking on "Create new group" button. If creation is successful then the group is visible in "Groups hierarchy" box on the top of page. All information filled during creation can be later modified.

Adding a supervisor to a group is rather easy, on group detail page is an "Add supervisor" box which contains text field. In there you can type name or username of any user from system. After filling user name, click on the magnifier image or press the enter key and all suitable users are searched. If your chosen supervisor is in the updated list then just click on the "Make supervisor" button and new supervisor should be successfully set.

Also, existing supervisor can be removed from the group. On the group detail page there is "Supervisors" box in which all supervisors of the group are visible. If you are the group administrator, you can see there "Remove supervisor" buttons right next to supervisors names. After clicking on it some particular supervisor should not to be supervisor of the group anymore.

Instance administrator

Instance administrator can be only one person per instance. In addition to previous roles this administrator should be able to modify the instance details, manage licences and take care of top level groups which belong to the instance.

Instance management

List of all instances in the system can be found under "Instances" link in the sidebar. On that page there is a table of instances with their respective admins. If you are one of them, you can visit its page by clicking on the instance name. On the instance details page you can find a description of the instance, current groups hierarchy and a form for creating a new group.

If you want to change some of the instance settings, follow "Edit instance" link on the instance details page. This will take you to the instance editation page with corresponding form. In there you can fill following information:

name of the instance which will be visible to every other user
brief description of instance and for whom it is intended
checkbox if instance is open or not which means public or private (hidden from potential users)

If you are done with your editation, save filled information by clicking on "Update instance" button.

If you go back to the instance details page you can find there a "Create new group" box which is able to add a group to the instance. This form is the same as the one for creating subgroup in already existing group so we can skip description of the form fields. After successful creation of the group it will appear in "Groups hierarchy" box at the top of the page.

Licences

On the instance details page, there is a box "Licences". On the first line, it shows it this instance has currently valid licence or not. Then, there are multiple lines with all licences assigned to this instance. Each line consists of a note, validity status (if it is valid or revoked by superadministrator) and the last date of licence validity.

A box "Add new licence" is used for creating new licences. Required fields are the note and the last day of validity. It is not possible to extend licence lifetime, a new one should be generated instead. It is possible to have more than one valid licence at a time. Currently there is no user interface for revoking licences, this is done manually by superadministrator. If an instance is to be disabled, all valid licences have to be revoked.

Superadministrator

Superadministrator is a user with the most privileges and as such superadmin should be quite a unique role. Ideally, there should be only one user of this kind, used with special caution and adequate security. With this stated it is obvious that superadmin can perform any action the API is capable of.

Users management

There are only a few user roles in ReCodEx. Basically there are only three: student, supervisor, and superadmin. Base role is student which is assigned to every registered user. Roles are stored in database alongside other information about user. One user always has only one role at the time. At first startup of ReCodEx, the administrator has to change the role for his/her account manually in the database. After that manual intervention into database should never be needed.

There is a little catch in groups and instances management. Groups can have admins and supervisors. This setting is valid only per one particular group and has to be separated from basic role system. This implies that supervisor in one group can be student in another and simultaneously have global supervisor role. Changing role from student to supervisor and back is done automatically when the new privileges are granted to the user, so managing roles by hand in database is not needed. Previously stated information can be applied to instances as well, but instances can only have admins.

Roles description:

Student -- Default role which is used for newly created accounts. Student can join or leave public groups and submit solutions of assigned exercises.
Supervisor -- Inherits all permissions from student role. Can manage groups to which he/she belongs to. Supervisor can also view and change groups details, manage assigned exercises, view students in group and their solutions for assigned exercises. On top of that supervisor can create/delete groups too, but only as subgroup of groups he/she belongs to.
Superadmin -- Inherits all permissions from supervisor role. Most powerful user in ReCodEx who should be able to do access any functionality provided by the application.

Writing score configuration

An important thing about assignment is how to assign points to particular solutions. As mentioned previously, the whole job is composed of logical tests. All of these tests have to contain one essential "evaluation" task. Evaluation task should output one float number which can be further used for scoring of particular tests.

Total resulting score of the students solution is then calculated according to a supplied score config (described below) and using specified calculator. Total score is also a float between 0 and 1. This number is then multiplied by the maximum of points awarded for the assignment by the teacher assigning the exercise -- not the exercise author.

For now, there is only one way how to write score configuration using only simple score calculator. But the implementation in API is agile enough to handle upcoming score calculators which might use some more complex scoring algorithms. This also means that future calculators do not have to use the YAML format for configuration. In fact, the configuration can be a string in any format.

Simple score calculation

First implemented calculator is simple score calculator with test weights. This calculator just looks at the score of each test and put them together according to the test weights specified in assignment configuration. Resulting score is calculated as a sum of products of score and weight of each test divided by the sum of all weights. The algorithm in Python would look something like this:

sum = 0
weightSum = 0
for t in tests:
  sum += t.score * t.weight
  weightSum += t.weight
score = sum / weightSum

Sample score config in YAML format:

testWeights:
  a: 300   # test with id 'a' has a weight of 300
  b: 200
  c: 100
  d: 100

Writing job configuration

To run and evaluate an exercise the backend needs to know the steps how to do that. This is different for each environment (operation system, programming language, etc.), so each of the environments needs to have separate configuration.

Backend works with a powerful, but quite low level description of simple connected tasks written in YAML syntax. More about the syntax and general task overview can be found on separate page. One of the planned features was user friendly configuration editor, but due to tight deadline and team composition it did not make it to the first release. However, writing configuration in the basic format will be always available and allows users to use the full expressive power of the system.

This section walks through creation of job configuration for hello world exercise. The goal is to compile file source.c and check if it prints Hello World! to the standard output. This is the only test case, let's call it A.

The problem can be split into several tasks:

compile source.c into helloworld with /usr/bin/gcc
run helloworld and save standard output into out.txt
fetch predefined output (suppose it is already uploaded to fileserver) with hash a0b65939670bc2c010f4d5d6a0b3e4e4590fb92b to reference.txt
compare out.txt and reference.txt by /usr/bin/diff

The absolute path of tools can be obtained from system administrator. However, /usr/bin/gcc is location, where the GCC binary is available almost everywhere, so location of some tools can be (professionally) guessed.

First, write header of the job to the configuration file.

submission:
    job-id: hello-word-job
    hw-groups:
        - group1

Basically it means, that the job hello-world-job needs to be run on workers that belong to the group_1 hardware group . Reference files are downloaded from the default location configured in API (such as http://localhost:9999/exercises) if not stated explicitly otherwise. Job execution log will not be saved to result archive.

Next the tasks have to be constructed under tasks section. In this demo job, every task depends only on previous one. The first task has input file source.c (if submitted by user) already available in working directory, so just call the GCC. Compilation is run in sandbox as any other external program and should have relaxed time and memory limits. In this scenario, worker defaults are used. If compilation fails, the whole job is immediately terminated (because the fatal-failure bit is set). Because bound-directories option in sandbox limits section is mostly shared between all tasks, it can be set in worker configuration instead of job configuration (suppose this for following tasks). For configuration of workers please contact your administrator.

- task-id: "compilation"
  type: "initiation"
  fatal-failure: true
  cmd:
      bin: "/usr/bin/gcc"
      args:
          - "source.c"
          - "-o"
          - "helloworld"
  sandbox:
      name: "isolate"
      limits:
          - hw-group-id: group1
            chdir: ${EVAL_DIR}
            bound-directories:
                - src: ${SOURCE_DIR}
                  dst: ${EVAL_DIR}
                  mode: RW

The compiled program is executed with time and memory limit set and the standard output is redirected to a file. This task depends on compilation task, because the program cannot be executed without being compiled first. It is important to mark this task with execution type, so exceeded limits will be reported in frontend.

Time and memory limits set directly for a task have higher priority than worker defaults. One important constraint is, that these limits cannot exceed limits set by workers. Worker defaults are present as a safety measure so that a malformed job configuration cannot block the worker forever. Worker default limits should be reasonably high, like a gigabyte of memory and several hours of execution time. For exact numbers please contact your administrator.

It is important to know that if the output of a program (both standard and error) is redirected to a file, the sandbox disk quotas apply to that file, as well as the files created directly by the program. In case the outputs are ignored, they are redirected to /dev/null, which means there is no limit on the output length (as long as the printing fits in the time limit).

- task-id: "execution_1"
  test-id: "A"
  type: "execution"
  dependencies:
      - compilation
  cmd:
      bin: "helloworld"
  sandbox:
      name: "isolate"
      stdout: ${EVAL_DIR}/out.txt
      limits:
          - hw-group-id: group1
            chdir: ${EVAL_DIR}
            bound-directories:
                - src: ${SOURCE_DIR}
                  dst: ${EVAL_DIR}
                  mode: RW
            time: 0.5
            memory: 8192

Fetch sample solution from file server. Base URL of file server is in the header of the job configuration, so only the name of required file (its sha1sum in our case) is necessary.

- task-id: "fetch_solution_1"
  test-id: "A"
  dependencies:
      - execution
  cmd:
      bin: "fetch"
      args:
          - "a0b65939670bc2c010f4d5d6a0b3e4e4590fb92b"
          - "${SOURCE_DIR}/reference.txt"

Comparison of results is quite straightforward. It is important to set the task type to evaluation, so that the return code is set to 0 if the program is correct and 1 otherwise. We do not set our own limits, so the default limits are used.

- task-id: "judge_1"
  test-id: "A"
  type: "evaluation"
  dependencies:
      - fetch_solution_1
  cmd:
      bin: "/usr/bin/diff"
      args:
          - "out.txt"
          - "reference.txt"
  sandbox:
      name: "isolate"
      limits:
          - hw-group-id: group1
            chdir: ${EVAL_DIR}
            bound-directories:
                - src: ${SOURCE_DIR}
                  dst: ${EVAL_DIR}
                  mode: RW

# Implementation

The backend

The backend is the part which is hidden to the user and which has only one purpose: evaluate user’s solutions of their assignments.

@todo: describe the configuration inputs of the Backend

@todo: describe the outputs of the Backend

@todo: describe how the backend receives the inputs and how it communicates the results

Whole backend is not just one service/component, it is quite complex system on its own.

@todo: describe the inner parts of the Backend (and refer to the Wiki for the technical description of the components)

Broker

@todo: gets stuff done, single point of failure and center point of ReCodEx universe

@todo: what to mention: - job scheduling, worker queues - API notification using curl, authentication using HTTP Basic Auth - asynchronous resending progress messages

Fileserver

@todo: stores particular data from frontend and backend, hashing, HTTP API

Worker

@todo: describe a bit of internal structure in general - two threads - number of ZeroMQ sockets, using it also for internal communication - how sandboxes are fitted into worker, unix syscalls, #ifndef - libcurl for fetchning, why not to use some object binding - working with local filesystem, directory structure - hardware groups in detail

@todo: describe how jobs are generally executed

Runtime environments

ReCodEx is designed to utilize a rather diverse set of workers -- there can be differences in many aspects, such as the actual hardware running the worker (which impacts the results of measuring) or installed compilers, interpreters and other tools needed for evaluation. To address these two examples in particular, we assign runtime environments and hardware groups to exercises.

The purpose of runtime environments is to specify which tools (and often also operating system) are required to evaluate a solution of the exercise -- for example, a C# programming exercise can be evaluated on a Linux worker running Mono or a Windows worker with the .NET runtime. Such exercise would be assigned two runtime environments, Linux+Mono and Windows+.NET (the environment names are arbitrary strings configured by the administrator).

A hardware group is a set of workers that run on similar hardware (e.g. a particular quad-core processor model and a SSD hard drive). Workers are assigned to these groups by the administrator. If this is done correctly, performance measurements of a submission should yield the same results. Thanks to this fact, we can use the same resource limits on every worker in a hardware group. However, limits can differ between runtime environments -- formally speaking, limits are a function of three arguments: an assignment, a hardware group and a runtime environment.

Monitor

@todo: not necessary component which can be omitted, proxy-like service

Cleaner

@todo: if it is something what to say here

The frontend

REST API

@todo: what to mention - basic - GET, POST, JSON, Header, ... - endpoint structure, Swager UI - handling requests, preflight, checking roles with annotation - Uploading files and file storage - one by one upload endpoint. Explain different types of the Uploaded files. - Automatic detection of the runtime environment - users must submit correctly named files, assuming the RTE from the extensions

Used technologies

@todo: PHP7 – how it is used for typehints, Nette framework – how it is used for routing, Presenters actions endpoints, exceptions and ErrorPresenter, Doctrine 2 – database abstraction, entities and repositories + conventions, Communication over ZMQ – describe the problem with the extension and how we reported it and how to treat it in the future when the bug is solved. Relational database – we use MariaDB, Doctine enables us to switch the engine to a different engine if needed

Data model

@todo: Describe the code-first approach using the Doctrine entities, how the entities map onto the database schema (refer to the attached schemas of entities and relational database models), describe the logical grouping of entities and how they are related:

user + settings + logins + ACL
instance + licences + groups + group membership
exercise + assignments + localized assignments + runtime environments + hardware groups
submission + solution + reference solution + solution evaluation
comment threads + comments

API endpoints

@todo: Tell the user about the generated API reference and how the Swagger UI can be used to access the API directly.

Web application

@todo: what to mention: - used libraries, JSX, ... - usage in user doc - server side rendering - maybe more ...

Communication protocol

Detailed communication inside the ReCodEx system is captured in the following image and described in sections below. Red connections are through ZeroMQ sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unles explicitly specified otherwise).

Broker - Worker communication

Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target worker's socket identity (which is saved on its init command).

Commands from broker to worker:

eval -- evaluate a job. Requires 3 message frames:
- job_id -- identifier of the job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
- job_url -- URL of the archive with job configuration and submitted source code
- result_url -- URL where the results should be stored after evaluation
intro -- introduce yourself to the broker (with init command) -- this is required when the broker loses track of the worker who sent the command. Possible reasons for such event are e.g. that one of the communicating sides shut down and restarted without the other side noticing.
pong -- reply to ping command, no arguments

Commands from worker to broker:

init -- introduce self to the broker. Useful on startup or after reestablishing lost connection. Requires at least 2 arguments:
- hwgroup -- hardware group of this worker
- header -- additional header describing worker capabilities. Format must be header_name=value, every header shall be in a separate message frame. There is no limit on number of headers. There is also an optional third argument -- additional information. If present, it should be separated from the headers with an empty frame. The format is the same as headers. Supported keys for additional information are:
- description -- a human readable description of the worker for administrators (it will show up in broker logs)
- current_job -- an identifier of a job the worker is now processing. This is useful when we are reassembling a connection to the broker and need it to know the worker will not accept a new job.
done -- notifying of finished job. Contains following message frames:
- job_id -- identifier of finished job
- result -- response result, possible values are:
- OK -- evaluation finished successfully
- FAILED -- job failed and cannot be reassigned to another worker (e.g. due to error in configuration)
- INTERNAL_ERROR -- job failed due to internal worker error, but another worker might be able to process it (e.g. downloading a file failed)
- message -- a human readable error message
progress -- notice about current evaluation progress. Contains following message frames:
- job_id -- identifier of current job
- command -- what is happening now.
- DOWNLOADED -- submission successfuly fetched from fileserver
- FAILED -- something bad happened and job was not executed at all
- UPLOADED -- results are uploaded to fileserver
- STARTED -- evaluation of tasks started
- ENDED -- evaluation of tasks is finished
- ABORTED -- evaluation of job encountered internal error, job will be rescheduled to another worker
- FINISHED -- whole execution is finished and worker ready for another job execution
- TASK -- task state changed -- see below
- task_id -- only present for "TASK" state -- identifier of task in current job
- task_state -- only present for "TASK" state -- result of task evaluation. One of:
- COMPLETED -- task was successfully executed without any error, subsequent task will be executed
- FAILED -- task ended up with some error, subsequent task will be skipped
- SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all
ping -- tell broker I am alive, no arguments

Heartbeating

It is important for the broker and workers to know if the other side is still working (and connected). This is achieved with a simple heartbeating protocol.

The protocol requires the workers to send a ping command regularly (the interval is configurable on both sides -- future releases might let the worker send its ping interval with the init command). Upon receiving a ping command, the broker responds with pong.

Whenever a heartbeating message doesn't arrive, a counter called liveness is decreased. When this counter drops to zero, the other side is considered disconnected. When a message arrives, the liveness counter is set back to its maximum value, which is configurable for both sides.

When the broker decides a worker disconnected, it tries to reschedule its jobs to other workers.

If a worker thinks the broker crashed, it tries to reconnect periodically, with a bounded, exponentially increasing delay.

This protocol proved great robustness in real world testing. Thus whole backend is reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages does not flood the network when there are problems. We experienced no issues since we are using this protocol.

Worker - File Server communication

Worker is communicating with file server only from execution thread. Supported protocol is HTTP optionally with SSL encryption (recommended). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request.

Worker side

Workers comunicate with the file server in both directions -- they download student's submissions and then upload evaluation results. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >=400). Worker have list of credentials to all available file servers in its config file.

download file -- standard HTTP GET request to given URL expecting file content as response
upload file -- standard HTTP PUT request to given URL with file data as body -- same as command line tool curl with option --upload-file

File server side

File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server does not provide authentication or secured connection by itself, but it is supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers:

GET /submission_archives/<id>.<ext> -- gets an archive with submitted source code and corresponding configuration of this job evaluation
GET /exercises/<hash> -- gets a file, common usage is for input files or reference result files
PUT /results/<id>.<ext> -- upload archive with evaluation results under specified name (should be same id as name of submission archive). On successful upload returns JSON { "result": "OK" } as body of returned page.

If not specified otherwise, zip format of archives is used. Symbol / in API description is root of file server's domain. If the domain is for example fs.recodex.org with SSL support, getting input file for one task could look as GET request to https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c.

Broker - Monitor communication

Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of socket is same on both sides, ROUTER. Monitor is set to act as server in this communication, its IP address and port are configurable in monitor's config file. ZeroMQ socket ID (set on monitor's side) is "recodex-monitor" and must be sent as first frame of every multipart message -- see ZeroMQ ROUTER socket documentation for more info.

Note that the monitor is designed so that it can receive data both from the broker and workers. The current architecture prefers the broker to do all the communication so that the workers do not have to know too many network services.

Monitor is treated as a somewhat optional part of whole solution, so no special effort on communication realibility was made.

Commands from monitor to broker:

Because there is no need for the monitor to communicate with the broker, there are no commands so far. Any message from monitor to broker is logged and discarded.

Commands from broker to monitor:

progress -- notification about progress with job evaluation. This communication is only redirected as is from worker, more info can be found in "Broker - Worker Communication" chapter above.

Broker - Web API communication

Broker communicates with main REST API through ZeroMQ connection over TCP. Socket type on broker side is ROUTER, on frontend part it is DEALER. Broker acts as a server, its IP address and port is configurable in the API.

Commands from API to broker:

eval -- evaluate a job. Requires at least 4 frames:
- job_id -- identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids)
- header -- additional header describing worker capabilities. Format must be header_name=value, every header shall be in a separate message frame. There is no maximum limit on number of headers. There may be also no headers at all. A worker is considered suitable for the job if and only if it satisfies all of its headers.
- empty frame -- frame which contains only empty string and serves only as breakpoint after headers
- job_url -- URI location of archive with job configuration and submitted source code
- result_url -- remote URI where results will be pushed to

Commands from broker to API (all are responses to eval command):

ack -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request
accept -- broker is capable of routing request to a worker
reject -- broker cannot handle this job (for example when the requirements specified by the headers cannot be met). There are (rare) cases when the broker finds that it cannot handle the job after it was confirmed. In such cases it uses the frontend REST API to mark the job as failed.

Asynchronous communication between broker and API

Only a fraction of the errors that can happen during evaluation can be detected while there is a ZeroMQ connection between the API and broker. To notify the frontend of the rest, we need an asynchronous communication channel that can be used by the broker when the status of a job changes (it's finished, it failed permanently, the only worker capable of processing it disconnected...).

This functionality is supplied by the broker-reports/ API endpoint group -- see its documentation for more details.

File Server - Web API communication

File server has a REST API for interaction with other parts of ReCodEx. Description of communication with workers is in "Worker - File Server Communication" chapter above. On top of that, there are other commands for interaction with the API:

GET /results/<id>.<ext> -- download archive with evaluated results of job id
POST /submissions/<id> -- upload new submission with identifier id. Expects that the body of the POST request uses file paths as keys and the content of the files as values. On successful upload returns JSON { "archive_path": <archive_url>, "result_path": <result_url> } in response body. From archive_path the submission can be downloaded (by worker) and corresponding evaluation results should be uploaded to result_path.
POST /tasks -- upload new files, which will be available by names equal to sha1sum of their content. There can be uploaded more files at once. On successful upload returns JSON { "result": "OK", "files": <file_list> } in response body, where file_list is dictionary of original file name as key and new URL with already hashed name as value.

There are no plans yet to support deleting files from this API. This may change in time.

Web API calls these fileserver endpoints with standard HTTP requests. There are no special commands involved. There is no communication in opposite direction.

Monitor - Web app communication

Monitor interacts with web application through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be multiple listeners per channel, even (shortly) delayed connections will receive all messages from the very beginning.

When monitor receives progress message from broker there are two options:

there is no WebSocket connection for listed channel (job id) -- message is dropped
there is active WebSocket connection for listed channel -- message is parsed into JSON format (see below) and send as string to that established channel. Messages for active connections are queued, so no messages are discarded even on heavy workload.

Message from monitor to web application is in JSON format and it has form of dictionary (associative array). Information contained in this message should correspond with the ones given by worker to broker. For further description please read more in "Broker - Worker communication" chapter under "progress" command.

Message format:

command -- type of progress, one of: DOWNLOADED, FAILED, UPLOADED, STARTED, ENDED, ABORTED, FINISHED, TASK
task_id -- id of currently evaluated task. Present only if command is "TASK".
task_state -- state of task with id task_id. Present only if command is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED".

Web app - Web API communication

Provided web application runs as javascript client inside user's browser. It communicates with REST API on the server through standard HTTP requests. Documentation of the main REST API is in separate document due to its extensiveness. Results are returned as JSON payload, which is simply parsed in web application and presented to the users.

159 KiB Raw Blame History Unescape Escape

Assignment

Current system

Exercise evaluation chain

Weaknesses

Requirements

System features

Requirements of the users

Administrative requirements

Non-functional requirements

Conclusion

Related work

Progtest

Codility

CMS

MOE

Kattis

Analysis

Basic concepts

Evaluation unit executed by ReCodEx

Evaluation progress state

Results of evaluation

Evaluation outputs

Scoring and assigning points

Persistence

Structure of the project

Implementation analysis

Communication between the backend components

Frontend - backend communication

Broker

Worker management

Scheduling

Forwarding jobs

Further requirements

Worker

Evaluation

Job configuration

Job working directories

Job variables

Supplementary files

Caching mechanism

Sandboxing

Linux

Windows

Fileserver

Monitor

API server

Architecture of the system

Request handling

Authentication

Forgotten password

Uploading files

Permissions

Solution loading

Communication with the backend

Backend failure reporting

Backend state monitoring

Web-app

User documentation

Terminology

General basics

First steps in ReCodEx

Forgotten password

Dashboard

Student

Join group and start solving assignments

Group supervisor

Manage group

Assigning exercises

Students' solutions management

Creating exercises

Exercise's reference solutions

Group administrator

Creating subgroups and managing supervisors

Instance administrator

Instance management

Licences

Superadministrator

Users management

Writing score configuration

159 KiB

Raw Blame History Unescape Escape