# Introduction

Generally, there are a lot of different ways and opinions on how to teach people 
something new. However, most people agree that a hands-on experience is one of 
the best ways to make the human brain remember a new skill. Learning must be 
entertaining and interactive, with fast and frequent feedback. Some kinds of 
knowledge are more suitable for this practical type of learning than others, and 
fortunately, programming is one of them.

University education system is one of the areas where this knowledge can be 
applied. In computer programming, there are several requirements such as the 
code being syntactically correct, efficient and easy to read, maintain and 
extend. Correctness and efficiency can be tested automatically to help teachers 
save time for their research, but checking for bad design, habits and mistakes 
is really hard to automate and requires manpower.

Checking programs written by students takes a lot of time and requires a lot of 
mechanical, repetitive work. The first idea of an automatic evaluation system 
comes from Stanford University profesors in 1965. They implemented a system 
which evaluated code in Algol submitted on punch cards. In following years, many 
similar products were written.

There are two basic ways of automatically evaluating code -- statically (check 
the code without running it; safe, but not much precise) or dynamically (run the 
code on testing inputs with checking the outputs against reference ones; needs 
sandboxing, but provides good real world experience).

This project focuses on the machine-controlled part of source code evaluation. 
First, problems of present software at our university were discussed and similar 
projects at other educational institutions were examined. With acquired 
knowledge from such projects in production, we set up goals for the new 
evaluation system, designed the architecture and implemented a fully operational 
solution. The system is now ready for production testing at our university.


## Current solution at MFF UK

The ideas presented above are not completely new. There was a group of students, 
who already implemented an evaluation solution for student's homeworks in 2006. 
Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) 
and it has been used with some improvements since then. The original plan was to 
use the system only for basic programming courses, but there is demand for 
adapting it for many different subjects.

CodEx is based on dynamic analysis. It features a web-based interface, where 
supervisors assign exercises to their students and the students have a time 
window to submit the solution. Each solution is compiled and run in sandbox 
(MO-Eval). The metrics which are checked are: corectness of the output, time and 
memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python 
and Haskell.

Current system is old, but robust. There were no major security incidents during its production usage. However, from today's perspective there are several drawbacks. The main ones are:

- **web interface** -- The web interface is simple and fully functional. But 
  rapid development in web technologies opens new horizons of how web interface 
  can be made.
- **web api** -- CodEx offers a very limited XML API based on outdated 
  technologies that is not sufficient for users who would like to create custom 
  interfaces such as a command line tool or mobile application.
- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system 
  calls and blocking the bad ones. This can be easily done for single-threaded 
  applications, but proves difficult with multi-threaded ones. In present day,
  parallelism is a very important area of computing, so there is requirement to 
  test multi-threaded applications too.
- **instances** -- Different ways of CodEx usage scenarios requires separate 
  instances (Programming I and II, Java, C#, etc.). This configuration is not 
  user friendly (students have to register in each instance separately) and 
  burdens administrators with unnecessary work. CodEx architecture does not 
  allow sharing hardware between instances, which results in an inefficient use 
  of hardware for evaluation.
- **task extensibility** -- There is a need to test and evaluate complicated 
  programs for classes such as Parallel programming or Compiler principles, 
  which have a more difficult evaluation chain than simple 
  compilation/execution/evaluation provided by CodEx.

After considerring all these facts, it is clear that CodEx cannot be used 
anymore. The project is too old to just maintain it and extend for modern 
technologies. Thus, it needs to be completely rewritten or another solution must 
be found.


## Analysis of related projects

First of all, some code evaluating projects were found and examined. It is not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project.

### Progtest

[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in 
Prague. As far as we know it is used for C/C++, Bash programming and 
knowledge-based quizzes. There are several bonus points and penalties and also a 
few hints what is failing in submitted solution. It is very strict on source 
code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks 
or array boundaries checks via `mudflap` library.

### Codility

[Codility](https://codility.com/) is web based solution primary targeted to company recruiters. It is commercial product of SaaS type supporting 16 programming languages. The [UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png) of Codility is [opensource](https://github.com/Codility/cui), the rest of source code is not available. One interesting feature is 'task timeline' -- captured progress of writing code for each user.

### CMS

[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system 
for running and organizing programming contests. It is written in Python and 
contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java. 
PostgreSQL is a single point of failure, all modules heavily depend on database 
connection. Task evaluation can be only three step pipeline -- compilation, 
execution, evaluation. Execution is performed in 
[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our 
project, Mgr. Martin Mareš, Ph.D.

### MOE

[MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C 
and Python. It does not provide a default GUI interface, all actions have to be 
performed from command line. The system does not evaluate submissions in real 
time, results are computed in batch mode after exercise deadline, using Isolate 
for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but 
the system is generally obsolete.

### Kattis

[Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean 
and functional web UI, but the rest of the application is too simple. A nice 
feature is the usage of a [standardized 
format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for 
exercises. Kattis is primarily used by programming contest organizators, company 
recruiters and also some universities.


## ReCodEx goals

From the research above, we set up several goals, which a new system should 
have. They mostly reflect drawbacks of current version of CodEx. No existing 
tool fits our needs, for example no examined project provides complex 
execution/evaluation pipeline to support needs of courses like Compiler 
principles. Modifying CodEx is also not an option -- the required scope of a new 
solution is too big. To sum up, a new evaluation system has to be written, with 
only small parts of reused code from CodEx (for example judges).

The new project is **ReCodEx -- ReCodEx Code Examiner**. The name should point to 
CodEx, previous evaluation solution, but also reflect new approach to solve 
issues. **Re** as part of the name means redesigned, rewritten, renewed or 
restarted.

Official assignment of the project is available at [web of software project committee](http://www.ksi.mff.cuni.cz/sw-projekty/zadani/recodex.pdf) (only in Czech). Most notable features are following:

- modern HTML5 web frontend written in Javascript using a suitable framework
- REST API implemented in PHP, communicating with database, backend and file server
- backend is implemented as distributed system on top of message queue framework (ZeroMQ) with master-worker architecture
- worker with basic support of Windows environment (without sandbox, no general purpose suitable tool available yet)
- evaluation procedure configured in YAML file, compound of small tasks connected into arbitrary oriented acyclic graph


## Terminology

Official terminology of ReCodEx which will be used in documentation and within code.

* **Exercise** -- Exercise is a template of programming problem including 
  detailed text description, evaluation instructions, sample implementation and 
  reference inputs and outputs. Typically, an author of exercise is a lecturer 
  of a programming class.

* **Assignment** -- Assignment is basically an instance of an exercise which was 
  assigned to a group of students by their supervisor. Supervisor can alter 
  predefined restrictions for resulting code (execution time limit, etc.), 
  deadlines and maximal amount of points for correct solutions.

* **Reference solution** -- Solution of exercise provided by author. This 
  solution should pass all test cases and could be also used for 
  auto-calibration of the exercise. One exercise could have more reference 
  solutions, for example in different programming languages or with varied 
  levels of efficiency.

* **Submission** -- Submission is one student's solution of an assignment 
  received by ReCodEx API. Submission can contain submitted source code and 
  additional information about assignment, exercise or submitter.

* **Job** -- Piece of work for a worker, generally corresponding to evaluation 
  of one submission. There are also other types of jobs like benchmarking 
  submission for memory and time limits configuration, but this classification 
  has no effect for evaluation. Internally, job is a set of small tasks defined 
  in exercise configuration. Job itself is transfered in the form of an archive 
  with submitted source codes and a configuration file written in YAML. 

* **Task** -- Atomic piece of work defined in job configuration which can 
  execute external program or some internal command. External program execution 
  is (mostly) performed in sandboxed environment, internal commands are executed 
  directly. For example, one task could make a new directory, copy a file or 
  compile source codes using GCC.

* **Test** -- Test is a logical part of a job that checks the correctness of a 
  program. There can be multiple tests inside a job, which together prove the 
  validity and correctness of all aspects of the solution. In the simplest case, 
  testing is done by providing reference inputs to the tested program and 
  results are compared with reference outputs. One test consists of multiple 
  tasks.

* **Judge** -- Judge is a standalone comparision program that compares sample 
  outputs against output from tested programs.

* **Limits** -- Tasks executing external programs are usually executed in a
  sandbox with defined limits on execution time, allocated memory, used disk 
  space and others. These limits are specified in job configuration. The term 
  _limits_ in this context means all the restrictions together.

* **Hwgroup** -- Hardware group is a set of workers with similar hardware. Its 
  purpose is to group workers that are likely to run a program using the same 
  amount of resources. Test limits are defined separately for each group. A 
  group has a unique string identifier and every worker in a group has this 
  identifier in its configuration file. Hardware group management is done 
  manually by the system administrator. Jobs can be routed to workers based on 
  hwgroup.