@ -8,49 +8,49 @@ Checking programs written by students is very timeconsuming and also boring. Fir
There are two main ways how to automatically evaluate source code -- statically (check the code without running it; safe, but not much precise) or dynamically (run the code on testing inputs with checking the outputs against reference ones; needs sandboxing, but provides good real world experience).
In this project, we'll focus on the machine-controlled part of source code evaluation. First, problems of present software at our university will be discussed and then similar projects at other educational institutions will be examined. With acquired knowledge from such projects in production, we'll set up goals for the new evaluation system, design the architecture and implement working solution. If there is enough time, we'll test it in production at our university.
This project focuses on the machine-controlled part of source code evaluation. First, problems of present software at our university were discussed and similar projects at other educational institutions were examined. With acquired knowledge from such projects in production, we set up goals for the new evaluation system, designed the architecture and implemented working solution. The system is now ready for pruduction testing at our university.
## Current solution at MFF UK
Ideas presented above aren't completely new. There was a group of students, who already implemented an evaluation solution for student's homeworks back in 2006. The system was rewritten several times after that, but since 2010 there was only one update. Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) and it's used till now.
Ideas presented above are not completely new. There was a group of students, who already implemented an evaluation solution for student's homeworks back in 2006. Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) and it is used with some improvements till now. Original plan was to use the system only for basic programming courses, but now it is used much more widely and in different conditions than intended.
CodEx is based on dynamic analysis. It's a system with web-based interface, where supervisors assign exercises to their students and the students have a time window to submit the solution. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. Supported languages are C, C++, C#, Pascal, Java and Haskel.
CodEx is based on dynamic analysis. It is a system with web-based interface, where supervisors assign exercises to their students and the students have a time window to submit the solution. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. Supported languages are C, C++, C#, Java, Pascal, Python and Haskel.
Current system is old, but robust. There were no major security incident during its production usage. However, from today's perspective there are several drawbacks. The main ones are:
- **web interface** - The web interface is simple and fully functional. But rapid development in web technologies opens new horizons of how web interface can be made.
- **web api** - There is no API support in current CodEx. This locks users from creating custom interfaces like command line tool or mobile application.
- **sandboxing** - MO-Eval sandbox is based on principle of monitoring system calls into operation system and blocking the bad ones. This could be easily done only for single-threaded applications. These days parallelism is very important part of computing, so there is requirement to test multi-threaded applications too.
- **hardware occupation** - Configuration of CodEx doesn't allow to share hardware between instances. Due to current architecture there are several separate instances (Programming I and II, Java, C#, etc.) which occupies not trivial amount of hardware.
- **task extensibility** - There is need to test more advanced tasks with difficult evaluation chain. Parallel programming and Compiler principles are such examples.
- **web interface** -- The web interface is simple and fully functional. But rapid development in web technologies opens new horizons of how web interface can be made.
- **web api** -- There is only very limited XML API based on outdated technologies in current CodEx. This locks users from creating custom interfaces like command line tool or mobile application.
- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system calls into operation system and blocking the bad ones. This could be easily done only for single-threaded applications. These days parallelism is very important part of computing, so there is requirement to test multi-threaded applications too.
- **instances** -- Different ways of CodEx usage scenarios requires separate instances (Programming I and II, Java, C#, etc.). This configuration is not user friendly (students have to register to each instance again) and puts unnecessary work to administrators. CodEx architecture does not allow to share hardware between instances, so not trivial amount of additional hardware is occupied.
- **task extensibility** -- There is a need to test and evaluate complicated programs from Parallel programming or Compiler principles classes, which have more difficult evaluation chain than simple compilation/execution/evaluation. CodEx is only capable of the simple solution without possibility of easy extension.
After considerring all these facts, CodEx can't be used anymore. It's too old project to just maintain it and extend for modern technologies. Thus, it needs to be completely rewritten or another solution must be found.
After considerring all these facts, CodEx cannot be used anymore. The project is too old to just maintain it and extend for modern technologies. Thus, it needs to be completely rewritten or another solution must be found.
## Analysis of related projects
First of all, some code evaluating projects were found and examined. It's not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project.
First of all, some code evaluating projects were found and examined. It is not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project.
### Progtest
[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in Prague. As far as we know it's used for C/C++, Bash programming and knowledge-based quizzes. There are several bonus points and penalties and also a few hints what is failing in submitted solution. It's very strict on source code quality, for example `-pedantic` option of GCC, Valgring for memory leaks or array boundaries checks via `mudflap` library.
[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes. There are several bonus points and penalties and also a few hints what is failing in submitted solution. It is very strict on source code quality, for example `-pedantic` option of GCC, Valgring for memory leaks or array boundaries checks via `mudflap` library.
### Codility
[Codility](https://codility.com/) is web based solution primary targeted to company recruiters. It's commercial product of SaaS type. Interesting feature is "task timeline", which shows progress of writing code in a browser on a timeline. The [UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png) of Codility is [opensource](https://github.com/Codility/cui). Codility supports 16 programming languages.
[Codility](https://codility.com/) is web based solution primary targeted to company recruiters. It is commercial product of SaaS type supporting 16 programming languages. The [UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png) of Codility is [opensource](https://github.com/Codility/cui), the rest of source code is not available. One interesting feature is 'task timeline' -- captured progress of writing code for each user.
### CMS
[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system for running and organizing programming contests. It's written in Python and contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java languages. PostgreSQL is single point of failure, all modules heavily depends on DB connection. Task evaluation can be only three step pipeline -- compilation, execution, evaluation. Execution is performed in [Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D.
[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system for running and organizing programming contests. It is written in Python and contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java languages. PostgreSQL is single point of failure, all modules heavily depends on database connection. Task evaluation can be only three step pipeline -- compilation, execution, evaluation. Execution is performed in [Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D.
### MOE
[MOE](http://www.ucw.cz/moe/) is old grading system, which is mostly obsolete. Parts of it are used in other systems like CodEx or CMS. It's written in Shell scripts, C and Python. MOE doesn't provide default GUI interface, all is managed from command line. It has simple configuration, but doesn't evaluate submission in real time. Isolate is part of this project too.
[MOE](http://www.ucw.cz/moe/) is grading system written in Shell scripts, C and Python. It does not provide default GUI interface, all actions have to be performed from command line. The system does not evaluate submissions in real time, results are computated in batch mode after exercise deadline. Used sandboxing environment is Isolate. Parts of MOE are used in other systems like CodEx or CMS, but the system is generally obsolete.
### Kattis
[Kattis](http://www.kattis.com/) is another SaaS solution. it's used for contests, companies and also some universities. The web UI is pretty nice, but everything is too much simple. They use [standartized format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for their exercises.
[Kattis](http://www.kattis.com/) is another SaaS solution. It provides pretty nice web UI, but the rest of this application is too much simple. Nice point is used [standartized format](http://www.problemarchive.org/wiki/index.php/Problem_Format) of exercises. Kattis is primarily used by programming contest organizators, company recruiters and also some universities.
## ReCodEx goals
@ -72,24 +72,23 @@ Official assignment of the project is available [here](http://www.ksi.mff.cuni.c
Official terminology of ReCodEx which will be used in documentation and within code.
* **Exercise** - Exercise is basic point of all evaluation. Students are receiving assignment from their teacher, but firstly there has to be authors of exercises from which assignments are derived. Assignment is basically exercise which was assigned to some group of students for evaluation.
* **Exercise** -- Exercise is a template of programming problem including detailed text description, evaluation instructions, sample implementation and reference inputs and outputs. Author of exercise is mostly lecturer of a programming class.
* **Assignment** - Teachers are creating assignments from exercises which are solved by students. When assignment is solved then students submit their solution, this solution with all other information (needed by worker) is called submission.
* **Assignment** -- Assignment is basically instance of exercise which was assigned to a group of students by their supervisor. He can alter predefined restrictions for resulting code (execution time limit, etc.), deadlines and maximal amount of points for correct solutions.
* **Reference solution** - When authors create exercises, they should provide sample solution. This solution should pass all test cases in specified limits. It can be also used for auto-calibration of exercise.
* **Reference solution** -- Solution of exercise provided by author. This solution should pass all test cases and could be also used for auto-calibration of the exercise. One exercise could have more reference solutions, for example in different programming languages or with various level of complexity.
* **Submission** - Submission is one solution of given exercise, it is sent by student to frontend of ReCodEx. To this term we can include all additional information about source code or submitter.
* **Submission** -- Submission is one student solution of an assignment received by ReCodEx API. Submission can contain submitted source code and additional informations about assignment, exercise or submitter.
* **Job** - Piece of work for worker. Internally it's performing set of small tasks from job configuration. Job itself is transfered in form of archive with submitted source codes and configuration file written in YAML. Typicaly, job is one standard submission, but there could be also benchmarking submission for configuring limits of exercise or maybe submission for determining hardware and software configuration of given worker. This classification have no effect for evaluation of the job.
* **Job** -- Piece of work for worker, generally corresponding to evaluation of one submission. There are also other types of jobs like benchmarking submission for memory and time limits configuration, but this classification has no effect for evaluation. Internally job is set of small tasks defined in exercise configuration. Job itself is transfered in form of archive with submitted source codes and configuration file written in YAML.
* **Task** - Atomic piece of work which can execute external program or some internal command.
* **Task** -- Atomic piece of work which can execute external program or some internal command. External program execution is (mostly) performed in sandboxed environment, internal commands are executed directly. For example, one task could make a new directory, copy a file or compile source codes using GCC.
* **Tests** - Tests are inputs to given student solution of exercise. On specified inputs program should have specified outputs, this is checked with judges. Test files can be send to standard input or just be there for the use from program.
* **Test** -- Test is a piece of work to check correctness of a program. There are multiple tests inside job, which together checks validity and correctness of all aspects of exercise solution. In easiest case, testing is done by providing reference inputs to the tested program and results are compared with reference outputs. One test consist of multiple tasks.
* **Sample outputs** - Sample outputs are basically given results of tasks. Output from program is compared with this files through judge.
* **Judge** -- Judge is a standalone comparision program which compares sample outputs against output from program.
* **Judge** - Judge is a comparision program which compares sample outputs against output from program.
* **Limits** -- Tasks executing external program are usually using sandbox with defined limits on running time, consumed memory, used disk space and others. The term 'limits' in this context means all these restrictions for program execution together.
* **Limits** - Particular tasks are usually running in sandbox, these limits are forwarded to the sandbox.
* **Hwgroup** -- Hardware group is set of workers with similar hardware capabilities. Each group has unique string identifier and every worker in particular group has that identifier inside its configuration. Hardware group management is done manualy by system administrator. Jobs are routed to the workers according to hwgroup, limits are also tied up with specific hwgroup.
* **Hwgroup** - Hardware group reflects hardware capabilities of each worker. It's just string identifier set up by administrator to each worker. Jobs are routed to the workers according to hwgroup, limits are also tied up with specific hwgroup.