diff --git a/Rewritten-docs.md b/Rewritten-docs.md index b9e20a2..6891262 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -1,13 +1,162 @@ Introduction ============ -@todo: Describe who we are and what is the nature of the project. +Generally, there are a lot of different ways and opinions on how to teach people +something new. However, most people agree that a hands-on experience is one of +the best ways to make the human brain remember a new skill. Learning must be +entertaining and interactive, with fast and frequent feedback. Some kinds of +knowledge are more suitable for this practical type of learning than others, and +fortunately, programming is one of them. + +University education system is one of the areas where this knowledge can be +applied. In computer programming, there are several requirements such as the +code being syntactically correct, efficient and easy to read, maintain and +extend. Correctness and efficiency can be tested automatically to help teachers +save time for their research, but checking for bad design, habits and mistakes +is really hard to automate and requires manpower. + +Checking programs written by students takes a lot of time and requires a lot of +mechanical, repetitive work. The first idea of an automatic evaluation system +comes from Stanford University profesors in 1965. They implemented a system +which evaluated code in Algol submitted on punch cards. In following years, many +similar products were written. + +There are two basic ways of automatically evaluating code -- statically (check +the code without running it; safe, but not much precise) or dynamically (run the +code on testing inputs with checking the outputs against reference ones; needs +sandboxing, but provides good real world experience). + +This project focuses on the machine-controlled part of source code evaluation. +First, problems of present software at our university were discussed and similar +projects at other educational institutions were examined. With acquired +knowledge from such projects in production, we set up goals for the new +evaluation system, designed the architecture and implemented a fully operational +solution. The system is now ready for production testing at our university. Analysis -------- -@todo: Describe how the idea of ReCodEx originated and how we came up -with the stuff we implemented. +### Current solution at MFF UK + +The ideas presented above are not completely new. There was a group of students, +who already implemented an evaluation solution for student's homeworks in 2006. +Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) +and it has been used with some improvements since then. The original plan was to +use the system only for basic programming courses, but there is demand for +adapting it for many different subjects. + +CodEx is based on dynamic analysis. It features a web-based interface, where +supervisors assign exercises to their students and the students have a time +window to submit the solution. Each solution is compiled and run in sandbox +(MO-Eval). The metrics which are checked are: corectness of the output, time and +memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python +and Haskell. + +Current system is old, but robust. There were no major security incidents during its production usage. However, from today's perspective there are several drawbacks. The main ones are: + +- **web interface** -- The web interface is simple and fully functional. But + rapid development in web technologies opens new horizons of how web interface + can be made. +- **web api** -- CodEx offers a very limited XML API based on outdated + technologies that is not sufficient for users who would like to create custom + interfaces such as a command line tool or mobile application. +- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system + calls and blocking the bad ones. This can be easily done for single-threaded + applications, but proves difficult with multi-threaded ones. In present day, + parallelism is a very important area of computing, so there is requirement to + test multi-threaded applications too. +- **instances** -- Different ways of CodEx usage scenarios requires separate + instances (Programming I and II, Java, C#, etc.). This configuration is not + user friendly (students have to register in each instance separately) and + burdens administrators with unnecessary work. CodEx architecture does not + allow sharing hardware between instances, which results in an inefficient use + of hardware for evaluation. +- **task extensibility** -- There is a need to test and evaluate complicated + programs for classes such as Parallel programming or Compiler principles, + which have a more difficult evaluation chain than simple + compilation/execution/evaluation provided by CodEx. + +After considerring all these facts, it is clear that CodEx cannot be used +anymore. The project is too old to just maintain it and extend for modern +technologies. Thus, it needs to be completely rewritten or another solution must +be found. + +### Related projects + +First of all, some code evaluating projects were found and examined. It is not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project. + +#### Progtest + +[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in +Prague. As far as we know it is used for C/C++, Bash programming and +knowledge-based quizzes. There are several bonus points and penalties and also a +few hints what is failing in submitted solution. It is very strict on source +code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks +or array boundaries checks via `mudflap` library. + +#### Codility + +[Codility](https://codility.com/) is web based solution primary targeted to company recruiters. It is commercial product of SaaS type supporting 16 programming languages. The [UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png) of Codility is [opensource](https://github.com/Codility/cui), the rest of source code is not available. One interesting feature is 'task timeline' -- captured progress of writing code for each user. + +#### CMS + +[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system +for running and organizing programming contests. It is written in Python and +contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java. +PostgreSQL is a single point of failure, all modules heavily depend on database +connection. Task evaluation can be only three step pipeline -- compilation, +execution, evaluation. Execution is performed in +[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our +project, Mgr. Martin Mareš, Ph.D. + +#### MOE + +[MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C +and Python. It does not provide a default GUI interface, all actions have to be +performed from command line. The system does not evaluate submissions in real +time, results are computed in batch mode after exercise deadline, using Isolate +for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but +the system is generally obsolete. + +#### Kattis + +[Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean +and functional web UI, but the rest of the application is too simple. A nice +feature is the usage of a [standardized +format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for +exercises. Kattis is primarily used by programming contest organizators, company +recruiters and also some universities. + + +### ReCodEx goals + +From the research above, we set up several goals, which a new system should +have. They mostly reflect drawbacks of current version of CodEx. No existing +tool fits our needs, for example no examined project provides complex +execution/evaluation pipeline to support needs of courses like Compiler +principles. Modifying CodEx is also not an option -- the required scope of a new +solution is too big. To sum up, a new evaluation system has to be written, with +only small parts of reused code from CodEx (for example judges). + +The new project is **ReCodEx -- ReCodEx Code Examiner**. The name should point +to CodEx, previous evaluation solution, but also reflect new approach to solve +issues. **Re** as part of the name means redesigned, rewritten, renewed or +restarted. + +Official assignment of the project is available at [web of software project +committee](http://www.ksi.mff.cuni.cz/sw-projekty/zadani/recodex.pdf) (only in +Czech). Most notable features are following: + +- modern HTML5 web frontend written in Javascript using a suitable framework +- REST API implemented in PHP, communicating with database, backend and file + server +- backend is implemented as distributed system on top of message queue framework + (ZeroMQ) with master-worker architecture +- worker with basic support of Windows environment (without sandbox, no general + purpose suitable tool available yet) +- evaluation procedure configured in YAML file, compound of small tasks + connected into arbitrary oriented acyclic graph + Structure of the project ------------------------ @@ -669,14 +818,15 @@ First, write header of the job to the configuration file. ```{.yml} submission: job-id: hello-word-job - file-collector: http://localhost:9999/exercises hw-groups: - group1 ``` -Basically it means, that the job _hello-world-job_ is for C language and needs -to be run on workers with capabilities of _group1_ group. Reference files are -downloaded from http://localhost:9999/exercises. +Basically it means, that the job _hello-world-job_ needs to be run on workers +with capabilities of _group1_ group. Reference files are downloaded from the +default location configured in API (probably `http://localhost:9999/exercises`) +if not stated explicitly otherwise. Job execution log will not be saved to +result archive. Next the tasks have to be constructed under _tasks_ section. In this demo job, every task depends only on previous one. The first task has input file @@ -716,11 +866,18 @@ the program cannot be executed without being compiled first. It is important to mark this task with _execution_ type, so exceeded limits will be reported in frontend. -@todo describe overriding of default (per worker) limits and that we cannot -relax them -@todo state clearly that the limits on stdout, stderr and file IO are shared -(but if outputs are ignored (not redirected to a file), the program can use them -at will) +Time and memory limits set directly for a task have higher priority than worker +defaults. One important constraint is, that these limits cannot exceed limits +set by workers. Worker defaults are present as a safety for the sake of +possibility that wrong job configuration can block whole worker forever. Worker +default limits should be set reasonably high, like gigabyte of memory and couple +hours of execution time. For exact numbers please contact your administrator. + +It is good point to remind here, that if output of a program (both standard and +error) is redirected to a file, the sandbox disk quotas holds for these files as +well as for files created directly by the program. But if outputs are ignored, +they are redirected to `/dev/null` file where arbitrary amount of data can be +written. ```{.yml} - task-id: "execution_1"