diff --git a/Introduction.md b/Introduction.md index 1bcdd10..f96289e 100644 --- a/Introduction.md +++ b/Introduction.md @@ -1,31 +1,82 @@ # Introduction -Generally, there are a lot of different ways and opinions how to teach people something new. However, the significant majority of them suggested hands-on experience as one of the best techniques to make human brain to remember. Learning must be fun. Some kinds of knowledge are more suitable for this practical type of learning than others, programming is fortunatelly one of the better ones. Possibility of trying things in real time is just amazing. - -This knowledge is needed to reflect also in university education system. In computer science, there are several specific requirements -- the code must be efficient and easy to read, maintain and extend. Correctness and efficiency can be tested automatically to help teachers save time for their research, but checking for bad code, habbits and mistakes is really hard to automate and requires manpower. - -Checking programs written by students is very timeconsuming and also boring. First idea of automatic evaluation system comes from Standford University profesors in 1965. They implemented such software, which evaluates code in Algol submitted on punch cards. In following years, many similar products were written. - -There are two main ways how to automatically evaluate source code -- statically (check the code without running it; safe, but not much precise) or dynamically (run the code on testing inputs with checking the outputs against reference ones; needs sandboxing, but provides good real world experience). - -This project focuses on the machine-controlled part of source code evaluation. First, problems of present software at our university were discussed and similar projects at other educational institutions were examined. With acquired knowledge from such projects in production, we set up goals for the new evaluation system, designed the architecture and implemented working solution. The system is now ready for pruduction testing at our university. +Generally, there are a lot of different ways and opinions on how to teach people +something new. However, most people agree that a hands-on experience is one of +the best ways to make the human brain remember a new skill. Learning must be +entertaining and interactive, with fast and frequent feedback. Some kinds of +knowledge are more suitable for this practical type of learning than others, and +fortunately, programming is one of them. + +University education system is one of the areas where this knowledge can be +applied. In computer programming, there are several requirements such as the +code being syntactically correct, efficient and easy to read, maintain and +extend. Correctness and efficiency can be tested automatically to help teachers +save time for their research, but checking for bad design, habits and mistakes +is really hard to automate and requires manpower. + +Checking programs written by students takes a lot of time and requires a lot of +mechanical, repetitive work. The first idea of an automatic evaluation system +comes from Stanford University profesors in 1965. They implemented a system +which evaluated code in Algol submitted on punch cards. In following years, many +similar products were written. + +There are two basic ways of automatically evaluating code -- statically (check +the code without running it; safe, but not much precise) or dynamically (run the +code on testing inputs with checking the outputs against reference ones; needs +sandboxing, but provides good real world experience). + +This project focuses on the machine-controlled part of source code evaluation. +First, problems of present software at our university were discussed and similar +projects at other educational institutions were examined. With acquired +knowledge from such projects in production, we set up goals for the new +evaluation system, designed the architecture and implemented a fully operational +solution. The system is now ready for production testing at our university. ## Current solution at MFF UK -Ideas presented above are not completely new. There was a group of students, who already implemented an evaluation solution for student's homeworks back in 2006. Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) and it is used with some improvements till now. Original plan was to use the system only for basic programming courses, but now it is used much more widely and in different conditions than intended. +The ideas presented above are not completely new. There was a group of students, +who already implemented an evaluation solution for student's homeworks in 2006. +Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) +and it has been used with some improvements since then. The original plan was to +use the system only for basic programming courses, but there is demand for +adapting it for many different subjects. -CodEx is based on dynamic analysis. It is a system with web-based interface, where supervisors assign exercises to their students and the students have a time window to submit the solution. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. Supported languages are C, C++, C#, Java, Pascal, Python and Haskel. +CodEx is based on dynamic analysis. It features a web-based interface, where +supervisors assign exercises to their students and the students have a time +window to submit the solution. Each solution is compiled and run in sandbox +(MO-Eval). The metrics which are checked are: corectness of the output, time and +memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python +and Haskell. Current system is old, but robust. There were no major security incidents during its production usage. However, from today's perspective there are several drawbacks. The main ones are: -- **web interface** -- The web interface is simple and fully functional. But rapid development in web technologies opens new horizons of how web interface can be made. -- **web api** -- There is only very limited XML API based on outdated technologies in current CodEx. This locks users from creating custom interfaces like command line tool or mobile application. -- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system calls into operating system and blocking the bad ones. This could be easily done only for single-threaded applications. These days parallelism is very important part of computing, so there is requirement to test multi-threaded applications too. -- **instances** -- Different ways of CodEx usage scenarios requires separate instances (Programming I and II, Java, C#, etc.). This configuration is not user friendly (students have to register to each instance again) and puts unnecessary work to administrators. CodEx architecture does not allow to share hardware between instances, so not trivial amount of additional hardware is occupied. -- **task extensibility** -- There is a need to test and evaluate complicated programs from Parallel programming or Compiler principles classes, which have more difficult evaluation chain than simple compilation/execution/evaluation. CodEx is only capable of the simple solution without possibility of easy extension. - -After considerring all these facts, CodEx cannot be used anymore. The project is too old to just maintain it and extend for modern technologies. Thus, it needs to be completely rewritten or another solution must be found. +- **web interface** -- The web interface is simple and fully functional. But + rapid development in web technologies opens new horizons of how web interface + can be made. +- **web api** -- CodEx offers a very limited XML API based on outdated + technologies that isn't sufficient for users who would like to create custom + interfaces such as a command line tool or mobile application. +- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system + calls and blocking the bad ones. This can be easily done for single-threaded + applications, but proves difficult with multi-threaded ones. In present day, + parallelism is a very important area of computing, so there is requirement to + test multi-threaded applications too. +- **instances** -- Different ways of CodEx usage scenarios requires separate + instances (Programming I and II, Java, C#, etc.). This configuration is not + user friendly (students have to register in each instance separately) and + burdens administrators with unnecessary work. CodEx architecture does not + allow sharing hardware between instances, which results in an inefficient use + of hardware for evaluation. +- **task extensibility** -- There is a need to test and evaluate complicated + programs for classes such as Parallel programming or Compiler principles, + which have a more difficult evaluation chain than simple + compilation/execution/evaluation provided by CodEx. + +After considerring all these facts, it is clear that CodEx cannot be used +anymore. The project is too old to just maintain it and extend for modern +technologies. Thus, it needs to be completely rewritten or another solution must +be found. ## Analysis of related projects @@ -34,7 +85,12 @@ First of all, some code evaluating projects were found and examined. It is not a ### Progtest -[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes. There are several bonus points and penalties and also a few hints what is failing in submitted solution. It is very strict on source code quality, for example `-pedantic` option of GCC, Valgring for memory leaks or array boundaries checks via `mudflap` library. +[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in +Prague. As far as we know it is used for C/C++, Bash programming and +knowledge-based quizzes. There are several bonus points and penalties and also a +few hints what is failing in submitted solution. It is very strict on source +code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks +or array boundaries checks via `mudflap` library. ### Codility @@ -42,26 +98,52 @@ First of all, some code evaluating projects were found and examined. It is not a ### CMS -[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system for running and organizing programming contests. It is written in Python and contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java languages. PostgreSQL is single point of failure, all modules heavily depends on database connection. Task evaluation can be only three step pipeline -- compilation, execution, evaluation. Execution is performed in [Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D. +[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system +for running and organizing programming contests. It is written in Python and +contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java. +PostgreSQL is a single point of failure, all modules heavily depend on database +connection. Task evaluation can be only three step pipeline -- compilation, +execution, evaluation. Execution is performed in +[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our +project, Mgr. Martin Mareš, Ph.D. ### MOE -[MOE](http://www.ucw.cz/moe/) is grading system written in Shell scripts, C and Python. It does not provide default GUI interface, all actions have to be performed from command line. The system does not evaluate submissions in real time, results are computed in batch mode after exercise deadline. Used sandboxing environment is Isolate. Parts of MOE are used in other systems like CodEx or CMS, but the system is generally obsolete. +[MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C +and Python. It does not provide a default GUI interface, all actions have to be +performed from command line. The system does not evaluate submissions in real +time, results are computed in batch mode after exercise deadline, using Isolate +for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but +the system is generally obsolete. ### Kattis -[Kattis](http://www.kattis.com/) is another SaaS solution. It provides pretty nice web UI, but the rest of the application is too simple. Nice feature is usage of [standartized format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for exercises. Kattis is primarily used by programming contest organizators, company recruiters and also some universities. +[Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean +and functional web UI, but the rest of the application is too simple. A nice +feature is the usage of a [standardized +format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for +exercises. Kattis is primarily used by programming contest organizators, company +recruiters and also some universities. ## ReCodEx goals -From the research above, we set up several goals, which a new system should have. They mostly reflect drawbacks of current version of CodEx. No existing tool fits our needs, for example no examined project provides complex execution/evaluation pipeline to support needs of courses like Compiler principles. Modifying existing project is also not an option, because of specific university environment. To sum up, existing CodEx has to be completely rewritten, with only small parts of adopted code (for example judges). +From the research above, we set up several goals, which a new system should +have. They mostly reflect drawbacks of current version of CodEx. No existing +tool fits our needs, for example no examined project provides complex +execution/evaluation pipeline to support needs of courses like Compiler +principles. Modifying CodEx is also not an option -- the required scope of a new +solution is too big. To sum up, a new evaluation system has to be written, with +only small parts of reused code from CodEx (for example judges). -The new project is **ReCodEx - ReCodEx Code Examiner**. The name should point to CodEx, previous evaluation solution, but also reflect new approach to solve issues. **Re** as part of the name means redesigned, rewrited, renewed or restarted. +The new project is **ReCodEx - ReCodEx Code Examiner**. The name should point to +CodEx, previous evaluation solution, but also reflect new approach to solve +issues. **Re** as part of the name means redesigned, rewritten, renewed or +restarted. Official assignment of the project is available [here](http://www.ksi.mff.cuni.cz/sw-projekty/zadani/recodex.pdf) (only in czech). Most notable features are following: -- modern HTML5 web application written in Javascript using suitable framework +- modern HTML5 web frontend written in Javascript using a suitable framework - REST API implemented in PHP, communicating with database, backend and file server - backend is implemented as distributed system on top of message queue framework (ZeroMQ) with master-worker architecture - worker with basic support of Windows environment (without sandbox, no general purpose suitable tool available yet) @@ -72,23 +154,59 @@ Official assignment of the project is available [here](http://www.ksi.mff.cuni.c Official terminology of ReCodEx which will be used in documentation and within code. -* **Exercise** -- Exercise is a template of programming problem including detailed text description, evaluation instructions, sample implementation and reference inputs and outputs. Author of exercise is mostly lecturer of a programming class. - -* **Assignment** -- Assignment is basically instance of exercise which was assigned to a group of students by their supervisor. Supervisor can alter predefined restrictions for resulting code (execution time limit, etc.), deadlines and maximal amount of points for correct solutions. - -* **Reference solution** -- Solution of exercise provided by author. This solution should pass all test cases and could be also used for auto-calibration of the exercise. One exercise could have more reference solutions, for example in different programming languages or with various level of complexity. - -* **Submission** -- Submission is one student solution of an assignment received by ReCodEx API. Submission can contain submitted source code and additional information about assignment, exercise or submitter. - -* **Job** -- Piece of work for worker, generally corresponding to evaluation of one submission. There are also other types of jobs like benchmarking submission for memory and time limits configuration, but this classification has no effect for evaluation. Internally job is set of small tasks defined in exercise configuration. Job itself is transfered in form of archive with submitted source codes and configuration file written in YAML. - -* **Task** -- Atomic piece of work defined in job configuration which can execute external program or some internal command. External program execution is (mostly) performed in sandboxed environment, internal commands are executed directly. For example, one task could make a new directory, copy a file or compile source codes using GCC. - -* **Test** -- Test is a piece of work to check correctness of a program. There are multiple tests inside job, which together checks validity and correctness of all aspects of exercise solution. In easiest case, testing is done by providing reference inputs to the tested program and results are compared with reference outputs. One test consists of multiple tasks. - -* **Judge** -- Judge is a standalone comparision program which compares sample outputs against output from tested program. - -* **Limits** -- Tasks executing external programs are usually executed in sandbox with defined limits on run time, consumed memory, used disk space and others. These limits are specified in job configuration. The term _limits_ in this context means all the restrictions together. - -* **Hwgroup** -- Hardware group is set of workers with similar hardware capabilities. Each group has unique string identifier and every worker in particular group has that identifier inside its configuration. Hardware group management is done manually by system administrator. Jobs are routed to the workers according to hwgroup, limits are also tied up with specific hwgroup. +* **Exercise** -- Exercise is a template of programming problem including + detailed text description, evaluation instructions, sample implementation and + reference inputs and outputs. Typically, an author of exercise is a lecturer + of a programming class. + +* **Assignment** -- Assignment is basically an instance of an exercise which was + assigned to a group of students by their supervisor. Supervisor can alter + predefined restrictions for resulting code (execution time limit, etc.), + deadlines and maximal amount of points for correct solutions. + +* **Reference solution** -- Solution of exercise provided by author. This + solution should pass all test cases and could be also used for + auto-calibration of the exercise. One exercise could have more reference + solutions, for example in different programming languages or with varied + levels of efficiency. + +* **Submission** -- Submission is one student's solution of an assignment + received by ReCodEx API. Submission can contain submitted source code and + additional information about assignment, exercise or submitter. + +* **Job** -- Piece of work for a worker, generally corresponding to evaluation + of one submission. There are also other types of jobs like benchmarking + submission for memory and time limits configuration, but this classification + has no effect for evaluation. Internally, job is a set of small tasks defined + in exercise configuration. Job itself is transfered in the form of an archive + with submitted source codes and a configuration file written in YAML. + +* **Task** -- Atomic piece of work defined in job configuration which can + execute external program or some internal command. External program execution + is (mostly) performed in sandboxed environment, internal commands are executed + directly. For example, one task could make a new directory, copy a file or + compile source codes using GCC. + +* **Test** -- Test is a logical part of a job that checks the correctness of a + program. There can be multiple tests inside a job, which together prove the + validity and correctness of all aspects of the solution. In the simplest case, + testing is done by providing reference inputs to the tested program and + results are compared with reference outputs. One test consists of multiple + tasks. + +* **Judge** -- Judge is a standalone comparision program that compares sample + outputs against output from tested programs. + +* **Limits** -- Tasks executing external programs are usually executed in a + sandbox with defined limits on execution time, allocated memory, used disk + space and others. These limits are specified in job configuration. The term + _limits_ in this context means all the restrictions together. + +* **Hwgroup** -- Hardware group is a set of workers with similar hardware. Its + purpose is to group workers that are likely to run a program using the same + amount of resources. Test limits are defined separately for each group. A + group has a unique string identifier and every worker in a group has this + identifier in its configuration file. Hardware group management is done + manually by the system administrator. Jobs can be routed to workers based on + hwgroup.