master
Martin Polanka 8 years ago
parent 77f8873ea3
commit 55bfd6b5a1

@ -6,20 +6,20 @@ This knowledge is needed to reflect also in university education system. In comp
Checking programs written by students is very timeconsuming and also boring. First idea of automatic evaluation system comes from Standford University profesors in 1965. They implemented such software, which evaluates code in Algol submitted on punch cards. In following years, many similar products were written. Checking programs written by students is very timeconsuming and also boring. First idea of automatic evaluation system comes from Standford University profesors in 1965. They implemented such software, which evaluates code in Algol submitted on punch cards. In following years, many similar products were written.
There are two main ways how to automatically evaluate source code -- statically (chceck the code without running it; safe, but not much precise) or dynamically (run the code on testing inputs with checking the outputs against reference ones; needs sandboxing, but provides good real world experience). There are two main ways how to automatically evaluate source code -- statically (check the code without running it; safe, but not much precise) or dynamically (run the code on testing inputs with checking the outputs against reference ones; needs sandboxing, but provides good real world experience).
In this project, we'll focus on the machine-controlled part of source code evaluation. First, problems of present software at our university will be discussed and then similar projects at other educational institutions will be examined. With acquired knowledge from such projects in production, we'll set up goals for the new evaluation system, design the architecture and implement working version. If there is enough time, we'll test it in production at our university. In this project, we'll focus on the machine-controlled part of source code evaluation. First, problems of present software at our university will be discussed and then similar projects at other educational institutions will be examined. With acquired knowledge from such projects in production, we'll set up goals for the new evaluation system, design the architecture and implement working solution. If there is enough time, we'll test it in production at our university.
## Current solution at MFF UK ## Current solution at MFF UK
Ideas presented above aren't completely new. There was a group of students, who already implemented an evaluation solution for students' homeworks back in 2006. The system was rewritten several times after that, but after 2010 there was only one update. Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) and it's used till now. Ideas presented above aren't completely new. There was a group of students, who already implemented an evaluation solution for student's homeworks back in 2006. The system was rewritten several times after that, but since 2010 there was only one update. Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/) and it's used till now.
CodEx is based on dynamic analysis. It's a system with web-based interface, where supervisors assigns exercises to their students and the students have a time window to submit the solution. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. Supported languages are C, C++, C#, Pascal, Java and Haskel. CodEx is based on dynamic analysis. It's a system with web-based interface, where supervisors assign exercises to their students and the students have a time window to submit the solution. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. Supported languages are C, C++, C#, Pascal, Java and Haskel.
Current system is old, but robust. There were no major security incident during it's production usage. However, from today's perspective there are several drawbacks. The main ones are: Current system is old, but robust. There were no major security incident during its production usage. However, from today's perspective there are several drawbacks. The main ones are:
- **web interface** - The web interface is simple and fully functional. But rapid development in web technologies opens new horizons how web interface can be made. - **web interface** - The web interface is simple and fully functional. But rapid development in web technologies opens new horizons of how web interface can be made.
- **web api** - There is no API support in current CodEx. This locks users from creating custom interfaces like command line tool or mobile application. - **web api** - There is no API support in current CodEx. This locks users from creating custom interfaces like command line tool or mobile application.
- **sandboxing** - MO-Eval sandbox is based on principle of monitoring system calls into operation system and blocking the bad ones. This could be easily done only for single-threaded applications. These days parallelism is very important part of computing, so there is requirement to test multi-threaded applications too. - **sandboxing** - MO-Eval sandbox is based on principle of monitoring system calls into operation system and blocking the bad ones. This could be easily done only for single-threaded applications. These days parallelism is very important part of computing, so there is requirement to test multi-threaded applications too.
- **hardware occupation** - Configuration of CodEx doesn't allow to share hardware between instances. Due to current architecture there are several separate instances (Programming I and II, Java, C#, etc.) which occupies not trivial amount of hardware. - **hardware occupation** - Configuration of CodEx doesn't allow to share hardware between instances. Due to current architecture there are several separate instances (Programming I and II, Java, C#, etc.) which occupies not trivial amount of hardware.
@ -34,7 +34,7 @@ First of all, some code evaluating projects were found and examined. It's not a
### Progtest ### Progtest
[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in Prague. As far as we know it's used for C/C++, Bash programming and knowledge-based quizes. There are several bonus points and penalties and also a few hints what is failing in submitted solution. It's very strict on source code quality, for example `-pedantic` option of GCC, Valgring for memory leaks or array boundaries checks via `mudflap` library. [Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in Prague. As far as we know it's used for C/C++, Bash programming and knowledge-based quizzes. There are several bonus points and penalties and also a few hints what is failing in submitted solution. It's very strict on source code quality, for example `-pedantic` option of GCC, Valgring for memory leaks or array boundaries checks via `mudflap` library.
### Codility ### Codility
@ -42,7 +42,7 @@ First of all, some code evaluating projects were found and examined. It's not a
### CMS ### CMS
[CMS](http://cms-dev.github.io/index.html) is and opensource distributed system for running and organizing programming contests. It's written in Python and contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java languages. PostgreSQL is single point of failure, all modules heavily depends on DB connection. Task evaluation can be only three step pipeline -- compilation, execution, evaluation. Execution is performed in [Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D. [CMS](http://cms-dev.github.io/index.html) is an opensource distributed system for running and organizing programming contests. It's written in Python and contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java languages. PostgreSQL is single point of failure, all modules heavily depends on DB connection. Task evaluation can be only three step pipeline -- compilation, execution, evaluation. Execution is performed in [Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D.
### MOE ### MOE
@ -53,9 +53,9 @@ First of all, some code evaluating projects were found and examined. It's not a
[Kattis](http://www.kattis.com/) is another SaaS solution. it's used for contests, companies and also some universities. The web UI is pretty nice, but everything is too much simple. They use [standartized format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for their exercises. [Kattis](http://www.kattis.com/) is another SaaS solution. it's used for contests, companies and also some universities. The web UI is pretty nice, but everything is too much simple. They use [standartized format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for their exercises.
## Survey results ## ReCodEx goals
From the survey above, we set up several goals, which a new system should have. They mostly reflect drawbacks of current version of CodEx. No existing tool fits our needs, for example no examined project provides complex execution/evaluation pipeline to support needs of courses like Compiler principles. Modifying existing project is also not an option, because of specific university environment. To sum up, existing CodEx has to be completelly rewritten, with only small parts of adopted code (for example judges). From the research above, we set up several goals, which a new system should have. They mostly reflect drawbacks of current version of CodEx. No existing tool fits our needs, for example no examined project provides complex execution/evaluation pipeline to support needs of courses like Compiler principles. Modifying existing project is also not an option, because of specific university environment. To sum up, existing CodEx has to be completely rewritten, with only small parts of adopted code (for example judges).
The new project is **ReCodEx - ReCodEx Code Examiner**. The name should point to CodEx, previous evaluation solution, but also reflect new approach to solve issues. **Re** as part of the name means redesigned, rewrited, renewed or restarted. The new project is **ReCodEx - ReCodEx Code Examiner**. The name should point to CodEx, previous evaluation solution, but also reflect new approach to solve issues. **Re** as part of the name means redesigned, rewrited, renewed or restarted.
@ -80,7 +80,7 @@ Official terminology of ReCodEx which will be used in documentation and within c
* **Submission** - Submission is one solution of given exercise, it is sent by student to frontend of ReCodEx. To this term we can include all additional information about source code or submitter. * **Submission** - Submission is one solution of given exercise, it is sent by student to frontend of ReCodEx. To this term we can include all additional information about source code or submitter.
* **Job** - Piece of work for worker. Internally it's performing set of small tasks from job configuration. Job itself is transfered in form of archive with submitted source codes and configuration file written in YAML. Typicaly, job is one standard submission, but there could be also benchmarking submission for configuring limits of exercise or maybe submission for determining hardware and software configuration of given worker. This classification have no effect for evaluating the job. * **Job** - Piece of work for worker. Internally it's performing set of small tasks from job configuration. Job itself is transfered in form of archive with submitted source codes and configuration file written in YAML. Typicaly, job is one standard submission, but there could be also benchmarking submission for configuring limits of exercise or maybe submission for determining hardware and software configuration of given worker. This classification have no effect for evaluation of the job.
* **Task** - Atomic piece of work which can execute external program or some internal command. * **Task** - Atomic piece of work which can execute external program or some internal command.
@ -93,4 +93,3 @@ Official terminology of ReCodEx which will be used in documentation and within c
* **Limits** - Particular tasks are usually running in sandbox, these limits are forwarded to the sandbox. * **Limits** - Particular tasks are usually running in sandbox, these limits are forwarded to the sandbox.
* **Hwgroup** - Hardware group reflects hardware capabilities of each worker. It's just string identifier set up by administrator to each worker. Jobs are routed to the workers according to hwgroup, limits are also tied up with specific hwgroup. * **Hwgroup** - Hardware group reflects hardware capabilities of each worker. It's just string identifier set up by administrator to each worker. Jobs are routed to the workers according to hwgroup, limits are also tied up with specific hwgroup.

Loading…
Cancel
Save