You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
213 lines
12 KiB
Markdown
213 lines
12 KiB
Markdown
# Introduction
|
|
|
|
Generally, there are a lot of different ways and opinions on how to teach people
|
|
something new. However, most people agree that a hands-on experience is one of
|
|
the best ways to make the human brain remember a new skill. Learning must be
|
|
entertaining and interactive, with fast and frequent feedback. Some kinds of
|
|
knowledge are more suitable for this practical type of learning than others, and
|
|
fortunately, programming is one of them.
|
|
|
|
University education system is one of the areas where this knowledge can be
|
|
applied. In computer programming, there are several requirements such as the
|
|
code being syntactically correct, efficient and easy to read, maintain and
|
|
extend. Correctness and efficiency can be tested automatically to help teachers
|
|
save time for their research, but checking for bad design, habits and mistakes
|
|
is really hard to automate and requires manpower.
|
|
|
|
Checking programs written by students takes a lot of time and requires a lot of
|
|
mechanical, repetitive work. The first idea of an automatic evaluation system
|
|
comes from Stanford University profesors in 1965. They implemented a system
|
|
which evaluated code in Algol submitted on punch cards. In following years, many
|
|
similar products were written.
|
|
|
|
There are two basic ways of automatically evaluating code -- statically (check
|
|
the code without running it; safe, but not much precise) or dynamically (run the
|
|
code on testing inputs with checking the outputs against reference ones; needs
|
|
sandboxing, but provides good real world experience).
|
|
|
|
This project focuses on the machine-controlled part of source code evaluation.
|
|
First, problems of present software at our university were discussed and similar
|
|
projects at other educational institutions were examined. With acquired
|
|
knowledge from such projects in production, we set up goals for the new
|
|
evaluation system, designed the architecture and implemented a fully operational
|
|
solution. The system is now ready for production testing at our university.
|
|
|
|
|
|
## Current solution at MFF UK
|
|
|
|
The ideas presented above are not completely new. There was a group of students,
|
|
who already implemented an evaluation solution for student's homeworks in 2006.
|
|
Its name is [CodEx - The Code Examiner](http://codex.ms.mff.cuni.cz/project/)
|
|
and it has been used with some improvements since then. The original plan was to
|
|
use the system only for basic programming courses, but there is demand for
|
|
adapting it for many different subjects.
|
|
|
|
CodEx is based on dynamic analysis. It features a web-based interface, where
|
|
supervisors assign exercises to their students and the students have a time
|
|
window to submit the solution. Each solution is compiled and run in sandbox
|
|
(MO-Eval). The metrics which are checked are: corectness of the output, time and
|
|
memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python
|
|
and Haskell.
|
|
|
|
Current system is old, but robust. There were no major security incidents during its production usage. However, from today's perspective there are several drawbacks. The main ones are:
|
|
|
|
- **web interface** -- The web interface is simple and fully functional. But
|
|
rapid development in web technologies opens new horizons of how web interface
|
|
can be made.
|
|
- **web api** -- CodEx offers a very limited XML API based on outdated
|
|
technologies that is not sufficient for users who would like to create custom
|
|
interfaces such as a command line tool or mobile application.
|
|
- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system
|
|
calls and blocking the bad ones. This can be easily done for single-threaded
|
|
applications, but proves difficult with multi-threaded ones. In present day,
|
|
parallelism is a very important area of computing, so there is requirement to
|
|
test multi-threaded applications too.
|
|
- **instances** -- Different ways of CodEx usage scenarios requires separate
|
|
instances (Programming I and II, Java, C#, etc.). This configuration is not
|
|
user friendly (students have to register in each instance separately) and
|
|
burdens administrators with unnecessary work. CodEx architecture does not
|
|
allow sharing hardware between instances, which results in an inefficient use
|
|
of hardware for evaluation.
|
|
- **task extensibility** -- There is a need to test and evaluate complicated
|
|
programs for classes such as Parallel programming or Compiler principles,
|
|
which have a more difficult evaluation chain than simple
|
|
compilation/execution/evaluation provided by CodEx.
|
|
|
|
After considerring all these facts, it is clear that CodEx cannot be used
|
|
anymore. The project is too old to just maintain it and extend for modern
|
|
technologies. Thus, it needs to be completely rewritten or another solution must
|
|
be found.
|
|
|
|
|
|
## Analysis of related projects
|
|
|
|
First of all, some code evaluating projects were found and examined. It is not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project.
|
|
|
|
### Progtest
|
|
|
|
[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in
|
|
Prague. As far as we know it is used for C/C++, Bash programming and
|
|
knowledge-based quizzes. There are several bonus points and penalties and also a
|
|
few hints what is failing in submitted solution. It is very strict on source
|
|
code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks
|
|
or array boundaries checks via `mudflap` library.
|
|
|
|
### Codility
|
|
|
|
[Codility](https://codility.com/) is web based solution primary targeted to company recruiters. It is commercial product of SaaS type supporting 16 programming languages. The [UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png) of Codility is [opensource](https://github.com/Codility/cui), the rest of source code is not available. One interesting feature is 'task timeline' -- captured progress of writing code for each user.
|
|
|
|
### CMS
|
|
|
|
[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system
|
|
for running and organizing programming contests. It is written in Python and
|
|
contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java.
|
|
PostgreSQL is a single point of failure, all modules heavily depend on database
|
|
connection. Task evaluation can be only three step pipeline -- compilation,
|
|
execution, evaluation. Execution is performed in
|
|
[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our
|
|
project, Mgr. Martin Mareš, Ph.D.
|
|
|
|
### MOE
|
|
|
|
[MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C
|
|
and Python. It does not provide a default GUI interface, all actions have to be
|
|
performed from command line. The system does not evaluate submissions in real
|
|
time, results are computed in batch mode after exercise deadline, using Isolate
|
|
for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but
|
|
the system is generally obsolete.
|
|
|
|
### Kattis
|
|
|
|
[Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean
|
|
and functional web UI, but the rest of the application is too simple. A nice
|
|
feature is the usage of a [standardized
|
|
format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for
|
|
exercises. Kattis is primarily used by programming contest organizators, company
|
|
recruiters and also some universities.
|
|
|
|
|
|
## ReCodEx goals
|
|
|
|
From the research above, we set up several goals, which a new system should
|
|
have. They mostly reflect drawbacks of current version of CodEx. No existing
|
|
tool fits our needs, for example no examined project provides complex
|
|
execution/evaluation pipeline to support needs of courses like Compiler
|
|
principles. Modifying CodEx is also not an option -- the required scope of a new
|
|
solution is too big. To sum up, a new evaluation system has to be written, with
|
|
only small parts of reused code from CodEx (for example judges).
|
|
|
|
The new project is **ReCodEx -- ReCodEx Code Examiner**. The name should point to
|
|
CodEx, previous evaluation solution, but also reflect new approach to solve
|
|
issues. **Re** as part of the name means redesigned, rewritten, renewed or
|
|
restarted.
|
|
|
|
Official assignment of the project is available at [web of software project committee](http://www.ksi.mff.cuni.cz/sw-projekty/zadani/recodex.pdf) (only in Czech). Most notable features are following:
|
|
|
|
- modern HTML5 web frontend written in Javascript using a suitable framework
|
|
- REST API implemented in PHP, communicating with database, backend and file server
|
|
- backend is implemented as distributed system on top of message queue framework (ZeroMQ) with master-worker architecture
|
|
- worker with basic support of Windows environment (without sandbox, no general purpose suitable tool available yet)
|
|
- evaluation procedure configured in YAML file, compound of small tasks connected into arbitrary oriented acyclic graph
|
|
|
|
|
|
## Terminology
|
|
|
|
Official terminology of ReCodEx which will be used in documentation and within code.
|
|
|
|
* **Exercise** -- Exercise is a template of programming problem including
|
|
detailed text description, evaluation instructions, sample implementation and
|
|
reference inputs and outputs. Typically, an author of exercise is a lecturer
|
|
of a programming class.
|
|
|
|
* **Assignment** -- Assignment is basically an instance of an exercise which was
|
|
assigned to a group of students by their supervisor. Supervisor can alter
|
|
predefined restrictions for resulting code (execution time limit, etc.),
|
|
deadlines and maximal amount of points for correct solutions.
|
|
|
|
* **Reference solution** -- Solution of exercise provided by author. This
|
|
solution should pass all test cases and could be also used for
|
|
auto-calibration of the exercise. One exercise could have more reference
|
|
solutions, for example in different programming languages or with varied
|
|
levels of efficiency.
|
|
|
|
* **Submission** -- Submission is one student's solution of an assignment
|
|
received by ReCodEx API. Submission can contain submitted source code and
|
|
additional information about assignment, exercise or submitter.
|
|
|
|
* **Job** -- Piece of work for a worker, generally corresponding to evaluation
|
|
of one submission. There are also other types of jobs like benchmarking
|
|
submission for memory and time limits configuration, but this classification
|
|
has no effect for evaluation. Internally, job is a set of small tasks defined
|
|
in exercise configuration. Job itself is transfered in the form of an archive
|
|
with submitted source codes and a configuration file written in YAML.
|
|
|
|
* **Task** -- Atomic piece of work defined in job configuration which can
|
|
execute external program or some internal command. External program execution
|
|
is (mostly) performed in sandboxed environment, internal commands are executed
|
|
directly. For example, one task could make a new directory, copy a file or
|
|
compile source codes using GCC.
|
|
|
|
* **Test** -- Test is a logical part of a job that checks the correctness of a
|
|
program. There can be multiple tests inside a job, which together prove the
|
|
validity and correctness of all aspects of the solution. In the simplest case,
|
|
testing is done by providing reference inputs to the tested program and
|
|
results are compared with reference outputs. One test consists of multiple
|
|
tasks.
|
|
|
|
* **Judge** -- Judge is a standalone comparision program that compares sample
|
|
outputs against output from tested programs.
|
|
|
|
* **Limits** -- Tasks executing external programs are usually executed in a
|
|
sandbox with defined limits on execution time, allocated memory, used disk
|
|
space and others. These limits are specified in job configuration. The term
|
|
_limits_ in this context means all the restrictions together.
|
|
|
|
* **Hwgroup** -- Hardware group is a set of workers with similar hardware. Its
|
|
purpose is to group workers that are likely to run a program using the same
|
|
amount of resources. Test limits are defined separately for each group. A
|
|
group has a unique string identifier and every worker in a group has this
|
|
identifier in its configuration file. Hardware group management is done
|
|
manually by the system administrator. Jobs can be routed to workers based on
|
|
hwgroup.
|
|
|