You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

10 KiB

Raw Blame History

Introduction

Generally, there are a lot of different ways and opinions how to teach people something new. However, the significant majority of them suggested hands-on experience as one of the best techniques to make human brain to remember. Learning must be fun. Some kinds of knowledge are more suitable for this practical type of learning than others, programming is fortunatelly one of the better ones. Possibility of trying things in real time is just amazing.

This knowledge is needed to reflect also in university education system. In computer science, there are several specific requirements -- the code must be efficient and easy to read, maintain and extend. Correctness and efficiency can be tested automatically to help teachers save time for their research, but checking for bad code, habbits and mistakes is really hard to automate and requires manpower.

Checking programs written by students is very timeconsuming and also boring. First idea of automatic evaluation system comes from Standford University profesors in 1965. They implemented such software, which evaluates code in Algol submitted on punch cards. In following years, many similar products were written.

There are two main ways how to automatically evaluate source code -- statically (chceck the code without running it; safe, but not much precise) or dynamically (run the code on testing inputs with checking the outputs against reference ones; needs sandboxing, but provides good real world experience).

In this project, we'll focus on the machine-controlled part of source code evaluation. First, problems of present software at our university will be discussed and then similar projects at other educational institutions will be examined. With acquired knowledge from such projects in production, we'll set up goals for the new evaluation system, design the architecture and implement working version. If there is enough time, we'll test it in production at our university.

Current solution at MFF UK

Ideas presented above aren't completely new. There was a group of students, who already implemented an evaluation solution for students' homeworks back in 2006. The system was rewritten several times after that, but after 2010 there was only one update. Its name is CodEx - The Code Examiner and it's used till now.

CodEx is based on dynamic analysis. It's a system with web-based interface, where supervisors assigns exercises to their students and the students have a time window to submit the solution. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. Supported languages are C, C++, C#, Pascal, Java and Haskel.

Current system is old, but robust. There were no major security incident during it's production usage. However, from today's perspective there are several drawbacks. The main ones are:

web interface - The web interface is simple and fully functional. But rapid development in web technologies opens new horizons how web interface can be made.
web api - There is no API support in current CodEx. This locks users from creating custom interfaces like command line tool or mobile application.
sandboxing - MO-Eval sandbox is based on principle of monitoring system calls into operation system and blocking the bad ones. This could be easily done only for single-threaded applications. These days parallelism is very important part of computing, so there is requirement to test multi-threaded applications too.
hardware occupation - Configuration of CodEx doesn't allow to share hardware between instances. Due to current architecture there are several separate instances (Programming I and II, Java, C#, etc.) which occupies not trivial amount of hardware.
task extensibility - There is need to test more advanced tasks with difficult evaluation chain. Parallel programming and Compiler principles are such examples.

After considerring all these facts, CodEx can't be used anymore. It's too old project to just maintain it and extend for modern technologies. Thus, it needs to be completely rewritten or another solution must be found.

First of all, some code evaluating projects were found and examined. It's not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project.

Progtest

Progtest is private project from FIT ČVUT in Prague. As far as we know it's used for C/C++, Bash programming and knowledge-based quizes. There are several bonus points and penalties and also a few hints what is failing in submitted solution. It's very strict on source code quality, for example -pedantic option of GCC, Valgring for memory leaks or array boundaries checks via mudflap library.

Codility

Codility is web based solution primary targeted to company recruiters. It's commercial product of SaaS type. Interesting feature is "task timeline", which shows progress of writing code in a browser on a timeline. The UI of Codility is opensource. Codility supports 16 programming languages.

CMS

CMS is and opensource distributed system for running and organizing programming contests. It's written in Python and contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java languages. PostgreSQL is single point of failure, all modules heavily depends on DB connection. Task evaluation can be only three step pipeline -- compilation, execution, evaluation. Execution is performed in Isolate, sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D.

MOE

MOE is old grading system, which is mostly obsolete. Parts of it are used in other systems like CodEx or CMS. It's written in Shell scripts, C and Python. MOE doesn't provide default GUI interface, all is managed from command line. It has simple configuration, but doesn't evaluate submission in real time. Isolate is part of this project too.

Kattis

Kattis is another SaaS solution. it's used for contests, companies and also some universities. The web UI is pretty nice, but everything is too much simple. They use standartized format for their exercises.

Survey results

From the survey above, we set up several goals, which a new system should have. They mostly reflect drawbacks of current version of CodEx. No existing tool fits our needs, for example no examined project provides complex execution/evaluation pipeline to support needs of courses like Compiler principles. Modifying existing project is also not an option, because of specific university environment. To sum up, existing CodEx has to be completelly rewritten, with only small parts of adopted code (for example judges).

The new project is ReCodEx - ReCodEx Code Examiner. The name should point to CodEx, previous evaluation solution, but also reflect new approach to solve issues. Re as part of the name means redesigned, rewrited, renewed or restarted.

Official assignment of the project is available here (only in czech). Most notable features are following:

modern HTML5 web application written in Javascript using suitable framework
REST API implemented in PHP, communicating with database, backend and file server
backend is implemented as distributed system on top of message queue framework (ZeroMQ) with master-worker architecture
worker with basic support of Windows environment (without sandbox, no general purpose suitable tool available yet)
evaluation procedure configured in YAML file, compound of small tasks connected into arbitrary oriented acyclic graph

Terminology

Official terminology of ReCodEx which will be used in documentation and within code.

Exercise - Exercise is basic point of all evaluation. Students are receiving assignment from their teacher, but firstly there has to be authors of exercises from which assignments are derived. Assignment is basically exercise which was assigned to some group of students for evaluation.
Assignment - Teachers are creating assignments from exercises which are solved by students. When assignment is solved then students submit their solution, this solution with all other information (needed by worker) is called submission.
Reference solution - When authors create exercises, they should provide sample solution. This solution should pass all test cases in specified limits. It can be also used for auto-calibration of exercise.
Submission - Submission is one solution of given exercise, it is sent by student to frontend of ReCodEx. To this term we can include all additional information about source code or submitter.
Job - Piece of work for worker. Internally it's performing set of small tasks from job configuration. Job itself is transfered in form of archive with submitted source codes and configuration file written in YAML. Typicaly, job is one standard submission, but there could be also benchmarking submission for configuring limits of exercise or maybe submission for determining hardware and software configuration of given worker. This classification have no effect for evaluating the job.
Task - Atomic piece of work which can execute external program or some internal command.
Tests - Tests are inputs to given student solution of exercise. On specified inputs program should have specified outputs, this is checked with judges. Test files can be send to standard input or just be there for the use from program.
Sample outputs - Sample outputs are basically given results of tasks. Output from program is compared with this files through judge.
Judge - Judge is a comparision program which compares sample outputs against output from program.
Limits - Particular tasks are usually running in sandbox, these limits are forwarded to the sandbox.
Hwgroup - Hardware group reflects hardware capabilities of each worker. It's just string identifier set up by administrator to each worker. Jobs are routed to the workers according to hwgroup, limits are also tied up with specific hwgroup.

10 KiB Raw Blame History