Merge branch 'master' of github.com:ReCodEx/GlobalWiki.wiki

master
Petr Stefan 8 years ago
commit bb049ca2ea

2
.gitignore vendored

@ -0,0 +1,2 @@
.vscode
.markdownlint.json

@ -51,8 +51,7 @@ Notes:
studenta
-->
Introduction
============
# Introduction
Generally, there are many different ways and opinions on how to teach people
something new. However, most people agree that a hands-on experience is one of
@ -65,8 +64,8 @@ University education system is one of the areas where this knowledge can be
applied. In computer programming, there are several requirements such as the
code being syntactically correct, efficient and easy to read, maintain and
extend. Correctness and efficiency can be tested automatically to help teachers
save time for their research, but checking for bad design, habits and mistakes
is really hard to automate and requires manpower.
save time for their research, but reviewing bad design, bad coding habits and
logical mistakes is really hard to automate and requires manpower.
Checking programs written by students takes a lot of time and requires a lot of
mechanical, repetitive work. The first idea of an automatic evaluation system
@ -75,70 +74,73 @@ which evaluated code in Algol submitted on punch cards. In following years, many
similar products were written.
There are two basic ways of automatically evaluating code -- statically (check
the code without running it; safe, but not much precise) or dynamically (run the
the code without running it; safe, but not very precise) or dynamically (run the
code on testing inputs with checking the outputs against reference ones; needs
sandboxing, but provides good real world experience).
This project focuses on the machine-controlled part of source code evaluation.
First, problems of present software at our university were discussed and similar
projects at other educational institutions were examined. With acquired
knowledge from such projects in production, we set up goals for the new
evaluation system, designed the architecture and implemented a fully operational
solution. The system is now ready for production testing at our university.
<!--
*Simon*: I am not very sure about the formulation of 'our university' - shouldn't
we rather say 'The Charles University in Prague' instead?
-->
This project focuses on the machine-controlled part of source code evaluation.
First, the problems of the software used at our university previously were
discussed and similar projects at other educational institutions were examined.
With acquired knowledge from such projects in production, we set up goals for
the new evaluation system, designed the architecture and implemented a fully
operational solution. The system is now ready for production testing
at our university.
## Assignment
The major goal of this project is to create a grading application that will be
used for programming classes at the Faculty of Mathematics and Physics, Charles
University. However, the application should be designed in a modular fashion so
that it can be easily extended to make other ways of using it possible.
used for programming classes at the Faculty of Mathematics and Physics of the Charles
University in Prague. However, the application should be designed in a modular fashion so
that it can be easily extended or modified to make other ways of using it possible.
The project has a great starting point -- there is an old grading system
currently used at our university (CodEx), so its mistakes and weaknesses can be
adressed. Furthermore, many teachers are willing to use and test the new system.
currently used at the university (CodEx), so its flaws and weaknesses can be
addressed. Furthermore, many teachers are willing to use and test the new system.
Following requirements were collected both from our personal experience with
CodEx and from teachers' requests.
**Basic grading system requirements:**
### Basic grading system requirements:
These are features that are necessary for any system for evaluation of
programming homework assignments used in a university programming course.
These are the features which are necessary for any system for evaluation of
programming coding assignments used in any university programming course:
<!---
@todo maybe group the requirements by role (student might want to do XYZ...)
- it's ok as is (unless requested differently)
-->
- creating exercises including textual description, sample inputs and correct
- students can use an intuitive user interface for interaction with the system,
mainly for viewing assigned exercises, uploading their own solutions to the assignments,
and viewing the results of the solutions after an automatic evaluation is finished
- teachers can create exercises including textual description, sample inputs and correct
reference outputs (for example "sum all numbers from given file and write the
result to the standard output")
- assigning the exercise to a group of users with some additional properties set
(deadlines, etc.)
- user interface for interaction with the system, mainly for showing assigned
exercises, uploading solution sources and presenting evaluated results
- safe environment to execute student solutions withing prescribed time and
memory limits and check corectness of outputs
- assigning points to users depending of correctness of his/her solution
- user management with support of roles (at least two -- _student_ and
- teachers can assigning an existing exercise to their class with some specific
properties set (deadlines, etc.)
- teachers can specify their scale of points which will be awarted to the students
depending on the correctness of his/her solution (expressed in percentage points)
- teachers can view all of the solutions their students submitted and also the
results of the evaluations and they can override the automatically assigned points
to the solutions manually
- teachers can see the statistics of their classes and individual students
of these claseese
- administrators can depend on a safe environment in which the students' solutions
will be executed
- administrators can manage users with support of roles (at least two -- _student_ and
_supervisor_)
- administrative interface for manual checking of solutions, overriding
automatically assigned amount of points and viewing of overall statistics
about users
CodEx satisfies all these requirements and a few more that originate from the
way courses are organized at our university -- for example, users have roles
(_student_, _supervisor_ and _administrator_) that determine their capabilities
in the system and students are divided into groups that correspond to lab
groups.
in the system and students are divided into groups that correspond to lab groups.
However, further requirements arose during the ten year long lifetime of the old
system. There are not many ways to improve it from the perspective of a student,
but a lot of feature requests came from administrators and supervisors.
Collected ideas were mostly gathered from meetings with faculty staff involved
The ideas were mostly gathered from meetings with faculty staff involved
with the current system.
**Requested features for the new system:**
### Requested features for the new system:
- logging in through a university authentication system (e.g. LDAP)
- support for multiple programming environments at once to avoid unacceptable
@ -150,45 +152,42 @@ with the current system.
- comments, comments, comments (exercises, tests, solutions, ...)
- edit student solution and privately resubmit it
- resubmit solution with saving all results (including temporary ones)
- mark one student solution as accepted (used for grading this assignment)
- web and command-line submit tool
- SIS (university information system) integration for fetching personal user
data
- mark one student's solution as accepted (used for grading this assignment)
- web and command-line submission tool
- SIS (university information system) integration for fetching personal user data
- plagiarism detection
- advanced low-level evaluation flow configuration with high-level abstraction
layer for ordinary configuration cases
- use of modern technologies with state-of-the-art compilers
The survey shows that the system is used in many different ways, but the core
functionality is the same for all of them. When the system is ready, it is
likely that new ideas are figured out, thus the system must be designed to be
easily extendable, so everyone can develop his dream feature. This also means,
that widely used programming languages and techniques should be used, so users
can quickly understand the code and make changes.
To find out current state in the field of automatic grading systems, let's do a
short survey at universities, programming contests or online tools.
functionality is the same for all of them. When the system is ready it is
likely that there will be new ideas of how to use the system and thus the system
must be designed to be easily extendable, so everyone can develop their own feature.
This also means that widely used programming languages and techniques should be used,
so users can quickly understand the code and make changes.
To find out the current state in the field of automatic grading systems we did a
short survey at universities, programming contests, and other available tools.
## Related work
First of all, some code evaluating projects were found and examined. It is not
a complete list of such evaluators, but just a few projects which are used
these days and can be an inspiration for our project. Each project from the
This is not a complete list of available evaluators, but only a few projects
which are used these days and can be an inspiration for our project. Each project from the
list has a brief description and some key features mentioned.
### CodEx
There already is a grading solution at MFF UK, which was implemented in 2006 by
group of students. Its name is [CodEx -- The Code
Examiner](http://codex.ms.mff.cuni.cz/project/) and it has been used with some
improvements since then. The original plan was to use the system only for basic
programming courses, but there is demand for adapting it for many different
subjects.
Currently used grading solution at the Faculty of Mathematics and Physics of
the Charles University in Prague which was implemented in 2006 by a group
of students. It is called [CodEx -- The Code Examiner](http://codex.ms.mff.cuni.cz/project/)
and it has been used with some improvements since then. The original plan was
to use the system only for basic programming courses, but there was a demand
for adapting it for many different subjects.
CodEx is based on dynamic analysis. It features a web-based interface, where
supervisors assign exercises to their students and the students have a time
window to submit the solution. Each solution is compiled and run in sandbox
supervisors can assign exercises to their students and the students have a time
window to submit their solutions. Each solution is compiled and run in sandbox
(MO-Eval). The metrics which are checked are: corectness of the output, time
and memory limits. It supports programs written in C, C++, C#, Java, Pascal,
Python and Haskell.
@ -200,7 +199,7 @@ several drawbacks. The main ones are:
- **web interface** -- The web interface is simple and fully functional. But
rapid development in web technologies opens new horizons of how web interface
can be made.
- **web api** -- CodEx offers a very limited XML API based on outdated
- **web API** -- CodEx offers a very limited XML API based on outdated
technologies that is not sufficient for users who would like to create custom
interfaces such as a command line tool or mobile application.
- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system
@ -221,17 +220,16 @@ several drawbacks. The main ones are:
### Progtest
[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in
Prague. As far as we know it is used for C/C++, Bash programming and
knowledge-based quizzes. There are several bonus points and penalties and also a
few hints what is failing in submitted solution. It is very strict on source
code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks
or array boundaries checks via `mudflap` library.
[Progtest](https://progtest.fit.cvut.cz/) is private project of [FIT ČVUT](https://fit.cvut.cz)
in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes.
There are several bonus points and penalties and also a few hints what is failing in the submitted
solution. It is very strict on source code quality, for example `-pedantic` option of GCC,
Valgrind for memory leaks or array boundaries checks via `mudflap` library.
### Codility
[Codility](https://codility.com/) is web based solution primary targeted to
company recruiters. It is commercial product of SaaS type supporting 16
[Codility](https://codility.com/) is a web based solution primary targeted to
company recruiters. It is a commercial product available as a SaaS and it supports 16
programming languages. The
[UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png)
of Codility is [opensource](https://github.com/Codility/cui), the rest of
@ -242,12 +240,12 @@ captured progress of writing code for each user.
[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system
for running and organizing programming contests. It is written in Python and
contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java.
PostgreSQL is a single point of failure, all modules heavily depend on database
connection. Task evaluation can be only three step pipeline -- compilation,
execution, evaluation. Execution is performed in
[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our
project, Mgr. Martin Mareš, Ph.D.
contains several modules. CMS supports C/C++, Pascal, Python, PHP, and Java
programming languages. PostgreSQL is a single point of failure, all modules
heavily depend on the database connection. Task evaluation can be only a three
step pipeline -- compilation, execution, evaluation. Execution is performed in
[Isolate](https://github.com/ioi/isolate), sandbox written by the consultant
of our project, Mgr. Martin Mareš, Ph.D.
### MOE
@ -270,14 +268,13 @@ recruiters and also some universities.
## ReCodEx goals
From the survey above it is clear, that none of the existing systems is capable
of all the features collected for the new system. No grading system is designed
to support complicated evaluation pipeline, so this part is unexplored field and
has to be designed with caution. Also, no project is modern and extendable in a
way that it can be used as a base for ReCodEx. After considering all these
facts, it is clear that the new system has to be written from scratch. This
implies, that only subset of all features will be implemented in the first
version, the others following later.
None of the existing systems we came across is capable of all the required features
of the new system. There is no grading system which is designed to support a complicated
evaluation pipeline, so this part is an unexplored field and has to be designed with caution.
Also, no project is modern and extensible so it could be used as a base for ReCodEx.
After considering all these facts, it was clear that a new system has to be written
from scratch. This implies, that only a subset of all the features will be implemented
in the first version, the other in the following ones.
Gathered features are categorized based on priorities for the whole system. The
highest priority has main functionality similar to current CodEx. It is a base
@ -294,41 +291,40 @@ side) and command-line submit tool. Plagiarism detection is not likely to be
part of any release in near future unless someone other makes the engine. The
detection problem is too hard to be solved as part of this project.
The new project is **ReCodEx -- ReCodEx Code Examiner**. The name should point
to CodEx, previous evaluation solution, but also reflect new approach to solve
issues. **Re** as part of the name means redesigned, rewritten, renewed or
restarted.
We named the project as **ReCodEx -- ReCodEx Code Examiner**. The name should point
to the old CodEx, but also reflect the new approach to solve issues.
**Re** as part of the name means redesigned, rewritten, renewed, or restarted.
At this point there is a clear idea how the new system will be used and what are
At this point there is a clear idea how the new system will be used and what are the
major enhancements for future releases. With this in mind, the overall
architecture can be sketched. From the previous research, we set up several
goals, which a new system should have. They mostly reflect drawbacks of current
version of CodEx and reasonable wishes of university users. Most notable
goals, which the new system should have. They mostly reflect drawbacks of the current
version of CodEx and some reasonable wishes of university users. Most notable
features are following:
- modern HTML5 web frontend written in Javascript using a suitable framework
- REST API implemented in PHP, communicating with database, backend and file
- modern HTML5 web frontend written in JavaScript using a suitable framework
- REST API implemented in PHP, communicating with database, evaluation backend and a file
server
- backend is implemented as distributed system on top of message queue framework
- evaluation backend implemented as a distributed system on top of a message queue framework
(ZeroMQ) with master-worker architecture
- worker with basic support of Windows environment (without sandbox, no general
<!-- @todo: WTF is worker??? The concept has not been introduced yet! -->
- worker with basic support of the Windows environment (without sandbox, no general
purpose suitable tool available yet)
- evaluation procedure configured in YAML file, compound of small tasks
connected into arbitrary oriented acyclic graph
- evaluation procedure configured in a YAML file, compound of small tasks
connected into an arbitrary oriented acyclic graph
### Intended usage
Whole system is intended to help both supervisors and students. To achieve this,
it is crucial to keep in mind typical usage scenarios of the system and try to
make these typical tasks as simple as possible. To synchronize visions of
readers, basic concepts are recapitulated.
The whole system is intended to help both teachers (supervisors) and students.
To achieve this, it is crucial to keep in mind typical usage scenarios of the
system and try to make these typical tasks as simple as possible.
First of all, the system has database of users. Each user has assigned a role,
which correspond to his/her privileges. User can be logged in via local
authentication service or university system. There are groups of users, which
corresponds to lectured courses. Groups can be hierarchically ordered to reflect
additional metadata like academic year. For example, reasonable group hierarchy
is like this:
The system has a database of users. Each user has a role assigned,
which correspond to his/her privileges. User can be logged in via
email and password or using the university system. There are groups of users, which
corresponds to the lectured courses. Groups can be hierarchically ordered to reflect
additional metadata such as the academic year. For example, a reasonable group hierarchy
can look like this:
```
Summer term 2016
@ -341,32 +337,34 @@ Summer term 2016
```
In this example, student users are part of the leaf groups, higher groups are
just for keeping related groups together. The hierarchy tree can be modified and
altered to fit specific needs for each organization, even the flat structure is
possible.
In this example, students are members of the leaf groups, the higher level groups
are just for keeping the related groups together. The hierarchy tree can be modified and
altered to fit specific needs of the university or any other organization, even the
flat structure (i.e., no hierarchy) is possible.
One user can be part of multiple groups and also one group can have multiple
users. Each user in a group has a role which defines its capabilities.
Priviledged user can assign a new exercise in his/her group, change assignment
details, view results of other users and manually change them. Normal user can
One user can be part of multiple groups and also one group can of course have multiple
users. Each user in a group has also a specific role for the given group.
Priviledged user (supervisor) can assign a new exercise in his/her group, change assignment
details, view results of other users and manually change them. Normal user (student) can
join a group, get list of assigned exercises, view assignment detail, submit
his/her solution and of course view the results.
his/her solution and view the results of the evaluation.
Database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of text in multiple language variants, evaluation
configuration and set of inputs and reference outputs. Exercises are created by
instructed priviledged users. Assigning exercise to a group means choose one of
the exercises in the list and specify additional data. Assignment has a
deadline, maximum amount of points and configuration for calculating the final
amount, number of tries and supported runtimes (programming languages) including
specific time and memory limits for sandboxed tasks.
Each exercise consists of a text in multiple language variants, an evaluation
configuration and a set of inputs and reference outputs. Exercises are created by
instructed priviledged users. Assigning an exercise to a group means to choose
one of the available exercises and specifying additional properties. An assignment
has a deadline (optionally a second deadline), a maximum amount of points,
a configuration for calculating the final score, a maximum number of submissions,
and a list of supported runtime environemnts (e.g., programming languages) including
specific time and memory limits for the sandboxed tasks.
#### Exercise evaluation chain
The most important part of the application is evaluating exercises for solutions
submitted by users. For imaginary system architecture _UI_, _API_, _Broker_ and
_Worker_ this goes as follows.
The most important part of the system is the evaluation of the solutions
submitted by the users for their assigned exercises.
~~For imaginary system architecture _UI_, _API_, _Broker_ and _Worker_ this goes as follows.~~
First thing users have to do is to submit their solutions to _UI_ which provides
interface to upload files and then submit them. _UI_ sends a request to _API_
@ -400,25 +398,24 @@ includes overview which part succeeded and which failed (optionally with reason
like "memory limit exceeded") and amount of awarded points.
Analysis
========
# Analysis
## Solution concepts analysis
@todo: what problems were solved on abstract and high levels, how they can be solved and what was the final solution
- which problems are they? ... these ones ↓
- what type of users there should be, why they are needed
- explain why there is exercise and assignment division, what means what and how they are used
- explain instances why they are usefull what they solve and also discuss licences concept
- groups, they can be public and private and why is that, what it solves, explain amd discuss treshold and other group features
- extended execution pipeline (not just compilation/execution/evaluation) and why it is needed
- progress state, how it can be done and displayed to user, why random messages
- how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why
- judges, discuss what they possibly can do and what it can be used for (returning for instance 2 numbers instead of 1 and why we return just one)
- discuss points assigned to solution, why are there bonus points, explain minimal point threshold
- discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why
- and many many more general concepts which can be discussed and solved... please append more of them if something comes to your mind... thanks
- which problems are they? ... these ones ↓
- what type of users there should be, why they are needed
- explain why there is exercise and assignment division, what means what and how they are used
- explain instances why they are usefull what they solve and also discuss licences concept
- groups, they can be public and private and why is that, what it solves, explain amd discuss treshold and other group features
- extended execution pipeline (not just compilation/execution/evaluation) and why it is needed
- progress state, how it can be done and displayed to user, why random messages
- how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why
- judges, discuss what they possibly can do and what it can be used for (returning for instance 2 numbers instead of 1 and why we return just one)
- discuss points assigned to solution, why are there bonus points, explain minimal point threshold
- discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why
- and many many more general concepts which can be discussed and solved... please append more of them if something comes to your mind... thanks
### Structure of the project
@ -426,9 +423,9 @@ The ReCodEx project is divided into two logical parts the *Backend*
and the *Frontend* which interact which each other and which cover the
whole area of code examination. Both of these logical parts are
independent of each other in the sense of being installed on separate
machines on different locations and that one of the parts can be
replaced with different implementation and as long as the communication
protocols are preserved, the system will continue to work as expected.
machines at different locations and that one of the parts can be
replaced with a different implementation and as long as the communication
protocols are preserved, the system will continue working as expected.
*Backend* is the part which is responsible solely for the process of
evaluation a solution of an exercise. Each evaluation of a solution is
@ -441,7 +438,7 @@ environment, specific version of a compiler or the job must be evaluated
on a processor with a specific number of cores. The backend
infrastructure decides whether it will accept a job or decline it based
on the specified requirements. In case it accepts the job, it will be
placed in a queue and processed as soon as possible. The backend
placed in a queue and it will be processed as soon as possible. The backend
publishes the progress of processing of the queued jobs and the results
of the evaluations can be queried after the job processing is finished.
The backend produces a log of the evaluation and scores the solution
@ -649,8 +646,7 @@ cleaner completes machine specific caching system.
The Backend
===========
# The Backend
The backend is the part which is hidden to the user and which has only
one purpose: evaluate users solutions of their assignments.
@ -675,7 +671,7 @@ for the technical description of the components)
### Fileserver
@todo: stores particular datas from frontend and backend, hashing, HTTP API
@todo: stores particular data from frontend and backend, hashing, HTTP API
### Worker
@ -996,7 +992,7 @@ of entities and relational database models), describe the logical
grouping of entities and how they are related:
- user + settings + logins + ACL
- instance + licences + groups + group membership
- instance + licenses + groups + group membership
- exercise + assignments + localized assignments + runtime
environments + hardware groups
- submission + solution + reference solution + solution evaluation
@ -1373,7 +1369,7 @@ the output length (as long as the printing fits in the time limit).
memory: 8192
```
Fetch sample solution from fileserver. Base URL of fileserver is in the header
Fetch sample solution from file server. Base URL of file server is in the header
of the job configuration, so only the name of required file (its `sha1sum` in
our case) is necessary.

Loading…
Cancel
Save