corrected english in the introduction

master
Simon Rozsival 8 years ago
parent 3da7b457bc
commit 50854ba9cb

2
.gitignore vendored

@ -0,0 +1,2 @@
.vscode
.markdownlint.json

@ -51,8 +51,7 @@ Notes:
studenta
-->
Introduction
============
# Introduction
Generally, there are many different ways and opinions on how to teach people
something new. However, most people agree that a hands-on experience is one of
@ -65,8 +64,8 @@ University education system is one of the areas where this knowledge can be
applied. In computer programming, there are several requirements such as the
code being syntactically correct, efficient and easy to read, maintain and
extend. Correctness and efficiency can be tested automatically to help teachers
save time for their research, but checking for bad design, habits and mistakes
is really hard to automate and requires manpower.
save time for their research, but reviewing bad design, bad coding habits and
logical mistakes is really hard to automate and requires manpower.
Checking programs written by students takes a lot of time and requires a lot of
mechanical, repetitive work. The first idea of an automatic evaluation system
@ -75,70 +74,73 @@ which evaluated code in Algol submitted on punch cards. In following years, many
similar products were written.
There are two basic ways of automatically evaluating code -- statically (check
the code without running it; safe, but not much precise) or dynamically (run the
the code without running it; safe, but not very precise) or dynamically (run the
code on testing inputs with checking the outputs against reference ones; needs
sandboxing, but provides good real world experience).
This project focuses on the machine-controlled part of source code evaluation.
First, problems of present software at our university were discussed and similar
projects at other educational institutions were examined. With acquired
knowledge from such projects in production, we set up goals for the new
evaluation system, designed the architecture and implemented a fully operational
solution. The system is now ready for production testing at our university.
<!--
*Simon*: I am not very sure about the formulation of 'our university' - shouldn't
we rather say 'The Charles University in Prague' instead?
-->
This project focuses on the machine-controlled part of source code evaluation.
First, the problems of the software used at our university previously were
discussed and similar projects at other educational institutions were examined.
With acquired knowledge from such projects in production, we set up goals for
the new evaluation system, designed the architecture and implemented a fully
operational solution. The system is now ready for production testing
at our university.
## Assignment
The major goal of this project is to create a grading application that will be
used for programming classes at the Faculty of Mathematics and Physics, Charles
University. However, the application should be designed in a modular fashion so
that it can be easily extended to make other ways of using it possible.
used for programming classes at the Faculty of Mathematics and Physics of the Charles
University in Prague. However, the application should be designed in a modular fashion so
that it can be easily extended or modified to make other ways of using it possible.
The project has a great starting point -- there is an old grading system
currently used at our university (CodEx), so its mistakes and weaknesses can be
adressed. Furthermore, many teachers are willing to use and test the new system.
currently used at the university (CodEx), so its flaws and weaknesses can be
addressed. Furthermore, many teachers are willing to use and test the new system.
Following requirements were collected both from our personal experience with
CodEx and from teachers' requests.
**Basic grading system requirements:**
### Basic grading system requirements:
These are features that are necessary for any system for evaluation of
programming homework assignments used in a university programming course.
<!---
@todo maybe group the requirements by role (student might want to do XYZ...)
- it's ok as is (unless requested differently)
-->
These are the features which are necessary for any system for evaluation of
programming coding assignments used in any university programming course:
- creating exercises including textual description, sample inputs and correct
- students can use an intuitive user interface for interaction with the system,
mainly for viewing assigned exercises, uploading their own solutions to the assignments,
and viewing the results of the solutions after an automatic evaluation is finished
- teachers can create exercises including textual description, sample inputs and correct
reference outputs (for example "sum all numbers from given file and write the
result to the standard output")
- assigning the exercise to a group of users with some additional properties set
(deadlines, etc.)
- user interface for interaction with the system, mainly for showing assigned
exercises, uploading solution sources and presenting evaluated results
- safe environment to execute student solutions withing prescribed time and
memory limits and check corectness of outputs
- assigning points to users depending of correctness of his/her solution
- user management with support of roles (at least two -- _student_ and
- teachers can assigning an existing exercise to their class with some specific
properties set (deadlines, etc.)
- teachers can specify their scale of points which will be awarted to the students
depending on the correctness of his/her solution (expressed in percentage points)
- teachers can view all of the solutions their students submitted and also the
results of the evaluations and they can override the automatically assigned points
to the solutions manually
- teachers can see the statistics of their classes and individual students
of these claseese
- administrators can depend on a safe environment in which the students' solutions
will be executed
- administrators can manage users with support of roles (at least two -- _student_ and
_supervisor_)
- administrative interface for manual checking of solutions, overriding
automatically assigned amount of points and viewing of overall statistics
about users
CodEx satisfies all these requirements and a few more that originate from the
way courses are organized at our university -- for example, users have roles
(_student_, _supervisor_ and _administrator_) that determine their capabilities
in the system and students are divided into groups that correspond to lab
groups.
in the system and students are divided into groups that correspond to lab groups.
However, further requirements arose during the ten year long lifetime of the old
system. There are not many ways to improve it from the perspective of a student,
but a lot of feature requests came from administrators and supervisors.
Collected ideas were mostly gathered from meetings with faculty staff involved
The ideas were mostly gathered from meetings with faculty staff involved
with the current system.
**Requested features for the new system:**
### Requested features for the new system:
- logging in through a university authentication system (e.g. LDAP)
- support for multiple programming environments at once to avoid unacceptable
@ -150,45 +152,42 @@ with the current system.
- comments, comments, comments (exercises, tests, solutions, ...)
- edit student solution and privately resubmit it
- resubmit solution with saving all results (including temporary ones)
- mark one student solution as accepted (used for grading this assignment)
- web and command-line submit tool
- SIS (university information system) integration for fetching personal user
data
- mark one student's solution as accepted (used for grading this assignment)
- web and command-line submission tool
- SIS (university information system) integration for fetching personal user data
- plagiarism detection
- advanced low-level evaluation flow configuration with high-level abstraction
layer for ordinary configuration cases
- use of modern technologies with state-of-the-art compilers
The survey shows that the system is used in many different ways, but the core
functionality is the same for all of them. When the system is ready, it is
likely that new ideas are figured out, thus the system must be designed to be
easily extendable, so everyone can develop his dream feature. This also means,
that widely used programming languages and techniques should be used, so users
can quickly understand the code and make changes.
To find out current state in the field of automatic grading systems, let's do a
short survey at universities, programming contests or online tools.
functionality is the same for all of them. When the system is ready it is
likely that there will be new ideas of how to use the system and thus the system
must be designed to be easily extendable, so everyone can develop their own feature.
This also means that widely used programming languages and techniques should be used,
so users can quickly understand the code and make changes.
To find out the current state in the field of automatic grading systems we did a
short survey at universities, programming contests, and other available tools.
## Related work
First of all, some code evaluating projects were found and examined. It is not
a complete list of such evaluators, but just a few projects which are used
these days and can be an inspiration for our project. Each project from the
This is not a complete list of available evaluators, but only a few projects
which are used these days and can be an inspiration for our project. Each project from the
list has a brief description and some key features mentioned.
### CodEx
There already is a grading solution at MFF UK, which was implemented in 2006 by
group of students. Its name is [CodEx -- The Code
Examiner](http://codex.ms.mff.cuni.cz/project/) and it has been used with some
improvements since then. The original plan was to use the system only for basic
programming courses, but there is demand for adapting it for many different
subjects.
Currently used grading solution at the Faculty of Mathematics and Physics of
the Charles University in Prague which was implemented in 2006 by a group
of students. It is called [CodEx -- The Code Examiner](http://codex.ms.mff.cuni.cz/project/)
and it has been used with some improvements since then. The original plan was
to use the system only for basic programming courses, but there was a demand
for adapting it for many different subjects.
CodEx is based on dynamic analysis. It features a web-based interface, where
supervisors assign exercises to their students and the students have a time
window to submit the solution. Each solution is compiled and run in sandbox
supervisors can assign exercises to their students and the students have a time
window to submit their solutions. Each solution is compiled and run in sandbox
(MO-Eval). The metrics which are checked are: corectness of the output, time
and memory limits. It supports programs written in C, C++, C#, Java, Pascal,
Python and Haskell.
@ -200,7 +199,7 @@ several drawbacks. The main ones are:
- **web interface** -- The web interface is simple and fully functional. But
rapid development in web technologies opens new horizons of how web interface
can be made.
- **web api** -- CodEx offers a very limited XML API based on outdated
- **web API** -- CodEx offers a very limited XML API based on outdated
technologies that is not sufficient for users who would like to create custom
interfaces such as a command line tool or mobile application.
- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system
@ -221,17 +220,16 @@ several drawbacks. The main ones are:
### Progtest
[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in
Prague. As far as we know it is used for C/C++, Bash programming and
knowledge-based quizzes. There are several bonus points and penalties and also a
few hints what is failing in submitted solution. It is very strict on source
code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks
or array boundaries checks via `mudflap` library.
[Progtest](https://progtest.fit.cvut.cz/) is private project of [FIT ČVUT](https://fit.cvut.cz)
in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes.
There are several bonus points and penalties and also a few hints what is failing in the submitted
solution. It is very strict on source code quality, for example `-pedantic` option of GCC,
Valgrind for memory leaks or array boundaries checks via `mudflap` library.
### Codility
[Codility](https://codility.com/) is web based solution primary targeted to
company recruiters. It is commercial product of SaaS type supporting 16
[Codility](https://codility.com/) is a web based solution primary targeted to
company recruiters. It is a commercial product available as a SaaS and it supports 16
programming languages. The
[UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png)
of Codility is [opensource](https://github.com/Codility/cui), the rest of
@ -242,12 +240,12 @@ captured progress of writing code for each user.
[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system
for running and organizing programming contests. It is written in Python and
contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java.
PostgreSQL is a single point of failure, all modules heavily depend on database
connection. Task evaluation can be only three step pipeline -- compilation,
execution, evaluation. Execution is performed in
[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our
project, Mgr. Martin Mareš, Ph.D.
contains several modules. CMS supports C/C++, Pascal, Python, PHP, and Java
programming languages. PostgreSQL is a single point of failure, all modules
heavily depend on the database connection. Task evaluation can be only a three
step pipeline -- compilation, execution, evaluation. Execution is performed in
[Isolate](https://github.com/ioi/isolate), sandbox written by the consultant
of our project, Mgr. Martin Mareš, Ph.D.
### MOE
@ -267,53 +265,50 @@ format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for
exercises. Kattis is primarily used by programming contest organizators, company
recruiters and also some universities.
## ReCodEx goals
From the survey above it is clear, that none of the existing systems is capable
of all the features collected for the new system. No grading system is designed
to support complicated evaluation pipeline, so this part is unexplored field and
has to be designed with caution. Also, no project is modern and extendable in a
way that it can be used as a base for ReCodEx. After considering all these
facts, it is clear that the new system has to be written from scratch. This
implies, that only subset of all features will be implemented in the first
version, the others following later.
The new project is **ReCodEx -- ReCodEx Code Examiner**. The name should point
to CodEx, previous evaluation solution, but also reflect new approach to solve
issues. **Re** as part of the name means redesigned, rewritten, renewed or
restarted.
At this point there is a clear idea how the new system will be used and what are
None of the existing systems we came across is capable of all the required features
of the new system. There is no grading system which is designed to support a complicated
evaluation pipeline, so this part is an unexplored field and has to be designed with caution.
Also, no project is modern and extensible so it could be used as a base for ReCodEx.
After considering all these facts, it was clear that a new system has to be written
from scratch. This implies, that only a subset of all the features will be implemented
in the first version, the other in the following ones.
We named the project as **ReCodEx -- ReCodEx Code Examiner**. The name should point
to the old CodEx, but also reflect the new approach to solve issues.
**Re** as part of the name means redesigned, rewritten, renewed, or restarted.
At this point there is a clear idea how the new system will be used and what are the
major enhancements for future releases. With this in mind, the overall
architecture can be sketched. From the previous research, we set up several
goals, which a new system should have. They mostly reflect drawbacks of current
version of CodEx and reasonable wishes of university users. Most notable
goals, which the new system should have. They mostly reflect drawbacks of the current
version of CodEx and some reasonable wishes of university users. Most notable
features are following:
- modern HTML5 web frontend written in Javascript using a suitable framework
- REST API implemented in PHP, communicating with database, backend and file
- modern HTML5 web frontend written in JavaScript using a suitable framework
- REST API implemented in PHP, communicating with database, evaluation backend and a file
server
- backend is implemented as distributed system on top of message queue framework
- evaluation backend implemented as a distributed system on top of a message queue framework
(ZeroMQ) with master-worker architecture
- worker with basic support of Windows environment (without sandbox, no general
<!-- @todo: WTF is worker??? The concept has not been introduced yet! -->
- worker with basic support of the Windows environment (without sandbox, no general
purpose suitable tool available yet)
- evaluation procedure configured in YAML file, compound of small tasks
connected into arbitrary oriented acyclic graph
- evaluation procedure configured in a YAML file, compound of small tasks
connected into an arbitrary oriented acyclic graph
### Intended usage
Whole system is intended to help both supervisors and students. To achieve this,
it is crucial to keep in mind typical usage scenarios of the system and try to
make these typical tasks as simple as possible. To synchronize visions of
readers, basic concepts are recapitulated.
The whole system is intended to help both teachers (supervisors) and students.
To achieve this, it is crucial to keep in mind typical usage scenarios of the
system and try to make these typical tasks as simple as possible.
First of all, the system has database of users. Each user has assigned a role,
which correspond to his/her privileges. User can be logged in via local
authentication service or university system. There are groups of users, which
corresponds to lectured courses. Groups can be hierarchically ordered to reflect
additional metadata like academic year. For example, reasonable group hierarchy
is like this:
The system has a database of users. Each user has a role assigned,
which correspond to his/her privileges. User can be logged in via
email and password or using the university system. There are groups of users, which
corresponds to the lectured courses. Groups can be hierarchically ordered to reflect
additional metadata such as the academic year. For example, a reasonable group hierarchy
can look like this:
```
Summer term 2016
@ -326,32 +321,34 @@ Summer term 2016
```
In this example, student users are part of the leaf groups, higher groups are
just for keeping related groups together. The hierarchy tree can be modified and
altered to fit specific needs for each organization, even the flat structure is
possible.
In this example, students are members of the leaf groups, the higher level groups
are just for keeping the related groups together. The hierarchy tree can be modified and
altered to fit specific needs of the university or any other organization, even the
flat structure (i.e., no hierarchy) is possible.
One user can be part of multiple groups and also one group can have multiple
users. Each user in a group has a role which defines its capabilities.
Priviledged user can assign a new exercise in his/her group, change assignment
details, view results of other users and manually change them. Normal user can
One user can be part of multiple groups and also one group can of course have multiple
users. Each user in a group has also a specific role for the given group.
Priviledged user (supervisor) can assign a new exercise in his/her group, change assignment
details, view results of other users and manually change them. Normal user (student) can
join a group, get list of assigned exercises, view assignment detail, submit
his/her solution and of course view the results.
his/her solution and view the results of the evaluation.
Database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of text in multiple language variants, evaluation
configuration and set of inputs and reference outputs. Exercises are created by
instructed priviledged users. Assigning exercise to a group means choose one of
the exercises in the list and specify additional data. Assignment has a
deadline, maximum amount of points and configuration for calculating the final
amount, number of tries and supported runtimes (programming languages) including
specific time and memory limits for sandboxed tasks.
Each exercise consists of a text in multiple language variants, an evaluation
configuration and a set of inputs and reference outputs. Exercises are created by
instructed priviledged users. Assigning an exercise to a group means to choose
one of the available exercises and specifying additional properties. An assignment
has a deadline (optionally a second deadline), a maximum amount of points,
a configuration for calculating the final score, a maximum number of submissions,
and a list of supported runtime environemnts (e.g., programming languages) including
specific time and memory limits for the sandboxed tasks.
#### Exercise evaluation chain
The most important part of the application is evaluating exercises for solutions
submitted by users. For imaginary system architecture _UI_, _API_, _Broker_ and
_Worker_ this goes as follows.
The most important part of the system is the evaluation of the solutions
submitted by the users for their assigned exercises.
~~For imaginary system architecture _UI_, _API_, _Broker_ and _Worker_ this goes as follows.~~
First thing users have to do is to submit their solutions to _UI_ which provides
interface to upload files and then submit them. _UI_ sends a request to _API_
@ -385,8 +382,7 @@ includes overview which part succeeded and which failed (optionally with reason
like "memory limit exceeded") and amount of awarded points.
Analysis
========
# Analysis
## Solution concepts analysis
@ -411,9 +407,9 @@ The ReCodEx project is divided into two logical parts the *Backend*
and the *Frontend* which interact which each other and which cover the
whole area of code examination. Both of these logical parts are
independent of each other in the sense of being installed on separate
machines on different locations and that one of the parts can be
replaced with different implementation and as long as the communication
protocols are preserved, the system will continue to work as expected.
machines at different locations and that one of the parts can be
replaced with a different implementation and as long as the communication
protocols are preserved, the system will continue working as expected.
*Backend* is the part which is responsible solely for the process of
evaluation a solution of an exercise. Each evaluation of a solution is
@ -426,7 +422,7 @@ environment, specific version of a compiler or the job must be evaluated
on a processor with a specific number of cores. The backend
infrastructure decides whether it will accept a job or decline it based
on the specified requirements. In case it accepts the job, it will be
placed in a queue and processed as soon as possible. The backend
placed in a queue and it will be processed as soon as possible. The backend
publishes the progress of processing of the queued jobs and the results
of the evaluations can be queried after the job processing is finished.
The backend produces a log of the evaluation and scores the solution
@ -558,13 +554,13 @@ synchronization and such.
At this point we have worker with two internal parts listening one and execution
one. Implementation of first one is quite straighforward and clear. So lets
discuss what should be happening in execution subsystem. Jobs as work units can quite vary and do completely different things, that means configuration and worker has to be prepared for this kind of generality. Configuration and its solution was already discussed above, implementation in worker is then quite straightforward. Worker has internal structures to which loads and which stores metadata given in configuration. Whole job is mapped to job metadata structure and tasks are mapped to either external ones or internal ones (internal commands has to be defined within worker), both are different whether they are executed in sandbox or as internal worker commands.
@todo: maybe describe folders within execution and what they can be used for?
discuss what should be happening in execution subsystem...
After successful arrival of job, worker has to prepare new execution environment, then solution archive has to be downloaded from fileserver and extracted. Job configuration is located within these files and loaded into internal structures and executed. After that results are uploaded back to fileserver. These steps are the basic ones which are really necessary for whole execution and have to be executed in this precise order.
@todo: complete paragraph above... execution of job on worker, how it is done,
what steps are necessary and general for all jobs
Interesting problem is with supplementary files (inputs, sample outputs). There are two approaches which can be observed. Supplementary files can be downloaded either on the start of the execution or during execution. If the files are downloaded at the beginning execution does not really started at this point and if there are problems with network worker find it right away and can abort execution without executing single task. Slight problems can arise if some of the files needs to have same name (e.g. solution assumes that input is `input.txt`), in this scenario downloaded files cannot be renamed at the beginning but during execution which is somehow impractical and not easily observed. Second solution of this problem when files are downloaded on the fly has quite opposite problem, if there are problems with network worker will find it during execution when for instance almost whole execution is done, this is also not ideal solution if we care about burnt hardware resources. On the other hand using this approach users have quite advanced control of execution flow and know what files exactly are available during execution which is from users perspective probably more appealing then the first solution. Based on that downloading of supplementary files using 'fetch' tasks during execution was chosen and implemented.
@todo: how can inputs and outputs (and supplementary files) be handled (they can
be downloaded on start of execution, or during...)
As described in fileserver section stored supplementary files have special
filenames which reflects hashes of their content. As such there are no
@ -634,8 +630,7 @@ cleaner completes machine specific caching system.
The Backend
===========
# The Backend
The backend is the part which is hidden to the user and which has only
one purpose: evaluate users solutions of their assignments.
@ -660,7 +655,7 @@ for the technical description of the components)
### Fileserver
@todo: stores particular datas from frontend and backend, hashing, HTTP API
@todo: stores particular data from frontend and backend, hashing, HTTP API
### Worker
@ -981,7 +976,7 @@ of entities and relational database models), describe the logical
grouping of entities and how they are related:
- user + settings + logins + ACL
- instance + licences + groups + group membership
- instance + licenses + groups + group membership
- exercise + assignments + localized assignments + runtime
environments + hardware groups
- submission + solution + reference solution + solution evaluation

Loading…
Cancel
Save