|
|
@ -78,7 +78,7 @@ the code without running it; safe, but not very precise) or dynamically (run the
|
|
|
|
code on testing inputs with checking the outputs against reference ones; needs
|
|
|
|
code on testing inputs with checking the outputs against reference ones; needs
|
|
|
|
sandboxing, but provides good real world experience).
|
|
|
|
sandboxing, but provides good real world experience).
|
|
|
|
|
|
|
|
|
|
|
|
<!--
|
|
|
|
<!---
|
|
|
|
*Simon*: I am not very sure about the formulation of 'our university' - shouldn't
|
|
|
|
*Simon*: I am not very sure about the formulation of 'our university' - shouldn't
|
|
|
|
we rather say 'The Charles University in Prague' instead?
|
|
|
|
we rather say 'The Charles University in Prague' instead?
|
|
|
|
-->
|
|
|
|
-->
|
|
|
@ -94,15 +94,16 @@ at our university.
|
|
|
|
## Assignment
|
|
|
|
## Assignment
|
|
|
|
|
|
|
|
|
|
|
|
The major goal of this project is to create a grading application that will be
|
|
|
|
The major goal of this project is to create a grading application that will be
|
|
|
|
used for programming classes at the Faculty of Mathematics and Physics of the Charles
|
|
|
|
used for programming classes at the Faculty of Mathematics and Physics of the
|
|
|
|
University in Prague. However, the application should be designed in a modular fashion so
|
|
|
|
Charles University in Prague. However, the application should be designed in a
|
|
|
|
that it can be easily extended or modified to make other ways of using it possible.
|
|
|
|
modular fashion so that it can be easily extended or modified to make other ways
|
|
|
|
|
|
|
|
of using it possible.
|
|
|
|
|
|
|
|
|
|
|
|
The project has a great starting point -- there is an old grading system
|
|
|
|
The project has a great starting point -- there is an old grading system
|
|
|
|
currently used at the university (CodEx), so its flaws and weaknesses can be
|
|
|
|
currently used at the university (CodEx), so its flaws and weaknesses can be
|
|
|
|
addressed. Furthermore, many teachers are willing to use and test the new system.
|
|
|
|
addressed. Furthermore, many teachers are willing to use and test the new
|
|
|
|
Following requirements were collected both from our personal experience with
|
|
|
|
system. Following requirements were collected both from our personal experience
|
|
|
|
CodEx and from teachers' requests.
|
|
|
|
with CodEx and from teachers' requests.
|
|
|
|
|
|
|
|
|
|
|
|
### Basic grading system requirements:
|
|
|
|
### Basic grading system requirements:
|
|
|
|
|
|
|
|
|
|
|
@ -110,35 +111,38 @@ These are the features which are necessary for any system for evaluation of
|
|
|
|
programming coding assignments used in any university programming course:
|
|
|
|
programming coding assignments used in any university programming course:
|
|
|
|
|
|
|
|
|
|
|
|
- students can use an intuitive user interface for interaction with the system,
|
|
|
|
- students can use an intuitive user interface for interaction with the system,
|
|
|
|
mainly for viewing assigned exercises, uploading their own solutions to the assignments,
|
|
|
|
mainly for viewing assigned exercises, uploading their own solutions to the
|
|
|
|
and viewing the results of the solutions after an automatic evaluation is finished
|
|
|
|
assignments, and viewing the results of the solutions after an automatic
|
|
|
|
- teachers can create exercises including textual description, sample inputs and correct
|
|
|
|
evaluation is finished
|
|
|
|
reference outputs (for example "sum all numbers from given file and write the
|
|
|
|
- teachers can create exercises including textual description, sample inputs and
|
|
|
|
result to the standard output")
|
|
|
|
correct reference outputs (for example "sum all numbers from given file and
|
|
|
|
|
|
|
|
write the result to the standard output")
|
|
|
|
- teachers can assigning an existing exercise to their class with some specific
|
|
|
|
- teachers can assigning an existing exercise to their class with some specific
|
|
|
|
properties set (deadlines, etc.)
|
|
|
|
properties set (deadlines, etc.)
|
|
|
|
- teachers can specify their scale of points which will be awarted to the students
|
|
|
|
- teachers can specify their scale of points which will be awarted to the
|
|
|
|
depending on the correctness of his/her solution (expressed in percentage points)
|
|
|
|
students depending on the correctness of his/her solution (expressed in
|
|
|
|
|
|
|
|
percentage points)
|
|
|
|
- teachers can view all of the solutions their students submitted and also the
|
|
|
|
- teachers can view all of the solutions their students submitted and also the
|
|
|
|
results of the evaluations and they can override the automatically assigned points
|
|
|
|
results of the evaluations and they can override the automatically assigned
|
|
|
|
to the solutions manually
|
|
|
|
points to the solutions manually
|
|
|
|
- teachers can see the statistics of their classes and individual students
|
|
|
|
- teachers can see the statistics of their classes and individual students of
|
|
|
|
of these claseese
|
|
|
|
these claseese
|
|
|
|
- administrators can depend on a safe environment in which the students' solutions
|
|
|
|
- administrators can depend on a safe environment in which the students'
|
|
|
|
will be executed
|
|
|
|
solutions will be executed
|
|
|
|
- administrators can manage users with support of roles (at least two -- _student_ and
|
|
|
|
- administrators can manage users with support of roles (at least two --
|
|
|
|
_supervisor_)
|
|
|
|
_student_ and _supervisor_)
|
|
|
|
|
|
|
|
|
|
|
|
CodEx satisfies all these requirements and a few more that originate from the
|
|
|
|
CodEx satisfies all these requirements and a few more that originate from the
|
|
|
|
way courses are organized at our university -- for example, users have roles
|
|
|
|
way courses are organized at our university -- for example, users have roles
|
|
|
|
(_student_, _supervisor_ and _administrator_) that determine their capabilities
|
|
|
|
(_student_, _supervisor_ and _administrator_) that determine their capabilities
|
|
|
|
in the system and students are divided into groups that correspond to lab groups.
|
|
|
|
in the system and students are divided into groups that correspond to lab
|
|
|
|
|
|
|
|
groups.
|
|
|
|
|
|
|
|
|
|
|
|
However, further requirements arose during the ten year long lifetime of the old
|
|
|
|
However, further requirements arose during the ten year long lifetime of the old
|
|
|
|
system. There are not many ways to improve it from the perspective of a student,
|
|
|
|
system. There are not many ways to improve it from the perspective of a student,
|
|
|
|
but a lot of feature requests came from administrators and supervisors.
|
|
|
|
but a lot of feature requests came from administrators and supervisors. The
|
|
|
|
The ideas were mostly gathered from meetings with faculty staff involved
|
|
|
|
ideas were mostly gathered from meetings with faculty staff involved with the
|
|
|
|
with the current system.
|
|
|
|
current system.
|
|
|
|
|
|
|
|
|
|
|
|
### Requested features for the new system:
|
|
|
|
### Requested features for the new system:
|
|
|
|
|
|
|
|
|
|
|
@ -173,8 +177,8 @@ short survey at universities, programming contests, and other available tools.
|
|
|
|
## Related work
|
|
|
|
## Related work
|
|
|
|
|
|
|
|
|
|
|
|
This is not a complete list of available evaluators, but only a few projects
|
|
|
|
This is not a complete list of available evaluators, but only a few projects
|
|
|
|
which are used these days and can be an inspiration for our project. Each project from the
|
|
|
|
which are used these days and can be an inspiration for our project. Each
|
|
|
|
list has a brief description and some key features mentioned.
|
|
|
|
project from the list has a brief description and some key features mentioned.
|
|
|
|
|
|
|
|
|
|
|
|
### CodEx
|
|
|
|
### CodEx
|
|
|
|
|
|
|
|
|
|
|
@ -220,10 +224,11 @@ several drawbacks. The main ones are:
|
|
|
|
|
|
|
|
|
|
|
|
### Progtest
|
|
|
|
### Progtest
|
|
|
|
|
|
|
|
|
|
|
|
[Progtest](https://progtest.fit.cvut.cz/) is private project of [FIT ČVUT](https://fit.cvut.cz)
|
|
|
|
[Progtest](https://progtest.fit.cvut.cz/) is private project of [FIT
|
|
|
|
in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes.
|
|
|
|
ČVUT](https://fit.cvut.cz) in Prague. As far as we know it is used for C/C++,
|
|
|
|
There are several bonus points and penalties and also a few hints what is failing in the submitted
|
|
|
|
Bash programming and knowledge-based quizzes. There are several bonus points
|
|
|
|
solution. It is very strict on source code quality, for example `-pedantic` option of GCC,
|
|
|
|
and penalties and also a few hints what is failing in the submitted solution. It
|
|
|
|
|
|
|
|
is very strict on source code quality, for example `-pedantic` option of GCC,
|
|
|
|
Valgrind for memory leaks or array boundaries checks via `mudflap` library.
|
|
|
|
Valgrind for memory leaks or array boundaries checks via `mudflap` library.
|
|
|
|
|
|
|
|
|
|
|
|
### Codility
|
|
|
|
### Codility
|
|
|
@ -268,13 +273,14 @@ recruiters and also some universities.
|
|
|
|
|
|
|
|
|
|
|
|
## ReCodEx goals
|
|
|
|
## ReCodEx goals
|
|
|
|
|
|
|
|
|
|
|
|
None of the existing systems we came across is capable of all the required features
|
|
|
|
None of the existing systems we came across is capable of all the required
|
|
|
|
of the new system. There is no grading system which is designed to support a complicated
|
|
|
|
features of the new system. There is no grading system which is designed to
|
|
|
|
evaluation pipeline, so this part is an unexplored field and has to be designed with caution.
|
|
|
|
support a complicated evaluation pipeline, so this part is an unexplored field
|
|
|
|
Also, no project is modern and extensible so it could be used as a base for ReCodEx.
|
|
|
|
and has to be designed with caution. Also, no project is modern and extensible
|
|
|
|
After considering all these facts, it was clear that a new system has to be written
|
|
|
|
so it could be used as a base for ReCodEx. After considering all these facts,
|
|
|
|
from scratch. This implies, that only a subset of all the features will be implemented
|
|
|
|
it was clear that a new system has to be written from scratch. This implies,
|
|
|
|
in the first version, the other in the following ones.
|
|
|
|
that only a subset of all the features will be implemented in the first version,
|
|
|
|
|
|
|
|
the other in the following ones.
|
|
|
|
|
|
|
|
|
|
|
|
Gathered features are categorized based on priorities for the whole system. The
|
|
|
|
Gathered features are categorized based on priorities for the whole system. The
|
|
|
|
highest priority has main functionality similar to current CodEx. It is a base
|
|
|
|
highest priority has main functionality similar to current CodEx. It is a base
|
|
|
@ -291,25 +297,25 @@ side) and command-line submit tool. Plagiarism detection is not likely to be
|
|
|
|
part of any release in near future unless someone other makes the engine. The
|
|
|
|
part of any release in near future unless someone other makes the engine. The
|
|
|
|
detection problem is too hard to be solved as part of this project.
|
|
|
|
detection problem is too hard to be solved as part of this project.
|
|
|
|
|
|
|
|
|
|
|
|
We named the project as **ReCodEx -- ReCodEx Code Examiner**. The name should point
|
|
|
|
We named the project as **ReCodEx -- ReCodEx Code Examiner**. The name should
|
|
|
|
to the old CodEx, but also reflect the new approach to solve issues.
|
|
|
|
point to the old CodEx, but also reflect the new approach to solve issues.
|
|
|
|
**Re** as part of the name means redesigned, rewritten, renewed, or restarted.
|
|
|
|
**Re** as part of the name means redesigned, rewritten, renewed, or restarted.
|
|
|
|
|
|
|
|
|
|
|
|
At this point there is a clear idea how the new system will be used and what are the
|
|
|
|
At this point there is a clear idea how the new system will be used and what are
|
|
|
|
major enhancements for future releases. With this in mind, the overall
|
|
|
|
the major enhancements for future releases. With this in mind, the overall
|
|
|
|
architecture can be sketched. From the previous research, we set up several
|
|
|
|
architecture can be sketched. From the previous research, we set up several
|
|
|
|
goals, which the new system should have. They mostly reflect drawbacks of the current
|
|
|
|
goals, which the new system should have. They mostly reflect drawbacks of the
|
|
|
|
version of CodEx and some reasonable wishes of university users. Most notable
|
|
|
|
current version of CodEx and some reasonable wishes of university users. Most
|
|
|
|
features are following:
|
|
|
|
notable features are following:
|
|
|
|
|
|
|
|
|
|
|
|
- modern HTML5 web frontend written in JavaScript using a suitable framework
|
|
|
|
- modern HTML5 web frontend written in JavaScript using a suitable framework
|
|
|
|
- REST API implemented in PHP, communicating with database, evaluation backend and a file
|
|
|
|
- REST API implemented in PHP, communicating with database, evaluation backend
|
|
|
|
server
|
|
|
|
and a file server
|
|
|
|
- evaluation backend implemented as a distributed system on top of a message queue framework
|
|
|
|
- evaluation backend implemented as a distributed system on top of a message
|
|
|
|
(ZeroMQ) with master-worker architecture
|
|
|
|
queue framework (ZeroMQ) with master-worker architecture <!-- @todo: WTF is
|
|
|
|
<!-- @todo: WTF is worker??? The concept has not been introduced yet! -->
|
|
|
|
worker??? The concept has not been introduced yet! -->
|
|
|
|
- worker with basic support of the Windows environment (without sandbox, no general
|
|
|
|
- worker with basic support of the Windows environment (without sandbox, no
|
|
|
|
purpose suitable tool available yet)
|
|
|
|
general purpose suitable tool available yet)
|
|
|
|
- evaluation procedure configured in a YAML file, compound of small tasks
|
|
|
|
- evaluation procedure configured in a YAML file, compound of small tasks
|
|
|
|
connected into an arbitrary oriented acyclic graph
|
|
|
|
connected into an arbitrary oriented acyclic graph
|
|
|
|
|
|
|
|
|
|
|
@ -319,11 +325,11 @@ The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
To achieve this, it is crucial to keep in mind typical usage scenarios of the
|
|
|
|
To achieve this, it is crucial to keep in mind typical usage scenarios of the
|
|
|
|
system and try to make these typical tasks as simple as possible.
|
|
|
|
system and try to make these typical tasks as simple as possible.
|
|
|
|
|
|
|
|
|
|
|
|
The system has a database of users. Each user has a role assigned,
|
|
|
|
The system has a database of users. Each user has a role assigned, which
|
|
|
|
which correspond to his/her privileges. User can be logged in via
|
|
|
|
correspond to his/her privileges. User can be logged in via email and password
|
|
|
|
email and password or using the university system. There are groups of users, which
|
|
|
|
or using the university system. There are groups of users, which corresponds to
|
|
|
|
corresponds to the lectured courses. Groups can be hierarchically ordered to reflect
|
|
|
|
the lectured courses. Groups can be hierarchically ordered to reflect additional
|
|
|
|
additional metadata such as the academic year. For example, a reasonable group hierarchy
|
|
|
|
metadata such as the academic year. For example, a reasonable group hierarchy
|
|
|
|
can look like this:
|
|
|
|
can look like this:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
```
|
|
|
@ -337,67 +343,63 @@ Summer term 2016
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
In this example, students are members of the leaf groups, the higher level groups
|
|
|
|
In this example, students are members of the leaf groups, the higher level
|
|
|
|
are just for keeping the related groups together. The hierarchy tree can be modified and
|
|
|
|
groups are just for keeping the related groups together. The hierarchy tree can
|
|
|
|
altered to fit specific needs of the university or any other organization, even the
|
|
|
|
be modified and altered to fit specific needs of the university or any other
|
|
|
|
flat structure (i.e., no hierarchy) is possible.
|
|
|
|
organization, even the flat structure (i.e., no hierarchy) is possible.
|
|
|
|
|
|
|
|
|
|
|
|
One user can be part of multiple groups and also one group can of course have multiple
|
|
|
|
One user can be part of multiple groups and also one group can of course have
|
|
|
|
users. Each user in a group has also a specific role for the given group.
|
|
|
|
multiple users. Each user in a group has also a specific role for the given
|
|
|
|
Priviledged user (supervisor) can assign a new exercise in his/her group, change assignment
|
|
|
|
group. Priviledged user (supervisor) can assign a new exercise in his/her
|
|
|
|
details, view results of other users and manually change them. Normal user (student) can
|
|
|
|
group, change assignment details, view results of other users and manually
|
|
|
|
join a group, get list of assigned exercises, view assignment detail, submit
|
|
|
|
change them. Normal user (student) can join a group, get list of assigned
|
|
|
|
his/her solution and view the results of the evaluation.
|
|
|
|
exercises, view assignment detail, submit his/her solution and view the results
|
|
|
|
|
|
|
|
of the evaluation.
|
|
|
|
|
|
|
|
|
|
|
|
Database of exercises (algorithmic problems) is another part of the project.
|
|
|
|
Database of exercises (algorithmic problems) is another part of the project.
|
|
|
|
Each exercise consists of a text in multiple language variants, an evaluation
|
|
|
|
Each exercise consists of a text in multiple language variants, an evaluation
|
|
|
|
configuration and a set of inputs and reference outputs. Exercises are created by
|
|
|
|
configuration and a set of inputs and reference outputs. Exercises are created
|
|
|
|
instructed priviledged users. Assigning an exercise to a group means to choose
|
|
|
|
by instructed priviledged users. Assigning an exercise to a group means to
|
|
|
|
one of the available exercises and specifying additional properties. An assignment
|
|
|
|
choose one of the available exercises and specifying additional properties. An
|
|
|
|
has a deadline (optionally a second deadline), a maximum amount of points,
|
|
|
|
assignment has a deadline (optionally a second deadline), a maximum amount of
|
|
|
|
a configuration for calculating the final score, a maximum number of submissions,
|
|
|
|
points, a configuration for calculating the final score, a maximum number of
|
|
|
|
and a list of supported runtime environemnts (e.g., programming languages) including
|
|
|
|
submissions, and a list of supported runtime environemnts (e.g., programming
|
|
|
|
specific time and memory limits for the sandboxed tasks.
|
|
|
|
languages) including specific time and memory limits for the sandboxed tasks.
|
|
|
|
|
|
|
|
|
|
|
|
#### Exercise evaluation chain
|
|
|
|
#### Exercise evaluation chain
|
|
|
|
|
|
|
|
|
|
|
|
The most important part of the system is the evaluation of the solutions
|
|
|
|
The most important part of the system is the evaluation of the solutions
|
|
|
|
submitted by the users for their assigned exercises.
|
|
|
|
submitted by the users for their assigned exercises. Concepts of consecutive
|
|
|
|
|
|
|
|
steps from source code of solution to results is described on architecture with
|
|
|
|
<!-- I really think this part is redundant - or should be described in a totally different way -->
|
|
|
|
two layer -- presentation (frontend) and executive (backend).
|
|
|
|
|
|
|
|
|
|
|
|
~~For imaginary system architecture _UI_, _API_, _Broker_ and _Worker_ this goes as follows.~~
|
|
|
|
First thing users have to do is to submit their solutions to _frontend_ which
|
|
|
|
|
|
|
|
provides interface to upload files and then submit them. It checks the
|
|
|
|
First thing users have to do is to submit their solutions to _UI_ which provides
|
|
|
|
assignment invariants (deadlines, count of submissions, ...) and stores
|
|
|
|
interface to upload files and then submit them. _UI_ sends a request to _API_
|
|
|
|
submitted files. The runtime environment is automatically detected based on
|
|
|
|
that user wants to evaluate assignment with provided files.
|
|
|
|
input files and suitable exercise configuration variant is chosen (one exercise
|
|
|
|
|
|
|
|
can have multiple variants, for example C and Java languages). Matching exercise
|
|
|
|
_API_ checks the assignment invariants (deadlines, count of submissions, ...)
|
|
|
|
configuration is then send to _backend_ alongside solution source files.
|
|
|
|
and stores submitted files. The runtime environment is automatically detected
|
|
|
|
|
|
|
|
based on input files and suitable exercise configuration variant is chosen (one
|
|
|
|
_Backend_ can have multiple engines to allow processing more jobs in parallel
|
|
|
|
exercise can have multiple variants, for example C and Java languages). Matching
|
|
|
|
and a loadbalancer, which tracks states of incoming jobs and performs scheduling
|
|
|
|
exercise configuration is then send to _Broker_ alongside solution source files.
|
|
|
|
of them. The decission is made based on capabilities of each engine and also job
|
|
|
|
|
|
|
|
requirements. When a match is found, the job is held until the particular engine
|
|
|
|
_Broker_ has to find suitable _Worker_ for execution of this particular
|
|
|
|
is jobless and can receive an evaluation request.
|
|
|
|
submission. This decission is made based on capabilities of each _Worker_ and
|
|
|
|
|
|
|
|
job requirements. When a match is found, the job is held until the _Worker_ is
|
|
|
|
Job processing itself stars with obtaining source files and job configuration.
|
|
|
|
jobless and can receive an evaluation request.
|
|
|
|
The configuration is parsed into small tasks with simple piece of work.
|
|
|
|
|
|
|
|
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
|
|
|
|
_Worker_ gets evaluation request with source files and job configuration. The
|
|
|
|
executive computer secure and stable, so isolated sandboxed environment is used
|
|
|
|
configuration is parsed into small tasks with simple piece of work. Evaluation
|
|
|
|
when dealing with unknown source code. When the execution is finished, results
|
|
|
|
itself goes in direction of tasks ordering. It is crucial to keep _Worker_
|
|
|
|
are uploaded back to _frontend_.
|
|
|
|
machine secure and stable, so isolated sandboxed environment is used when
|
|
|
|
|
|
|
|
dealing with unknown source code. When the execution is finished, results are
|
|
|
|
The _frontend_ is immediately notified about finished job. The outcomes are
|
|
|
|
uploaded back.
|
|
|
|
parsed and results of important tasks (comparing actual and expected results)
|
|
|
|
|
|
|
|
saved into storage. Also, points are calculated depending on solution
|
|
|
|
_API_ is notified about finished job from _Broker_. The results are parsed and
|
|
|
|
correctness and assignment configuration. Data presented back to users includes
|
|
|
|
results of important tasks (comparing actual and expected results) saved into
|
|
|
|
overview which part succeeded and which failed (optionally with reason like
|
|
|
|
database. Also, points are calculated depending on solution correctness and
|
|
|
|
"memory limit exceeded") and amount of awarded points.
|
|
|
|
assignment configuration.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
_UI_ then only displays results summary fetched from the _API_. Presented data
|
|
|
|
|
|
|
|
includes overview which part succeeded and which failed (optionally with reason
|
|
|
|
|
|
|
|
like "memory limit exceeded") and amount of awarded points.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Analysis
|
|
|
|
# Analysis
|
|
|
|