|
|
|
@ -79,41 +79,130 @@ code on testing inputs with checking the outputs against reference ones; needs
|
|
|
|
|
sandboxing, but provides good real world experience).
|
|
|
|
|
|
|
|
|
|
This project focuses on the machine-controlled part of source code evaluation.
|
|
|
|
|
First, the problems of the software used at Charles University in Prague
|
|
|
|
|
previously were discussed and similar projects at other educational institutions
|
|
|
|
|
were examined. With acquired knowledge from such projects in production, we set
|
|
|
|
|
up goals for the new evaluation system, designed the architecture and
|
|
|
|
|
implemented a fully operational solution. The system is now ready for production
|
|
|
|
|
testing at our university.
|
|
|
|
|
First, general concepts of grading systems are observed, new requirements are
|
|
|
|
|
specified and project with similar functionality are examined. Also, problems of
|
|
|
|
|
the software previously used at Charles University in Prague are briefly
|
|
|
|
|
discussed. With acquired knowledge from such projects in production, we set up
|
|
|
|
|
goals for the new evaluation system, designed the architecture and implemented a
|
|
|
|
|
fully operational solution. The system is now ready for production testing at
|
|
|
|
|
the university.
|
|
|
|
|
|
|
|
|
|
## Assignment
|
|
|
|
|
|
|
|
|
|
The major goal of this project is to create a grading application that will be
|
|
|
|
|
used for programming classes at the Faculty of Mathematics and Physics of the
|
|
|
|
|
Charles University in Prague. However, the application should be designed in a
|
|
|
|
|
modular fashion so that it can be easily extended or modified to make other ways
|
|
|
|
|
of using it possible.
|
|
|
|
|
modular fashion to be easily extended ori even modified to make other ways of
|
|
|
|
|
usage possible.
|
|
|
|
|
|
|
|
|
|
The system should be capable of dynamic analysis of programming code. It means,
|
|
|
|
|
that following four basic steps have to be supported:
|
|
|
|
|
|
|
|
|
|
1. compile the code and check for compilation errors
|
|
|
|
|
2. run compiled binary in a sandbox with predefined inputs
|
|
|
|
|
3. check constraints on used amount of memory and time
|
|
|
|
|
4. compare program outpus with predefined values
|
|
|
|
|
|
|
|
|
|
The project has a great starting point -- there is an old grading system
|
|
|
|
|
currently used at the university (CodEx), so its flaws and weaknesses can be
|
|
|
|
|
addressed. Furthermore, many teachers are willing to use and test the new
|
|
|
|
|
system. Following requirements were collected both from our personal experience
|
|
|
|
|
with CodEx and from teachers' requests.
|
|
|
|
|
addressed. Furthermore, many teachers desire to use and test the new system and
|
|
|
|
|
they are willing to consult ideas or problems during development.
|
|
|
|
|
|
|
|
|
|
### Intended usage
|
|
|
|
|
|
|
|
|
|
The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
|
To achieve this, it is crucial to keep in mind typical usage scenarios of the
|
|
|
|
|
system and try to make these tasks as simple as possible.
|
|
|
|
|
|
|
|
|
|
The system has a database of users. Each user has assigned one role, which
|
|
|
|
|
corresponds to his/her privileges. There are user groups reflecting structure of
|
|
|
|
|
lectured courses. Groups can be hierarchically ordered to reflect additional
|
|
|
|
|
metadata such as the academic year. For example, a reasonable group hierarchy
|
|
|
|
|
can look like this:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Summer term 2016
|
|
|
|
|
|-- Language C# and .NET platform
|
|
|
|
|
| |-- Labs Monday 10:30
|
|
|
|
|
| `-- Labs Thursday 9:00
|
|
|
|
|
|-- Programming I
|
|
|
|
|
| |-- Labs Monday 14:00
|
|
|
|
|
...
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
In this example, students are members of the leaf groups, the higher level
|
|
|
|
|
entities are just for keeping the related groups together. The hierarchy
|
|
|
|
|
structure can be modified and altered to fit specific needs of the university or
|
|
|
|
|
any other organization, even the flat structure (i.e., no hierarchy) is
|
|
|
|
|
possible. One user can be part of multiple groups and on the other hand one
|
|
|
|
|
group can have multiple users. Each user has a specific role for every group in
|
|
|
|
|
which is a member.
|
|
|
|
|
|
|
|
|
|
Database of exercises (algorithmic problems) is another part of the project.
|
|
|
|
|
Each exercise consists of a text in multiple language variants, an evaluation
|
|
|
|
|
configuration and a set of inputs and reference outputs. Exercises are created
|
|
|
|
|
by instructed priviledged users. Assigning an exercise to a group means to
|
|
|
|
|
choose one of the available exercises and specifying additional properties. An
|
|
|
|
|
assignment has a deadline (optionally a second deadline), a maximum amount of
|
|
|
|
|
points, a configuration for calculating the final score, a maximum number of
|
|
|
|
|
submissions, and a list of supported runtime environemnts (e.g., programming
|
|
|
|
|
languages) including specific time and memory limits for the sandboxed tasks.
|
|
|
|
|
|
|
|
|
|
Typical use cases for supported user roles are ilustrated on following picture:
|
|
|
|
|
|
|
|
|
|
@todo: UML use case diagram (and improve or delete following paragraph)
|
|
|
|
|
|
|
|
|
|
Priviledged user (supervisor) can create exercise, assign it in his/her group,
|
|
|
|
|
change assignment details, view results of his/her students and manually alter
|
|
|
|
|
them. Normal user (student) can join a group, get list of assigned exercises,
|
|
|
|
|
view assignment detail, submit his/her solution and view the results of the
|
|
|
|
|
evaluation.
|
|
|
|
|
|
|
|
|
|
#### Exercise evaluation chain
|
|
|
|
|
|
|
|
|
|
The most important part of the system is evaluation of solutions submitted by
|
|
|
|
|
students. Concepts of consecutive steps from source code to final results
|
|
|
|
|
is described in more detail below to give readers solid overview of what have to
|
|
|
|
|
happen during evaluation process.
|
|
|
|
|
|
|
|
|
|
First thing users have to do is to submit their solutions through some user
|
|
|
|
|
interface. Then, the system checks assignment invariants (deadlines, count
|
|
|
|
|
of submissions, ...) and stores submitted files. The runtime environment is
|
|
|
|
|
automatically detected based on input files and suitable exercise configuration
|
|
|
|
|
variant is chosen (one exercise can have multiple variants, for example C and
|
|
|
|
|
Java languages). Matching exercise configuration is then used for taking care of
|
|
|
|
|
evaluation process.
|
|
|
|
|
|
|
|
|
|
There is a pool of worker computers dedicated to processing jobs. Some of them
|
|
|
|
|
may have different environment to allow testing programs in different
|
|
|
|
|
conditions. Incoming jobs are scheduled to particular worker depending on its
|
|
|
|
|
capabilities and job requirements.
|
|
|
|
|
|
|
|
|
|
Job processing itself stars with obtaining source files and job configuration.
|
|
|
|
|
The configuration is parsed into small tasks with simple piece of work.
|
|
|
|
|
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
|
|
|
|
|
executive computer secure and stable, so isolated sandboxed environment is used
|
|
|
|
|
when dealing with unknown source code. When the execution is finished, results
|
|
|
|
|
are saved.
|
|
|
|
|
|
|
|
|
|
Results from worker contains only output data from processed tasks (this could
|
|
|
|
|
be return value, consumed time, ...). On top of that, one value is calculated to
|
|
|
|
|
express overall quality of the tested job. It is used as points for final
|
|
|
|
|
student grading. Calculation method of this value may be different for each
|
|
|
|
|
assignment. Data presented back to users include overview of job parts (which
|
|
|
|
|
succeeded and which failed, optionally with reason like "memory limit exceeded")
|
|
|
|
|
and achieved score (amount of awarded points).
|
|
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
|
|
There are bunch of different requirements for the system. Some of them are
|
|
|
|
|
features which are necessary for any system for evaluation of programming coding
|
|
|
|
|
assignments. Some of them are specific for university deployment and some are
|
|
|
|
|
wishes for new features collected for period of CodEx operation.
|
|
|
|
|
|
|
|
|
|
CodEx satisfies all the basic requirements and a few more that originate from
|
|
|
|
|
the way courses are organized at university environment -- for example students
|
|
|
|
|
are divided into groups that correspond to lab groups. New wishes arose during
|
|
|
|
|
the ten year long lifetime of the old system. There are not many ways to improve
|
|
|
|
|
it from the perspective of a student, but a lot of feature requests came from
|
|
|
|
|
administrators and supervisors. The ideas were mostly gathered from meetings
|
|
|
|
|
with faculty staff involved with the current system.
|
|
|
|
|
necessary for any system for source code evaluation. Some of them are specific
|
|
|
|
|
for university deployment and some of them arose during the ten year long
|
|
|
|
|
lifetime of the old system. There are not many ways to improve CodEx
|
|
|
|
|
experience from the perspective of a student, but a lot of feature requests
|
|
|
|
|
came from administrators and supervisors. The ideas were gathered mostly our
|
|
|
|
|
personal experience with the system and from meetings with faculty staff
|
|
|
|
|
involved with the current system.
|
|
|
|
|
|
|
|
|
|
For clear arragement all the requirements and wishes are presented grouped by
|
|
|
|
|
categories.
|
|
|
|
@ -136,9 +225,9 @@ They describe the evaluation system in general and also university addons
|
|
|
|
|
specific properties set (deadlines, etc.)
|
|
|
|
|
- there is a list of submitted solutions for each assignment with corresponding
|
|
|
|
|
results
|
|
|
|
|
- teachers can specify scale of points which will be awarted to the students
|
|
|
|
|
depending on the correctness of his/her solution for each assignment extra
|
|
|
|
|
(expressed in percentage points)
|
|
|
|
|
- teachers can specify way of computation grading points which will be awarted
|
|
|
|
|
to the students depending on the quality of his/her solution for each
|
|
|
|
|
assignment extra
|
|
|
|
|
- teachers can view detailed data about their students (users of a their groups)
|
|
|
|
|
including all submitted solutions; also, each of the solution can be manually
|
|
|
|
|
reviewed, commented and assigned additional points (positive or negative)
|
|
|
|
@ -158,13 +247,11 @@ They describe the evaluation system in general and also university addons
|
|
|
|
|
mainly for viewing assigned exercises, uploading their own solutions to the
|
|
|
|
|
assignments, and viewing the results of the solutions after an automatic
|
|
|
|
|
evaluation is finished; wanted two interfaces are web and command-line based
|
|
|
|
|
- administrators can manage users with support of roles (at least two --
|
|
|
|
|
_student_ and _supervisor_)
|
|
|
|
|
- user priviledge separation (at least two roles -- _student_ and _supervisor_)
|
|
|
|
|
- logging in through a university authentication system (e.g. LDAP)
|
|
|
|
|
- SIS (university information system) integration for fetching personal user
|
|
|
|
|
data
|
|
|
|
|
- administrators can depend on a safe environment in which the students'
|
|
|
|
|
solutions will be executed
|
|
|
|
|
- safe environment in which the students' solutions are executed
|
|
|
|
|
- support for multiple programming environments at once to avoid unacceptable
|
|
|
|
|
workload for administrator (maintain separate installations for many courses)
|
|
|
|
|
and high hardware occupation
|
|
|
|
@ -263,12 +350,12 @@ Valgrind for memory leaks or array boundaries checks via `mudflap` library.
|
|
|
|
|
### Codility
|
|
|
|
|
|
|
|
|
|
[Codility](https://codility.com/) is a web based solution primary targeted to
|
|
|
|
|
company recruiters. It is a commercial product available as a SaaS and it supports 16
|
|
|
|
|
programming languages. The
|
|
|
|
|
company recruiters. It is a commercial product available as a SaaS and it
|
|
|
|
|
supports 16 programming languages. The
|
|
|
|
|
[UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png)
|
|
|
|
|
of Codility is [opensource](https://github.com/Codility/cui), the rest of
|
|
|
|
|
source code is not available. One interesting feature is 'task timeline' --
|
|
|
|
|
captured progress of writing code for each user.
|
|
|
|
|
of Codility is [opensource](https://github.com/Codility/cui), the rest of source
|
|
|
|
|
code is not available. One interesting feature is 'task timeline' -- captured
|
|
|
|
|
progress of writing code for each user.
|
|
|
|
|
|
|
|
|
|
### CMS
|
|
|
|
|
|
|
|
|
@ -300,90 +387,6 @@ exercises. Kattis is primarily used by programming contest organizators, company
|
|
|
|
|
recruiters and also some universities.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
### Intended usage
|
|
|
|
|
|
|
|
|
|
The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
|
To achieve this, it is crucial to keep in mind typical usage scenarios of the
|
|
|
|
|
system and try to make these typical tasks as simple as possible.
|
|
|
|
|
|
|
|
|
|
The system has a database of users. Each user has a role assigned, which
|
|
|
|
|
correspond to his/her privileges. User can be logged in via email and password
|
|
|
|
|
or using the university system. There are groups of users, which corresponds to
|
|
|
|
|
the lectured courses. Groups can be hierarchically ordered to reflect additional
|
|
|
|
|
metadata such as the academic year. For example, a reasonable group hierarchy
|
|
|
|
|
can look like this:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Summer term 2016
|
|
|
|
|
|-- Language C# and .NET platform
|
|
|
|
|
| |-- Labs Monday 10:30
|
|
|
|
|
| `-- Labs Thursday 9:00
|
|
|
|
|
|-- Programming I
|
|
|
|
|
| |-- Labs Monday 14:00
|
|
|
|
|
...
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
In this example, students are members of the leaf groups, the higher level
|
|
|
|
|
groups are just for keeping the related groups together. The hierarchy tree can
|
|
|
|
|
be modified and altered to fit specific needs of the university or any other
|
|
|
|
|
organization, even the flat structure (i.e., no hierarchy) is possible.
|
|
|
|
|
|
|
|
|
|
One user can be part of multiple groups and also one group can of course have
|
|
|
|
|
multiple users. Each user in a group has also a specific role for the given
|
|
|
|
|
group. Priviledged user (supervisor) can assign a new exercise in his/her
|
|
|
|
|
group, change assignment details, view results of other users and manually
|
|
|
|
|
change them. Normal user (student) can join a group, get list of assigned
|
|
|
|
|
exercises, view assignment detail, submit his/her solution and view the results
|
|
|
|
|
of the evaluation.
|
|
|
|
|
|
|
|
|
|
Database of exercises (algorithmic problems) is another part of the project.
|
|
|
|
|
Each exercise consists of a text in multiple language variants, an evaluation
|
|
|
|
|
configuration and a set of inputs and reference outputs. Exercises are created
|
|
|
|
|
by instructed priviledged users. Assigning an exercise to a group means to
|
|
|
|
|
choose one of the available exercises and specifying additional properties. An
|
|
|
|
|
assignment has a deadline (optionally a second deadline), a maximum amount of
|
|
|
|
|
points, a configuration for calculating the final score, a maximum number of
|
|
|
|
|
submissions, and a list of supported runtime environemnts (e.g., programming
|
|
|
|
|
languages) including specific time and memory limits for the sandboxed tasks.
|
|
|
|
|
|
|
|
|
|
#### Exercise evaluation chain
|
|
|
|
|
|
|
|
|
|
The most important part of the system is the evaluation of the solutions
|
|
|
|
|
submitted by the users for their assigned exercises. Concepts of consecutive
|
|
|
|
|
steps from source code of solution to results is described on architecture with
|
|
|
|
|
two layer -- presentation (_frontend_) and executive (_backend_).
|
|
|
|
|
|
|
|
|
|
First thing users have to do is to submit their solutions to _frontend_ which
|
|
|
|
|
provides interface to upload files and then submit them. It checks the
|
|
|
|
|
assignment invariants (deadlines, count of submissions, ...) and stores
|
|
|
|
|
submitted files. The runtime environment is automatically detected based on
|
|
|
|
|
input files and suitable exercise configuration variant is chosen (one exercise
|
|
|
|
|
can have multiple variants, for example C and Java languages). Matching exercise
|
|
|
|
|
configuration is then send to _backend_ alongside solution source files.
|
|
|
|
|
|
|
|
|
|
_Backend_ can have multiple engines to allow processing more jobs in parallel
|
|
|
|
|
and a loadbalancer, which tracks states of incoming jobs and performs scheduling
|
|
|
|
|
of them. The decission is made based on capabilities of each engine and also job
|
|
|
|
|
requirements. When a match is found, the job is held until the particular engine
|
|
|
|
|
is jobless and can receive an evaluation request.
|
|
|
|
|
|
|
|
|
|
Job processing itself stars with obtaining source files and job configuration.
|
|
|
|
|
The configuration is parsed into small tasks with simple piece of work.
|
|
|
|
|
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
|
|
|
|
|
executive computer secure and stable, so isolated sandboxed environment is used
|
|
|
|
|
when dealing with unknown source code. When the execution is finished, results
|
|
|
|
|
are uploaded back to _frontend_.
|
|
|
|
|
|
|
|
|
|
The _frontend_ is immediately notified about finished job. The outcomes are
|
|
|
|
|
parsed and results of important tasks (comparing actual and expected results)
|
|
|
|
|
saved into storage. Also, points are calculated depending on solution
|
|
|
|
|
correctness and assignment configuration. Data presented back to users includes
|
|
|
|
|
overview which part succeeded and which failed (optionally with reason like
|
|
|
|
|
"memory limit exceeded") and amount of awarded points.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# Analysis
|
|
|
|
|
|
|
|
|
|
## ReCodEx goals
|
|
|
|
@ -1671,6 +1674,6 @@ used.
|
|
|
|
|
chdir: ${EVAL_DIR}
|
|
|
|
|
```
|
|
|
|
|
<!---
|
|
|
|
|
// vim: set formatoptions=tqan flp+=\\\|^\\*\\s* textwidth=80 colorcolumn=+1:
|
|
|
|
|
// vim: set formatoptions=tqn flp+=\\\|^\\*\\s* textwidth=80 colorcolumn=+1:
|
|
|
|
|
-->
|
|
|
|
|
|
|
|
|
|