Intro update

master
Petr Stefan 8 years ago
parent 58aeaf5cce
commit 4c9dac022f

@ -79,41 +79,130 @@ code on testing inputs with checking the outputs against reference ones; needs
sandboxing, but provides good real world experience).
This project focuses on the machine-controlled part of source code evaluation.
First, the problems of the software used at Charles University in Prague
previously were discussed and similar projects at other educational institutions
were examined. With acquired knowledge from such projects in production, we set
up goals for the new evaluation system, designed the architecture and
implemented a fully operational solution. The system is now ready for production
testing at our university.
First, general concepts of grading systems are observed, new requirements are
specified and project with similar functionality are examined. Also, problems of
the software previously used at Charles University in Prague are briefly
discussed. With acquired knowledge from such projects in production, we set up
goals for the new evaluation system, designed the architecture and implemented a
fully operational solution. The system is now ready for production testing at
the university.
## Assignment
The major goal of this project is to create a grading application that will be
used for programming classes at the Faculty of Mathematics and Physics of the
Charles University in Prague. However, the application should be designed in a
modular fashion so that it can be easily extended or modified to make other ways
of using it possible.
modular fashion to be easily extended ori even modified to make other ways of
usage possible.
The system should be capable of dynamic analysis of programming code. It means,
that following four basic steps have to be supported:
1. compile the code and check for compilation errors
2. run compiled binary in a sandbox with predefined inputs
3. check constraints on used amount of memory and time
4. compare program outpus with predefined values
The project has a great starting point -- there is an old grading system
currently used at the university (CodEx), so its flaws and weaknesses can be
addressed. Furthermore, many teachers are willing to use and test the new
system. Following requirements were collected both from our personal experience
with CodEx and from teachers' requests.
addressed. Furthermore, many teachers desire to use and test the new system and
they are willing to consult ideas or problems during development.
### Intended usage
The whole system is intended to help both teachers (supervisors) and students.
To achieve this, it is crucial to keep in mind typical usage scenarios of the
system and try to make these tasks as simple as possible.
The system has a database of users. Each user has assigned one role, which
corresponds to his/her privileges. There are user groups reflecting structure of
lectured courses. Groups can be hierarchically ordered to reflect additional
metadata such as the academic year. For example, a reasonable group hierarchy
can look like this:
```
Summer term 2016
|-- Language C# and .NET platform
|   |-- Labs Monday 10:30
|   `-- Labs Thursday 9:00
|-- Programming I
|   |-- Labs Monday 14:00
...
```
In this example, students are members of the leaf groups, the higher level
entities are just for keeping the related groups together. The hierarchy
structure can be modified and altered to fit specific needs of the university or
any other organization, even the flat structure (i.e., no hierarchy) is
possible. One user can be part of multiple groups and on the other hand one
group can have multiple users. Each user has a specific role for every group in
which is a member.
Database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of a text in multiple language variants, an evaluation
configuration and a set of inputs and reference outputs. Exercises are created
by instructed priviledged users. Assigning an exercise to a group means to
choose one of the available exercises and specifying additional properties. An
assignment has a deadline (optionally a second deadline), a maximum amount of
points, a configuration for calculating the final score, a maximum number of
submissions, and a list of supported runtime environemnts (e.g., programming
languages) including specific time and memory limits for the sandboxed tasks.
Typical use cases for supported user roles are ilustrated on following picture:
@todo: UML use case diagram (and improve or delete following paragraph)
Priviledged user (supervisor) can create exercise, assign it in his/her group,
change assignment details, view results of his/her students and manually alter
them. Normal user (student) can join a group, get list of assigned exercises,
view assignment detail, submit his/her solution and view the results of the
evaluation.
#### Exercise evaluation chain
The most important part of the system is evaluation of solutions submitted by
students. Concepts of consecutive steps from source code to final results
is described in more detail below to give readers solid overview of what have to
happen during evaluation process.
First thing users have to do is to submit their solutions through some user
interface. Then, the system checks assignment invariants (deadlines, count
of submissions, ...) and stores submitted files. The runtime environment is
automatically detected based on input files and suitable exercise configuration
variant is chosen (one exercise can have multiple variants, for example C and
Java languages). Matching exercise configuration is then used for taking care of
evaluation process.
There is a pool of worker computers dedicated to processing jobs. Some of them
may have different environment to allow testing programs in different
conditions. Incoming jobs are scheduled to particular worker depending on its
capabilities and job requirements.
Job processing itself stars with obtaining source files and job configuration.
The configuration is parsed into small tasks with simple piece of work.
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
executive computer secure and stable, so isolated sandboxed environment is used
when dealing with unknown source code. When the execution is finished, results
are saved.
Results from worker contains only output data from processed tasks (this could
be return value, consumed time, ...). On top of that, one value is calculated to
express overall quality of the tested job. It is used as points for final
student grading. Calculation method of this value may be different for each
assignment. Data presented back to users include overview of job parts (which
succeeded and which failed, optionally with reason like "memory limit exceeded")
and achieved score (amount of awarded points).
## Requirements
There are bunch of different requirements for the system. Some of them are
features which are necessary for any system for evaluation of programming coding
assignments. Some of them are specific for university deployment and some are
wishes for new features collected for period of CodEx operation.
CodEx satisfies all the basic requirements and a few more that originate from
the way courses are organized at university environment -- for example students
are divided into groups that correspond to lab groups. New wishes arose during
the ten year long lifetime of the old system. There are not many ways to improve
it from the perspective of a student, but a lot of feature requests came from
administrators and supervisors. The ideas were mostly gathered from meetings
with faculty staff involved with the current system.
necessary for any system for source code evaluation. Some of them are specific
for university deployment and some of them arose during the ten year long
lifetime of the old system. There are not many ways to improve CodEx
experience from the perspective of a student, but a lot of feature requests
came from administrators and supervisors. The ideas were gathered mostly our
personal experience with the system and from meetings with faculty staff
involved with the current system.
For clear arragement all the requirements and wishes are presented grouped by
categories.
@ -136,9 +225,9 @@ They describe the evaluation system in general and also university addons
specific properties set (deadlines, etc.)
- there is a list of submitted solutions for each assignment with corresponding
results
- teachers can specify scale of points which will be awarted to the students
depending on the correctness of his/her solution for each assignment extra
(expressed in percentage points)
- teachers can specify way of computation grading points which will be awarted
to the students depending on the quality of his/her solution for each
assignment extra
- teachers can view detailed data about their students (users of a their groups)
including all submitted solutions; also, each of the solution can be manually
reviewed, commented and assigned additional points (positive or negative)
@ -158,13 +247,11 @@ They describe the evaluation system in general and also university addons
mainly for viewing assigned exercises, uploading their own solutions to the
assignments, and viewing the results of the solutions after an automatic
evaluation is finished; wanted two interfaces are web and command-line based
- administrators can manage users with support of roles (at least two --
_student_ and _supervisor_)
- user priviledge separation (at least two roles -- _student_ and _supervisor_)
- logging in through a university authentication system (e.g. LDAP)
- SIS (university information system) integration for fetching personal user
data
- administrators can depend on a safe environment in which the students'
solutions will be executed
- safe environment in which the students' solutions are executed
- support for multiple programming environments at once to avoid unacceptable
workload for administrator (maintain separate installations for many courses)
and high hardware occupation
@ -263,12 +350,12 @@ Valgrind for memory leaks or array boundaries checks via `mudflap` library.
### Codility
[Codility](https://codility.com/) is a web based solution primary targeted to
company recruiters. It is a commercial product available as a SaaS and it supports 16
programming languages. The
company recruiters. It is a commercial product available as a SaaS and it
supports 16 programming languages. The
[UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png)
of Codility is [opensource](https://github.com/Codility/cui), the rest of
source code is not available. One interesting feature is 'task timeline' --
captured progress of writing code for each user.
of Codility is [opensource](https://github.com/Codility/cui), the rest of source
code is not available. One interesting feature is 'task timeline' -- captured
progress of writing code for each user.
### CMS
@ -300,90 +387,6 @@ exercises. Kattis is primarily used by programming contest organizators, company
recruiters and also some universities.
### Intended usage
The whole system is intended to help both teachers (supervisors) and students.
To achieve this, it is crucial to keep in mind typical usage scenarios of the
system and try to make these typical tasks as simple as possible.
The system has a database of users. Each user has a role assigned, which
correspond to his/her privileges. User can be logged in via email and password
or using the university system. There are groups of users, which corresponds to
the lectured courses. Groups can be hierarchically ordered to reflect additional
metadata such as the academic year. For example, a reasonable group hierarchy
can look like this:
```
Summer term 2016
|-- Language C# and .NET platform
|   |-- Labs Monday 10:30
|   `-- Labs Thursday 9:00
|-- Programming I
|   |-- Labs Monday 14:00
...
```
In this example, students are members of the leaf groups, the higher level
groups are just for keeping the related groups together. The hierarchy tree can
be modified and altered to fit specific needs of the university or any other
organization, even the flat structure (i.e., no hierarchy) is possible.
One user can be part of multiple groups and also one group can of course have
multiple users. Each user in a group has also a specific role for the given
group. Priviledged user (supervisor) can assign a new exercise in his/her
group, change assignment details, view results of other users and manually
change them. Normal user (student) can join a group, get list of assigned
exercises, view assignment detail, submit his/her solution and view the results
of the evaluation.
Database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of a text in multiple language variants, an evaluation
configuration and a set of inputs and reference outputs. Exercises are created
by instructed priviledged users. Assigning an exercise to a group means to
choose one of the available exercises and specifying additional properties. An
assignment has a deadline (optionally a second deadline), a maximum amount of
points, a configuration for calculating the final score, a maximum number of
submissions, and a list of supported runtime environemnts (e.g., programming
languages) including specific time and memory limits for the sandboxed tasks.
#### Exercise evaluation chain
The most important part of the system is the evaluation of the solutions
submitted by the users for their assigned exercises. Concepts of consecutive
steps from source code of solution to results is described on architecture with
two layer -- presentation (_frontend_) and executive (_backend_).
First thing users have to do is to submit their solutions to _frontend_ which
provides interface to upload files and then submit them. It checks the
assignment invariants (deadlines, count of submissions, ...) and stores
submitted files. The runtime environment is automatically detected based on
input files and suitable exercise configuration variant is chosen (one exercise
can have multiple variants, for example C and Java languages). Matching exercise
configuration is then send to _backend_ alongside solution source files.
_Backend_ can have multiple engines to allow processing more jobs in parallel
and a loadbalancer, which tracks states of incoming jobs and performs scheduling
of them. The decission is made based on capabilities of each engine and also job
requirements. When a match is found, the job is held until the particular engine
is jobless and can receive an evaluation request.
Job processing itself stars with obtaining source files and job configuration.
The configuration is parsed into small tasks with simple piece of work.
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
executive computer secure and stable, so isolated sandboxed environment is used
when dealing with unknown source code. When the execution is finished, results
are uploaded back to _frontend_.
The _frontend_ is immediately notified about finished job. The outcomes are
parsed and results of important tasks (comparing actual and expected results)
saved into storage. Also, points are calculated depending on solution
correctness and assignment configuration. Data presented back to users includes
overview which part succeeded and which failed (optionally with reason like
"memory limit exceeded") and amount of awarded points.
# Analysis
## ReCodEx goals
@ -1671,6 +1674,6 @@ used.
chdir: ${EVAL_DIR}
```
<!---
// vim: set formatoptions=tqan flp+=\\\|^\\*\\s* textwidth=80 colorcolumn=+1:
// vim: set formatoptions=tqn flp+=\\\|^\\*\\s* textwidth=80 colorcolumn=+1:
-->

Loading…
Cancel
Save