|
|
@ -61,22 +61,28 @@ knowledge are more suitable for this practical type of learning than others, and
|
|
|
|
fortunately, programming is one of them.
|
|
|
|
fortunately, programming is one of them.
|
|
|
|
|
|
|
|
|
|
|
|
University education system is one of the areas where this knowledge can be
|
|
|
|
University education system is one of the areas where this knowledge can be
|
|
|
|
applied. In computer programming, there are several requirements such as the
|
|
|
|
applied. In computer programming, there are several requirements a program
|
|
|
|
code being syntactically correct, efficient and easy to read, maintain and
|
|
|
|
should satify, such as the code being syntactically correct, efficient and easy
|
|
|
|
extend. Correctness and efficiency can be tested automatically to help teachers
|
|
|
|
to read, maintain and extend.
|
|
|
|
save time for their research, but reviewing bad design, bad coding habits and
|
|
|
|
|
|
|
|
logical mistakes is really hard to automate and requires manpower.
|
|
|
|
Checking programs written by students takes time and requires a lot of
|
|
|
|
|
|
|
|
mechanical, repetitive work -- reviewing source codes, compiling them and
|
|
|
|
Checking programs written by students takes a lot of time and requires a lot of
|
|
|
|
running them through testing scenarios. It is therefore desirable to automate as
|
|
|
|
mechanical, repetitive work. The first idea of an automatic evaluation system
|
|
|
|
much of this work as possible. The first idea of an automatic evaluation system
|
|
|
|
comes from Stanford University professors in 1965. They implemented a system
|
|
|
|
comes from Stanford University professors in 1965. They implemented a system
|
|
|
|
which evaluated code in Algol submitted on punch cards. In following years, many
|
|
|
|
which evaluated code in Algol submitted on punch cards. In following years, many
|
|
|
|
similar products were written.
|
|
|
|
similar products were written.
|
|
|
|
|
|
|
|
|
|
|
|
There are two basic ways of automatically evaluating code -- statically (check
|
|
|
|
In today's world, properties like correctness and efficiency can be tested
|
|
|
|
the code without running it; safe, but not very precise) or dynamically (run the
|
|
|
|
automatically to a large extent. This fact should be exploited to help teachers
|
|
|
|
code on testing inputs with checking the outputs against reference ones; needs
|
|
|
|
save time for tasks such as examining bad design, bad coding habits and logical
|
|
|
|
sandboxing, but provides good real world experience).
|
|
|
|
mistakes, which are difficult to perform automatically.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
There are two basic ways of automatically evaluating code -- statically
|
|
|
|
|
|
|
|
(checking the sourcecode without running it; safe, but not very precise) or
|
|
|
|
|
|
|
|
dynamically (running the code on test inputs and checking the correctness of
|
|
|
|
|
|
|
|
outputs ones; provides good real world experience, but requires extensive
|
|
|
|
|
|
|
|
security measures).
|
|
|
|
|
|
|
|
|
|
|
|
This project focuses on the machine-controlled part of source code evaluation.
|
|
|
|
This project focuses on the machine-controlled part of source code evaluation.
|
|
|
|
First, general concepts of grading systems are observed, new requirements are
|
|
|
|
First, general concepts of grading systems are observed, new requirements are
|
|
|
@ -84,8 +90,8 @@ specified and project with similar functionality are examined. Also, problems of
|
|
|
|
the software previously used at Charles University in Prague are briefly
|
|
|
|
the software previously used at Charles University in Prague are briefly
|
|
|
|
discussed. With acquired knowledge from such projects in production, we set up
|
|
|
|
discussed. With acquired knowledge from such projects in production, we set up
|
|
|
|
goals for the new evaluation system, designed the architecture and implemented a
|
|
|
|
goals for the new evaluation system, designed the architecture and implemented a
|
|
|
|
fully operational solution. The system is now ready for production testing at
|
|
|
|
fully operational solution based on dynamic evaluation. The system is now ready
|
|
|
|
the university.
|
|
|
|
for production testing at the university.
|
|
|
|
|
|
|
|
|
|
|
|
## Assignment
|
|
|
|
## Assignment
|
|
|
|
|
|
|
|
|
|
|
@ -95,13 +101,14 @@ Charles University in Prague. However, the application should be designed in a
|
|
|
|
modular fashion to be easily extended or even modified to make other ways of
|
|
|
|
modular fashion to be easily extended or even modified to make other ways of
|
|
|
|
usage possible.
|
|
|
|
usage possible.
|
|
|
|
|
|
|
|
|
|
|
|
The system should be capable of dynamic analysis of programming code. It means,
|
|
|
|
The system should be capable of dynamic analysis of submitted source codes. This
|
|
|
|
that following four basic steps have to be supported:
|
|
|
|
consists of following basic steps:
|
|
|
|
|
|
|
|
|
|
|
|
1. compile the code and check for compilation errors
|
|
|
|
1. compile the code and check for compilation errors
|
|
|
|
2. run compiled binary in a sandbox with predefined inputs
|
|
|
|
2. run compiled binary in a sandbox with predefined inputs
|
|
|
|
3. check constraints on used amount of memory and time
|
|
|
|
3. check constraints on used amount of memory and time
|
|
|
|
4. compare program outputs with predefined values
|
|
|
|
4. compare program outputs with predefined values
|
|
|
|
|
|
|
|
5. award the code with a numeric score
|
|
|
|
|
|
|
|
|
|
|
|
The project has a great starting point -- there is an old grading system
|
|
|
|
The project has a great starting point -- there is an old grading system
|
|
|
|
currently used at the university (CodEx), so its flaws and weaknesses can be
|
|
|
|
currently used at the university (CodEx), so its flaws and weaknesses can be
|
|
|
@ -111,14 +118,14 @@ they are willing to consult ideas or problems during development with us.
|
|
|
|
### Intended usage
|
|
|
|
### Intended usage
|
|
|
|
|
|
|
|
|
|
|
|
The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
To achieve this, it is crucial to keep in mind typical usage scenarios of the
|
|
|
|
To achieve this, it is crucial to keep in mind the typical usage scenarios of
|
|
|
|
system and try to make these tasks as simple as possible.
|
|
|
|
the system and to try to make these tasks as simple as possible.
|
|
|
|
|
|
|
|
|
|
|
|
The system has a database of users. Each user has assigned a role, which
|
|
|
|
The system has a database of users. Each user is assigned a role, which
|
|
|
|
corresponds to his/her privileges. There are user groups reflecting structure of
|
|
|
|
corresponds to his/her privileges. There are user groups reflecting the
|
|
|
|
lectured courses. Groups can be hierarchically ordered to reflect additional
|
|
|
|
structure of lectured courses. Groups can be hierarchically ordered to reflect
|
|
|
|
metadata such as the academic year. For example, a reasonable group hierarchy
|
|
|
|
additional metadata such as the academic year. For example, a reasonable group
|
|
|
|
can look like this:
|
|
|
|
hierarchy could look like this:
|
|
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
```
|
|
|
|
Summer term 2016
|
|
|
|
Summer term 2016
|
|
|
@ -130,22 +137,22 @@ Summer term 2016
|
|
|
|
...
|
|
|
|
...
|
|
|
|
```
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
In this example, students are members of the leaf groups, the higher level
|
|
|
|
In this example, students are members of the leaf groups and the higher level
|
|
|
|
entities are just for keeping the related groups together. The hierarchy
|
|
|
|
nodes are just for keeping related groups together. The structure can be
|
|
|
|
structure can be modified and altered to fit specific needs of the university or
|
|
|
|
modified and altered to fit specific needs of the university or any other
|
|
|
|
any other organization, even the flat structure (i.e., no hierarchy) is
|
|
|
|
organization, even a flat structure is possible. One user can be a member of
|
|
|
|
possible. One user can be part of multiple groups and on the other hand one
|
|
|
|
multiple groups and have a different role in each of them (a student can attend
|
|
|
|
group can have multiple users. Each user can have a specific role for every
|
|
|
|
labs for several courses while also teaching one).
|
|
|
|
group in which is a member, overriding his/her default role in this context.
|
|
|
|
|
|
|
|
|
|
|
|
A database of exercises (algorithmic problems) is another part of the project.
|
|
|
|
Database of exercises (algorithmic problems) is another part of the project.
|
|
|
|
Each exercise consists of a text describing the problem in multiple language
|
|
|
|
Each exercise consists of a text in multiple language variants, an evaluation
|
|
|
|
variants, an evaluation configuration (machine-readable instructions on how to
|
|
|
|
configuration and a set of inputs and reference outputs. Exercises are created
|
|
|
|
evaluate solutions to the exercise) and a set of inputs and reference outputs.
|
|
|
|
by instructed privileged users. Assigning an exercise to a group means to
|
|
|
|
Exercises are created by instructed privileged users. Assigning an exercise to a
|
|
|
|
choose one of the available exercises and specifying additional properties. An
|
|
|
|
group means choosing one of the available exercises and specifying additional
|
|
|
|
assignment has a deadline (optionally a second deadline), a maximum amount of
|
|
|
|
properties: a deadline (optionally a second deadline), a maximum amount of
|
|
|
|
points, a configuration for calculating the final score, a maximum number of
|
|
|
|
points, a configuration for calculating the score, a maximum number of
|
|
|
|
submissions, and a list of supported runtime environments (e.g., programming
|
|
|
|
submissions, and a list of supported runtime environments (e.g. programming
|
|
|
|
languages) including specific time and memory limits for each one.
|
|
|
|
languages) including specific time and memory limits for each one.
|
|
|
|
|
|
|
|
|
|
|
|
Typical use cases for supported user roles are illustrated on following UML
|
|
|
|
Typical use cases for supported user roles are illustrated on following UML
|
|
|
@ -161,32 +168,29 @@ is described in more detail below to give readers solid overview of what have to
|
|
|
|
happen during evaluation process.
|
|
|
|
happen during evaluation process.
|
|
|
|
|
|
|
|
|
|
|
|
First thing users have to do is to submit their solutions through some user
|
|
|
|
First thing users have to do is to submit their solutions through some user
|
|
|
|
interface. Then, the system checks assignment invariants (deadlines, count
|
|
|
|
interface. Then, the system checks assignment invariants (deadlines, count of
|
|
|
|
of submissions, ...) and stores submitted files. The runtime environment is
|
|
|
|
submissions, ...) and stores submitted files. The runtime environment is
|
|
|
|
automatically detected based on input files and suitable exercise configuration
|
|
|
|
automatically detected based on input files and a suitable evaluation
|
|
|
|
variant is chosen (one exercise can have multiple variants, for example C and
|
|
|
|
configuration variant is chosen (one exercise can have multiple variants, for
|
|
|
|
Java languages). Matching exercise configuration is then used for taking care of
|
|
|
|
example C and Java languages). This exercise configuration is then used for
|
|
|
|
evaluation process.
|
|
|
|
taking care of evaluation process.
|
|
|
|
|
|
|
|
|
|
|
|
There is a pool of worker computers dedicated to processing jobs. Some of them
|
|
|
|
There is a pool of worker computers dedicated to evaluation jobs. Each one of
|
|
|
|
may have different environment to allow testing programs in more conditions.
|
|
|
|
them can support different environments and programming languages to allow
|
|
|
|
Incoming jobs are scheduled to particular worker depending on its capabilities
|
|
|
|
testing programs for as many platforms as possible. Incoming jobs are scheduled
|
|
|
|
and job requirements.
|
|
|
|
to a worker that is capable of running the job.
|
|
|
|
|
|
|
|
|
|
|
|
Job processing itself starts with obtaining source files and job configuration.
|
|
|
|
The worker obtains the solution and its evaluation configuration, parses it and
|
|
|
|
The configuration is parsed into small tasks with simple piece of work.
|
|
|
|
starts executing the contained instructions. It is crucial to keep the worker
|
|
|
|
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
|
|
|
|
computer secure and stable, so a sandboxed environment is used for dealing with
|
|
|
|
executive computer secure and stable, so isolated sandboxed environment is used
|
|
|
|
unknown source code. When the execution is finished, results are saved and the
|
|
|
|
when dealing with unknown source code. When the execution is finished, results
|
|
|
|
submitter is notified.
|
|
|
|
are saved.
|
|
|
|
|
|
|
|
|
|
|
|
The output of the worker contains data about the evaluation, such as time and
|
|
|
|
Results from worker contains only output data from processed tasks (this could
|
|
|
|
memory spent on running the program for each test input and whether its output
|
|
|
|
be return value, consumed time, ...). On top of that, one value is calculated to
|
|
|
|
was correct. The system then calculates a numeric score from this data, which is
|
|
|
|
express overall quality of the tested job. It is used as points for final
|
|
|
|
presented to the student. If the solution is wrong (incorrect output, uses too
|
|
|
|
student grading. Calculation method of this value may be different for each
|
|
|
|
much memory,..), error messages are also displayed to the submitter.
|
|
|
|
assignment. Data presented back to users include overview of job parts (which
|
|
|
|
|
|
|
|
succeeded and which failed, optionally with reason like "memory limit exceeded")
|
|
|
|
|
|
|
|
and achieved score (amount of awarded points).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
|
|
|
|