introduction and intended usage polishing

master
Teyras 8 years ago
parent fbcfe96261
commit 543794bd49

@ -61,22 +61,28 @@ knowledge are more suitable for this practical type of learning than others, and
fortunately, programming is one of them.
University education system is one of the areas where this knowledge can be
applied. In computer programming, there are several requirements such as the
code being syntactically correct, efficient and easy to read, maintain and
extend. Correctness and efficiency can be tested automatically to help teachers
save time for their research, but reviewing bad design, bad coding habits and
logical mistakes is really hard to automate and requires manpower.
Checking programs written by students takes a lot of time and requires a lot of
mechanical, repetitive work. The first idea of an automatic evaluation system
applied. In computer programming, there are several requirements a program
should satify, such as the code being syntactically correct, efficient and easy
to read, maintain and extend.
Checking programs written by students takes time and requires a lot of
mechanical, repetitive work -- reviewing source codes, compiling them and
running them through testing scenarios. It is therefore desirable to automate as
much of this work as possible. The first idea of an automatic evaluation system
comes from Stanford University professors in 1965. They implemented a system
which evaluated code in Algol submitted on punch cards. In following years, many
similar products were written.
There are two basic ways of automatically evaluating code -- statically (check
the code without running it; safe, but not very precise) or dynamically (run the
code on testing inputs with checking the outputs against reference ones; needs
sandboxing, but provides good real world experience).
In today's world, properties like correctness and efficiency can be tested
automatically to a large extent. This fact should be exploited to help teachers
save time for tasks such as examining bad design, bad coding habits and logical
mistakes, which are difficult to perform automatically.
There are two basic ways of automatically evaluating code -- statically
(checking the sourcecode without running it; safe, but not very precise) or
dynamically (running the code on test inputs and checking the correctness of
outputs ones; provides good real world experience, but requires extensive
security measures).
This project focuses on the machine-controlled part of source code evaluation.
First, general concepts of grading systems are observed, new requirements are
@ -84,8 +90,8 @@ specified and project with similar functionality are examined. Also, problems of
the software previously used at Charles University in Prague are briefly
discussed. With acquired knowledge from such projects in production, we set up
goals for the new evaluation system, designed the architecture and implemented a
fully operational solution. The system is now ready for production testing at
the university.
fully operational solution based on dynamic evaluation. The system is now ready
for production testing at the university.
## Assignment
@ -95,13 +101,14 @@ Charles University in Prague. However, the application should be designed in a
modular fashion to be easily extended or even modified to make other ways of
usage possible.
The system should be capable of dynamic analysis of programming code. It means,
that following four basic steps have to be supported:
The system should be capable of dynamic analysis of submitted source codes. This
consists of following basic steps:
1. compile the code and check for compilation errors
2. run compiled binary in a sandbox with predefined inputs
3. check constraints on used amount of memory and time
4. compare program outputs with predefined values
5. award the code with a numeric score
The project has a great starting point -- there is an old grading system
currently used at the university (CodEx), so its flaws and weaknesses can be
@ -111,14 +118,14 @@ they are willing to consult ideas or problems during development with us.
### Intended usage
The whole system is intended to help both teachers (supervisors) and students.
To achieve this, it is crucial to keep in mind typical usage scenarios of the
system and try to make these tasks as simple as possible.
To achieve this, it is crucial to keep in mind the typical usage scenarios of
the system and to try to make these tasks as simple as possible.
The system has a database of users. Each user has assigned a role, which
corresponds to his/her privileges. There are user groups reflecting structure of
lectured courses. Groups can be hierarchically ordered to reflect additional
metadata such as the academic year. For example, a reasonable group hierarchy
can look like this:
The system has a database of users. Each user is assigned a role, which
corresponds to his/her privileges. There are user groups reflecting the
structure of lectured courses. Groups can be hierarchically ordered to reflect
additional metadata such as the academic year. For example, a reasonable group
hierarchy could look like this:
```
Summer term 2016
@ -130,22 +137,22 @@ Summer term 2016
...
```
In this example, students are members of the leaf groups, the higher level
entities are just for keeping the related groups together. The hierarchy
structure can be modified and altered to fit specific needs of the university or
any other organization, even the flat structure (i.e., no hierarchy) is
possible. One user can be part of multiple groups and on the other hand one
group can have multiple users. Each user can have a specific role for every
group in which is a member, overriding his/her default role in this context.
Database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of a text in multiple language variants, an evaluation
configuration and a set of inputs and reference outputs. Exercises are created
by instructed privileged users. Assigning an exercise to a group means to
choose one of the available exercises and specifying additional properties. An
assignment has a deadline (optionally a second deadline), a maximum amount of
points, a configuration for calculating the final score, a maximum number of
submissions, and a list of supported runtime environments (e.g., programming
In this example, students are members of the leaf groups and the higher level
nodes are just for keeping related groups together. The structure can be
modified and altered to fit specific needs of the university or any other
organization, even a flat structure is possible. One user can be a member of
multiple groups and have a different role in each of them (a student can attend
labs for several courses while also teaching one).
A database of exercises (algorithmic problems) is another part of the project.
Each exercise consists of a text describing the problem in multiple language
variants, an evaluation configuration (machine-readable instructions on how to
evaluate solutions to the exercise) and a set of inputs and reference outputs.
Exercises are created by instructed privileged users. Assigning an exercise to a
group means choosing one of the available exercises and specifying additional
properties: a deadline (optionally a second deadline), a maximum amount of
points, a configuration for calculating the score, a maximum number of
submissions, and a list of supported runtime environments (e.g. programming
languages) including specific time and memory limits for each one.
Typical use cases for supported user roles are illustrated on following UML
@ -161,32 +168,29 @@ is described in more detail below to give readers solid overview of what have to
happen during evaluation process.
First thing users have to do is to submit their solutions through some user
interface. Then, the system checks assignment invariants (deadlines, count
of submissions, ...) and stores submitted files. The runtime environment is
automatically detected based on input files and suitable exercise configuration
variant is chosen (one exercise can have multiple variants, for example C and
Java languages). Matching exercise configuration is then used for taking care of
evaluation process.
There is a pool of worker computers dedicated to processing jobs. Some of them
may have different environment to allow testing programs in more conditions.
Incoming jobs are scheduled to particular worker depending on its capabilities
and job requirements.
Job processing itself starts with obtaining source files and job configuration.
The configuration is parsed into small tasks with simple piece of work.
Evaluation itself goes in direction of tasks ordering. It is crucial to keep
executive computer secure and stable, so isolated sandboxed environment is used
when dealing with unknown source code. When the execution is finished, results
are saved.
Results from worker contains only output data from processed tasks (this could
be return value, consumed time, ...). On top of that, one value is calculated to
express overall quality of the tested job. It is used as points for final
student grading. Calculation method of this value may be different for each
assignment. Data presented back to users include overview of job parts (which
succeeded and which failed, optionally with reason like "memory limit exceeded")
and achieved score (amount of awarded points).
interface. Then, the system checks assignment invariants (deadlines, count of
submissions, ...) and stores submitted files. The runtime environment is
automatically detected based on input files and a suitable evaluation
configuration variant is chosen (one exercise can have multiple variants, for
example C and Java languages). This exercise configuration is then used for
taking care of evaluation process.
There is a pool of worker computers dedicated to evaluation jobs. Each one of
them can support different environments and programming languages to allow
testing programs for as many platforms as possible. Incoming jobs are scheduled
to a worker that is capable of running the job.
The worker obtains the solution and its evaluation configuration, parses it and
starts executing the contained instructions. It is crucial to keep the worker
computer secure and stable, so a sandboxed environment is used for dealing with
unknown source code. When the execution is finished, results are saved and the
submitter is notified.
The output of the worker contains data about the evaluation, such as time and
memory spent on running the program for each test input and whether its output
was correct. The system then calculates a numeric score from this data, which is
presented to the student. If the solution is wrong (incorrect output, uses too
much memory,..), error messages are also displayed to the submitter.
## Requirements

Loading…
Cancel
Save