|
|
|
@ -30,7 +30,7 @@ mistakes, which are difficult to perform automatically.
|
|
|
|
|
There are two basic ways of automatically evaluating code:
|
|
|
|
|
|
|
|
|
|
- **statically** -- by checking the source code without running it.
|
|
|
|
|
This is safe, but not very precical.
|
|
|
|
|
This is safe, but not very practical.
|
|
|
|
|
- **dynamically** -- by running the code on test inputs and checking the correctness of
|
|
|
|
|
outputs ones. This provides good real world experience, but requires extensive
|
|
|
|
|
security measures).
|
|
|
|
@ -127,86 +127,86 @@ The typical use cases for the user roles are the following:
|
|
|
|
|
|
|
|
|
|
### Exercise Evaluation Chain
|
|
|
|
|
|
|
|
|
|
The most important part of the system is evaluation of solutions submitted by
|
|
|
|
|
students. The process leading from source code to final results (score) is
|
|
|
|
|
described in more detail below to give readers a solid overview of what happens
|
|
|
|
|
The most important part of the system is the evaluation of solutions submitted by
|
|
|
|
|
the students. The process from the source code to final results (score) is
|
|
|
|
|
described in more detail below to give readers a solid overview of what is happening
|
|
|
|
|
during the evaluation process.
|
|
|
|
|
|
|
|
|
|
First thing students have to do is to submit their solutions through web user
|
|
|
|
|
interface. The system checks assignment invariants (deadlines, count of
|
|
|
|
|
submissions, ...) and stores the submitted code. The runtime environment is
|
|
|
|
|
automatically detected based on input file extension and a suitable evaluation
|
|
|
|
|
configuration variant is chosen (one exercise can have multiple variants, for
|
|
|
|
|
example C and Java languages). This exercise configuration is then used for
|
|
|
|
|
taking care of evaluation process.
|
|
|
|
|
The first thing students have to do is to submit their solutions through the web user
|
|
|
|
|
interface. The system checks assignment invariants (e.g., deadlines, number of
|
|
|
|
|
submissions) and stores the submitted code. The runtime environment is
|
|
|
|
|
automatically detected based on the extension of the input file, and a suitable evaluation
|
|
|
|
|
configuration type is chosen (one exercise can have multiple variants, for
|
|
|
|
|
example C and Java is allowed). This exercise configuration is then used for
|
|
|
|
|
the evaluation process.
|
|
|
|
|
|
|
|
|
|
There is a pool of uniform worker machines dedicated to evaluation jobs.
|
|
|
|
|
Incoming jobs are kept in a queue until a free worker picks them. Workers are
|
|
|
|
|
capable of sequential evaluation of jobs, one at a time.
|
|
|
|
|
capable of a sequential evaluation of jobs, one at a time.
|
|
|
|
|
|
|
|
|
|
The worker obtains the solution and its evaluation configuration, parses it and
|
|
|
|
|
starts executing the contained instructions. Each job should have more testing
|
|
|
|
|
cases, which examine wrong inputs, corner values and data of different sizes to
|
|
|
|
|
guess the program complexity. It is crucial to keep the worker computer secure
|
|
|
|
|
and stable, so a sandboxed environment is used for dealing with unknown source
|
|
|
|
|
code. When the execution is finished, results are saved and the submitter is
|
|
|
|
|
notified.
|
|
|
|
|
starts executing the instructions contained. Each job should have more test
|
|
|
|
|
cases which examine invalid inputs, corner cases and data of different sizes to
|
|
|
|
|
estimate the program complexity. It is crucial to keep the computer running the worker
|
|
|
|
|
secure and stable, so a sandboxed environment is used for dealing with an
|
|
|
|
|
unknown source code. When the execution is finished, results are saved, and the
|
|
|
|
|
student is notified.
|
|
|
|
|
|
|
|
|
|
The output of the worker contains data about the evaluation, such as time and
|
|
|
|
|
memory spent on running the program for each test input and whether its output
|
|
|
|
|
was correct. The system then calculates a numeric score from this data, which is
|
|
|
|
|
presented to the student. If the solution is wrong (incorrect output, uses too
|
|
|
|
|
much memory,..), error messages are also displayed to the submitter.
|
|
|
|
|
is correct. The system then calculates a numeric score from the data which is
|
|
|
|
|
presented to the student. If the solution is incorrect (e.g., incorrect output,
|
|
|
|
|
exceeds memory or time limits), error messages are also displayed to the student.
|
|
|
|
|
|
|
|
|
|
### Possible Improvements
|
|
|
|
|
|
|
|
|
|
Current system is old, but robust. There were no major security incidents
|
|
|
|
|
during its production usage. However, from the perspective of today there are
|
|
|
|
|
several drawbacks. The main ones are:
|
|
|
|
|
The current system is old, but robust. There were no major security incidents
|
|
|
|
|
in the course of its usage. However, from the present day perspective there are
|
|
|
|
|
several major drawbacks:
|
|
|
|
|
|
|
|
|
|
- **web interface** -- The web interface is simple and fully functional.
|
|
|
|
|
However, recent rapid development in web technologies opens new horizons of
|
|
|
|
|
how web interfaces can be made.
|
|
|
|
|
- **web API** -- CodEx offers a very limited XML API based on outdated
|
|
|
|
|
technologies that is not sufficient for users who would like to create custom
|
|
|
|
|
interfaces such as a command line tool or mobile application.
|
|
|
|
|
- **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system
|
|
|
|
|
calls and blocking the bad ones. This can be easily done for single-threaded
|
|
|
|
|
applications, but proves difficult with multi-threaded ones. In present day,
|
|
|
|
|
parallelism is a very important area of computing, so there is requirement to
|
|
|
|
|
test multi-threaded applications as well.
|
|
|
|
|
- **instances** -- Different ways of CodEx usage scenarios requires separate
|
|
|
|
|
installations (Programming I and II, Java, C#, etc.). This configuration is
|
|
|
|
|
not user friendly (students have to register in each installation separately)
|
|
|
|
|
and burdens administrators with unnecessary work. CodEx architecture does not
|
|
|
|
|
allow sharing workers between installations, which results in an inefficient
|
|
|
|
|
However, the recent rapid development in web technologies provides us with new
|
|
|
|
|
possibilities of making web interfaces.
|
|
|
|
|
- **public API** -- CodEx offers a very limited public XML API based on outdated
|
|
|
|
|
technologies that are not sufficient for users who would like to create their
|
|
|
|
|
custom interfaces such as a command line tool or a mobile application.
|
|
|
|
|
- **sandboxing** -- the MO-Eval sandbox is based on the principle of monitoring
|
|
|
|
|
system calls and blocking the forbidden ones. This can be sufficient with
|
|
|
|
|
single-threaded programs, but proves to be difficult with multi-threaded ones.
|
|
|
|
|
Nowadays, parallelism is a very important area of computing, it is required that
|
|
|
|
|
multi-threaded programs can be securely tested as well.
|
|
|
|
|
- **instances** -- Different ways of CodEx use require separate
|
|
|
|
|
installations (e.g., Programming I and II, Java, C#). This configuration is
|
|
|
|
|
not user friendly as students have to register in each installation separately
|
|
|
|
|
and burdens administrators with unnecessary work. The CodEx architecture does not
|
|
|
|
|
allow sharing workers between installations which results in an inefficient
|
|
|
|
|
use of hardware for evaluation.
|
|
|
|
|
- **task extensibility** -- There is a need to test and evaluate complicated
|
|
|
|
|
programs for classes such as Parallel programming or Compiler principles,
|
|
|
|
|
programs for courses such as *Parallel programming* or *Compiler principles*,
|
|
|
|
|
which have a more difficult evaluation chain than simple
|
|
|
|
|
compilation/execution/evaluation provided by CodEx.
|
|
|
|
|
*compilation/execution/evaluation* provided by CodEx.
|
|
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
|
|
There are many different formal requirements for the system. Some of them
|
|
|
|
|
are necessary for any system for source code evaluation, some of them are
|
|
|
|
|
specific for university deployment and some of them arose during the ten year
|
|
|
|
|
long lifetime of the old system. There are not many ways to improve CodEx
|
|
|
|
|
long lifetime of the old system. There are not many ways of improving CodEx
|
|
|
|
|
experience from the perspective of a student, but a lot of feature requests come
|
|
|
|
|
from administrators and supervisors. The ideas were gathered mostly from our
|
|
|
|
|
personal experience with the system and from meetings with faculty staff
|
|
|
|
|
involved with the current system.
|
|
|
|
|
personal experience with the system and from meetings with the faculty staff
|
|
|
|
|
who use the current system.
|
|
|
|
|
|
|
|
|
|
In general, CodEx features should be preserved, so only differences are
|
|
|
|
|
presented here. For clear arrangement all the requirements and wishes are
|
|
|
|
|
presented grouped by categories.
|
|
|
|
|
In general, CodEx features should be preserved, so only the differences are
|
|
|
|
|
presented here. For clear arrangement, all the requirements and wishes are
|
|
|
|
|
presented in groups by the user categories.
|
|
|
|
|
|
|
|
|
|
### Requirements of The Users
|
|
|
|
|
|
|
|
|
|
- _group hierarchy_ -- creating an arbitrarily nested tree structure should be
|
|
|
|
|
supported to allow keeping related groups together, such as in the example
|
|
|
|
|
supported to keep related groups together, such as in the example
|
|
|
|
|
below. CodEx supported only a flat group structure. A group hierarchy also
|
|
|
|
|
allows archiving data from past courses.
|
|
|
|
|
allows to archive data from the past courses.
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Summer term 2016
|
|
|
|
@ -218,26 +218,28 @@ presented grouped by categories.
|
|
|
|
|
...
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
- _a database of exercises_ -- teachers should be able to filter viewed
|
|
|
|
|
exercises according to several criteria, for example supported runtime
|
|
|
|
|
environment or author. It should also be possible to link exercises to a group
|
|
|
|
|
so that groups supervisors do not have to browse hundreds of exercises when
|
|
|
|
|
their group only uses five of them
|
|
|
|
|
- _advanced exercises_ -- the system should support more advanced evaluation
|
|
|
|
|
pipeline than basic compilation/execution/evaluation which is in CodEx
|
|
|
|
|
- _a database of exercises_ -- teachers should be able to filter the displayed
|
|
|
|
|
exercises according to several criteria, for example by the supported runtime
|
|
|
|
|
environments or by the author. It should also be possible to link exercises to a group
|
|
|
|
|
so that group supervisors do not have to browse hundreds of exercises when
|
|
|
|
|
their group only uses a few of them
|
|
|
|
|
- _advanced exercises_ -- the system should support a more advanced evaluation
|
|
|
|
|
pipeline than basic *compilation/execution/evaluation* which is in CodEx
|
|
|
|
|
- _customizable grading system_ -- teachers need to specify the way of
|
|
|
|
|
computation of the final score, which will be awarded to the submissions of
|
|
|
|
|
the student depending on their quality
|
|
|
|
|
- _marking a solution as accepted_ -- the system should allow marking one
|
|
|
|
|
particular solution as accepted (used for grading the assignment) by the
|
|
|
|
|
supervisor
|
|
|
|
|
- _solution resubmission_ -- teachers should be able edit the solutions of the
|
|
|
|
|
calculating the final score which will be allocated to the submissions
|
|
|
|
|
depending on their correctness and quality
|
|
|
|
|
- _marking a solution as accepted_ -- a supervisor should be able to choose
|
|
|
|
|
one of the submitted solutions of a student as accepted. The score of this
|
|
|
|
|
particular solution will be used as the score which the student receives
|
|
|
|
|
for the given assignment instead of the one with the highest score.
|
|
|
|
|
- _solution resubmission_ -- teachers should be able to edit the solutions of the
|
|
|
|
|
student and privately resubmit them, optionally saving all results (including
|
|
|
|
|
temporary ones); this feature can be used to quickly fix obvious errors in the
|
|
|
|
|
solution and see if it is otherwise viable
|
|
|
|
|
- _localization_ -- all texts (UI and exercises) should be translatable
|
|
|
|
|
- _formatted exercise texts_ -- Markdown or another lightweight markup language
|
|
|
|
|
should be supported for formatting exercise texts
|
|
|
|
|
solution and see if it is otherwise correct
|
|
|
|
|
- _localization_ -- all texts (the UI and the assignments of the exercises) should
|
|
|
|
|
be translatable into several languages
|
|
|
|
|
- _formatted texts of assignments_ -- Markdown or another lightweight markup language
|
|
|
|
|
should be supported for the formatting of the texts of the exercises
|
|
|
|
|
- _comments_ -- adding both private and public comments to exercises, tests and
|
|
|
|
|
solutions should be supported
|
|
|
|
|
- _plagiarism detection_
|
|
|
|
|