|
|
|
@ -85,10 +85,10 @@ outputs ones; provides good real world experience, but requires extensive
|
|
|
|
|
security measures).
|
|
|
|
|
|
|
|
|
|
This project focuses on the machine-controlled part of source code evaluation.
|
|
|
|
|
First, general concepts of grading systems are observed, new requirements are
|
|
|
|
|
specified and project with similar functionality are examined. Also, problems of
|
|
|
|
|
the software previously used at Charles University in Prague are briefly
|
|
|
|
|
discussed. With acquired knowledge from such projects in production, we set up
|
|
|
|
|
First, general concepts of grading systems are observed and problems of the
|
|
|
|
|
software previously used at Charles University in Prague are briefly discussed.
|
|
|
|
|
Then new requirements are specified and projects with similar functionality are
|
|
|
|
|
examined. With acquired knowledge from such projects in production, we set up
|
|
|
|
|
goals for the new evaluation system, designed the architecture and implemented a
|
|
|
|
|
fully operational solution based on dynamic evaluation. The system is now ready
|
|
|
|
|
for production testing at the university.
|
|
|
|
@ -110,7 +110,10 @@ consists of following basic steps:
|
|
|
|
|
4. compare program outputs with predefined values
|
|
|
|
|
5. award the code with a numeric score
|
|
|
|
|
|
|
|
|
|
The project has a great starting point -- there is an old grading system
|
|
|
|
|
The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
|
To achieve this, it is crucial to keep in mind the typical usage scenarios of
|
|
|
|
|
the system and to try to make these tasks as simple as possible. To fulfil this
|
|
|
|
|
task, the project has a great starting point -- there is an old grading system
|
|
|
|
|
currently used at the university (CodEx), so its flaws and weaknesses can be
|
|
|
|
|
addressed. Furthermore, many teachers desire to use and test the new system and
|
|
|
|
|
they are willing to consult ideas or problems during development with us.
|
|
|
|
@ -131,10 +134,6 @@ window to submit their solutions. Each solution is compiled and run in sandbox
|
|
|
|
|
and memory limits. It supports programs written in C, C++, C#, Java, Pascal,
|
|
|
|
|
Python and Haskell.
|
|
|
|
|
|
|
|
|
|
The whole system is intended to help both teachers (supervisors) and students.
|
|
|
|
|
To achieve this, it is crucial to keep in mind the typical usage scenarios of
|
|
|
|
|
the system and to try to make these tasks as simple as possible.
|
|
|
|
|
|
|
|
|
|
The system has a database of users. Each user is assigned a role, which
|
|
|
|
|
corresponds to his/her privileges. There are user groups reflecting the
|
|
|
|
|
structure of lectured courses.
|
|
|
|
@ -155,18 +154,58 @@ Typical use cases for supported user roles are following:
|
|
|
|
|
- **student**
|
|
|
|
|
- join a group
|
|
|
|
|
- get assignments in group
|
|
|
|
|
- submit solution to assignment
|
|
|
|
|
- view solution results
|
|
|
|
|
- submit solution to assignment -- upload one source file and trigger
|
|
|
|
|
evaluation process
|
|
|
|
|
- view solution results -- which parts succeeded and failed, total number of
|
|
|
|
|
acquired points, bonus points
|
|
|
|
|
- **supervisor**
|
|
|
|
|
- create exercise
|
|
|
|
|
- assign exercise to group, modify assignment
|
|
|
|
|
- create exercise -- create description text and evaluation configuration
|
|
|
|
|
(for each programming environment), upload testing inputs and outputs
|
|
|
|
|
- assign exercise to group -- choose exercise and set deadlines, number of
|
|
|
|
|
allowed submissions, weights of all testing cases and amount of points for
|
|
|
|
|
correct solutions
|
|
|
|
|
- modify assignment
|
|
|
|
|
- view all results in group
|
|
|
|
|
- alter automatic solution grading
|
|
|
|
|
- check automatic solution grading -- view submitted source and optionally
|
|
|
|
|
set bonus points
|
|
|
|
|
- **administrator**
|
|
|
|
|
- create groups
|
|
|
|
|
- alter user privileges
|
|
|
|
|
- alter user privileges -- make supervisor accounts
|
|
|
|
|
- check system logs, upgrades and other management
|
|
|
|
|
|
|
|
|
|
### Exercise evaluation chain
|
|
|
|
|
|
|
|
|
|
The most important part of the system is evaluation of solutions submitted by
|
|
|
|
|
students. Concepts of consecutive steps from source code to final results
|
|
|
|
|
is described in more detail below to give readers solid overview of what have to
|
|
|
|
|
happen during evaluation process.
|
|
|
|
|
|
|
|
|
|
First thing users have to do is to submit their solutions through web user
|
|
|
|
|
interface. The system checks assignment invariants (deadlines, count of
|
|
|
|
|
submissions, ...) and stores submitted file. The runtime environment is
|
|
|
|
|
automatically detected based on input file and a suitable evaluation
|
|
|
|
|
configuration variant is chosen (one exercise can have multiple variants, for
|
|
|
|
|
example C and Java languages). This exercise configuration is then used for
|
|
|
|
|
taking care of evaluation process.
|
|
|
|
|
|
|
|
|
|
There is a pool of uniform worker engines dedicated to evaluation jobs. Incoming
|
|
|
|
|
jobs are kept in a queue until a free worker picks them. Worker is capable of
|
|
|
|
|
sequential evaluation of jobs, one at a time.
|
|
|
|
|
|
|
|
|
|
The worker obtains the solution and its evaluation configuration, parses it and
|
|
|
|
|
starts executing the contained instructions. It is crucial to keep the worker
|
|
|
|
|
computer secure and stable, so a sandboxed environment is used for dealing with
|
|
|
|
|
unknown source code. When the execution is finished, results are saved and the
|
|
|
|
|
submitter is notified.
|
|
|
|
|
|
|
|
|
|
The output of the worker contains data about the evaluation, such as time and
|
|
|
|
|
memory spent on running the program for each test input and whether its output
|
|
|
|
|
was correct. The system then calculates a numeric score from this data, which is
|
|
|
|
|
presented to the student. If the solution is wrong (incorrect output, uses too
|
|
|
|
|
much memory,..), error messages are also displayed to the submitter.
|
|
|
|
|
|
|
|
|
|
### Weaknesses
|
|
|
|
|
|
|
|
|
|
Current system is old, but robust. There were no major security incidents
|
|
|
|
|
during its production usage. However, from today's perspective there are
|
|
|
|
|
several drawbacks. The main ones are:
|
|
|
|
@ -193,37 +232,6 @@ several drawbacks. The main ones are:
|
|
|
|
|
which have a more difficult evaluation chain than simple
|
|
|
|
|
compilation/execution/evaluation provided by CodEx.
|
|
|
|
|
|
|
|
|
|
### Exercise evaluation chain
|
|
|
|
|
|
|
|
|
|
The most important part of the system is evaluation of solutions submitted by
|
|
|
|
|
students. Concepts of consecutive steps from source code to final results
|
|
|
|
|
is described in more detail below to give readers solid overview of what have to
|
|
|
|
|
happen during evaluation process.
|
|
|
|
|
|
|
|
|
|
First thing users have to do is to submit their solutions through some user
|
|
|
|
|
interface. Then, the system checks assignment invariants (deadlines, count of
|
|
|
|
|
submissions, ...) and stores submitted files. The runtime environment is
|
|
|
|
|
automatically detected based on input files and a suitable evaluation
|
|
|
|
|
configuration variant is chosen (one exercise can have multiple variants, for
|
|
|
|
|
example C and Java languages). This exercise configuration is then used for
|
|
|
|
|
taking care of evaluation process.
|
|
|
|
|
|
|
|
|
|
There is a pool of worker computers dedicated to evaluation jobs. Each one of
|
|
|
|
|
them can support different environments and programming languages to allow
|
|
|
|
|
testing programs for as many platforms as possible. Incoming jobs are scheduled
|
|
|
|
|
to a worker that is capable of running the job.
|
|
|
|
|
|
|
|
|
|
The worker obtains the solution and its evaluation configuration, parses it and
|
|
|
|
|
starts executing the contained instructions. It is crucial to keep the worker
|
|
|
|
|
computer secure and stable, so a sandboxed environment is used for dealing with
|
|
|
|
|
unknown source code. When the execution is finished, results are saved and the
|
|
|
|
|
submitter is notified.
|
|
|
|
|
|
|
|
|
|
The output of the worker contains data about the evaluation, such as time and
|
|
|
|
|
memory spent on running the program for each test input and whether its output
|
|
|
|
|
was correct. The system then calculates a numeric score from this data, which is
|
|
|
|
|
presented to the student. If the solution is wrong (incorrect output, uses too
|
|
|
|
|
much memory,..), error messages are also displayed to the submitter.
|
|
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
|
@ -292,9 +300,9 @@ addons (mostly administrative features).
|
|
|
|
|
another tool and perform additional tests
|
|
|
|
|
- use of modern technologies with state-of-the-art compilers
|
|
|
|
|
|
|
|
|
|
### Nonfunctional requirements
|
|
|
|
|
### Non-functional requirements
|
|
|
|
|
|
|
|
|
|
Nonfunctional requirements are requirements of technical character with no
|
|
|
|
|
Non-functional requirements are requirements of technical character with no
|
|
|
|
|
direct mapping to visible parts of system. In ideal word, users should not know
|
|
|
|
|
about these if they work properly, but would be at least annoyed if these
|
|
|
|
|
requirements were not met. Most notably they are these ones:
|
|
|
|
@ -317,14 +325,14 @@ extendable, so everyone can develop their own feature. This also means that
|
|
|
|
|
widely used programming languages and techniques should be used, so users can
|
|
|
|
|
quickly understand the code and make changes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Related work
|
|
|
|
|
|
|
|
|
|
To find out the current state in the field of automatic grading systems we did a
|
|
|
|
|
short market survey on the field of automatic grading systems at universities,
|
|
|
|
|
programming contests, and possibly other places where similar tools are
|
|
|
|
|
available.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
## Related work
|
|
|
|
|
|
|
|
|
|
This is not a complete list of available evaluators, but only a few projects
|
|
|
|
|
which are used these days and can be an inspiration for our project. Each
|
|
|
|
|
project from the list has a brief description and some key features mentioned.
|
|
|
|
|