@ -667,7 +667,7 @@ There are lot of things which deserves discussion concerning results of
evaluation, how they should be displayed, what should be visible or not and also
evaluation, how they should be displayed, what should be visible or not and also
some what kind of reward for users solutions should be chosen.
some what kind of reward for users solutions should be chosen.
At first lets focus on all kinds of outputs from executed programs within job.
At first let us focus on all kinds of outputs from executed programs within job.
Out of discussion is that supervisors should be able to view almost all outputs
Out of discussion is that supervisors should be able to view almost all outputs
from solutions if they choose them to be visible and recorded. This feature is
from solutions if they choose them to be visible and recorded. This feature is
critical in debugging either whole exercises or users solutions. But should it
critical in debugging either whole exercises or users solutions. But should it
@ -683,19 +683,53 @@ interesting. Simple answer is of course that they should not see anything which
is partly true. Outputs from their programs can be anything and users can
is partly true. Outputs from their programs can be anything and users can
somehow analyse inputs or even redirect them to output. So outputs from
somehow analyse inputs or even redirect them to output. So outputs from
execution should not be visible at all or under very special circumstances. But
execution should not be visible at all or under very special circumstances. But
what about compilation or other kinds of initiations. Well, this is another
that is not so straighforward for compilation or other kinds of initiation.
story, it really depends on the particular case. But generally it is quite
Well, this is another story, it really depends on the particular case. But
harmless to display user some kind of compilation error which can help a lot
generally it is quite harmless to display user some kind of compilation error
during troubleshooting. Of course again this kind of functionality should be
which can help a lot during troubleshooting. Of course again this kind of
configurable by supervisors and disabled by default. There is also the last kind
functionality should be configurable by supervisors and disabled by default.
of tasks which can output some information which is evaluation tasks. Output of
There is also the last kind of tasks which can output some information which is
these tasks is somehow important to whole system and again can contain some
evaluation tasks. Output of these tasks is somehow important to whole system and
information about inputs or reference outputs. So outputs of evaluation tasks
again can contain some information about inputs or reference outputs. So outputs
should not also be visible to regular users.
of evaluation tasks should not be visible to regular users too.
@todo: discuss points assigned to solution, why are there bonus points, explain minimal point threshold
The overall concept of grading solutions was presented earlier. To briefly
remind that, backend returns only exact measured values (used time and memory,
@todo: discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why
return code of the judging task, ...) and on top of that one value is computed.
The way of this computation can be very different across supervisors, so it has
to be easily extendable. The best way is to provide interface, which can be
implemented and any sort of magic can return the final value.
We found out several computational possibilities. There is basic arithmetic,
weighted arithmetic, geometric and harmonic mean of results of each test (the
result is boolean succeeded/failed, optionaly has weight), some kind of
interpolation of used amount of time for each test, the same with used memory
amount and surely many others. To keep the project simple, we decided to design
apropriate interface and implement only weighted arithmetic mean computation,
which is used in about 90% of all assignments. Of course, diferent scheme can be
chosen for every assignment and also configured -- for example test weights can
be specified for implemented weighted arithmetic mean. Advanced ways of
computation can be implemented on demand when is a real demand for them.
To avoid assigning points for insufficient solutions (like only printing "File
error" which is the valid answer in two tests), a minimal point threshold can be
specified. It he solution is to get less points than specified, it will get zero
points instead. This functionality can be empedded into grading computation
algoritm itself, but it would have to be present in each implementation
separately, which is a bit ugly. So, this feature is separated from point
computation.
Automatic grading cannot reflect all aspects of submitted code. For example,
structuring the code, number and quality of comments and so on. To allow
supervisors bring these manually checked things into grading, there is a concept
of bonus points. They can be positive or negative. Generaly the solution with
the most assigned points is marked for grading that particular solution.
However, if supervisor is not satisfied with student solution (really bad code,
cheating, ...) he/she assigns the student negative bonus points. But to prevent
chosing another solution with more points by the system or even submitting the
same code again which is worth more points by students, supervisor can mark a
particular solution as marked and used for grading instead of solution with the