diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 9e809cd..70ed342 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -85,10 +85,10 @@ outputs ones; provides good real world experience, but requires extensive security measures). This project focuses on the machine-controlled part of source code evaluation. -First, general concepts of grading systems are observed, new requirements are -specified and project with similar functionality are examined. Also, problems of -the software previously used at Charles University in Prague are briefly -discussed. With acquired knowledge from such projects in production, we set up +First, general concepts of grading systems are observed and problems of the +software previously used at Charles University in Prague are briefly discussed. +Then new requirements are specified and projects with similar functionality are +examined. With acquired knowledge from such projects in production, we set up goals for the new evaluation system, designed the architecture and implemented a fully operational solution based on dynamic evaluation. The system is now ready for production testing at the university. @@ -110,7 +110,10 @@ consists of following basic steps: 4. compare program outputs with predefined values 5. award the code with a numeric score -The project has a great starting point -- there is an old grading system +The whole system is intended to help both teachers (supervisors) and students. +To achieve this, it is crucial to keep in mind the typical usage scenarios of +the system and to try to make these tasks as simple as possible. To fulfil this +task, the project has a great starting point -- there is an old grading system currently used at the university (CodEx), so its flaws and weaknesses can be addressed. Furthermore, many teachers desire to use and test the new system and they are willing to consult ideas or problems during development with us. @@ -131,10 +134,6 @@ window to submit their solutions. Each solution is compiled and run in sandbox and memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python and Haskell. -The whole system is intended to help both teachers (supervisors) and students. -To achieve this, it is crucial to keep in mind the typical usage scenarios of -the system and to try to make these tasks as simple as possible. - The system has a database of users. Each user is assigned a role, which corresponds to his/her privileges. There are user groups reflecting the structure of lectured courses. @@ -155,18 +154,58 @@ Typical use cases for supported user roles are following: - **student** - join a group - get assignments in group - - submit solution to assignment - - view solution results + - submit solution to assignment -- upload one source file and trigger + evaluation process + - view solution results -- which parts succeeded and failed, total number of + acquired points, bonus points - **supervisor** - - create exercise - - assign exercise to group, modify assignment + - create exercise -- create description text and evaluation configuration + (for each programming environment), upload testing inputs and outputs + - assign exercise to group -- choose exercise and set deadlines, number of + allowed submissions, weights of all testing cases and amount of points for + correct solutions + - modify assignment - view all results in group - - alter automatic solution grading + - check automatic solution grading -- view submitted source and optionally + set bonus points - **administrator** - create groups - - alter user privileges + - alter user privileges -- make supervisor accounts - check system logs, upgrades and other management +### Exercise evaluation chain + +The most important part of the system is evaluation of solutions submitted by +students. Concepts of consecutive steps from source code to final results +is described in more detail below to give readers solid overview of what have to +happen during evaluation process. + +First thing users have to do is to submit their solutions through web user +interface. The system checks assignment invariants (deadlines, count of +submissions, ...) and stores submitted file. The runtime environment is +automatically detected based on input file and a suitable evaluation +configuration variant is chosen (one exercise can have multiple variants, for +example C and Java languages). This exercise configuration is then used for +taking care of evaluation process. + +There is a pool of uniform worker engines dedicated to evaluation jobs. Incoming +jobs are kept in a queue until a free worker picks them. Worker is capable of +sequential evaluation of jobs, one at a time. + +The worker obtains the solution and its evaluation configuration, parses it and +starts executing the contained instructions. It is crucial to keep the worker +computer secure and stable, so a sandboxed environment is used for dealing with +unknown source code. When the execution is finished, results are saved and the +submitter is notified. + +The output of the worker contains data about the evaluation, such as time and +memory spent on running the program for each test input and whether its output +was correct. The system then calculates a numeric score from this data, which is +presented to the student. If the solution is wrong (incorrect output, uses too +much memory,..), error messages are also displayed to the submitter. + +### Weaknesses + Current system is old, but robust. There were no major security incidents during its production usage. However, from today's perspective there are several drawbacks. The main ones are: @@ -193,37 +232,6 @@ several drawbacks. The main ones are: which have a more difficult evaluation chain than simple compilation/execution/evaluation provided by CodEx. -### Exercise evaluation chain - -The most important part of the system is evaluation of solutions submitted by -students. Concepts of consecutive steps from source code to final results -is described in more detail below to give readers solid overview of what have to -happen during evaluation process. - -First thing users have to do is to submit their solutions through some user -interface. Then, the system checks assignment invariants (deadlines, count of -submissions, ...) and stores submitted files. The runtime environment is -automatically detected based on input files and a suitable evaluation -configuration variant is chosen (one exercise can have multiple variants, for -example C and Java languages). This exercise configuration is then used for -taking care of evaluation process. - -There is a pool of worker computers dedicated to evaluation jobs. Each one of -them can support different environments and programming languages to allow -testing programs for as many platforms as possible. Incoming jobs are scheduled -to a worker that is capable of running the job. - -The worker obtains the solution and its evaluation configuration, parses it and -starts executing the contained instructions. It is crucial to keep the worker -computer secure and stable, so a sandboxed environment is used for dealing with -unknown source code. When the execution is finished, results are saved and the -submitter is notified. - -The output of the worker contains data about the evaluation, such as time and -memory spent on running the program for each test input and whether its output -was correct. The system then calculates a numeric score from this data, which is -presented to the student. If the solution is wrong (incorrect output, uses too -much memory,..), error messages are also displayed to the submitter. ## Requirements @@ -292,9 +300,9 @@ addons (mostly administrative features). another tool and perform additional tests - use of modern technologies with state-of-the-art compilers -### Nonfunctional requirements +### Non-functional requirements -Nonfunctional requirements are requirements of technical character with no +Non-functional requirements are requirements of technical character with no direct mapping to visible parts of system. In ideal word, users should not know about these if they work properly, but would be at least annoyed if these requirements were not met. Most notably they are these ones: @@ -317,14 +325,14 @@ extendable, so everyone can develop their own feature. This also means that widely used programming languages and techniques should be used, so users can quickly understand the code and make changes. + +## Related work + To find out the current state in the field of automatic grading systems we did a short market survey on the field of automatic grading systems at universities, programming contests, and possibly other places where similar tools are available. - -## Related work - This is not a complete list of available evaluators, but only a few projects which are used these days and can be an inspiration for our project. Each project from the list has a brief description and some key features mentioned.