diff --git a/Rewritten-docs.md b/Rewritten-docs.md index c23dc31..7277ece 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -61,22 +61,28 @@ knowledge are more suitable for this practical type of learning than others, and fortunately, programming is one of them. University education system is one of the areas where this knowledge can be -applied. In computer programming, there are several requirements such as the -code being syntactically correct, efficient and easy to read, maintain and -extend. Correctness and efficiency can be tested automatically to help teachers -save time for their research, but reviewing bad design, bad coding habits and -logical mistakes is really hard to automate and requires manpower. - -Checking programs written by students takes a lot of time and requires a lot of -mechanical, repetitive work. The first idea of an automatic evaluation system +applied. In computer programming, there are several requirements a program +should satify, such as the code being syntactically correct, efficient and easy +to read, maintain and extend. + +Checking programs written by students takes time and requires a lot of +mechanical, repetitive work -- reviewing source codes, compiling them and +running them through testing scenarios. It is therefore desirable to automate as +much of this work as possible. The first idea of an automatic evaluation system comes from Stanford University professors in 1965. They implemented a system which evaluated code in Algol submitted on punch cards. In following years, many similar products were written. -There are two basic ways of automatically evaluating code -- statically (check -the code without running it; safe, but not very precise) or dynamically (run the -code on testing inputs with checking the outputs against reference ones; needs -sandboxing, but provides good real world experience). +In today's world, properties like correctness and efficiency can be tested +automatically to a large extent. This fact should be exploited to help teachers +save time for tasks such as examining bad design, bad coding habits and logical +mistakes, which are difficult to perform automatically. + +There are two basic ways of automatically evaluating code -- statically +(checking the sourcecode without running it; safe, but not very precise) or +dynamically (running the code on test inputs and checking the correctness of +outputs ones; provides good real world experience, but requires extensive +security measures). This project focuses on the machine-controlled part of source code evaluation. First, general concepts of grading systems are observed, new requirements are @@ -84,8 +90,8 @@ specified and project with similar functionality are examined. Also, problems of the software previously used at Charles University in Prague are briefly discussed. With acquired knowledge from such projects in production, we set up goals for the new evaluation system, designed the architecture and implemented a -fully operational solution. The system is now ready for production testing at -the university. +fully operational solution based on dynamic evaluation. The system is now ready +for production testing at the university. ## Assignment @@ -95,13 +101,14 @@ Charles University in Prague. However, the application should be designed in a modular fashion to be easily extended or even modified to make other ways of usage possible. -The system should be capable of dynamic analysis of programming code. It means, -that following four basic steps have to be supported: +The system should be capable of dynamic analysis of submitted source codes. This +consists of following basic steps: 1. compile the code and check for compilation errors 2. run compiled binary in a sandbox with predefined inputs 3. check constraints on used amount of memory and time 4. compare program outputs with predefined values +5. award the code with a numeric score The project has a great starting point -- there is an old grading system currently used at the university (CodEx), so its flaws and weaknesses can be @@ -111,14 +118,14 @@ they are willing to consult ideas or problems during development with us. ### Intended usage The whole system is intended to help both teachers (supervisors) and students. -To achieve this, it is crucial to keep in mind typical usage scenarios of the -system and try to make these tasks as simple as possible. +To achieve this, it is crucial to keep in mind the typical usage scenarios of +the system and to try to make these tasks as simple as possible. -The system has a database of users. Each user has assigned a role, which -corresponds to his/her privileges. There are user groups reflecting structure of -lectured courses. Groups can be hierarchically ordered to reflect additional -metadata such as the academic year. For example, a reasonable group hierarchy -can look like this: +The system has a database of users. Each user is assigned a role, which +corresponds to his/her privileges. There are user groups reflecting the +structure of lectured courses. Groups can be hierarchically ordered to reflect +additional metadata such as the academic year. For example, a reasonable group +hierarchy could look like this: ``` Summer term 2016 @@ -130,22 +137,22 @@ Summer term 2016 ... ``` -In this example, students are members of the leaf groups, the higher level -entities are just for keeping the related groups together. The hierarchy -structure can be modified and altered to fit specific needs of the university or -any other organization, even the flat structure (i.e., no hierarchy) is -possible. One user can be part of multiple groups and on the other hand one -group can have multiple users. Each user can have a specific role for every -group in which is a member, overriding his/her default role in this context. - -Database of exercises (algorithmic problems) is another part of the project. -Each exercise consists of a text in multiple language variants, an evaluation -configuration and a set of inputs and reference outputs. Exercises are created -by instructed privileged users. Assigning an exercise to a group means to -choose one of the available exercises and specifying additional properties. An -assignment has a deadline (optionally a second deadline), a maximum amount of -points, a configuration for calculating the final score, a maximum number of -submissions, and a list of supported runtime environments (e.g., programming +In this example, students are members of the leaf groups and the higher level +nodes are just for keeping related groups together. The structure can be +modified and altered to fit specific needs of the university or any other +organization, even a flat structure is possible. One user can be a member of +multiple groups and have a different role in each of them (a student can attend +labs for several courses while also teaching one). + +A database of exercises (algorithmic problems) is another part of the project. +Each exercise consists of a text describing the problem in multiple language +variants, an evaluation configuration (machine-readable instructions on how to +evaluate solutions to the exercise) and a set of inputs and reference outputs. +Exercises are created by instructed privileged users. Assigning an exercise to a +group means choosing one of the available exercises and specifying additional +properties: a deadline (optionally a second deadline), a maximum amount of +points, a configuration for calculating the score, a maximum number of +submissions, and a list of supported runtime environments (e.g. programming languages) including specific time and memory limits for each one. Typical use cases for supported user roles are illustrated on following UML @@ -161,32 +168,29 @@ is described in more detail below to give readers solid overview of what have to happen during evaluation process. First thing users have to do is to submit their solutions through some user -interface. Then, the system checks assignment invariants (deadlines, count -of submissions, ...) and stores submitted files. The runtime environment is -automatically detected based on input files and suitable exercise configuration -variant is chosen (one exercise can have multiple variants, for example C and -Java languages). Matching exercise configuration is then used for taking care of -evaluation process. - -There is a pool of worker computers dedicated to processing jobs. Some of them -may have different environment to allow testing programs in more conditions. -Incoming jobs are scheduled to particular worker depending on its capabilities -and job requirements. - -Job processing itself starts with obtaining source files and job configuration. -The configuration is parsed into small tasks with simple piece of work. -Evaluation itself goes in direction of tasks ordering. It is crucial to keep -executive computer secure and stable, so isolated sandboxed environment is used -when dealing with unknown source code. When the execution is finished, results -are saved. - -Results from worker contains only output data from processed tasks (this could -be return value, consumed time, ...). On top of that, one value is calculated to -express overall quality of the tested job. It is used as points for final -student grading. Calculation method of this value may be different for each -assignment. Data presented back to users include overview of job parts (which -succeeded and which failed, optionally with reason like "memory limit exceeded") -and achieved score (amount of awarded points). +interface. Then, the system checks assignment invariants (deadlines, count of +submissions, ...) and stores submitted files. The runtime environment is +automatically detected based on input files and a suitable evaluation +configuration variant is chosen (one exercise can have multiple variants, for +example C and Java languages). This exercise configuration is then used for +taking care of evaluation process. + +There is a pool of worker computers dedicated to evaluation jobs. Each one of +them can support different environments and programming languages to allow +testing programs for as many platforms as possible. Incoming jobs are scheduled +to a worker that is capable of running the job. + +The worker obtains the solution and its evaluation configuration, parses it and +starts executing the contained instructions. It is crucial to keep the worker +computer secure and stable, so a sandboxed environment is used for dealing with +unknown source code. When the execution is finished, results are saved and the +submitter is notified. + +The output of the worker contains data about the evaluation, such as time and +memory spent on running the program for each test input and whether its output +was correct. The system then calculates a numeric score from this data, which is +presented to the student. If the solution is wrong (incorrect output, uses too +much memory,..), error messages are also displayed to the submitter. ## Requirements