diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..75255c0 --- /dev/null +++ b/.gitignore @@ -0,0 +1,2 @@ +.vscode +.markdownlint.json \ No newline at end of file diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 1357a53..1194598 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -51,8 +51,7 @@ Notes: studenta --> -Introduction -============ +# Introduction Generally, there are many different ways and opinions on how to teach people something new. However, most people agree that a hands-on experience is one of @@ -65,8 +64,8 @@ University education system is one of the areas where this knowledge can be applied. In computer programming, there are several requirements such as the code being syntactically correct, efficient and easy to read, maintain and extend. Correctness and efficiency can be tested automatically to help teachers -save time for their research, but checking for bad design, habits and mistakes -is really hard to automate and requires manpower. +save time for their research, but reviewing bad design, bad coding habits and +logical mistakes is really hard to automate and requires manpower. Checking programs written by students takes a lot of time and requires a lot of mechanical, repetitive work. The first idea of an automatic evaluation system @@ -75,70 +74,73 @@ which evaluated code in Algol submitted on punch cards. In following years, many similar products were written. There are two basic ways of automatically evaluating code -- statically (check -the code without running it; safe, but not much precise) or dynamically (run the +the code without running it; safe, but not very precise) or dynamically (run the code on testing inputs with checking the outputs against reference ones; needs sandboxing, but provides good real world experience). -This project focuses on the machine-controlled part of source code evaluation. -First, problems of present software at our university were discussed and similar -projects at other educational institutions were examined. With acquired -knowledge from such projects in production, we set up goals for the new -evaluation system, designed the architecture and implemented a fully operational -solution. The system is now ready for production testing at our university. + +This project focuses on the machine-controlled part of source code evaluation. +First, the problems of the software used at our university previously were +discussed and similar projects at other educational institutions were examined. +With acquired knowledge from such projects in production, we set up goals for +the new evaluation system, designed the architecture and implemented a fully +operational solution. The system is now ready for production testing +at our university. ## Assignment The major goal of this project is to create a grading application that will be -used for programming classes at the Faculty of Mathematics and Physics, Charles -University. However, the application should be designed in a modular fashion so -that it can be easily extended to make other ways of using it possible. +used for programming classes at the Faculty of Mathematics and Physics of the Charles +University in Prague. However, the application should be designed in a modular fashion so +that it can be easily extended or modified to make other ways of using it possible. The project has a great starting point -- there is an old grading system -currently used at our university (CodEx), so its mistakes and weaknesses can be -adressed. Furthermore, many teachers are willing to use and test the new system. +currently used at the university (CodEx), so its flaws and weaknesses can be +addressed. Furthermore, many teachers are willing to use and test the new system. Following requirements were collected both from our personal experience with CodEx and from teachers' requests. -**Basic grading system requirements:** - -These are features that are necessary for any system for evaluation of -programming homework assignments used in a university programming course. +### Basic grading system requirements: - +These are the features which are necessary for any system for evaluation of +programming coding assignments used in any university programming course: -- creating exercises including textual description, sample inputs and correct +- students can use an intuitive user interface for interaction with the system, + mainly for viewing assigned exercises, uploading their own solutions to the assignments, + and viewing the results of the solutions after an automatic evaluation is finished +- teachers can create exercises including textual description, sample inputs and correct reference outputs (for example "sum all numbers from given file and write the result to the standard output") -- assigning the exercise to a group of users with some additional properties set - (deadlines, etc.) -- user interface for interaction with the system, mainly for showing assigned - exercises, uploading solution sources and presenting evaluated results -- safe environment to execute student solutions withing prescribed time and - memory limits and check corectness of outputs -- assigning points to users depending of correctness of his/her solution -- user management with support of roles (at least two -- _student_ and +- teachers can assigning an existing exercise to their class with some specific + properties set (deadlines, etc.) +- teachers can specify their scale of points which will be awarted to the students + depending on the correctness of his/her solution (expressed in percentage points) +- teachers can view all of the solutions their students submitted and also the + results of the evaluations and they can override the automatically assigned points + to the solutions manually +- teachers can see the statistics of their classes and individual students + of these claseese +- administrators can depend on a safe environment in which the students' solutions + will be executed +- administrators can manage users with support of roles (at least two -- _student_ and _supervisor_) -- administrative interface for manual checking of solutions, overriding - automatically assigned amount of points and viewing of overall statistics - about users -CodEx satisfies all these requirements and a few more that originate from the -way courses are organized at our university -- for example, users have roles -(_student_, _supervisor_ and _administrator_) that determine their capabilities -in the system and students are divided into groups that correspond to lab -groups. +CodEx satisfies all these requirements and a few more that originate from the +way courses are organized at our university -- for example, users have roles +(_student_, _supervisor_ and _administrator_) that determine their capabilities +in the system and students are divided into groups that correspond to lab groups. However, further requirements arose during the ten year long lifetime of the old system. There are not many ways to improve it from the perspective of a student, but a lot of feature requests came from administrators and supervisors. -Collected ideas were mostly gathered from meetings with faculty staff involved +The ideas were mostly gathered from meetings with faculty staff involved with the current system. -**Requested features for the new system:** +### Requested features for the new system: - logging in through a university authentication system (e.g. LDAP) - support for multiple programming environments at once to avoid unacceptable @@ -150,45 +152,42 @@ with the current system. - comments, comments, comments (exercises, tests, solutions, ...) - edit student solution and privately resubmit it - resubmit solution with saving all results (including temporary ones) -- mark one student solution as accepted (used for grading this assignment) -- web and command-line submit tool -- SIS (university information system) integration for fetching personal user - data +- mark one student's solution as accepted (used for grading this assignment) +- web and command-line submission tool +- SIS (university information system) integration for fetching personal user data - plagiarism detection - advanced low-level evaluation flow configuration with high-level abstraction layer for ordinary configuration cases - use of modern technologies with state-of-the-art compilers The survey shows that the system is used in many different ways, but the core -functionality is the same for all of them. When the system is ready, it is -likely that new ideas are figured out, thus the system must be designed to be -easily extendable, so everyone can develop his dream feature. This also means, -that widely used programming languages and techniques should be used, so users -can quickly understand the code and make changes. - -To find out current state in the field of automatic grading systems, let's do a -short survey at universities, programming contests or online tools. +functionality is the same for all of them. When the system is ready it is +likely that there will be new ideas of how to use the system and thus the system +must be designed to be easily extendable, so everyone can develop their own feature. +This also means that widely used programming languages and techniques should be used, +so users can quickly understand the code and make changes. +To find out the current state in the field of automatic grading systems we did a +short survey at universities, programming contests, and other available tools. ## Related work -First of all, some code evaluating projects were found and examined. It is not -a complete list of such evaluators, but just a few projects which are used -these days and can be an inspiration for our project. Each project from the +This is not a complete list of available evaluators, but only a few projects +which are used these days and can be an inspiration for our project. Each project from the list has a brief description and some key features mentioned. ### CodEx -There already is a grading solution at MFF UK, which was implemented in 2006 by -group of students. Its name is [CodEx -- The Code -Examiner](http://codex.ms.mff.cuni.cz/project/) and it has been used with some -improvements since then. The original plan was to use the system only for basic -programming courses, but there is demand for adapting it for many different -subjects. +Currently used grading solution at the Faculty of Mathematics and Physics of +the Charles University in Prague which was implemented in 2006 by a group +of students. It is called [CodEx -- The Code Examiner](http://codex.ms.mff.cuni.cz/project/) +and it has been used with some improvements since then. The original plan was +to use the system only for basic programming courses, but there was a demand +for adapting it for many different subjects. CodEx is based on dynamic analysis. It features a web-based interface, where -supervisors assign exercises to their students and the students have a time -window to submit the solution. Each solution is compiled and run in sandbox +supervisors can assign exercises to their students and the students have a time +window to submit their solutions. Each solution is compiled and run in sandbox (MO-Eval). The metrics which are checked are: corectness of the output, time and memory limits. It supports programs written in C, C++, C#, Java, Pascal, Python and Haskell. @@ -200,7 +199,7 @@ several drawbacks. The main ones are: - **web interface** -- The web interface is simple and fully functional. But rapid development in web technologies opens new horizons of how web interface can be made. -- **web api** -- CodEx offers a very limited XML API based on outdated +- **web API** -- CodEx offers a very limited XML API based on outdated technologies that is not sufficient for users who would like to create custom interfaces such as a command line tool or mobile application. - **sandboxing** -- MO-Eval sandbox is based on principle of monitoring system @@ -218,20 +217,19 @@ several drawbacks. The main ones are: programs for classes such as Parallel programming or Compiler principles, which have a more difficult evaluation chain than simple compilation/execution/evaluation provided by CodEx. - + ### Progtest -[Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in -Prague. As far as we know it is used for C/C++, Bash programming and -knowledge-based quizzes. There are several bonus points and penalties and also a -few hints what is failing in submitted solution. It is very strict on source -code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks -or array boundaries checks via `mudflap` library. +[Progtest](https://progtest.fit.cvut.cz/) is private project of [FIT ČVUT](https://fit.cvut.cz) +in Prague. As far as we know it is used for C/C++, Bash programming and knowledge-based quizzes. +There are several bonus points and penalties and also a few hints what is failing in the submitted +solution. It is very strict on source code quality, for example `-pedantic` option of GCC, +Valgrind for memory leaks or array boundaries checks via `mudflap` library. ### Codility -[Codility](https://codility.com/) is web based solution primary targeted to -company recruiters. It is commercial product of SaaS type supporting 16 +[Codility](https://codility.com/) is a web based solution primary targeted to +company recruiters. It is a commercial product available as a SaaS and it supports 16 programming languages. The [UI](http://1.bp.blogspot.com/-_isqWtuEvvY/U8_SbkUMP-I/AAAAAAAAAL0/Hup_amNYU2s/s1600/cui.png) of Codility is [opensource](https://github.com/Codility/cui), the rest of @@ -240,44 +238,43 @@ captured progress of writing code for each user. ### CMS -[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system -for running and organizing programming contests. It is written in Python and -contain several modules. CMS supports C/C++, Pascal, Python, PHP and Java. -PostgreSQL is a single point of failure, all modules heavily depend on database -connection. Task evaluation can be only three step pipeline -- compilation, -execution, evaluation. Execution is performed in -[Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our -project, Mgr. Martin Mareš, Ph.D. +[CMS](http://cms-dev.github.io/index.html) is an opensource distributed system +for running and organizing programming contests. It is written in Python and +contains several modules. CMS supports C/C++, Pascal, Python, PHP, and Java +programming languages. PostgreSQL is a single point of failure, all modules +heavily depend on the database connection. Task evaluation can be only a three +step pipeline -- compilation, execution, evaluation. Execution is performed in +[Isolate](https://github.com/ioi/isolate), sandbox written by the consultant +of our project, Mgr. Martin Mareš, Ph.D. ### MOE -[MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C -and Python. It does not provide a default GUI interface, all actions have to be -performed from command line. The system does not evaluate submissions in real -time, results are computed in batch mode after exercise deadline, using Isolate -for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but +[MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C +and Python. It does not provide a default GUI interface, all actions have to be +performed from command line. The system does not evaluate submissions in real +time, results are computed in batch mode after exercise deadline, using Isolate +for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but the system is generally obsolete. ### Kattis -[Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean -and functional web UI, but the rest of the application is too simple. A nice +[Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean +and functional web UI, but the rest of the application is too simple. A nice feature is the usage of a [standardized -format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for -exercises. Kattis is primarily used by programming contest organizators, company +format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for +exercises. Kattis is primarily used by programming contest organizators, company recruiters and also some universities. ## ReCodEx goals -From the survey above it is clear, that none of the existing systems is capable -of all the features collected for the new system. No grading system is designed -to support complicated evaluation pipeline, so this part is unexplored field and -has to be designed with caution. Also, no project is modern and extendable in a -way that it can be used as a base for ReCodEx. After considering all these -facts, it is clear that the new system has to be written from scratch. This -implies, that only subset of all features will be implemented in the first -version, the others following later. +None of the existing systems we came across is capable of all the required features +of the new system. There is no grading system which is designed to support a complicated +evaluation pipeline, so this part is an unexplored field and has to be designed with caution. +Also, no project is modern and extensible so it could be used as a base for ReCodEx. +After considering all these facts, it was clear that a new system has to be written +from scratch. This implies, that only a subset of all the features will be implemented +in the first version, the other in the following ones. Gathered features are categorized based on priorities for the whole system. The highest priority has main functionality similar to current CodEx. It is a base @@ -294,41 +291,40 @@ side) and command-line submit tool. Plagiarism detection is not likely to be part of any release in near future unless someone other makes the engine. The detection problem is too hard to be solved as part of this project. -The new project is **ReCodEx -- ReCodEx Code Examiner**. The name should point -to CodEx, previous evaluation solution, but also reflect new approach to solve -issues. **Re** as part of the name means redesigned, rewritten, renewed or -restarted. +We named the project as **ReCodEx -- ReCodEx Code Examiner**. The name should point +to the old CodEx, but also reflect the new approach to solve issues. +**Re** as part of the name means redesigned, rewritten, renewed, or restarted. -At this point there is a clear idea how the new system will be used and what are +At this point there is a clear idea how the new system will be used and what are the major enhancements for future releases. With this in mind, the overall architecture can be sketched. From the previous research, we set up several -goals, which a new system should have. They mostly reflect drawbacks of current -version of CodEx and reasonable wishes of university users. Most notable +goals, which the new system should have. They mostly reflect drawbacks of the current +version of CodEx and some reasonable wishes of university users. Most notable features are following: -- modern HTML5 web frontend written in Javascript using a suitable framework -- REST API implemented in PHP, communicating with database, backend and file +- modern HTML5 web frontend written in JavaScript using a suitable framework +- REST API implemented in PHP, communicating with database, evaluation backend and a file server -- backend is implemented as distributed system on top of message queue framework +- evaluation backend implemented as a distributed system on top of a message queue framework (ZeroMQ) with master-worker architecture -- worker with basic support of Windows environment (without sandbox, no general + +- worker with basic support of the Windows environment (without sandbox, no general purpose suitable tool available yet) -- evaluation procedure configured in YAML file, compound of small tasks - connected into arbitrary oriented acyclic graph +- evaluation procedure configured in a YAML file, compound of small tasks + connected into an arbitrary oriented acyclic graph ### Intended usage -Whole system is intended to help both supervisors and students. To achieve this, -it is crucial to keep in mind typical usage scenarios of the system and try to -make these typical tasks as simple as possible. To synchronize visions of -readers, basic concepts are recapitulated. +The whole system is intended to help both teachers (supervisors) and students. +To achieve this, it is crucial to keep in mind typical usage scenarios of the +system and try to make these typical tasks as simple as possible. -First of all, the system has database of users. Each user has assigned a role, -which correspond to his/her privileges. User can be logged in via local -authentication service or university system. There are groups of users, which -corresponds to lectured courses. Groups can be hierarchically ordered to reflect -additional metadata like academic year. For example, reasonable group hierarchy -is like this: +The system has a database of users. Each user has a role assigned, +which correspond to his/her privileges. User can be logged in via +email and password or using the university system. There are groups of users, which +corresponds to the lectured courses. Groups can be hierarchically ordered to reflect +additional metadata such as the academic year. For example, a reasonable group hierarchy +can look like this: ``` Summer term 2016 @@ -341,32 +337,34 @@ Summer term 2016 ``` -In this example, student users are part of the leaf groups, higher groups are -just for keeping related groups together. The hierarchy tree can be modified and -altered to fit specific needs for each organization, even the flat structure is -possible. +In this example, students are members of the leaf groups, the higher level groups +are just for keeping the related groups together. The hierarchy tree can be modified and +altered to fit specific needs of the university or any other organization, even the +flat structure (i.e., no hierarchy) is possible. -One user can be part of multiple groups and also one group can have multiple -users. Each user in a group has a role which defines its capabilities. -Priviledged user can assign a new exercise in his/her group, change assignment -details, view results of other users and manually change them. Normal user can +One user can be part of multiple groups and also one group can of course have multiple +users. Each user in a group has also a specific role for the given group. +Priviledged user (supervisor) can assign a new exercise in his/her group, change assignment +details, view results of other users and manually change them. Normal user (student) can join a group, get list of assigned exercises, view assignment detail, submit -his/her solution and of course view the results. +his/her solution and view the results of the evaluation. Database of exercises (algorithmic problems) is another part of the project. -Each exercise consists of text in multiple language variants, evaluation -configuration and set of inputs and reference outputs. Exercises are created by -instructed priviledged users. Assigning exercise to a group means choose one of -the exercises in the list and specify additional data. Assignment has a -deadline, maximum amount of points and configuration for calculating the final -amount, number of tries and supported runtimes (programming languages) including -specific time and memory limits for sandboxed tasks. +Each exercise consists of a text in multiple language variants, an evaluation +configuration and a set of inputs and reference outputs. Exercises are created by +instructed priviledged users. Assigning an exercise to a group means to choose +one of the available exercises and specifying additional properties. An assignment +has a deadline (optionally a second deadline), a maximum amount of points, +a configuration for calculating the final score, a maximum number of submissions, +and a list of supported runtime environemnts (e.g., programming languages) including +specific time and memory limits for the sandboxed tasks. #### Exercise evaluation chain -The most important part of the application is evaluating exercises for solutions -submitted by users. For imaginary system architecture _UI_, _API_, _Broker_ and -_Worker_ this goes as follows. +The most important part of the system is the evaluation of the solutions +submitted by the users for their assigned exercises. + +~~For imaginary system architecture _UI_, _API_, _Broker_ and _Worker_ this goes as follows.~~ First thing users have to do is to submit their solutions to _UI_ which provides interface to upload files and then submit them. _UI_ sends a request to _API_ @@ -400,25 +398,24 @@ includes overview which part succeeded and which failed (optionally with reason like "memory limit exceeded") and amount of awarded points. -Analysis -======== +# Analysis ## Solution concepts analysis @todo: what problems were solved on abstract and high levels, how they can be solved and what was the final solution - - which problems are they? ... these ones ↓ - - what type of users there should be, why they are needed - - explain why there is exercise and assignment division, what means what and how they are used - - explain instances why they are usefull what they solve and also discuss licences concept - - groups, they can be public and private and why is that, what it solves, explain amd discuss treshold and other group features - - extended execution pipeline (not just compilation/execution/evaluation) and why it is needed - - progress state, how it can be done and displayed to user, why random messages - - how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why - - judges, discuss what they possibly can do and what it can be used for (returning for instance 2 numbers instead of 1 and why we return just one) - - discuss points assigned to solution, why are there bonus points, explain minimal point threshold - - discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why - - and many many more general concepts which can be discussed and solved... please append more of them if something comes to your mind... thanks +- which problems are they? ... these ones ↓ +- what type of users there should be, why they are needed +- explain why there is exercise and assignment division, what means what and how they are used +- explain instances why they are usefull what they solve and also discuss licences concept +- groups, they can be public and private and why is that, what it solves, explain amd discuss treshold and other group features +- extended execution pipeline (not just compilation/execution/evaluation) and why it is needed +- progress state, how it can be done and displayed to user, why random messages +- how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why +- judges, discuss what they possibly can do and what it can be used for (returning for instance 2 numbers instead of 1 and why we return just one) +- discuss points assigned to solution, why are there bonus points, explain minimal point threshold +- discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why +- and many many more general concepts which can be discussed and solved... please append more of them if something comes to your mind... thanks ### Structure of the project @@ -426,9 +423,9 @@ The ReCodEx project is divided into two logical parts – the *Backend* and the *Frontend* – which interact which each other and which cover the whole area of code examination. Both of these logical parts are independent of each other in the sense of being installed on separate -machines on different locations and that one of the parts can be -replaced with different implementation and as long as the communication -protocols are preserved, the system will continue to work as expected. +machines at different locations and that one of the parts can be +replaced with a different implementation and as long as the communication +protocols are preserved, the system will continue working as expected. *Backend* is the part which is responsible solely for the process of evaluation a solution of an exercise. Each evaluation of a solution is @@ -441,7 +438,7 @@ environment, specific version of a compiler or the job must be evaluated on a processor with a specific number of cores. The backend infrastructure decides whether it will accept a job or decline it based on the specified requirements. In case it accepts the job, it will be -placed in a queue and processed as soon as possible. The backend +placed in a queue and it will be processed as soon as possible. The backend publishes the progress of processing of the queued jobs and the results of the evaluations can be queried after the job processing is finished. The backend produces a log of the evaluation and scores the solution @@ -649,8 +646,7 @@ cleaner completes machine specific caching system. -The Backend -=========== +# The Backend The backend is the part which is hidden to the user and which has only one purpose: evaluate user’s solutions of their assignments. @@ -675,7 +671,7 @@ for the technical description of the components) ### Fileserver -@todo: stores particular datas from frontend and backend, hashing, HTTP API +@todo: stores particular data from frontend and backend, hashing, HTTP API ### Worker @@ -996,7 +992,7 @@ of entities and relational database models), describe the logical grouping of entities and how they are related: - user + settings + logins + ACL -- instance + licences + groups + group membership +- instance + licenses + groups + group membership - exercise + assignments + localized assignments + runtime environments + hardware groups - submission + solution + reference solution + solution evaluation @@ -1373,7 +1369,7 @@ the output length (as long as the printing fits in the time limit). memory: 8192 ``` -Fetch sample solution from fileserver. Base URL of fileserver is in the header +Fetch sample solution from file server. Base URL of file server is in the header of the job configuration, so only the name of required file (its `sha1sum` in our case) is necessary.