From 0d97741e8ae312233b3f25e4dc578e8818996413 Mon Sep 17 00:00:00 2001 From: Petr Stefan Date: Wed, 28 Dec 2016 11:45:03 +0100 Subject: [PATCH] Analysis structure flattened --- Rewritten-docs.md | 107 +++++++++++++++++++++++++++++++--------------- 1 file changed, 73 insertions(+), 34 deletions(-) diff --git a/Rewritten-docs.md b/Rewritten-docs.md index aa491a3..fccccc8 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -86,10 +86,8 @@ knowledge from such projects in production, we set up goals for the new evaluation system, designed the architecture and implemented a fully operational solution. The system is now ready for production testing at our university. -Analysis --------- -### Assignment +## Assignment The major goal of this project is to create a grading application that will be used for programming classes at the Faculty of Mathematics and Physics, Charles @@ -172,14 +170,14 @@ To find out current state in the field of automatic grading systems, let's do a short survey at universities, programming contests or online tools. -### Related work +## Related work First of all, some code evaluating projects were found and examined. It is not a complete list of such evaluators, but just a few projects which are used these days and can be an inspiration for our project. Each project from the list has a brief description and some key features mentioned. -#### CodEx +### CodEx There already is a grading solution at MFF UK, which was implemented in 2006 by group of students. Its name is [CodEx -- The Code @@ -221,7 +219,7 @@ several drawbacks. The main ones are: which have a more difficult evaluation chain than simple compilation/execution/evaluation provided by CodEx. -#### Progtest +### Progtest [Progtest](https://progtest.fit.cvut.cz/) is private project from FIT ČVUT in Prague. As far as we know it is used for C/C++, Bash programming and @@ -230,7 +228,7 @@ few hints what is failing in submitted solution. It is very strict on source code quality, for example `-pedantic` option of GCC, Valgrind for memory leaks or array boundaries checks via `mudflap` library. -#### Codility +### Codility [Codility](https://codility.com/) is web based solution primary targeted to company recruiters. It is commercial product of SaaS type supporting 16 @@ -240,7 +238,7 @@ of Codility is [opensource](https://github.com/Codility/cui), the rest of source code is not available. One interesting feature is 'task timeline' -- captured progress of writing code for each user. -#### CMS +### CMS [CMS](http://cms-dev.github.io/index.html) is an opensource distributed system for running and organizing programming contests. It is written in Python and @@ -251,7 +249,7 @@ execution, evaluation. Execution is performed in [Isolate](https://github.com/ioi/isolate), sandbox written by consultant of our project, Mgr. Martin Mareš, Ph.D. -#### MOE +### MOE [MOE](http://www.ucw.cz/moe/) is a grading system written in Shell scripts, C and Python. It does not provide a default GUI interface, all actions have to be @@ -260,7 +258,7 @@ time, results are computed in batch mode after exercise deadline, using Isolate for sandboxing. Parts of MOE are used in other systems like CodEx or CMS, but the system is generally obsolete. -#### Kattis +### Kattis [Kattis](http://www.kattis.com/) is another SaaS solution. It provides a clean and functional web UI, but the rest of the application is too simple. A nice @@ -270,7 +268,7 @@ exercises. Kattis is primarily used by programming contest organizators, company recruiters and also some universities. -### ReCodEx goals +## ReCodEx goals From the survey above it is clear, that none of the existing systems is capable of all the features collected for the new system. No grading system is designed @@ -303,7 +301,7 @@ features are following: - evaluation procedure configured in YAML file, compound of small tasks connected into arbitrary oriented acyclic graph -#### Intended usage +### Intended usage Whole system is intended to help both supervisors and students. To achieve this, it is crucial to keep in mind typical usage scenarios of the system and try to @@ -349,7 +347,7 @@ deadline, maximum amount of points and configuration for calculating the final amount, number of tries and supported runtimes (programming languages) including specific time and memory limits for sandboxed tasks. -##### Exercise evaluation chain +#### Exercise evaluation chain The most important part of the application is evaluating exercises for solutions submitted by users. For imaginary system architecture _UI_, _API_, _Broker_ and @@ -387,7 +385,10 @@ includes overview which part succeeded and which failed (optionally with reason like "memory limit exceeded") and amount of awarded points. -### Solution concepts analysis +Analysis +======== + +## Solution concepts analysis @todo: what problems were solved on abstract and high levels, how they can be solved and what was the final solution @@ -404,7 +405,7 @@ like "memory limit exceeded") and amount of awarded points. - discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why - and many many more general concepts which can be discussed and solved... please append more of them if something comes to your mind... thanks -#### Structure of the project +### Structure of the project The ReCodEx project is divided into two logical parts – the *Backend* and the *Frontend* – which interact which each other and which cover the @@ -467,7 +468,7 @@ described as well. @todo: move "General backend implementation" here @todo: move "General frontend implementation" here -#### Evaluation unit executed on backend +### Evaluation unit executed on backend @todo: describe possibilities of "piece of work" which can backend execute, how they can look like, describe our job and its tasks @@ -478,7 +479,7 @@ described as well. @todo: how to solve problem with specific worker environment, mention internal job variables -### Implementation analysis +## Implementation analysis Developing project like ReCodEx have to have some discussion over implementation details and how to solve some particular problems properly. This discussion is @@ -486,7 +487,7 @@ never ending story which is done through whole development process. Some of the most important implementation problems or interesting observations will be discussed in this chapter. -#### General backend implementation +### General backend implementation There are numerous ways how to divide some sort of system into separated services, from one single component to many and many single-purpose components. @@ -517,7 +518,7 @@ they are more connected with backend, so it is considered they belong there. @todo: what type of communication within backend could be used, mention some frameworks, queue managers, protocols, which was considered -#### Fileserver +### Fileserver @todo: fileserver and why is separated @@ -527,7 +528,7 @@ they are more connected with backend, so it is considered they belong there. @todo: how can jobs be stored on fileserver, mainly mention that it is nonsence to store inputs and outputs within job archive -#### Broker +### Broker @todo: assigning of jobs to workers, which are possible algorithms, queues, which one was chosen @@ -535,29 +536,67 @@ they are more connected with backend, so it is considered they belong there. @todo: making action and reaction over zeromq more general and easily extensible, mention reactor and why is needed and what it solves -#### Worker - -Worker is component which is supposed to execute incoming jobs from broker. As such worker should work and support wide range of different infrastructures and maybe even platforms/operating systems. Support of at least two main operating systems is desirable and should be implemented. Worker as a service does not have to be much complicated, but a bit of complex behaviour is needed. Mentioned complexity is almost exclusively concerned about robust communication with broker which has to be regularly checked. Ping mechanism is usually used for this in all kind of projects. This means that worker should be able to send ping messages even during execution. So worker has to be divided into two separate parts, the one which will handle communication with broker and the another which will execute jobs. The easiest solution is to have these parts in separate threads which somehow tightly communicates with each other. For inner process commucation there can be used numerous technologies, from shared memory to condition variables or some kind of in-process messages. Already used library ZeroMQ is possible to provide in-process messages working on the same principles as network communication which is quite handy and solves problems with threads synchronization and such. - -At this point we have worker with two internal parts listening one and execution one. Implementation of first one is quite straighforward and clear. So lets discuss what should be happening in execution subsystem... - -@todo: complete paragraph above... execution of job on worker, how it is done, what steps are necessary and general for all jobs - -@todo: how can inputs and outputs (and supplementary files) be handled (they can be downloaded on start of execution, or during...) +### Worker -As described in fileserver section stored supplementary files have special filenames which reflects hashes of their content. As such there are no duplicates stored in fileserver. Worker can use feature too and caches these files for some while and saves precious bandwith. This means there has to be system which can download file, store it in cache and after some time of inactivity delete it. Because there can be multiple worker instances on some particular server it is not efficient to have this system in every worker on its own. So it is feasible to have this feature somehow shared among all workers on the same machine. Solution would be again having separate service connected through network with workers which would provide such functionality but this would component with another communication for the purpose where it is not exactly needed. Implemented solution assume worker has access to specified cache folder, to this folder worker can download supplementary files and copy them from here. This means every worker has the possibility to maintain downloads to cache, but what is worker not able to properly do is deletion of unused files after some time. For that single-purpose component is introduced which is called 'cleaner'. It is simple script executed within cron which is able to delete files which were unused for some time. Together with worker fetching feature cleaner completes machine specific caching system. +Worker is component which is supposed to execute incoming jobs from broker. As +such worker should work and support wide range of different infrastructures and +maybe even platforms/operating systems. Support of at least two main operating +systems is desirable and should be implemented. Worker as a service does not +have to be much complicated, but a bit of complex behaviour is needed. Mentioned +complexity is almost exclusively concerned about robust communication with +broker which has to be regularly checked. Ping mechanism is usually used for +this in all kind of projects. This means that worker should be able to send ping +messages even during execution. So worker has to be divided into two separate +parts, the one which will handle communication with broker and the another which +will execute jobs. The easiest solution is to have these parts in separate +threads which somehow tightly communicates with each other. For inner process +commucation there can be used numerous technologies, from shared memory to +condition variables or some kind of in-process messages. Already used library +ZeroMQ is possible to provide in-process messages working on the same principles +as network communication which is quite handy and solves problems with threads +synchronization and such. + +At this point we have worker with two internal parts listening one and execution +one. Implementation of first one is quite straighforward and clear. So lets +discuss what should be happening in execution subsystem... + +@todo: complete paragraph above... execution of job on worker, how it is done, +what steps are necessary and general for all jobs + +@todo: how can inputs and outputs (and supplementary files) be handled (they can +be downloaded on start of execution, or during...) + +As described in fileserver section stored supplementary files have special +filenames which reflects hashes of their content. As such there are no +duplicates stored in fileserver. Worker can use feature too and caches these +files for some while and saves precious bandwith. This means there has to be +system which can download file, store it in cache and after some time of +inactivity delete it. Because there can be multiple worker instances on some +particular server it is not efficient to have this system in every worker on its +own. So it is feasible to have this feature somehow shared among all workers on +the same machine. Solution would be again having separate service connected +through network with workers which would provide such functionality but this +would component with another communication for the purpose where it is not +exactly needed. Implemented solution assume worker has access to specified cache +folder, to this folder worker can download supplementary files and copy them +from here. This means every worker has the possibility to maintain downloads to +cache, but what is worker not able to properly do is deletion of unused files +after some time. For that single-purpose component is introduced which is called +'cleaner'. It is simple script executed within cron which is able to delete +files which were unused for some time. Together with worker fetching feature +cleaner completes machine specific caching system. @todo: describe a bit more cleaner functionality and that it is safe and there are no unrecoverable races @todo: sandboxing, what possibilites are out there (linux, Windows), what are general and really needed features, mention isolate, what are isolate features -#### Monitor +### Monitor @todo: how progress status can be sent, why is there separate component of system (monitor) and why is this feature only optional @todo: monitor and what is done there, mention caching and why it is needed -#### General frontend implementation +### General frontend implementation @todo: communication between backend and frontend @@ -565,7 +604,7 @@ As described in fileserver section stored supplementary files have special filen @todo: what apis can be used on server frontend side, why rest in particular -#### API +### API @todo: php frameworks, why nette @@ -587,7 +626,7 @@ As described in fileserver section stored supplementary files have special filen @todo: on demand loading of students submission, in-time loading of every other submission, why -#### Web-app +### Web-app @todo: what technologies can be used on client frontend side, why react was used