Typos

8 years ago · 91fa5d0af1
parent c906ed83af
commit 91fa5d0af1
1 changed files with 76 additions and 51 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -69,7 +69,7 @@ logical mistakes is really hard to automate and requires manpower.

 Checking programs written by students takes a lot of time and requires a lot of
 mechanical, repetitive work. The first idea of an automatic evaluation system
-comes from Stanford University profesors in 1965. They implemented a system
+comes from Stanford University professors in 1965. They implemented a system
 which evaluated code in Algol submitted on punch cards. In following years, many
 similar products were written.

@ -101,7 +101,7 @@ that following four basic steps have to be supported:
 1. compile the code and check for compilation errors
 2. run compiled binary in a sandbox with predefined inputs
 3. check constraints on used amount of memory and time
-4. compare program outpus with predefined values
+4. compare program outputs with predefined values

 The project has a great starting point -- there is an old grading system
 currently used at the university (CodEx), so its flaws and weaknesses can be
@ -141,14 +141,14 @@ which is a member.
 Database of exercises (algorithmic problems) is another part of the project.
 Each exercise consists of a text in multiple language variants, an evaluation
 configuration and a set of inputs and reference outputs. Exercises are created
-by instructed priviledged users. Assigning an exercise to a group means to
+by instructed privileged users. Assigning an exercise to a group means to
 choose one of the available exercises and specifying additional properties. An
 assignment has a deadline (optionally a second deadline), a maximum amount of
 points, a configuration for calculating the final score, a maximum number of
-submissions, and a list of supported runtime environemnts (e.g., programming
+submissions, and a list of supported runtime environements (e.g., programming
 languages) including specific time and memory limits for the sandboxed tasks.

-Typical use cases for supported user roles are ilustrated on following UML
+Typical use cases for supported user roles are illustrated on following UML
 diagram:

 ![System use case diagram](https://github.com/ReCodEx/wiki/raw/master/images/System_use_case.png)
@ -199,7 +199,7 @@ came from administrators and supervisors. The ideas were gathered mostly our
 personal experience with the system and from meetings with faculty staff
 involved with the current system.

-For clear arragement all the requirements and wishes are presented grouped by
+For clear arrangement all the requirements and wishes are presented grouped by
 categories.

 ### System features
@ -228,7 +228,7 @@ They describe the evaluation system in general and also university addons
  reviewed, commented and assigned additional points (positive or negative)
 - one particular solution can be marked as accepted (used for grading this
  assignment)
- teacher can edit student solution and privately resubmit it; optionaly saving
+- teacher can edit student solution and privately resubmit it; optionally saving
  all results (including temporary ones)
 - localization of all texts (UI and exercises)
 - Markdown support for creating exercise texts
@ -242,7 +242,7 @@ They describe the evaluation system in general and also university addons
  mainly for viewing assigned exercises, uploading their own solutions to the
  assignments, and viewing the results of the solutions after an automatic
  evaluation is finished; wanted two interfaces are web and command-line based
- user priviledge separation (at least two roles -- _student_ and _supervisor_)
+- user privilege separation (at least two roles -- _student_ and _supervisor_)
 - logging in through a university authentication system (e.g. LDAP)
 - SIS (university information system) integration for fetching personal user
  data
@ -264,7 +264,7 @@ met. Most notably they are these ones:
 - user interface of the system accessible on users' computers without
  installation of any kind of additional software
 - easy implementation of different user interfaces
- be ready for workload hundreads of students and tens of supervisors
+- be ready for workload hundreds of students and tens of supervisors
 - automated installation of all components

@todo: fill some nonfunctional requirements;
@ -303,7 +303,7 @@ for adapting it for many different subjects.
 CodEx is based on dynamic analysis. It features a web-based interface, where
 supervisors can assign exercises to their students and the students have a time
 window to submit their solutions. Each solution is compiled and run in sandbox
-(MO-Eval). The metrics which are checked are: corectness of the output, time
+(MO-Eval). The metrics which are checked are: correctness of the output, time
 and memory limits. It supports programs written in C, C++, C#, Java, Pascal,
 Python and Haskell.

@ -378,7 +378,7 @@ the system is generally obsolete.
 and functional web UI, but the rest of the application is too simple. A nice
 feature is the usage of a [standardized 
 format](http://www.problemarchive.org/wiki/index.php/Problem_Format) for
-exercises. Kattis is primarily used by programming contest organizators, company
+exercises. Kattis is primarily used by programming contest organizers, company
 recruiters and also some universities.


@ -386,7 +386,7 @@ recruiters and also some universities.

 ## ReCodEx goals

-@todo: improve and extend this chapter - analysis of user requrements and way we
+@todo: improve and extend this chapter - analysis of user requirements and way we
 solve them; exercise is a template for assignment, users are in groups, what is
 group, how points are assigned for solutions, ...

@ -446,8 +446,9 @@ notable features are following:
 - which problems are they? ... these ones below:
 - what type of users there should be, why they are needed
 - explain why there is exercise and assignment division, what means what and how they are used
- explain instances why they are usefull what they solve and also discuss licences concept
- groups, they can be public and private and why is that, what it solves, explain amd discuss treshold and other group features
+- explain instances why they are useful what they solve and also discuss licenses concept
+- groups, they can be public and private and why is that, what it solves,
+  explain and discuss threshold and other group features
 - extended execution pipeline (not just compilation/execution/evaluation) and why it is needed
 - progress state, how it can be done and displayed to user, why random messages
 - how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why
@ -500,7 +501,7 @@ which will execute jobs and component which will distribute jobs to the
 instances of the first one. This ensures scalability in manner of parallel
 execution of numerous jobs which is exactly what is needed. Implementation of
 these services are called **broker** and **worker**, first one handles
-distribution, latter execution. These components should be enough to fulfil all
+distribution, latter execution. These components should be enough to fulfill all
 above said, but for the sake of simplicity and better communication gateways
 with frontend two other components were added, **fileserver** and **monitor**.
 Fileserver is simple component whose purpose is to store files which are
@ -556,7 +557,7 @@ protocol between these two logical parts will be described as well.
 One of the bigger requests for the new system is to support a complex
 configuration of execution pipeline. The idea comes from lecturers of Compiler
 principles class who want to migrate their semi-manual evaluation process to
-CodEx. Unfortunately, CodEx is not capable of such compilicated exercise setup.
+CodEx. Unfortunately, CodEx is not capable of such complicated exercise setup.
 None of evaluation systems we found is can handle such task, so design from
 scratch is needed.

@ -578,18 +579,18 @@ systems it is better to implement reasonable subset of operations directly
 without calling system provided binaries. These operations are copy file, create
 new directory, extract archive and so on, altogether called internal tasks.
 Another benefit from custom implementation of these tasks is guarantied safety,
-so no sandbox needs to be used as in exernal tasks case.
+so no sandbox needs to be used as in external tasks case.

-For a job evaluation, the tasks needs to be executed sequentialy in a specified
+For a job evaluation, the tasks needs to be executed sequentially in a specified
 order. The idea of running independent tasks in parallel is bad because exact
-time measurement needs controled environment on target computer with
-minimalization of interrupts by other processes. It seems that connecting tasks
+time measurement needs controlled environment on target computer with
+minimization of interrupts by other processes. It seems that connecting tasks
 into directed acyclic graph (DAG) can handle all possible problem cases. None of
 the authors, supervisors and involved faculty staff can think of a problem that
 cannot be decomposed into tasks connected in a DAG. The goal of evaluation is
 to satisfy as many tasks as possible. During execution there are sometimes
 multiple choices of next task. To control that, each task can have a priority,
-which is used as a secondary ordering criterium. For better understanding, here
+which is used as a secondary ordering criterion. For better understanding, here
 is a small example.

 ![Task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png)
@ -614,7 +615,7 @@ reasonable, to keep this piece of information alongside the tasks in job
 configuration, so each task can have a label about its purpose. Unlabeled tasks
 have an internal type _inner_. There are four categories of tasks:

- _initiation_ -- setting up the environment, compilling code, etc.; for users
+- _initiation_ -- setting up the environment, compiling code, etc.; for users
  failure means error in their sources which are not compatible with running it
  with examination data
 - _execution_ -- running the user code with examination data, must not exceed
@ -625,11 +626,11 @@ have an internal type _inner_. There are four categories of tasks:
 - _inner_ -- no special meaning for frontend, technical tasks for fetching and
  copying files, creating directories, etc.

-Each job is composed of multiple tasks of these types which are semanticaly
-grupped into tests. A test can represent one set of examination data for user
-code. To mark the groupping, another task label can be used. Each test must have
+Each job is composed of multiple tasks of these types which are semantically
+grouped into tests. A test can represent one set of examination data for user
+code. To mark the grouping, another task label can be used. Each test must have
 exactly one _evaluation_ task (to show success or failure to users) and
-arbitraty number of tasks with other types.
+arbitrary number of tasks with other types.


 ## Implementation analysis
@ -666,7 +667,7 @@ messages even during execution. So worker has to be divided into two separate
 parts, the one which will handle communication with broker and the another which
 will execute jobs. The easiest solution is to have these parts in separate
 threads which somehow tightly communicates with each other. For inner process
-commucation there can be used numerous technologies, from shared memory to
+communication there can be used numerous technologies, from shared memory to
 condition variables or some kind of in-process messages. Already used library
 ZeroMQ is possible to provide in-process messages working on the same principles
 as network communication which is quite handy and solves problems with threads
@ -674,11 +675,22 @@ synchronization and such.

 At this point we have worker with two internal parts listening one and execution one. Implementation of first one is quite straighforward and clear. So lets discuss what should be happening in execution subsystem. Jobs as work units can quite vary and do completely different things, that means configuration and worker has to be prepared for this kind of generality. Configuration and its solution was already discussed above, implementation in worker is then quite also quite straightforward. Worker has internal structures to which loads and which stores metadata given in configuration. Whole job is mapped to job metadata structure and tasks are mapped to either external ones or internal ones (internal commands has to be defined within worker), both are different whether they are executed in sandbox or as internal worker commands.

-Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails. There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overally as failed. User should not be punished for bad configuration or some network error. This is where task types are usefull. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.
+Another division of tasks is by task-type field in configuration. This field can have four values: initiation, execution, evaluation and inner. All was discussed and described above in configuration analysis. What is important to worker is how to behave if execution of task with some particular type fails. There are two possible situations execution fails due to bad user solution or due to some internal error. If execution fails on internal error solution cannot be declared overly as failed. User should not be punished for bad configuration or some network error. This is where task types are useful. Generally initiation, execution and evaluation are tasks which are somehow executing code which was given by users who submitted solution of exercise. If this kinds of tasks fail it is probably connected with bad user solution and can be evaluated. But if some inner task fails solution should be re-executed, in best case scenario on different worker. That is why if inner task fails it is sent back to broker which will reassign job to another worker. More on this subject should be discussed in broker assigning algorithms section.

 There is also question about working directory or directories of job, which directories should be used and what for. There is one simple answer on this every job will have only one specified directory which will contain every file with which worker will work in the scope of whole job execution. This is of course nonsense there has to be some logical division. The least which must be done are two folders one for internal temporary files and second one for evaluation. The directory for temporary files is enough to comprehend all kind of internal work with filesystem but only one directory for whole evaluation is somehow not enough. Users solutions are downloaded in form of zip archives so why these should be present during execution or why the results and files which should be uploaded back to fileserver should be cherry picked from the one big directory? The answer is of course another logical division into subfolders. The solution which was chosen at the end is to have folders for downloaded archive, decompressed solution, evaluation directory in which user solution is executed and then folders for temporary files and for results and generally files which should be uploaded back to fileserver with solution results. Of course there has to be hierarchy which separate folders from different workers on the same machines. That is why paths to directories are in format: ${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID} where default means default working directory of whole worker, folder is particular directory for some purpose (archives, evaluation...). Mentioned division of job directories proved to be flexible and detailed enough, everything is in logical units and where it is supposed to be which means that searching through this system should be easy. In addition if solutions of users have access only to evaluation directory then they do not have access to unnecessary files which is better for overall security of whole ReCodEx.

-As we discovered above worker has job directories but users who are writing and managing job configurations do not know where they are (on some particular worker) and how they can be accessed and written into configuration. For this kind of task we have to introduce some kind of marks or signs which will represent particular folders. Marks or signs can have form of some kind of special strings which can be called variables. These variables then can be used everywhere where filesystems paths are used within configuration file. This will solve problem with specific worker environment and specific hierarchy of directories. Final form of variables is ${...} where triple dot is textual description. This format was used because of special dolar sign character which cannot be used within filesystem path, braces are there only to border textual description of variable.
+As we discovered above worker has job directories but users who are writing and
+managing job configurations do not know where they are (on some particular
+worker) and how they can be accessed and written into configuration. For this
+kind of task we have to introduce some kind of marks or signs which will
+represent particular folders. Marks or signs can have form of some kind of
+special strings which can be called variables. These variables then can be used
+everywhere where filesystems paths are used within configuration file. This will
+solve problem with specific worker environment and specific hierarchy of
+directories. Final form of variables is ${...} where triple dot is textual
+description. This format was used because of special dollar sign character which
+cannot be used within filesystem path, braces are there only to border textual
+description of variable.

 #### Evaluation

@ -691,7 +703,7 @@ Interesting problem is with supplementary files (inputs, sample outputs). There
 As described in fileserver section stored supplementary files have special
 filenames which reflects hashes of their content. As such there are no
 duplicates stored in fileserver. Worker can use feature too and caches these
-files for some while and saves precious bandwith. This means there has to be
+files for some while and saves precious bandwidth. This means there has to be
 system which can download file, store it in cache and after some time of
 inactivity delete it. Because there can be multiple worker instances on some
 particular server it is not efficient to have this system in every worker on its
@ -712,22 +724,34 @@ system.

 Cleaner as mentioned is simple script which is executed regularly as cron job. If there is caching system like it was introduced in paragraph above there are little possibilities how cleaner should be implemented. On various filesystems there is usually support for two  particular timestamps, `last access time` and `last modification time`. Files in cache are once downloaded and then just copied, this means that last modification time is set only once on creation of file and last access time should be set every time on copy. This imply last access time is what is needed here. But last modification time is widely used by operating systems, on the other hand last access time is not by default. More on this subject can be found [here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime). For proper cleaner functionality filesystem which is used by worker for caching has to have last access time for files enabled.

-Having cleaner as separated component and caching itself handled in worker is kind of blury and is not clearly observable that it works without any race conditions. The goal here is not to have system without races but to have system which can recover from them. Implementation of caching system is based upon atomic operations of underlying filesystem. Follows description of one possible robust implementation. First start with worker implementation:
+Having cleaner as separated component and caching itself handled in worker is
+kind of blurry and is not clearly observable that it works without any race
+conditions. The goal here is not to have system without races but to have system
+which can recover from them. Implementation of caching system is based upon
+atomic operations of underlying filesystem. Follows description of one possible
+robust implementation. First start with worker implementation:

 - worker discovers fetch task which should download supplementary file
- worker takes name of file and tries to copy it from cache folder to its working folder
-    - if successful then last access time should be rewritten (by filesystem itself) and whole operation is done
+- worker takes name of file and tries to copy it from cache folder to its
+  working folder
+	- if successful then last access time should be rewritten (by filesystem
+	  itself) and whole operation is done
 	- if not successful then file has to be downloaded
 		- file is downloaded from fileserver to working folder
 		- downloaded file is then copied to cache

-Previous implementation is only within worker, cleaner can anytime intervene and delete files. Implementation in cleaner follows:
+Previous implementation is only within worker, cleaner can anytime intervene and
+delete files. Implementation in cleaner follows:

- cleaner on its start stores current reference timestamp which will be used for comparision and load configuration values of caching folder and maximal file age
- there is a loop going through all files and even directories in specified cache folder
+- cleaner on its start stores current reference timestamp which will be used for
+  comparison and load configuration values of caching folder and maximal file
+  age
+- there is a loop going through all files and even directories in specified
+  cache folder
 	- last access time of file or folder is detected
 	- last access time is subtracted from reference timestamp into difference
-    - difference is compared against specified maximal file age, if difference is greater, file or folder is deleted
+	- difference is compared against specified maximal file age, if difference
+	  is greater, file or folder is deleted

 Previous description implies that there is gap between detection of last access time and deleting file within cleaner. In the gap there can be worker which will access file and the file is anyway deleted but this is fine, file is deleted but worker has it copied. Another problem can be with two workers downloading the same file, but this is also not a problem file is firstly downloaded to working folder and after that copied to cache. And even if something else unexpectedly fails and because of that fetch task will fail during execution even that should be fine. Because fetch tasks should have 'inner' task type which implies that fail in this task will stop all execution and job will be reassigned to another worker. It should be like the last salvation in case everything else goes wrong.

@ -749,7 +773,7 @@ Previous description implies that there is gap between detection of last access

 Users want to view real time evaluation progress of their solution. It can be
 easily done with established double-sided connection stream, but it is hard to
-achive with web technologies. HTTP protocol works differently on separate
+achieve with web technologies. HTTP protocol works differently on separate
 requests basis with no longterm connection. However, there is widely used
 technology to solve this problem, WebSocket protocol.

@ -761,7 +785,7 @@ surface for possible attacks. With this in mind, there are two possible options:
 - make separate component for progress messages

 Each of the two possibilities has some pros and cons. The first one is good
-beacuse there is no additional component and API is already publicly visible. On
+because there is no additional component and API is already publicly visible. On
 the other side, working with WebSocket protocol from PHP is not much pleasant
 (but it is possible) and embedding this functionality into API is not
 extendable. The second approach is better for future changing the protocol or
@ -784,7 +808,7 @@ following picture.
 ![Message flow inside montior](https://raw.githubusercontent.com/ReCodEx/wiki/master/images/Monitor_arch.png)

 The message channel inputing the monitor uses ZeroMQ as main message framework
-used by backend. This decission keeps rest of backend avare of used
+used by backend. This decision keeps rest of backend avare of used
 communication protocol and related libraries. Output channel is WebSocket as a
 protocol for sending messages to web browsers. In Python, there are several
 WebSocket libraries. The most popular one is `websockets` in cooperation with
@ -792,7 +816,7 @@ WebSocket libraries. The most popular one is `websockets` in cooperation with
 monitor component too. For ZeroMQ, there is `zmq` library with binding to
 framework core in C++.

-Incomming messages are cached for short period of time. Early testing shows,
+Incoming messages are cached for short period of time. Early testing shows,
 that backend can start sending progress messages sooner than client connects to
 the monitor. To solve this, messages for each job are hold 5 minutes after
 reception of last message. The client gets all already received messages at time
@ -816,7 +840,7 @@ client-server architecture. There are several options:
  a standard.
 - *HTTP protocol* -- The HTTP protocol is a state-less protocol implemented on
  top of the TCP protocol.  The communication between the client and server
-  consists of a requests sent by the client and reponses to these requests sent
+  consists of a requests sent by the client and responses to these requests sent
  back by the sever. The client can send as many requests as needed and it may
  ignore the responses from the server, but the server must respond only to the
  requests of the client and it cannot initiate communication on its own.
@ -858,7 +882,8 @@ We considered several technologies which could be used:
  Linux servers (ASP.NET using the .NET Core).
 - JavaScript (Node.js) -- it is a quite new technology and it is being used to
  create REST APIs lately.  Applications running on Node.js are quite performant
-  and the number of open-source libraries avialble on the Internet is very huge.
+  and the number of open-source libraries available on the Internet is very
+  huge.

 We chose PHP and Apache mainly because we were familiar with these technologies
 and we were able to develop all the features we needed without learning to use a
@ -879,7 +904,7 @@ framework is very common in the Czech Republic -- its main developer is a
 well-known Czech programmer David Grudl -- and we were already familiar with the
 patterns used in this framework (e.g., dependency injection, authentication,
 routing). There is a good extension for the Nette framework which makes usage of
-Doctrine 2 very straighforward.
+Doctrine 2 very straightforward.

@todo: what database can be used, how it is mapped and used within code