typos, grammar, wording

8 years ago · ebb1892c74
parent 4ffae99721
commit ebb1892c74
1 changed files with 137 additions and 124 deletions
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@ -487,7 +487,7 @@ everybody seems satisfied with this fact. There are other communicating channels
 most programmers use, such as e-mail or git, but they are inappropriate for
 designing user interfaces on top of them.
-The application interacts with users. From the project assignment it is clear,
+The application interacts with users. From the project assignment it is clear
 that the system has to keep personalized data about users and adapt presented
 content according to this knowledge. User data cannot be publicly visible, which
 implies necessity of user authentication. The application also has to support
@ -542,16 +542,16 @@ ReCodEx it is possible to offer hosted environment as a service to other
 subjects. 
 The concept we came up with is based on user and group separation inside the
-system. There are multiple _instances_ in the system, which means unit of
+system. The system is divided into multiple separated units called _instances_.
-separation. Each instance has own set of users and groups, exercises can be
+Each instance has own set of users and groups. Exercises can be optionally
-optionally shared. Evaluation backend is common for all instances. To keep track
+shared. The rest of the system (API server and evaluation backend) is shared
-of active instances and paying customers, each instance must have a valid
+between the instances. To keep track of active instances and paying customers,
-_licence_ to allow users submit their solutions. licence is granted for defined
+each instance must have a valid _licence_ to allow users submit their solutions.
-period of time and can be revoked in advance if the subject do not keep approved
+licence is granted for a definite period of time and can be revoked in advance
-terms and conditions.
+if the subject does not conform with approved terms and conditions.
-
+
-The primary task of the system is to evaluate programming exercises. The
+The primary task of the system is to evaluate programming exercises. An
-exercise is quite similar to homework assignment during school labs. When a
+exercise is quite similar to a homework assignment during school labs. When a
 homework is assigned, two things are important to know for users:
 - description of the problem
@ -560,11 +560,11 @@ homework is assigned, two things are important to know for users:
 To reflect this idea teachers and students are already familiar with, we decided
 to keep separation between problem itself (_exercise_) and its _assignment_.
 Exercise only describes one problem and provides testing data with description
-of how to evaluate it. In fact, it is template for assignments.  Assignment then
+of how to evaluate it. In fact, it is a template for assignments.  Assignment
-contains data from its exercise and additional metadata, which can be different
+then contains data from its exercise and additional metadata, which can be
-for every assignment of the same exercise. This separation is natural for all
+different for every assignment of the same exercise. This separation is natural
-users, in CodEx it is implemented in similar way and no other considerable
+for all users, in CodEx it is implemented in similar way and no other
-solution was found.
+considerable solution was found.
 ### Evaluation unit executed by ReCodEx
@ -577,36 +577,41 @@ scratch is needed.
 There are two main approaches to design a complex execution configuration. It
 can be composed of small amount of relatively big components or much more small
-tasks. Big components are easy to write and whole configuration is reasonably
+tasks. Big components are easy to write and help keeping the configuration
-small. The components are designed for current problems, so it is not scalable
+reasonably small. However, these components are designed for current problems
-enough for pleasant future usage. This can be solved by introducing small set of
+and they might not hold well against future requirements. This can be solved by
-single-purposed tasks which can be composed together. The whole configuration is
+introducing a small set of single-purposed tasks which can be composed together.
-then quite bigger, but with great adaptation ability for new conditions and also
+The whole configuration becomes bigger, but more flexible for new conditions.
-less amount of work programming them. For better user experience, configuration
+Moreover, they will not require as much programming effort as bigger evaluation
-generators for some common cases can be introduced.
+units. For better user experience, configuration generators for some common
-
+cases can be introduced.
-ReCodEx target is to be continuously developed and used for many years, so the
+
-smaller tasks are the right choice. Observation of CodEx system shows that
+A goal of ReCodEx is to be continuously developed and used for many years.
-only a few tasks are needed. In extreme case, only one task is enough -- execute
+Therefore, we chose to use smaller tasks, because this approach is better for
-a binary. However, for better portability of configurations along different
+future extensibility. Observation of CodEx system shows that only a few tasks
-systems it is better to implement reasonable subset of operations directly
+are needed. In an extreme case, only one task is enough -- execute a binary.
-without calling system provided binaries. These operations are copy file, create
+However, for better portability of configurations between different systems it
-new directory, extract archive and so on, altogether called internal tasks.
+is better to implement a reasonable subset of operations ourselves without
-Another benefit from custom implementation of these tasks is guarantied safety,
+calling binaries provided by the system directly. These operations are copy
-so no sandbox needs to be used as in external tasks case.
+file, create new directory, extract archive and so on, altogether called
-
+internal tasks. Another benefit from custom implementation of these tasks is
-For a job evaluation, the tasks needs to be executed sequentially in a specified
+guarantied safety, so no sandbox needs to be used as in external tasks case.
-order. The idea of running independent tasks in parallel is bad because exact
+
-time measurement needs controlled environment on target computer with
+For a job evaluation, the tasks need to be executed sequentially in a specified
-minimization of interrupts by other processes. It would be possible to run tasks
+order. Running independent tasks is possible, but there are complications --
-which does not need exact time measuremet in parallel, but in this case a
+exact time measurement requires a controlled environment with as few
 interruptions as possible from other processes. It would be possible to run
 tasks that do not need exact time measuremet in parallel, but in this case a
 synchronization mechanism has to be developed to exclude paralellism for
 measured tasks. Usually, there are about four times more unmeasured tasks than
-tasks with time measurement, but measured tasks tends to be much longer. With
+tasks with time measurement, but measured tasks tend to be much longer. With
 [Amdahl's law](https://en.wikipedia.org/wiki/Amdahl's_law) in mind, the
-parallelism seems not to provide a huge benefit in overall execution speed and
+parallelism does not seem to provide a notable benefit in overall execution
-brings troubles with synchronization. However, it there will be speed issues,
+speed and brings trouble with synchronization. Moreover, most of the internal
-this approach could be reconsiderred.
+tasks are also limited by IO speed (most notably copying and downloading files
 and reading archives). However, if there are performance issues, this approach
 could be reconsiderred, along with using a ram disk for storing supplementary
 files.
 It seems that connecting tasks into directed acyclic graph (DAG) can handle all
 possible problem cases. None of the authors, supervisors and involved faculty
@ -618,7 +623,7 @@ For better understanding, here is a small example.
 ![Task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png)
-The _job root_ task is imaginary single starting point of each job. When the
+The _job root_ task is an imaginary single starting point of each job. When the
 _CompileA_ task is finished, the _RunAA_ task is started (or _RunAB_, but should
 be deterministic by position in configuration file -- tasks stated earlier
 should be executed earlier). The task priorities guaranties, that after
@ -634,13 +639,13 @@ clean the big temporary file and proceed with following test. If there is an
 ambiguity in task ordering at this point, they are executed in order of input
 task configuration.
-The total linear ordering of tasks can be done easier with just executing them
+The total linear ordering of tasks can be made easier with just executing them
-in order of input configuration. But this structure cannot handle well cases,
+in order of input configuration. But this structure cannot handle cases, when a
-when a task fails. There is not a easy and nice way how to tell which task
+task fails very well. There is no easy way of telling which task should be
-should be executed next. However, this issue can be solved with graph structured
+executed next. However, this issue can be solved with graph structured
 dependencies of the tasks. In graph structure, it is clear that all dependent
-tasks has to be skipped and continue execution with a non related task. This is
+tasks have to be skipped and execution must be resumed with a non related task.
-the main reason, why the tasks are connected in a DAG.
+This is the main reason, why the tasks are connected in a DAG.
 For grading there are several important tasks. First, tasks executing submitted
 code need to be checked for time and memory limits. Second, outputs of judging
@ -719,11 +724,11 @@ them.
 To avoid assigning points for insufficient solutions (like only printing "File
 error" which is the valid answer in two tests), a minimal point threshold can be
-specified. It the solution is to get less points than specified, it will get
+specified. If the solution is to get less points than specified, it will get
 zero points instead. This functionality can be embedded into grading computation
 algoritm itself, but it would have to be present in each implementation
-separately, which is not maintainable. Because of this the the threshold feature
+separately, which is not maintainable. Because of this the threshold feature is
-is separated from point computation.
+separated from score computation.
 Automatic grading cannot reflect all aspects of submitted code. For example,
 structuring the code, number and quality of comments and so on. To allow
@ -744,16 +749,16 @@ previous chapter, there are also text or binary outputs of the executed tasks.
 Knowing them helps users identify and solve their potential issues, but on the
 other hand this can lead to possibility of leaking input data. This may lead
 students to hack their solutions to pass just the ReCodEx testing cases instead
-of properly solving the assigned problem. The usual approach is to keep these
+of properly solving the assigned problem. The usual approach is to keep this
-information private and so does strongly recommended Martin Mareš, who has
+information private. This was also strongly recommended by Martin Mareš, who has
 experience with several programming contests.
-The only one exception of hiding the logs are compilation outputs, which can
+The only one exception from hiding the logs are compilation outputs, which can
-help students a lot during troubleshooting and there is only small possibility
+help students a lot during troubleshooting and there is only a small possibility
 of input data leakage. The supervisors have access to all of the logs and they
 can decide if students are allowed to see the compilation outputs.
-Note, that due to lack of frontend developers, showing compilation logs to the
+Note that due to lack of frontend developers, showing compilation logs to the
 students is not implemented in the very first release of ReCodEx.
 ### Persistence
@ -768,29 +773,29 @@ factor. There are several ways how to save structured data:
 - relational database
 Another important factor is amount and size of stored data. Our guess is about
-1000 users, 100 exercises, 200 assignments per year and 200000 unique solutions
+1000 users, 100 exercises, 200 assignments per year and 20000 unique solutions
 per year. The data are mostly structured and there are a lot of them with the
 same format. For example, there is a thousand of users and each one has the same
-values -- name, email, age, etc. These kind of data are relatively small, name
+values -- name, email, age, etc. These data items are relatively small, name
 and email are short strings, age is an integer. Considering this, relational
-databases or formatted plain files (CSV for example) fits best for them.
+databases or formatted plain files (CSV for example) fit best for them.
-However, the data often have to support find operation, so they have to be
+However, the data often have to support searching, so they have to be
-sorted and allow random access for resolving cross references. Also, addition a
+sorted and allow random access for resolving cross references. Also, addition
-deletion of entries should take reasonable time (at most logarithmic time
+and deletion of entries should take reasonable time (at most logarithmic time
 complexity to number of saved values). This practically excludes plain files, so
-relational database is used instead.
+we decided to use a relational database.
-
+
-On the other hand, there are some data with no such great structure and much
+On the other hand, there is data with basically no structure and much larger
-larger size. These can be evaluation logs, sample input files for exercises or
+size. These can be evaluation logs, sample input files for exercises or sources
-submitted sources by students. Saving this kind of data into relational database
+submitted by students. Saving this kind of data into a relational database is
-is not suitable, but it is better to keep them as ordinary files or store them
+not appropriate. It is better to keep them as ordinary files or store them in
-into some kind of NoSQL database. Since they are already files and does not need
+some kind of NoSQL database. Since they are already files and do not need to be
-to be backed up in multiple copies, it is easier to keep them as ordinary files
+backed up in multiple copies, it is easier to keep them as ordinary files in the
-in filesystem. Also, this solution is more lightweight and does not require
+filesystem. Also, this solution is more lightweight and does not require
-additional dependencies on third-party software. File can be identified using
+additional dependencies on third-party software. Files can be identified using
-its filesystem path or unique index stored as value in relational database. Both
+their filesystem paths or a unique index stored as a value in a relational
-approaches are equally good, final decision depends on actual case.
+database. Both approaches are equally good, final decision depends on the actual
-
+implementation.
 ## Structure of the project
@ -804,7 +809,7 @@ working as expected.
 ### Backend
-Backend is the part which is responsible solely for the process of evaluation
+Backend is the part which is responsible solely for the process of evaluating
 a solution of an exercise. Each evaluation of a solution is referred to as a
 *job*. For each job, the system expects a configuration document of the job,
 supplementary files for the exercise (e.g., test inputs, expected outputs,
@ -814,40 +819,46 @@ job, such as a specific runtime environment, specific version of a compiler or
 the job must be evaluated on a processor with a specific number of cores. The
 backend infrastructure decides whether it will accept a job or decline it based
 on the specified requirements. In case it accepts the job, it will be placed in
-a queue and it will be processed as soon as possible. The backend publishes the
+a queue and it will be processed as soon as possible. 
-progress of processing of the queued jobs and the results of the evaluations can
+
-be queried after the job processing is finished. The backend produces a log of
+The backend publishes the progress of processing of the queued jobs and the
-the evaluation which can be used for further score calculation or debugging.
+results of the evaluations can be queried after the job processing is finished.
 The backend produces a log of the evaluation which can be used for further score
 calculation or debugging.
 To make the backend scalable, there are two necessary components -- the one
 which will execute jobs and the other which will distribute jobs to the
 instances of the first one. This ensures scalability in manner of parallel
-execution of numerous jobs. Implementation of these services are called
+execution of numerous jobs which is exactly what is needed. Implementation of
-**broker** and **worker**, the first one handles distribution, the latter one
+these services are called **broker** and **worker**, first one handles
-execution. These components could handle the whole evaluation process, but for
+distribution, the other one handles execution. 
-cleaner design and better communication gateways with frontend two other
+
-components were added, **fileserver** and **monitor**. Fileserver is simple
+These components should be enough to fulfill all tasks mentioned above, but for
-component whose purpose is to store files which are exchanged between frontend
+the sake of simplicity and better communication, gateways with frontend two
-and backend. Monitor is a simple service which is able to serve job progress
+other components were added -- **fileserver** and **monitor**. Fileserver is a
-state from worker to web application. These two additional components are on
+simple component whose purpose is to store files which are exchanged between
-the edge of frontend and backend (like gateways) but logically they are more
+frontend and backend. Monitor is also quite a simple service which is able to
-connected with backend, so it is considered they belong there.
+forward job progress data from worker to web application. These two additional
 services are at the border between frontend and backend (like gateways) but
 logically they are more connected with backend, so it is considered they belong
 there.
 ### Frontend
-Frontend on the other hand is responsible for providing users a convenient
+Frontend on the other hand is responsible for providing users with convenient
 access to the backend infrastructure and interpreting raw data from backend
-evaluation. There are two main purposes of frontend -- holding the state of
+evaluation. 
-whole system (database of users, exercises, solutions, points, etc.) and
+
-presenting the state to users through some kind of an user interface (e.g., a
+There are two main purposes of the frontend -- holding the state of the whole
-web application, mobile application, or a command-line tool). According to
+system (database of users, exercises, solutions, points, etc.) and presenting
-contemporary trends in development of frontend parts of applications, we
+the state to users through some kind of a user interface (e.g., a web
-decided to split the frontend in two logical parts -- a server side and a
+application, mobile application, or a command-line tool). According to
-client side. The server side is responsible for managing the state and the
+contemporary trends in development of frontend parts of applications, we decided
-client side gives instructions to the server side based on the inputs from the
+to split the frontend in two logical parts -- a server side and a client side.
-user. This decoupling gives us the ability to create multiple client side tools
+The server side is responsible for managing the state and the client side gives
-which may address different needs of the users with preserving single server
+instructions to the server side based on the inputs from the user. This
-side component.
+decoupling gives us the ability to create multiple client side tools which may
 address different needs of the users.
 The frontend developed as part of this project is a web application created with
 the needs of the Faculty of Mathematics and Physics of the Charles university in
@ -870,7 +881,7 @@ fully accurate.
 ![Overall architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png)
-In the latter parts of the documentation, both of the backend and frontend parts
+In the following parts of the documentation, both the backend and frontend parts
 will be introduced separately and covered in more detail. The communication
 protocol between these two logical parts will be described as well.
@ -935,15 +946,16 @@ However, all of the three options would have been possible to use.
 ### File transfers
-There has to be a way to access files stored on the fileserver from both worker
+There has to be a way to access files stored on the fileserver (and also upload
-and frontend server machines. The protocol used for this should handle large
+them )from both worker and frontend server machines. The protocol used for this
-files efficiently and be resilient to network failures. Security features are
+should handle large files efficiently and be resilient to network failures.
-not a primary concern, because all communication with the fileserver will happen
+Security features are not a primary concern, because all communication with the
-in an internal network. However, a basic form of authentication can be useful to
+fileserver will happen in an internal network. However, a basic form of
-ensure correct configuration (if a development fileserver uses different
+authentication can be useful to ensure correct configuration (if a development
-credentials than production, production workers will not be able to use it by
+fileserver uses different credentials than production, production workers will
-accident). Lastly, the protocol must have a client library for platforms
+not be able to use it by accident). Lastly, the protocol must have a client
-(languages) used in the backend. We will present some of the possible options:
+library for platforms (languages) used in the backend. We will present some of
 the possible options:
 - HTTP(S) -- a de-facto standard for web communication that has far more
  features than just file transfers. Thanks to being used on the web, a large
@ -1110,15 +1122,15 @@ services, for example via HTTP.
 ### Worker
-Worker is component which is supposed to execute incoming jobs from broker. As
+Worker is a component which is supposed to execute incoming jobs from broker. As
 such worker should work and support wide range of different infrastructures and
 maybe even platforms/operating systems. Support of at least two main operating
 systems is desirable and should be implemented.
-Worker as a service does not have to be much complicated, but a bit of complex
+Worker as a service does not have to be very complicated, but a bit of complex
 behaviour is needed. Mentioned complexity is almost exclusively concerned about
 robust communication with broker which has to be regularly checked. Ping
-mechanism is usually used for this in all kind of projects. This means that
+mechanism is usually used for this in all kind of projects. This means that the
 worker should be able to send ping messages even during execution. So worker has
 to be divided into two separate parts, the one which will handle communication
 with broker and the another which will execute jobs.
@ -1126,9 +1138,9 @@ with broker and the another which will execute jobs.
 The easiest solution is to have these parts in separate threads which somehow
 tightly communicates with each other. For inter process communication there can
 be used numerous technologies, from shared memory to condition variables or some
-kind of in-process messages. Already used library ZeroMQ is possible to provide
+kind of in-process messages. The ZeroMQ library which we already use provides
-in-process messages working on the same principles as network communication
+in-process messages that work on the same principles as network communication,
-which is quite handy and solves problems with threads synchronization and such.
+which is convenient and solves problems with thread synchronization.
 #### Evaluation
@ -1629,13 +1641,14 @@ implemented in some of the next releases.
 ### The WebApp
-The web application ("WebApp") is one of the possible client applications of the ReCodEx
+The web application ("WebApp") is one of the possible client applications of the
-system. Creating a web application as the first client application has several advantages:
+ReCodEx system. Creating a web application as the first client application has
 several advantages:
 - no installation or setup is required on the user's device
 - works on all platforms including mobile devices
- when a new version is rolled out all the clients will use this version without
+- when a new version is released, all the clients will use this version without
-any need for manula instalation of the update
+any need for manual instalation of the update
 One of the downsides is the large number of different web browsers (including
 the older versions of a specific browser) and their different interpretation