Divide monolithic paragraphs into small ones

master
Martin Polanka 8 years ago
parent 56f4298f15
commit d92799206c

@ -672,12 +672,13 @@ execution and evaluation as a lot of evaluation systems are already providing.
However the ReCodEx have more advanced execution pipeline where there can be However the ReCodEx have more advanced execution pipeline where there can be
more compilations or more executions per test and also other technical tasks more compilations or more executions per test and also other technical tasks
controlling the job execution flow. The users do not know about these technical controlling the job execution flow. The users do not know about these technical
details and data from this tasks may confuse them. A solution is to show users details and data from this tasks may confuse them.
only percentual completion of the job as a plain progress bar without additional
information about task types. This solution works well for all of the jobs and A solution is to show users only percentual completion of the job as a plain
is very user friendly. To make the output more interesting, there is a database progress bar without additional information about task types. This solution
of random kind-of-funny statements and a random new one is displayed every time works well for all of the jobs and is very user friendly. To make the output
a task is completed. more interesting, there is a database of random kind-of-funny statements and a
random new one is displayed every time a task is completed.
### Results of evaluation ### Results of evaluation
@ -685,6 +686,8 @@ There are lot of things which deserves discussion concerning results of
evaluation, how they should be displayed, what should be visible or not and also evaluation, how they should be displayed, what should be visible or not and also
what kind of reward for users solutions should be chosen. what kind of reward for users solutions should be chosen.
#### Evaluation outputs
At first let us focus on all kinds of outputs from executed programs within job. At first let us focus on all kinds of outputs from executed programs within job.
Out of discussion is that supervisors should be able to view almost all outputs Out of discussion is that supervisors should be able to view almost all outputs
from solutions if they choose them to be visible and recorded. This feature is from solutions if they choose them to be visible and recorded. This feature is
@ -704,6 +707,8 @@ be visible unless the supervisor decides otherwise. Note, that due to lack of
frontend developers, this feature was not implemented in the very first release frontend developers, this feature was not implemented in the very first release
of ReCodEx, but will be definitely available in the future. of ReCodEx, but will be definitely available in the future.
#### Scoring and assigning points
The overall concept of grading solutions was presented earlier. To briefly The overall concept of grading solutions was presented earlier. To briefly
remind that, backend returns only exact measured values (used time and memory, remind that, backend returns only exact measured values (used time and memory,
return code of the judging task, ...) and on top of that one value is computed. return code of the judging task, ...) and on top of that one value is computed.
@ -1059,114 +1064,132 @@ services, for example via HTTP.
Worker is component which is supposed to execute incoming jobs from broker. As Worker is component which is supposed to execute incoming jobs from broker. As
such worker should work and support wide range of different infrastructures and such worker should work and support wide range of different infrastructures and
maybe even platforms/operating systems. Support of at least two main operating maybe even platforms/operating systems. Support of at least two main operating
systems is desirable and should be implemented. Worker as a service does not systems is desirable and should be implemented.
have to be much complicated, but a bit of complex behaviour is needed. Mentioned
complexity is almost exclusively concerned about robust communication with Worker as a service does not have to be much complicated, but a bit of complex
broker which has to be regularly checked. Ping mechanism is usually used for behaviour is needed. Mentioned complexity is almost exclusively concerned about
this in all kind of projects. This means that worker should be able to send ping robust communication with broker which has to be regularly checked. Ping
messages even during execution. So worker has to be divided into two separate mechanism is usually used for this in all kind of projects. This means that
parts, the one which will handle communication with broker and the another which worker should be able to send ping messages even during execution. So worker has
will execute jobs. The easiest solution is to have these parts in separate to be divided into two separate parts, the one which will handle communication
threads which somehow tightly communicates with each other. For inter process with broker and the another which will execute jobs.
communication there can be used numerous technologies, from shared memory to
condition variables or some kind of in-process messages. Already used library The easiest solution is to have these parts in separate threads which somehow
ZeroMQ is possible to provide in-process messages working on the same principles tightly communicates with each other. For inter process communication there can
as network communication which is quite handy and solves problems with threads be used numerous technologies, from shared memory to condition variables or some
synchronization and such. kind of in-process messages. Already used library ZeroMQ is possible to provide
in-process messages working on the same principles as network communication
which is quite handy and solves problems with threads synchronization and such.
#### Evaluation
At this point we have worker with two internal parts listening one and execution At this point we have worker with two internal parts listening one and execution
one. Implementation of first one is quite straightforward and clear. So lets one. Implementation of first one is quite straightforward and clear. So lets
discuss what should be happening in execution subsystem. Jobs as work units can discuss what should be happening in execution subsystem.
quite vary and do completely different things, that means configuration and
worker has to be prepared for this kind of generality. Configuration and its After successful arrival of job, worker has to prepare new execution
solution was already discussed above, implementation in worker is then quite environment, then solution archive has to be downloaded from fileserver and
also quite straightforward. Worker has internal structures to which loads and extracted. Job configuration is located within these files and loaded into
which stores metadata given in configuration. Whole job is mapped to job internal structures and executed. After that, results are uploaded back to
metadata structure and tasks are mapped to either external ones or internal ones fileserver. These steps are the basic ones which are really necessary for whole
(internal commands has to be defined within worker), both are different whether execution and have to be executed in this precise order.
they are executed in sandbox or as internal worker commands.
#### Job configuration
Jobs as work units can quite vary and do completely different things, that means
configuration and worker has to be prepared for this kind of generality.
Configuration and its solution was already discussed above, implementation in
worker is then quite also quite straightforward.
Worker has internal structures to which loads and which stores metadata given in
configuration. Whole job is mapped to job metadata structure and tasks are
mapped to either external ones or internal ones (internal commands has to be
defined within worker), both are different whether they are executed in sandbox
or as internal worker commands.
Another division of tasks is by task-type field in configuration. This field can Another division of tasks is by task-type field in configuration. This field can
have four values: initiation, execution, evaluation and inner. All was discussed have four values: initiation, execution, evaluation and inner. All was discussed
and described above in configuration analysis. What is important to worker is and described above in configuration analysis. What is important to worker is
how to behave if execution of task with some particular type fails. There are how to behave if execution of task with some particular type fails.
two possible situations execution fails due to bad user solution or due to some
internal error. If execution fails on internal error solution cannot be declared There are two possible situations execution fails due to bad user solution or
overly as failed. User should not be punished for bad configuration or some due to some internal error. If execution fails on internal error solution cannot
network error. This is where task types are useful. Generally initiation, be declared overly as failed. User should not be punished for bad configuration
execution and evaluation are tasks which are somehow executing code which was or some network error. This is where task types are useful. Generally
given by users who submitted solution of exercise. If this kinds of tasks fail initiation, execution and evaluation are tasks which are somehow executing code
it is probably connected with bad user solution and can be evaluated. But if which was given by users who submitted solution of exercise. If this kinds of
some inner task fails solution should be re-executed, in best case scenario on tasks fail it is probably connected with bad user solution and can be evaluated.
different worker. That is why if inner task fails it is sent back to broker But if some inner task fails solution should be re-executed, in best case
which will reassign job to another worker. More on this subject should be scenario on different worker. That is why if inner task fails it is sent back to
broker which will reassign job to another worker. More on this subject should be
discussed in broker assigning algorithms section. discussed in broker assigning algorithms section.
#### Job working directories
There is also question about working directory or directories of job, which There is also question about working directory or directories of job, which
directories should be used and what for. There is one simple answer on this directories should be used and what for. There is one simple answer on this
every job will have only one specified directory which will contain every file every job will have only one specified directory which will contain every file
with which worker will work in the scope of whole job execution. This is of with which worker will work in the scope of whole job execution. This solution
course nonsense there has to be some logical division. The least which must be is easy but fails due to logical and security reasons.
done are two folders one for internal temporary files and second one for
evaluation. The directory for temporary files is enough to comprehend all kind The least which must be done are two folders one for internal temporary files
of internal work with filesystem but only one directory for whole evaluation is and second one for evaluation. The directory for temporary files is enough to
somehow not enough. Users solutions are downloaded in form of zip archives so comprehend all kind of internal work with filesystem but only one directory for
why these should be present during execution or why the results and files which whole evaluation is somehow not enough.
should be uploaded back to fileserver should be cherry picked from the one big
directory? The answer is of course another logical division into subfolders. The The solution which was chosen at the end is to have folders for downloaded
solution which was chosen at the end is to have folders for downloaded archive, archive, decompressed solution, evaluation directory in which user solution is
decompressed solution, evaluation directory in which user solution is executed executed and then folders for temporary files and for results and generally
and then folders for temporary files and for results and generally files which files which should be uploaded back to fileserver with solution results.
should be uploaded back to fileserver with solution results. Of course there has
to be hierarchy which separate folders from different workers on the same There has to be also hierarchy which separate folders from different workers on
machines. That is why paths to directories are in format: the same machines. That is why paths to directories are in format:
`${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID}` where default means default `${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID}` where default means default
working directory of whole worker, folder is particular directory for some working directory of whole worker, folder is particular directory for some
purpose (archives, evaluation, ...). Mentioned division of job directories purpose (archives, evaluation, ...).
proved to be flexible and detailed enough, everything is in logical units and
where it is supposed to be which means that searching through this system should Mentioned division of job directories proved to be flexible and detailed enough,
be easy. In addition if solutions of users have access only to evaluation everything is in logical units and where it is supposed to be which means that
directory then they do not have access to unnecessary files which is better for searching through this system should be easy. In addition if solutions of users
overall security of whole ReCodEx. have access only to evaluation directory then they do not have access to
unnecessary files which is better for overall security of whole ReCodEx.
As we discovered above worker has job directories but users who are writing and
#### Job variables
As mentioned above worker has job directories but users who are writing and
managing job configurations do not know where they are (on some particular managing job configurations do not know where they are (on some particular
worker) and how they can be accessed and written into configuration. For this worker) and how they can be accessed and written into configuration. For this
kind of task we have to introduce some kind of marks or signs which will kind of task we have to introduce some kind of marks or signs which will
represent particular folders. Marks or signs can have form of some kind of represent particular folders. Marks or signs can have form broadly used
special strings which can be called variables. These variables then can be used variables.
everywhere where filesystem paths are used within configuration file. This will
solve problem with specific worker environment and specific hierarchy of
directories. Final form of variables is `${...}` where triple dot is textual
description. This format was used because of special dollar sign character which
cannot be used within filesystem path, braces are there only to border textual
description of variable.
#### Evaluation Variables can be used everywhere where filesystem paths are used within
configuration file. This will solve problem with specific worker environment and
specific hierarchy of directories. Final form of variables is `${...}` where
triple dot is textual description. This format was used because of special
dollar sign character which cannot be used within filesystem path, braces are
there only to border textual description of variable.
After successful arrival of job, worker has to prepare new execution #### Supplementary files
environment, then solution archive has to be downloaded from fileserver and
extracted. Job configuration is located within these files and loaded into
internal structures and executed. After that results are uploaded back to
fileserver. These steps are the basic ones which are really necessary for whole
execution and have to be executed in this precise order.
Interesting problem is with supplementary files (inputs, sample outputs). There Interesting problem is with supplementary files (inputs, sample outputs). There
are two approaches which can be observed. Supplementary files can be downloaded are two approaches which can be observed. Supplementary files can be downloaded
either on the start of the execution or during execution. If the files are either on the start of the execution or during execution. If the files are
downloaded at the beginning execution does not really started at this point and downloaded at the beginning, execution does not really started at this point and
if there are problems with network worker find it right away and can abort if there are problems with network worker will find it right away and can abort
execution without executing single task. Slight problems can arise if some of execution without executing single task. Slight problems can arise if some of
the files needs to have same name (e.g. solution assumes that input is the files needs to have same name (e.g. solution assumes that input is
`input.txt`), in this scenario downloaded files cannot be renamed at the `input.txt`), in this scenario downloaded files cannot be renamed at the
beginning but during execution which is somehow impractical and not easily beginning but during execution which is somehow impractical and not easily
observed. Second solution of this problem when files are downloaded on the fly observed.
has quite opposite problem, if there are problems with network worker will find
it during execution when for instance almost whole execution is done, this is Second solution of this problem when files are downloaded on the fly has quite
also not ideal solution if we care about burnt hardware resources. On the other opposite problem, if there are problems with network, worker will find it during
hand using this approach users have quite advanced control of execution flow and execution when for instance almost whole execution is done, this is also not
know what files exactly are available during execution which is from users ideal solution if we care about burnt hardware resources. On the other hand
perspective probably more appealing then the first solution. Based on that using this approach users have quite advanced control of execution flow and know
what files exactly are available during execution which is from users
perspective probably more appealing then the first solution. Based on that,
downloading of supplementary files using 'fetch' tasks during execution was downloading of supplementary files using 'fetch' tasks during execution was
chosen and implemented. chosen and implemented.
@ -1248,7 +1271,7 @@ be fine. Because fetch tasks should have 'inner' task type which implies that
fail in this task will stop all execution and job will be reassigned to another fail in this task will stop all execution and job will be reassigned to another
worker. It should be like the last salvation in case everything else goes wrong. worker. It should be like the last salvation in case everything else goes wrong.
#### Sandboxing ### Sandboxing
There are numerous ways how to approach sandboxing on different platforms, There are numerous ways how to approach sandboxing on different platforms,
describing all possible approaches is out of scope of this document. Instead of describing all possible approaches is out of scope of this document. Instead of
@ -1269,6 +1292,8 @@ implemented well are giving pretty safe sandbox which can be used for all kinds
of users solutions and should be able to restrict and stop any standard way of of users solutions and should be able to restrict and stop any standard way of
attacks or errors. attacks or errors.
#### Linux
Linux systems have quite extent support of sandboxing in kernel, there were Linux systems have quite extent support of sandboxing in kernel, there were
introduced and implemented kernel namespaces and cgroups which combined can introduced and implemented kernel namespaces and cgroups which combined can
limit hardware resources (cpu, memory) and separate executing program into its limit hardware resources (cpu, memory) and separate executing program into its
@ -1278,30 +1303,31 @@ new one. Luckily existing solution was found and its name is **isolate**.
Isolate does not use all possible kernel features but only subset which is still Isolate does not use all possible kernel features but only subset which is still
enough to be used by ReCodEx. enough to be used by ReCodEx.
#### Windows
The opposite situation is in Windows world, there is limited support in its The opposite situation is in Windows world, there is limited support in its
kernel which makes sandboxing a bit trickier. Windows kernel only has ways how kernel which makes sandboxing a bit trickier. Windows kernel only has ways how
to restrict privileges of a process through restriction of internal access to restrict privileges of a process through restriction of internal access
tokens. Monitoring of hardware resources is not possible but used resources can tokens. Monitoring of hardware resources is not possible but used resources can
be obtained through newly created job objects. But find sandbox which can do all be obtained through newly created job objects.
things needed for ReCodEx seems to be impossible. There are numerous sandboxes
for Windows but they all are focused on different things in a lot of cases they There are numerous sandboxes for Windows but they all are focused on different
serves as safe environment for malicious programs, viruses in particular. Or things in a lot of cases they serves as safe environment for malicious programs,
they are designed as a separate filesystem namespace for installing a lot of viruses in particular. Or they are designed as a separate filesystem namespace
temporarily used programs. From all these we can mention Sandboxie, Comodo for installing a lot of temporarily used programs. From all these we can
Internet Security, Cuckoo sandbox and many others. None of these is fitted as mention: Sandboxie, Comodo Internet Security, Cuckoo sandbox and many others.
sandbox solution for ReCodEx. With this being said we can safely state that None of these is fitted as sandbox solution for ReCodEx. With this being said we
designing and implementing new general sandbox for Windows is out of scope of can safely state that designing and implementing new general sandbox for Windows
this project. is out of scope of this project.
New general sandbox for Windows is out of business but what about more But designing sandbox only for specific environment is possible, namely for C#
specialized solution used for instance only for C#. CLR as a virtual machine and and .NET. CLR as a virtual machine and runtime environment has a pretty good
runtime environment has a pretty good security support for restrictions and security support for restrictions and separation which is also transferred to
separation which is also transferred to C#. This makes it quite easy to C#. This makes it quite easy to implement simple sandbox within C# but there are
implement simple sandbox within C# but surprisingly there cannot be found some not any well known general purpose implementations. As said in previous
well known general purpose implementations. As said in previous paragraph paragraph implementing our own solution is out of scope of project. But C#
implementing our own solution is out of scope of project there is simple not sandbox is quite good topic for another project for example term project for C#
enough time. But C# sandbox is quite good topic for another project for example course so it might be written and integrated in future.
term project for C# course so it might be written and integrated in future.
### Fileserver ### Fileserver
@ -1520,32 +1546,35 @@ implementation of this critical security feature.
#### Forgotten password #### Forgotten password
With authentication and some sort of dealing with passwords is related a problem With authentication and some sort of dealing with passwords is related a problem
with forgotten credentials, especially passwords. People easily forget them and with forgotten credentials, especially passwords. There has to be some kind of
there has to be some kind of mechanism to retrieve a new password or change the mechanism to retrieve a new password or change the old one.
old one. Problem is that it cannot be done in totally secure way, but we can at
least come quite close to it. First, there are absolutely not secure and First, there are absolutely not secure and recommendable ways how to handle
recommendable ways how to handle that, for example sending the old password that, for example sending the old password through email. A better, but still
through email. A better, but still not secure solution is to generate a new one not secure solution is to generate a new one and again send it through email.
and again send it through email. This solution was provided in CodEx, users had
to write an email to administrator, who generated a new password and sent it Mentioned solution was provided in CodEx, users had to write an email to
back to the sender. This simple solution could be also automated, but administrator, who generated a new password and sent it back to the sender. This
administrator had quite a big control over whole process. This might come in simple solution could be also automated, but administrator had quite a big
handy if there could be some additional checkups for example, but on the other control over whole process. This might come in handy if there should be some
hand it can be quite time consuming. additional checkups, but on the other hand it can be quite time consuming.
Probably the best solution which is often used and is fairly secure is Probably the best solution which is often used and is fairly secure follows. Let
following. Let us consider only case in which all users have to fill their us consider only case in which all users have to fill their email addresses into
email addresses into the system and these addresses are safely in the hands of the system and these addresses are safely in the hands of the right users.
the right users. When user finds out that he/she does not remember a password,
he/she requests a password reset and fill in his/her unique identifier; it might When user finds out that he/she does not remember a password, he/she requests a
be email or unique nickname. Based on matched user account the system generates password reset and fill in his/her unique identifier; it might be email or
unique access token and sends it to user via email address. This token should be unique nickname. Based on matched user account the system generates unique
time limited and usable only once, so it cannot be misused. User then takes the access token and sends it to user via email address. This token should be time
token or URL address which is provided in the email and go to the system's limited and usable only once, so it cannot be misused. User then takes the token
appropriate section, where new password can be set. After that user can sign in or URL address which is provided in the email and go to the system's appropriate
with his/her new password. As previously stated, this solution is quite safe and section, where new password can be set. After that user can sign in with his/her
user can handle it on its own, so administrator does not have to worry about it. new password.
That is the main reason why this approach was chosen to be used.
As previously stated, this solution is quite safe and user can handle it on its
own, so administrator does not have to worry about it. That is the main reason
why this approach was chosen to be used.
#### Uploading files #### Uploading files
@ -1657,7 +1686,7 @@ Another step would be the overall backend state like how many jobs were
processed by some particular worker, workload of the broker and the workers, processed by some particular worker, workload of the broker and the workers,
etc. The easiest solution is to manage this information by hand, every instance etc. The easiest solution is to manage this information by hand, every instance
of the API server has to have an administrator which would have to fill them. of the API server has to have an administrator which would have to fill them.
This of course includes only the currently available workers and runtime This includes only the currently available workers and runtime
environments which does not change very often. The real-time statistics of the environments which does not change very often. The real-time statistics of the
backend cannot be made accessible this way in a reasonable way. backend cannot be made accessible this way in a reasonable way.

Loading…
Cancel
Save