You cannot select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
1446 lines
80 KiB
Markdown
1446 lines
80 KiB
Markdown
8 years ago
|
# Analysis
|
||
|
|
||
|
None of the existing projects we came across meets all the features requested
|
||
|
by the new system. There is no grading system which supports an arbitrary-length
|
||
|
evaluation pipeline, so we have to implement this feature ourselves.
|
||
|
No existing solution is extensible enough to be used as a base for the new system.
|
||
|
After considering all these
|
||
|
facts, a new system has to be written from scratch. This
|
||
|
implies that only a subset of all the features will be implemented in the first
|
||
|
version, and more of them will come in the following releases.
|
||
|
|
||
|
The requested features are categorized based on priorities for the whole system. The
|
||
|
highest priority is the functionality present in the current CodEx. It is a base
|
||
|
line for being useful in the production environment. The design of the new solution
|
||
|
should allow that the system will be extended easily. The ideas from faculty staff
|
||
|
have lower priority, but most of them will be implemented as part of the project.
|
||
|
The most complicated tasks from this category are an advanced low-level evaluation
|
||
|
configuration format, use of modern tools, connection to a university system, and
|
||
|
combining the currently separate instances into one installation of the system.
|
||
|
|
||
|
Other tasks are scheduled
|
||
|
for the next releases after the first version of the project is completed.
|
||
|
Namely, these are a high-level exercise evaluation configuration with a
|
||
|
user-friendly UI, SIS integration (when a public API becomes available for
|
||
|
the system), and a command-line submission tool. Plagiarism detection is not
|
||
|
likely to be part of any release in near future unless someone else implements a
|
||
|
sufficiently capable and extendable solution -- this problem is too complex to be
|
||
|
solved as a part of this project.
|
||
|
|
||
|
We named the new project **ReCodEx -- ReCodEx Code Examiner**. The name
|
||
|
should point to the old CodEx, **Re** as part of the name means
|
||
|
redesigned, rewritten, renewed, or restarted.
|
||
|
|
||
|
At this point there is a clear idea of how the new system will be used and what are
|
||
|
the major enhancements for the future releases. With this in mind, it is possible
|
||
|
to sketch the overall architecture. To sum this up, here is a list of key features of the
|
||
|
new system. They come from the previous research of the current drawbacks of the system,
|
||
|
reasonable wishes of university users, and our major design choices:
|
||
|
|
||
|
- modern HTML5 web frontend written in JavaScript using a suitable framework
|
||
|
- REST API communicating with a persistent database, evaluation backend, and a file server
|
||
|
- evaluation backend implemented as a distributed system on top of a messaging
|
||
|
framework with a master-worker architecture
|
||
|
- multi-platform worker supporting Linux and Windows environments (latter without
|
||
|
a sandbox, no suitable general purpose tool available yet)
|
||
|
- evaluation procedure configured in a human readable text file, consisting of
|
||
|
small tasks forming an arbitrary oriented acyclic dependency graph
|
||
|
|
||
|
## Basic Concepts
|
||
|
|
||
|
The requirements specify that the user
|
||
|
interface must be accessible to students without the need to install additional
|
||
|
software. This immediately implies that users have to be connected to the
|
||
|
Internet. Nowadays, there are two main ways
|
||
|
of designing graphical user interfaces -- as a native application or a web page.
|
||
|
Creating a user-friendly and multi-platform application with graphical UI
|
||
|
is almost impossible because of the large number of different operating systems.
|
||
|
These applications typically require installation or at least downloading
|
||
|
its files (source codes or binaries). On the other hand, distributing a web
|
||
|
application is easier, because every personal computer has an internet browser
|
||
|
installed. Browsers support a (mostly) unified and standardized
|
||
|
environment of HTML5 and JavaScript. CodEx is also a web application and
|
||
|
everybody seems to be satisfied with this fact. There are other communicating channels
|
||
|
most programmers use, such as e-mail or git, but they are inappropriate for
|
||
|
designing user interfaces on top of them.
|
||
|
|
||
|
It is clear from the assignment of the project
|
||
|
that the system has to keep personalized data of the users. User data cannot
|
||
|
be publicly available, which implies necessity of user authentication.
|
||
|
The application also has to support
|
||
|
multiple ways of authentication (e.g., university authentication systems, a company
|
||
|
LDAP server, an OAuth server), and permit adding more security measures in the
|
||
|
future, such as two-factor authentication.
|
||
|
|
||
|
Each user has a specific role in the system. From the assignment it is required to
|
||
|
have at least two such roles, _student_ and _supervisor_. However, it is advisible
|
||
|
to add an _administrator_ level for users who take care of the system as a whole and are
|
||
|
responsible for the setup, monitoring, or updates. The student role has the
|
||
|
minimum access rights, basically a student can only view assignments and submit solutions.
|
||
|
Supervisors have more authority, so they can create exercises and assignments, and
|
||
|
view results of their students. From the organization of the university, one possible
|
||
|
level could be introduced, _course guarantor_. However, from real experience all
|
||
|
duties related with lecturing of labs are already associated with supervisors,
|
||
|
so this role does not seem useful. In addition, no one requested more than a three
|
||
|
level privilege scheme.
|
||
|
|
||
|
School labs are lessons for some students lead by supervisors. All students in a
|
||
|
lab have the same homework and supervisors evaluate their solutions. This
|
||
|
arrangement has to be transferred into the new system. The groups in the system
|
||
|
correspond to the real-life labs. This concept was already discussed in the
|
||
|
previous chapter including the need for a hierarchical structure of the groups.
|
||
|
|
||
|
To allow restriction of group members in ReCodEx, there are two types of groups
|
||
|
-- _public_ and _private_. Public groups are open for all registered users, but
|
||
|
to become a member of a private group, one of its supervisors has to add the
|
||
|
user to the group. This could be done automatically at the beginning of a term with data
|
||
|
from the information system, but unfortunately there is no API for this yet.
|
||
|
However, creating this API is now being considered by university staff.
|
||
|
|
||
|
Supervisors using CodEx in their labs usually set a minimum amount of points
|
||
|
required to get a credit. These points can be acquired by solving assigned
|
||
|
exercises. To show users whether they already have enough points, ReCodEx also
|
||
|
supports setting this limit for the groups. There are two equal ways of setting
|
||
|
a limit -- an absolute number fo points or a percentage of the total possible
|
||
|
number of points. We decided to implement the latter of these possibilities
|
||
|
and we call it the threshold.
|
||
|
|
||
|
Our university has a few partners among grammar schools. There was an idea, that they
|
||
|
could use CodEx for teaching IT classes. To simplify the setup for
|
||
|
them, all the software and hardware would be provided by the university as a
|
||
|
SaaS. However, CodEx is not prepared for this
|
||
|
kind of usage and no one has the time to manage another separate instance. With
|
||
|
ReCodEx it is possible to offer a hosted environment as a service to other
|
||
|
subjects.
|
||
|
|
||
|
The system is divided into multiple separate units called _instances_.
|
||
|
Each instance has its own set of users and groups. Exercises can be optionally
|
||
|
shared. The rest of the system (the API server and the evaluation backend) is shared
|
||
|
between the instances. To keep a track of the active instances and allow access
|
||
|
to the infrastructure to other, paying, customers, each instance must have a
|
||
|
valid _licence_ to allow its users to submit their solutions.
|
||
|
Each licence is granted for a specific period of time and can be revoked in advance
|
||
|
if the subject does not conform with approved terms and conditions.
|
||
|
|
||
|
The problems the students solve are broken down into two parts in the system:
|
||
|
|
||
|
- the problem itself (an _exercise_),
|
||
|
- and its _assignment_
|
||
|
|
||
|
Exercises only describe the problem and provide testing data with the description
|
||
|
of how to evaluate them. In fact, these are templates for the assignments. A particular
|
||
|
assignment then contains data from the exercise and some additional metadata, which can be
|
||
|
different for every assignment of the same exercise (e.g., the deadline, maximum number
|
||
|
of points).
|
||
|
|
||
|
### Evaluation Unit Executed by ReCodEx
|
||
|
|
||
|
One of the bigger requests for the new system is to support a complex
|
||
|
configuration of execution pipeline. The idea comes from lecturers of Compiler
|
||
|
principles class who want to migrate their semi-manual evaluation process to
|
||
|
CodEx. Unfortunately, CodEx is not capable of such complicated exercise setup.
|
||
|
None of evaluation systems we found can handle such task, so design from
|
||
|
scratch is needed.
|
||
|
|
||
|
There are two main approaches to design a complex execution configuration. It
|
||
|
can be composed of small amount of relatively big components or much more small
|
||
|
tasks. Big components are easy to write and help keeping the configuration
|
||
|
reasonably small. However, these components are designed for current problems
|
||
|
and they might not hold well against future requirements. This can be solved by
|
||
|
introducing a small set of single-purposed tasks which can be composed together.
|
||
|
The whole configuration becomes bigger, but more flexible for new conditions.
|
||
|
Moreover, they will not require as much programming effort as bigger evaluation
|
||
|
units. For better user experience, configuration generators for some common
|
||
|
cases can be introduced.
|
||
|
|
||
|
A goal of ReCodEx is to be continuously developed and used for many years.
|
||
|
Therefore, we chose to use smaller tasks, because this approach is better for
|
||
|
future extensibility. Observation of CodEx system shows that only a few tasks
|
||
|
are needed. In an extreme case, only one task is enough -- execute a binary.
|
||
|
However, for better portability of configurations between different systems it
|
||
|
is better to implement a reasonable subset of operations ourselves without
|
||
|
calling binaries provided by the system directly. These operations are copy
|
||
|
file, create new directory, extract archive and so on, altogether called
|
||
|
internal tasks. Another benefit from custom implementation of these tasks is
|
||
|
guarantied safety, so no sandbox needs to be used as in external tasks case.
|
||
|
|
||
|
For a job evaluation, the tasks need to be executed sequentially in a specified
|
||
|
order. Running independent tasks is possible, but there are complications --
|
||
|
exact time measurement requires a controlled environment with as few
|
||
|
interruptions as possible from other processes. It would be possible to run
|
||
|
tasks that do not need exact time measurement in parallel, but in this case a
|
||
|
synchronization mechanism has to be developed to exclude parallelism for
|
||
|
measured tasks. Usually, there are about four times more unmeasured tasks than
|
||
|
tasks with time measurement, but measured tasks tend to be much longer. With
|
||
|
[Amdahl's law](https://en.wikipedia.org/wiki/Amdahl's_law) in mind, the
|
||
|
parallelism does not seem to provide a notable benefit in overall execution
|
||
|
speed and brings trouble with synchronization. Moreover, most of the internal
|
||
|
tasks are also limited by IO speed (most notably copying and downloading files
|
||
|
and reading archives). However, if there are performance issues, this approach
|
||
|
could be reconsidered, along with using a ram disk for storing supplementary
|
||
|
files.
|
||
|
|
||
|
It seems that connecting tasks into directed acyclic graph (DAG) can handle all
|
||
|
possible problem cases. None of the authors, supervisors and involved faculty
|
||
|
staff can think of a problem that cannot be decomposed into tasks connected in a
|
||
|
DAG. The goal of evaluation is to satisfy as many tasks as possible. During
|
||
|
execution there are sometimes multiple choices of next task. To control that,
|
||
|
each task can have a priority, which is used as a secondary ordering criterion.
|
||
|
For better understanding, here is a small example.
|
||
|
|
||
|
![Task serialization](https://github.com/ReCodEx/wiki/raw/master/images/Assignment_overview.png)
|
||
|
|
||
|
The _job root_ task is an imaginary single starting point of each job. When the
|
||
|
_CompileA_ task is finished, the _RunAA_ task is started (or _RunAB_, but should
|
||
|
be deterministic by position in configuration file -- tasks stated earlier
|
||
|
should be executed earlier). The task priorities guaranties, that after
|
||
|
_CompileA_ task all dependent tasks are executed before _CompileB_ task (they
|
||
|
have higher priority number). To sum up, connection of tasks represents
|
||
|
dependencies and priorities can be used to order unrelated tasks and with this
|
||
|
provide a total ordering of them. For well written jobs the priorities may not
|
||
|
be so useful, but they can help control execution order for example to avoid
|
||
|
situation, where each test of the job generates large temporary file and there
|
||
|
is a one valid execution order which keeps all the temporary files for later
|
||
|
processing at one time. Better approach is to finish execution of one test,
|
||
|
clean the big temporary file and proceed with following test. If there is an
|
||
|
ambiguity in task ordering at this point, they are executed in order of input
|
||
|
task configuration.
|
||
|
|
||
|
The total linear ordering of tasks can be made easier with just executing them
|
||
|
in order of input configuration. But this structure cannot handle cases, when a
|
||
|
task fails very well. There is no easy way of telling which task should be
|
||
|
executed next. However, this issue can be solved with graph structured
|
||
|
dependencies of the tasks. In graph structure, it is clear that all dependent
|
||
|
tasks have to be skipped and execution must be resumed with a non related task.
|
||
|
This is the main reason, why the tasks are connected in a DAG.
|
||
|
|
||
|
For grading there are several important tasks. First, tasks executing submitted
|
||
|
code need to be checked for time and memory limits. Second, outputs of judging
|
||
|
tasks need to be checked for correctness (represented by return value or by data
|
||
|
on standard output) and should not fail. This division can be transparent for
|
||
|
backend, each task is executed the same way. But frontend must know which tasks
|
||
|
from whole job are important and what is their kind. It is reasonable, to keep
|
||
|
this piece of information alongside the tasks in job configuration, so each task
|
||
|
can have a label about its purpose. Unlabeled tasks have an internal type
|
||
|
_inner_. There are four categories of tasks:
|
||
|
|
||
|
- _initiation_ -- setting up the environment, compiling code, etc.; for users
|
||
|
failure means error in their sources which are not compatible with running it
|
||
|
with examination data
|
||
|
- _execution_ -- running the user code with examination data, must not exceed
|
||
|
time and memory limits; for users failure means wrong design, slow data
|
||
|
structures, etc.
|
||
|
- _evaluation_ -- comparing user and examination outputs; for user failure means
|
||
|
that the program does not compute the right results
|
||
|
- _inner_ -- no special meaning for frontend, technical tasks for fetching and
|
||
|
copying files, creating directories, etc.
|
||
|
|
||
|
Each job is composed of multiple tasks of these types which are semantically
|
||
|
grouped into tests. A test can represent one set of examination data for user
|
||
|
code. To mark the grouping, another task label can be used. Each test must have
|
||
|
exactly one _evaluation_ task (to show success or failure to users) and
|
||
|
arbitrary number of tasks with other types.
|
||
|
|
||
|
### Evaluation Progress State
|
||
|
|
||
|
Users want to know the state of their submitted solution (whether it is waiting
|
||
|
in a queue, compiling, etc.). The very first idea would be to report a state
|
||
|
based on "done" messages from compilation, execution and evaluation like many
|
||
|
evaluation systems are already providing. However ReCodEx has a more complicated
|
||
|
execution pipeline where there can be more compilation or execution tasks per
|
||
|
test and also other internal tasks that control the job execution flow.
|
||
|
|
||
|
The users do not know the technical details of the evaluation and data about
|
||
|
completion of tasks may confuse them. A solution is to show users only
|
||
|
percentual completion of the job without any additional information about task
|
||
|
types. This solution works well for all of the jobs and is very user friendly.
|
||
|
|
||
|
It is possible to expand upon this by adding a special "send progress message"
|
||
|
task to the job configuration that would mark the completion of a specific part
|
||
|
of the evaluation. However, the benefits of this feature are not worth the
|
||
|
effort of implementing it and unnecessarily complicating the job configuration
|
||
|
files.
|
||
|
|
||
|
### Results of Evaluation
|
||
|
|
||
|
The evaluation data have to be processed and then presented in human readable
|
||
|
form. This is done through a one numeric value called points. Also, results of
|
||
|
job tests should be available to know what kind of error is in the solution. For
|
||
|
more debugging, outputs of tasks could be optionally available for the users.
|
||
|
|
||
|
#### Scoring and Assigning Points
|
||
|
|
||
|
The overall concept of grading solutions was presented earlier. To briefly
|
||
|
remind that, backend returns only exact measured values (used time and memory,
|
||
|
return code of the judging task, ...) and on top of that one value is computed.
|
||
|
The way of this computation can be very different across supervisors, so it has
|
||
|
to be easily extendable. The best way is to provide interface, which can be
|
||
|
implemented and any sort of magic can return the final value.
|
||
|
|
||
|
We found out several computational possibilities. There is basic arithmetic,
|
||
|
weighted arithmetic, geometric and harmonic mean of results of each test (the
|
||
|
result is logical value succeeded/failed, optionally with weight), some kind of
|
||
|
interpolation of used amount of time for each test, the same with used memory
|
||
|
amount and surely many others. To keep the project simple, we decided to design
|
||
|
appropriate interface and implement only weighted arithmetic mean computation,
|
||
|
which is used in about 90% of all assignments. Of course, different scheme can
|
||
|
be chosen for every assignment and also can be configured -- for example
|
||
|
specifying test weights for implemented weighted arithmetic mean. Advanced ways
|
||
|
of computation can be implemented on demand when there is a real demand for
|
||
|
them.
|
||
|
|
||
|
To avoid assigning points for insufficient solutions (like only printing "File
|
||
|
error" which is the valid answer in two tests), a minimal point threshold can be
|
||
|
specified. If the solution is to get less points than specified, it will get
|
||
|
zero points instead. This functionality can be embedded into grading computation
|
||
|
algorithm itself, but it would have to be present in each implementation
|
||
|
separately, which is not maintainable. Because of this the threshold feature is
|
||
|
separated from score computation.
|
||
|
|
||
|
Automatic grading cannot reflect all aspects of submitted code. For example,
|
||
|
structuring the code, number and quality of comments and so on. To allow
|
||
|
supervisors bring these manually checked things into grading, there is a concept
|
||
|
of bonus points. They can be positive or negative. Generally the solution with
|
||
|
the most assigned points is marked for grading that particular assignment.
|
||
|
However, if supervisor is not satisfied with student solution (really bad code,
|
||
|
cheating, ...) he/she assigns the student negative bonus points. To prevent
|
||
|
overriding this decision by system choosing another solution with more points or
|
||
|
even student submitting the same code again which evaluates to more points,
|
||
|
supervisor can mark a particular solution as marked and used for grading instead
|
||
|
of solution with the most points.
|
||
|
|
||
|
#### Evaluation Outputs
|
||
|
|
||
|
In addition to the exact measured values used for score calculation described in
|
||
|
previous chapter, there are also text or binary outputs of the executed tasks.
|
||
|
Knowing them helps users identify and solve their potential issues, but on the
|
||
|
other hand this can lead to possibility of leaking input data. This may lead
|
||
|
students to hack their solutions to pass just the ReCodEx testing cases instead
|
||
|
of properly solving the assigned problem. The usual approach is to keep this
|
||
|
information private. This was also strongly recommended by Martin Mareš, who has
|
||
|
experience with several programming contests.
|
||
|
|
||
|
The only one exception from hiding the logs are compilation outputs, which can
|
||
|
help students a lot during troubleshooting and there is only a small possibility
|
||
|
of input data leakage. The supervisors have access to all of the logs and they
|
||
|
can decide if students are allowed to see the compilation outputs.
|
||
|
|
||
|
Note that due to lack of frontend developers, showing compilation logs to the
|
||
|
students is not implemented in the very first release of ReCodEx.
|
||
|
|
||
|
### Persistence
|
||
|
|
||
|
Previous parts of analysis show that the system has to keep some state. This
|
||
|
could be user settings, group membership, evaluated assignments and so on. The
|
||
|
data have to be kept across restart, so persistence is important decision
|
||
|
factor. There are several ways how to save structured data:
|
||
|
|
||
|
- plain files
|
||
|
- NoSQL database
|
||
|
- relational database
|
||
|
|
||
|
Another important factor is amount and size of stored data. Our guess is about
|
||
|
1000 users, 100 exercises, 200 assignments per year and 20000 unique solutions
|
||
|
per year. The data are mostly structured and there are a lot of them with the
|
||
|
same format. For example, there is a thousand of users and each one has the same
|
||
|
values -- name, email, age, etc. These data items are relatively small, name
|
||
|
and email are short strings, age is an integer. Considering this, relational
|
||
|
databases or formatted plain files (CSV for example) fit best for them.
|
||
|
However, the data often have to support searching, so they have to be
|
||
|
sorted and allow random access for resolving cross references. Also, addition
|
||
|
and deletion of entries should take reasonable time (at most logarithmic time
|
||
|
complexity to number of saved values). This practically excludes plain files, so
|
||
|
we decided to use a relational database.
|
||
|
|
||
|
On the other hand, there is data with basically no structure and much larger
|
||
|
size. These can be evaluation logs, sample input files for exercises or sources
|
||
|
submitted by students. Saving this kind of data into a relational database is
|
||
|
not appropriate. It is better to keep them as ordinary files or store them in
|
||
|
some kind of NoSQL database. Since they are already files and do not need to be
|
||
|
backed up in multiple copies, it is easier to keep them as ordinary files in the
|
||
|
filesystem. Also, this solution is more lightweight and does not require
|
||
|
additional dependencies on third-party software. Files can be identified using
|
||
|
their filesystem paths or a unique index stored as a value in a relational
|
||
|
database. Both approaches are equally good, final decision depends on the actual
|
||
|
implementation.
|
||
|
|
||
|
## Structure of The Project
|
||
|
|
||
|
The ReCodEx project is divided into two logical parts -- the *backend* and the
|
||
|
*frontend* -- which interact with each other and together cover the whole area
|
||
|
of code examination. Both of these logical parts are independent of each other
|
||
|
in the sense of being installed on separate machines at different locations and
|
||
|
that one of the parts can be replaced with a different implementation and as
|
||
|
long as the communication protocols are preserved, the system will continue
|
||
|
working as expected.
|
||
|
|
||
|
### Backend
|
||
|
|
||
|
Backend is the part which is responsible solely for the process of evaluating
|
||
|
a solution of an exercise. Each evaluation of a solution is referred to as a
|
||
|
*job*. For each job, the system expects a configuration document of the job,
|
||
|
supplementary files for the exercise (e.g., test inputs, expected outputs,
|
||
|
predefined header files), and the solution of the exercise (typically source
|
||
|
codes created by a student). There might be some specific requirements for the
|
||
|
job, such as a specific runtime environment, specific version of a compiler or
|
||
|
the job must be evaluated on a processor with a specific number of cores. The
|
||
|
backend infrastructure decides whether it will accept a job or decline it based
|
||
|
on the specified requirements. In case it accepts the job, it will be placed in
|
||
|
a queue and it will be processed as soon as possible.
|
||
|
|
||
|
The backend publishes the progress of processing of the queued jobs and the
|
||
|
results of the evaluations can be queried after the job processing is finished.
|
||
|
The backend produces a log of the evaluation which can be used for further score
|
||
|
calculation or debugging.
|
||
|
|
||
|
To make the backend scalable, there are two necessary components -- the one
|
||
|
which will execute jobs and the other which will distribute jobs to the
|
||
|
instances of the first one. This ensures scalability in manner of parallel
|
||
|
execution of numerous jobs which is exactly what is needed. Implementation of
|
||
|
these services are called **broker** and **worker**, first one handles
|
||
|
distribution, the other one handles execution.
|
||
|
|
||
|
These components should be enough to fulfill all tasks mentioned above, but for
|
||
|
the sake of simplicity and better communication, gateways with frontend two
|
||
|
other components were added -- **fileserver** and **monitor**. Fileserver is a
|
||
|
simple component whose purpose is to store files which are exchanged between
|
||
|
frontend and backend. Monitor is also quite a simple service which is able to
|
||
|
forward job progress data from worker to web application. These two additional
|
||
|
services are at the border between frontend and backend (like gateways) but
|
||
|
logically they are more connected with backend, so it is considered they belong
|
||
|
there.
|
||
|
|
||
|
### Frontend
|
||
|
|
||
|
Frontend on the other hand is responsible for providing users with convenient
|
||
|
access to the backend infrastructure and interpreting raw data from backend
|
||
|
evaluation.
|
||
|
|
||
|
There are two main purposes of the frontend -- holding the state of the whole
|
||
|
system (database of users, exercises, solutions, points, etc.) and presenting
|
||
|
the state to users through some kind of a user interface (e.g., a web
|
||
|
application, mobile application, or a command-line tool). According to
|
||
|
contemporary trends in development of frontend parts of applications, we decided
|
||
|
to split the frontend in two logical parts -- a server side and a client side.
|
||
|
The server side is responsible for managing the state and the client side gives
|
||
|
instructions to the server side based on the inputs from the user. This
|
||
|
decoupling gives us the ability to create multiple client side tools which may
|
||
|
address different needs of the users.
|
||
|
|
||
|
The frontend developed as part of this project is a web application created with
|
||
|
the needs of the Faculty of Mathematics and Physics of the Charles university in
|
||
|
Prague in mind. The users are the students and their teachers, groups correspond
|
||
|
to the different courses, the teachers are the supervisors of these groups. This
|
||
|
model is applicable to the needs of other universities, schools, and IT
|
||
|
companies, which can use the same system for their needs. It is also possible to
|
||
|
develop a custom frontend with own user management system and use the
|
||
|
possibilities of the backend without any changes.
|
||
|
|
||
|
### Possible Connection
|
||
|
|
||
|
One possible configuration of ReCodEx system is illustrated on following
|
||
|
picture, where there is one shared backend with three workers and two separate
|
||
|
instances of whole frontend. This configuration may be suitable for MFF UK --
|
||
|
basic programming course and KSP competition. But maybe even sharing web API and
|
||
|
fileserver with only custom instances of client (web app or own implementation)
|
||
|
is more likely to be used. Note, that connections between components are not
|
||
|
fully accurate.
|
||
|
|
||
|
![Overall architecture](https://github.com/ReCodEx/wiki/blob/master/images/Overall_Architecture.png)
|
||
|
|
||
|
In the following parts of the documentation, both the backend and frontend parts
|
||
|
will be introduced separately and covered in more detail. The communication
|
||
|
protocol between these two logical parts will be described as well.
|
||
|
|
||
|
|
||
|
## Implementation Analysis
|
||
|
|
||
|
Some of the most important implementation problems or interesting observations
|
||
|
will be discussed in this chapter.
|
||
|
|
||
|
### Communication Between the Backend Components
|
||
|
|
||
|
Overall design of the project is discussed above. There are bunch of components
|
||
|
with their own responsibility. Important thing to design is communication of
|
||
|
these components. To choose a suitable protocol, there are some additional
|
||
|
requirements that should be met:
|
||
|
|
||
|
- reliability -- if a message is sent between components, the protocol has to
|
||
|
ensure that it is received by target component
|
||
|
- working over IP protocol
|
||
|
- multi-platform and multi-language usage
|
||
|
|
||
|
TCP/IP protocol meets these conditions, however it is quite low level and
|
||
|
working with it usually requires working with platform dependent non-object API.
|
||
|
Often way to reflect these reproaches is to use some framework which provides
|
||
|
better abstraction and more suitable API. We decided to go this way, so the
|
||
|
following options are considered:
|
||
|
|
||
|
- CORBA (or some other form of RPC) -- CORBA is a well known framework for
|
||
|
remote procedure calls. There are multiple implementations for almost every
|
||
|
known programming language. It fits nicely into object oriented programming
|
||
|
environment.
|
||
|
- RabbitMQ -- RabbitMQ is a messaging framework written in Erlang. It features a
|
||
|
message broker, to which nodes connect and declare the message queues they
|
||
|
work with. It is also capable of routing requests, which could be a useful
|
||
|
feature for job load-balancing. Bindings exist for a large number of languages
|
||
|
and there is a large community supporting the project.
|
||
|
- ZeroMQ -- ZeroMQ is another messaging framework, which is different from
|
||
|
RabbitMQ and others (such as ActiveMQ) because it features a "brokerless
|
||
|
design". This means there is no need to launch a message broker service to
|
||
|
which clients have to connect -- ZeroMQ based clients are capable of
|
||
|
communicating directly. However, it only provides an interface for passing
|
||
|
messages (basically vectors of 255B strings) and any additional features such
|
||
|
as load balancing or acknowledgement schemes have to be implemented on top of
|
||
|
this. The ZeroMQ library is written in C++ with a huge number of bindings.
|
||
|
|
||
|
CORBA is a large framework that would satisfy all our needs, but we are aiming
|
||
|
towards a more loosely-coupled system, and asynchronous messaging seems better
|
||
|
for this approach than RPC. Moreover, we rarely need to receive replies to our
|
||
|
requests immediately.
|
||
|
|
||
|
RabbitMQ seems well suited for many use cases, but implementing a job routing
|
||
|
mechanism between heterogeneous workers would be complicated -- we would probably
|
||
|
have to create a separate load balancing service, which cancels the advantage of
|
||
|
a message broker already being provided by the framework. It is also written in
|
||
|
Erlang, which nobody from our team understands.
|
||
|
|
||
|
ZeroMQ is the best option for us, even with the drawback of having to implement
|
||
|
a load balancer ourselves (which could also be seen as a benefit and there is a
|
||
|
notable chance we would have to do the same with RabbitMQ). It also gives us
|
||
|
complete control over the transmitted messages and communication patterns.
|
||
|
However, all of the three options would have been possible to use.
|
||
|
|
||
|
### File Transfers
|
||
|
|
||
|
There has to be a way to access files stored on the fileserver (and also upload
|
||
|
them )from both worker and frontend server machines. The protocol used for this
|
||
|
should handle large files efficiently and be resilient to network failures.
|
||
|
Security features are not a primary concern, because all communication with the
|
||
|
fileserver will happen in an internal network. However, a basic form of
|
||
|
authentication can be useful to ensure correct configuration (if a development
|
||
|
fileserver uses different credentials than production, production workers will
|
||
|
not be able to use it by accident). Lastly, the protocol must have a client
|
||
|
library for platforms (languages) used in the backend. We will present some of
|
||
|
the possible options:
|
||
|
|
||
|
- HTTP(S) -- a de-facto standard for web communication that has far more
|
||
|
features than just file transfers. Thanks to being used on the web, a large
|
||
|
effort has been put into the development of its servers. It supports
|
||
|
authentication and it can handle short-term network failures (thanks to being
|
||
|
built on TCP and supporting resuming interrupted transfers). We will use HTTP
|
||
|
for communication with clients, so there is no added cost in maintaining a
|
||
|
server. HTTP requests can be made using libcurl.
|
||
|
- FTP -- an old protocol designed only for transferring files. It has all the
|
||
|
required features, but doesn't offer anything over HTTP. It is also supported
|
||
|
by libcurl.
|
||
|
- SFTP -- a file transfer protocol most frequently used as a subsystem of the
|
||
|
SSH protocol implementations. It doesn't provide authentication, but it
|
||
|
supports working with large files and resuming failed transfers. The libcurl
|
||
|
library supports SFTP.
|
||
|
- A network-shared file system (such as NFS) -- an obvious advantage of a
|
||
|
network-shared file system is that applications can work with remote files the
|
||
|
same way they would with local files. However, it brings an overhead for the
|
||
|
administrator, who has to configure access to this filesystem for every
|
||
|
machine that needs to access the storage.
|
||
|
- A custom protocol over ZeroMQ -- it is possible to design a custom file
|
||
|
transfer protocol that uses ZeroMQ for sending data, but it is not a trivial
|
||
|
task -- we would have to find a way to transfer large files efficiently,
|
||
|
implement an acknowledgement scheme and support resuming transfers. Using
|
||
|
ZeroMQ as the underlying layer does not help a lot with this. The sole
|
||
|
advantage of this is that the backend components would not need another
|
||
|
library for communication.
|
||
|
|
||
|
We chose HTTPS because it is widely used and client libraries exist in all
|
||
|
relevant environments. In addition, it is highly probable we will have to run an
|
||
|
HTTP server, because it is intended for ReCodEx to have a web frontend.
|
||
|
|
||
|
### Frontend - Backend Communication
|
||
|
|
||
|
Our choices when considering how clients will communicate with the backend have
|
||
|
to stem from the fact that ReCodEx should primarily be a web application. This
|
||
|
rules out ZeroMQ -- while it is very useful for asynchronous communication
|
||
|
between backend components, it is practically impossible to use it from a web
|
||
|
browser. There are several other options:
|
||
|
|
||
|
- *WebSockets* -- The WebSocket standard is built on top of TCP. It enables a
|
||
|
web browser to connect to a server over a TCP socket. WebSockets are
|
||
|
implemented in recent versions of all modern web browsers and there are
|
||
|
libraries for several programming languages like Python or JavaScript (running
|
||
|
in Node.js). Encryption of the communication over a WebSocket is supported as
|
||
|
a standard.
|
||
|
- *HTTP protocol* -- The HTTP protocol is a state-less protocol implemented on
|
||
|
top of the TCP protocol. The communication between the client and server
|
||
|
consists of a requests sent by the client and responses to these requests sent
|
||
|
back by the sever. The client can send as many requests as needed and it may
|
||
|
ignore the responses from the server, but the server must respond only to the
|
||
|
requests of the client and it cannot initiate communication on its own.
|
||
|
End-to-end encryption can be achieved easily using SSL (HTTPS).
|
||
|
|
||
|
We chose the HTTP(S) protocol because of the simple implementation in all sorts
|
||
|
of operating systems and runtime environments on both the client and the server
|
||
|
side.
|
||
|
|
||
|
The API of the server should expose basic CRUD (Create, Read, Update, Delete)
|
||
|
operations. There are some options on what kind of messages to send over the
|
||
|
HTTP:
|
||
|
|
||
|
- SOAP -- a protocol for exchanging XML messages. It is very robust and complex.
|
||
|
- REST -- is a stateless architecture style, not a protocol or a technology. It
|
||
|
relies on HTTP (but not necessarily) and its method verbs (e.g., GET, POST,
|
||
|
PUT, DELETE). It can fully implement the CRUD operations.
|
||
|
|
||
|
Even though there are some other technologies we chose the REST style over the
|
||
|
HTTP protocol. It is widely used, there are many tools available for development
|
||
|
and testing, and it is understood by programmers so it should be easy for a new
|
||
|
developer with some experience in client-side applications to get to know with
|
||
|
the ReCodEx API and develop a client application.
|
||
|
|
||
|
A high level view of chosen communication protocols in ReCodEx can be seen in
|
||
|
following image. Red arrows mark connections through ZeroMQ sockets, blue mark
|
||
|
WebSockets communication and green arrows connect nodes that communicate through
|
||
|
HTTP(S).
|
||
|
|
||
|
![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png)
|
||
|
|
||
|
### Job Configuration File
|
||
|
|
||
|
As discussed previously in 'Evaluation Unit Executed by ReCodEx' an evaluation
|
||
|
unit have form of a job which contains small tasks representing one piece of
|
||
|
work executed by worker. This implies that jobs have to be passed from the
|
||
|
frontend to the backend. The best option for this is to use some kind of
|
||
|
configuration file which represents job details. The configuration file should
|
||
|
be specified in the frontend and in the backend, namely worker, will be parsed
|
||
|
and executed.
|
||
|
|
||
|
There are many formats which can be used for configuration representation. The
|
||
|
considered ones are:
|
||
|
|
||
|
- *XML* -- broadly used general markup language which is flavoured with document
|
||
|
type definition (DTD) which can express and check XML file structure, so it
|
||
|
does not have to be checked within application. But XML with its tags can be
|
||
|
sometimes quite 'chatty' and extensive which is not desirable. And overly XML
|
||
|
with all its features and properties can be a bit heavy-weight.
|
||
|
- *JSON* -- a notation which was developed to represent javascript objects. As
|
||
|
such it is quite simple, there can be expressed only: key-value structures,
|
||
|
arrays and primitive values. Structure and hierarchy of data is solved by
|
||
|
braces and brackets.
|
||
|
- *INI* -- very simple configuration format which is able to represents only
|
||
|
key-value structures which can be grouped into sections. This is not enough
|
||
|
to represent a job and its tasks hierarchy.
|
||
|
- *YAML* -- format which is very similar to JSON with its capabilities. But with
|
||
|
small difference in structure and hierarchy of configuration which is solved
|
||
|
not with braces but with indentation. This means that YAML is easily readable
|
||
|
by both human and machine.
|
||
|
- *specific format* -- newly created format used just for job configuration.
|
||
|
Obvious drawback is non-existing parsers which would have to be written from
|
||
|
scratch.
|
||
|
|
||
|
Given previous list of different formats we decided to use YAML. There are
|
||
|
existing parsers for most of the programming languages and it is easy enough to
|
||
|
learn and understand. Another choice which make sense is JSON but at the end
|
||
|
YAML seemed to be better.
|
||
|
|
||
|
Job configuration including design and implementation notes is described in 'Job
|
||
|
configuration' appendix.
|
||
|
|
||
|
#### Task Types
|
||
|
|
||
|
From the low-level point of view there are only two types of tasks in the job.
|
||
|
First ones are doing some internal operation which should work on all platforms
|
||
|
or operating systems the same way. Second type of tasks are external ones which
|
||
|
are executing external binary.
|
||
|
|
||
|
Internal tasks should handle at least these operations:
|
||
|
|
||
|
- *fetch* -- fetch single file from fileserver
|
||
|
- *copy* -- copy file between directories
|
||
|
- *remove* -- remove single file or folder
|
||
|
- *extract* -- extract files from downloaded archive
|
||
|
|
||
|
These internal operations are essential but many more can be eventually
|
||
|
implemented.
|
||
|
|
||
|
External tasks executing external binary should be optionally runnable in
|
||
|
sandbox. But for security sake there is no reason to execute them outside of
|
||
|
sandbox. So all external tasks are executed within a general a configurable
|
||
|
sandbox. Configuration options for sandboxes will be called limits and there can
|
||
|
be specified for example time or memory limits.
|
||
|
|
||
|
#### Configuration File Content
|
||
|
|
||
|
Content of the configuration file can be divided in two parts, first concerns
|
||
|
about the job in general and its metadata, second one relates to the tasks and
|
||
|
their specification.
|
||
|
|
||
|
There is not much to express in general job metadata. There can be
|
||
|
identification of the job and some general options, like enable/disable logging.
|
||
|
But really necessary item is address of the fileserver from where supplementary
|
||
|
files are downloaded. This option is crucial because there can be more
|
||
|
fileservers and the worker have no other way how to figure out where the files
|
||
|
might be.
|
||
|
|
||
|
More interesting situation is about the metadata of tasks. From the initial
|
||
|
analysis of evaluation unit and its structure there are derived at least these
|
||
|
generally needed items:
|
||
|
|
||
|
- *task identification* -- identificator used at least for specifying
|
||
|
dependencies
|
||
|
- *type* -- as described before, one of: 'initiation', 'execution', 'evaluation'
|
||
|
or 'inner'
|
||
|
- *priority* -- priority can additionally control execution flow in task graph
|
||
|
- *dependencies* -- necessary item for constructing hierarchy of tasks into DAG
|
||
|
- *execution command* -- command which should be executed withing this tasks
|
||
|
with possible parameters parameters
|
||
|
|
||
|
Previous list of items is applicable both for internal and external tasks.
|
||
|
Internal tasks do not need any more items but external do. Additional items are
|
||
|
exclusively related to sandboxing and limitation:
|
||
|
|
||
|
- *sandbox name* -- there should be possibility to have multiple sandboxes, so
|
||
|
identification of the right one is needed
|
||
|
- *limits* -- hardware and software resources limitations
|
||
|
- *time limit* -- limits time of execution
|
||
|
- *memory limit* -- maximum memory which can be consumed by external program
|
||
|
- *I/O operations* -- limitation concerning disk operations
|
||
|
- *restrict filesystem* -- restrict or enable access to directories
|
||
|
|
||
|
#### Supplementary Files
|
||
|
|
||
|
Interesting problem arise with supplementary files (e.g., inputs, sample
|
||
|
outputs). There are two main ways which can be observed. Supplementary files can
|
||
|
be downloaded either on the start of the execution or during the execution.
|
||
|
|
||
|
If the files are downloaded at the beginning, the execution does not really
|
||
|
started at this point and thus if there are problems with network, worker will
|
||
|
find it right away and can abort execution without executing a single task.
|
||
|
Slight problems can arise if some of the files needs to have specific name (e.g.
|
||
|
solution assumes that the input is `input.txt`). In this scenario the downloaded
|
||
|
files cannot be renamed at the beginning but during the execution which is
|
||
|
impractical and not easily observed by the authors of job configurations.
|
||
|
|
||
|
Second solution of this problem when files are downloaded on the fly has quite
|
||
|
opposite problem. If there are problems with network, worker will find it during
|
||
|
execution when for instance almost whole execution is done. This is also not
|
||
|
ideal solution if we care about burnt hardware resources. On the other hand
|
||
|
using this approach users have advanced control of the execution flow and know
|
||
|
what files exactly are available during execution which is from users
|
||
|
perspective probably more appealing then the first solution. Based on that,
|
||
|
downloading of supplementary files using 'fetch' tasks during execution was
|
||
|
chosen and implemented.
|
||
|
|
||
|
#### Job Variables
|
||
|
|
||
|
Considering the fact that jobs can be executed within the worker on different
|
||
|
machines with specific settings, it can be handy to have some kind of mechanism
|
||
|
in the job configuration which will hide these particular worker details, most
|
||
|
notably specific directory structure. For this purpose marks or signs can be
|
||
|
used and can have a form of broadly used variables.
|
||
|
|
||
|
Variables in general can be used everywhere where configuration values (not
|
||
|
keys) are expected. This implies that substitution should be done after parsing
|
||
|
of job configuration, not before. The only usage for variables which was
|
||
|
considered is for directories within worker, but in future this might be subject
|
||
|
to change.
|
||
|
|
||
|
Final form of variables is `${...}` where triple dot is textual description.
|
||
|
This format was used because of special dollar sign character which cannot be
|
||
|
used within paths of regular filesystems. Braces are there only to border
|
||
|
textual description of variable.
|
||
|
|
||
|
### Broker
|
||
|
|
||
|
The broker is responsible for keeping track of available workers and
|
||
|
distributing jobs that it receives from the frontend between them.
|
||
|
|
||
|
#### Worker Management
|
||
|
|
||
|
It is intended for the broker to be a fixed part of the backend infrastructure
|
||
|
to which workers connect at will. Thanks to this design, workers can be added
|
||
|
and removed when necessary (and possibly in an automated fashion), without
|
||
|
changing the configuration of the broker. An alternative solution would be
|
||
|
configuring a list of workers before startup, thus making them passive in the
|
||
|
communication (in the sense that they just wait for incoming jobs instead of
|
||
|
connecting to the broker). However, this approach comes with a notable
|
||
|
administration overhead -- in addition to starting a worker, the administrator
|
||
|
would have to update the worker list.
|
||
|
|
||
|
Worker management must also take into account the possibility of worker
|
||
|
disconnection, either because of a network or software failure (or termination).
|
||
|
A common way to detect such events in distributed systems is to periodically
|
||
|
send short messages to other nodes and expect a response. When these messages
|
||
|
stop arriving, we presume that the other node encountered a failure. Both the
|
||
|
broker and workers can be made responsible for initiating these exchanges and it
|
||
|
seems that there are no differences stemming from this choice. We decided that
|
||
|
the workers will be the active party that initiates the exchange.
|
||
|
|
||
|
#### Scheduling
|
||
|
|
||
|
Jobs should be scheduled in a way that ensures that they will be processed
|
||
|
without unnecessary waiting. This depends on the fairness of the scheduling
|
||
|
algorithm (no worker machine should be overloaded).
|
||
|
|
||
|
The design of such scheduling algorithm is complicated by the requirements on
|
||
|
the diversity of workers -- they can differ in operating systems, available
|
||
|
software, computing power and many other aspects.
|
||
|
|
||
|
We decided to keep the details of connected workers hidden from the frontend,
|
||
|
which should lead to a better separation of responsibilities and flexibility.
|
||
|
Therefore, the frontend needs a way of communicating its requirements on the
|
||
|
machine that processes a job without knowing anything about the available
|
||
|
workers. A key-value structure is suitable for representing such requirements.
|
||
|
|
||
|
With respect to these constraints, and because the analysis and design of a more
|
||
|
sophisticated solution was declared out of scope of our project assignment, a
|
||
|
rather simple scheduling algorithm was chosen. The broker shall maintain a queue
|
||
|
of available workers. When assigning a job, it traverses this queue and chooses
|
||
|
the first machine that matches the requirements of the job. This machine is then
|
||
|
moved to the end of the queue.
|
||
|
|
||
|
Presented algorithm results in a simple round-robin load balancing strategy,
|
||
|
which should be sufficient for small-scale deployments (such as a single
|
||
|
university). However, with a large amount of jobs, some workers will easily
|
||
|
become overloaded. The implementation must allow for a simple replacement of the
|
||
|
load balancing strategy so that this problem can be solved in the near future.
|
||
|
|
||
|
#### Forwarding Jobs
|
||
|
|
||
|
Information about a job can be divided in two disjoint parts -- what the worker
|
||
|
needs to know to process it and what the broker needs to forward it to the
|
||
|
correct worker. It remains to be decided how this information will be
|
||
|
transferred to its destination.
|
||
|
|
||
|
It is technically possible to transfer all the data required by the worker at
|
||
|
once through the broker. This package could contain submitted files, test
|
||
|
data, requirements on the worker, etc. A drawback of this solution is that
|
||
|
both submitted files and test data can be rather large. Furthermore, it is
|
||
|
likely that test data would be transferred many times.
|
||
|
|
||
|
Because of these facts, we decided to store data required by the worker using a
|
||
|
shared storage space and only send a link to this data through the broker. This
|
||
|
approach leads to a more efficient network and resource utilization (the broker
|
||
|
doesn't have to process data that it doesn't need), but also makes the job
|
||
|
submission flow more complicated.
|
||
|
|
||
|
#### Further Requirements
|
||
|
|
||
|
The broker can be viewed as a central point of the backend. While it has only
|
||
|
two primary, closely related responsibilities, other requirements have arisen
|
||
|
(forwarding messages about job evaluation progress back to the frontend) and
|
||
|
will arise in the future. To facilitate such requirements, its architecture
|
||
|
should allow simply adding new communication flows. It should also be as
|
||
|
asynchronous as possible to enable efficient communication with external
|
||
|
services, for example via HTTP.
|
||
|
|
||
|
### Worker
|
||
|
|
||
|
Worker is a component which is supposed to execute incoming jobs from broker. As
|
||
|
such worker should work and support wide range of different infrastructures and
|
||
|
maybe even platforms/operating systems. Support of at least two main operating
|
||
|
systems is desirable and should be implemented.
|
||
|
|
||
|
Worker as a service does not have to be very complicated, but a bit of complex
|
||
|
behaviour is needed. Mentioned complexity is almost exclusively concerned about
|
||
|
robust communication with broker which has to be regularly checked. Ping
|
||
|
mechanism is usually used for this in all kind of projects. This means that the
|
||
|
worker should be able to send ping messages even during execution. So worker has
|
||
|
to be divided into two separate parts, the one which will handle communication
|
||
|
with broker and the another which will execute jobs.
|
||
|
|
||
|
The easiest solution is to have these parts in separate threads which somehow
|
||
|
tightly communicates with each other. For inter process communication there can
|
||
|
be used numerous technologies, from shared memory to condition variables or some
|
||
|
kind of in-process messages. The ZeroMQ library which we already use provides
|
||
|
in-process messages that work on the same principles as network communication,
|
||
|
which is convenient and solves problems with thread synchronization.
|
||
|
|
||
|
#### Execution of Jobs
|
||
|
|
||
|
At this point we have worker with two internal parts listening one and execution
|
||
|
one. Implementation of first one is quite straightforward and clear. So let us
|
||
|
discuss what should be happening in execution subsystem.
|
||
|
|
||
|
After successful arrival of the job from broker to the listening thread, the job
|
||
|
is immediately redirected to execution thread. In there worker has to prepare
|
||
|
new execution environment, solution archive has to be downloaded from fileserver
|
||
|
and extracted. Job configuration is located within these files and loaded into
|
||
|
internal structures and executed. After that, results are uploaded back to
|
||
|
fileserver. These steps are the basic ones which are really necessary for whole
|
||
|
execution and have to be executed in this precise order.
|
||
|
|
||
|
The evaluation unit executed by ReCodEx and job configuration were already
|
||
|
discussed above. The conclusion was that jobs containing small tasks will be
|
||
|
used. Particular format of the actual job configuration can be found in 'Job
|
||
|
configuration' appendix. Implementation of parsing and storing these data in
|
||
|
worker is then quite straightforward.
|
||
|
|
||
|
Worker has internal structures to which loads and which stores metadata given in
|
||
|
configuration. Whole job is mapped to job metadata structure and tasks are
|
||
|
mapped to either external ones or internal ones (internal commands has to be
|
||
|
defined within worker), both are different whether they are executed in sandbox
|
||
|
or as an internal worker commands.
|
||
|
|
||
|
#### Task Execution Failure
|
||
|
|
||
|
Another division of tasks is by task-type field in configuration. This field can
|
||
|
have four values: initiation, execution, evaluation and inner. All was discussed
|
||
|
and described above in evaluation unit analysis. What is important to worker is
|
||
|
how to behave if execution of task with some particular type fails.
|
||
|
|
||
|
There are two possible situations execution fails due to bad user solution or
|
||
|
due to some internal error. If execution fails on internal error solution cannot
|
||
|
be declared overly as failed. User should not be punished for bad configuration
|
||
|
or some network error. This is where task types are useful.
|
||
|
|
||
|
Initiation, execution and evaluation are tasks which are usually executing code
|
||
|
which was given by users who submitted solution of exercise. If this kinds of
|
||
|
tasks fail it is probably connected with bad user solution and can be evaluated.
|
||
|
|
||
|
But if some inner task fails solution should be re-executed, in best case
|
||
|
scenario on different worker. That is why if inner task fails it is sent back to
|
||
|
broker which will reassign job to another worker. More on this subject should be
|
||
|
discussed in broker assigning algorithms section.
|
||
|
|
||
|
#### Job Working Directories
|
||
|
|
||
|
There is also question about working directory or directories of job, which
|
||
|
directories should be used and what for. There is one simple answer on this
|
||
|
every job will have only one specified directory which will contain every file
|
||
|
with which worker will work in the scope of whole job execution. This solution
|
||
|
is easy but fails due to logical and security reasons.
|
||
|
|
||
|
The least which must be done are two folders one for internal temporary files
|
||
|
and second one for evaluation. The directory for temporary files is enough to
|
||
|
comprehend all kind of internal work with filesystem but only one directory for
|
||
|
whole evaluation is somehow not enough.
|
||
|
|
||
|
The solution which was chosen at the end is to have folders for downloaded
|
||
|
archive, decompressed solution, evaluation directory in which user solution is
|
||
|
executed and then folders for temporary files and for results and generally
|
||
|
files which should be uploaded back to fileserver with solution results.
|
||
|
|
||
|
There has to be also hierarchy which separate folders from different workers on
|
||
|
the same machines. That is why paths to directories are in format:
|
||
|
`${DEFAULT}/${FOLDER}/${WORKER_ID}/${JOB_ID}` where default means default
|
||
|
working directory of whole worker, folder is particular directory for some
|
||
|
purpose (archives, evaluation, ...).
|
||
|
|
||
|
Mentioned division of job directories proved to be flexible and detailed enough,
|
||
|
everything is in logical units and where it is supposed to be which means that
|
||
|
searching through this system should be easy. In addition if solutions of users
|
||
|
have access only to evaluation directory then they do not have access to
|
||
|
unnecessary files which is better for overall security of whole ReCodEx.
|
||
|
|
||
|
### Sandboxing
|
||
|
|
||
|
There are numerous ways how to approach sandboxing on different platforms,
|
||
|
describing all possible approaches is out of scope of this document. Instead of
|
||
|
that have a look at some of the features which are certainly needed for ReCodEx
|
||
|
and propose some particular sandboxes implementations on Linux or Windows.
|
||
|
|
||
|
General purpose of sandbox is safely execute software in any form, from scripts
|
||
|
to binaries. Various sandboxes differ in how safely are they and what limiting
|
||
|
features they have. Ideal situation is that sandbox will have numerous options
|
||
|
and corresponding features which will allow administrators to setup environment
|
||
|
as they like and which will not allow user programs to somehow damage executing
|
||
|
machine in any way possible.
|
||
|
|
||
|
For ReCodEx and its evaluation there is need for at least these features:
|
||
|
execution time and memory limitation, disk operations limit, disk accessibility
|
||
|
restrictions and network restrictions. All these features if combined and
|
||
|
implemented well are giving pretty safe sandbox which can be used for all kinds
|
||
|
of users solutions and should be able to restrict and stop any standard way of
|
||
|
attacks or errors.
|
||
|
|
||
|
#### Linux
|
||
|
|
||
|
Linux systems have quite extent support of sandboxing in kernel, there were
|
||
|
introduced and implemented kernel namespaces and cgroups which combined can
|
||
|
limit hardware resources (cpu, memory) and separate executing program into its
|
||
|
own namespace (pid, network). These two features comply sandbox requirement for
|
||
|
ReCodEx so there were two options, either find existing solution or implement
|
||
|
new one. Luckily existing solution was found and its name is **isolate**.
|
||
|
Isolate does not use all possible kernel features but only subset which is still
|
||
|
enough to be used by ReCodEx.
|
||
|
|
||
|
#### Windows
|
||
|
|
||
|
The opposite situation is in Windows world, there is limited support in its
|
||
|
kernel which makes sandboxing a bit trickier. Windows kernel only has ways how
|
||
|
to restrict privileges of a process through restriction of internal access
|
||
|
tokens. Monitoring of hardware resources is not possible but used resources can
|
||
|
be obtained through newly created job objects.
|
||
|
|
||
|
There are numerous sandboxes for Windows but they all are focused on different
|
||
|
things in a lot of cases they serves as safe environment for malicious programs,
|
||
|
viruses in particular. Or they are designed as a separate filesystem namespace
|
||
|
for installing a lot of temporarily used programs. From all these we can
|
||
|
mention: Sandboxie, Comodo Internet Security, Cuckoo sandbox and many others.
|
||
|
None of these is fitted as sandbox solution for ReCodEx. With this being said we
|
||
|
can safely state that designing and implementing new general sandbox for Windows
|
||
|
is out of scope of this project.
|
||
|
|
||
|
But designing sandbox only for specific environment is possible, namely for C#
|
||
|
and .NET. CLR as a virtual machine and runtime environment has a pretty good
|
||
|
security support for restrictions and separation which is also transferred to
|
||
|
C#. This makes it quite easy to implement simple sandbox within C# but there are
|
||
|
not any well known general purpose implementations.
|
||
|
|
||
|
As mentioned in previous paragraphs implementing our own solution is out of
|
||
|
scope of project. But C# sandbox is quite good topic for another project for
|
||
|
example term project for C# course so it might be written and integrated in
|
||
|
future.
|
||
|
|
||
|
### Fileserver
|
||
|
|
||
|
The fileserver provides access over HTTP to a shared storage space that contains
|
||
|
files submitted by students, supplementary files such as test inputs and outputs
|
||
|
and results of evaluation. In other words, it acts as an intermediate storage
|
||
|
node for data passed between the frontend and the backend. This functionality
|
||
|
can be easily separated from the rest of the backend features, which led to
|
||
|
designing the fileserver as a standalone component. Such design helps
|
||
|
encapsulate the details of how the files are stored (e.g. on a file system, in a
|
||
|
database or using a cloud storage service), while also making it possible to
|
||
|
share the storage between multiple ReCodEx frontends.
|
||
|
|
||
|
For early releases of the system, we chose to store all files on the file system
|
||
|
-- it is the least complicated solution (in terms of implementation complexity)
|
||
|
and the storage backend can be rather easily migrated to a different technology.
|
||
|
|
||
|
One of the facts we learned from CodEx is that many exercises share test input
|
||
|
and output files, and also that these files can be rather large (hundreds of
|
||
|
megabytes). A direct consequence of this is that we cannot add these files to
|
||
|
submission archives that are to be downloaded by workers -- the combined size of
|
||
|
the archives would quickly exceed gigabytes, which is impractical. Another
|
||
|
conclusion we made is that a way to deal with duplicate files must be
|
||
|
introduced.
|
||
|
|
||
|
A simple solution to this problem is storing supplementary files under the
|
||
|
hashes of their content. This ensures that every file is stored only once. On
|
||
|
the other hand, it makes it more difficult to understand what the content of a
|
||
|
file is at a glance, which might prove problematic for the administrator.
|
||
|
However, human-readable identification is not as important as removing
|
||
|
duplicates -- administrators rarely need to inspect stored files (and when they
|
||
|
do, they should know their hashes), but duplicate files occupied a large part of
|
||
|
the disk space used by CodEx.
|
||
|
|
||
|
A notable part of the work of the fileserver is done by a web server (e.g.
|
||
|
listening to HTTP requests and caching recently accessed files in memory for
|
||
|
faster access). What remains to be implemented is handling requests that upload
|
||
|
files -- student submissions should be stored in archives to facilitate simple
|
||
|
downloading and supplementary exercise files need to be stored under their
|
||
|
hashes.
|
||
|
|
||
|
We decided to use Python and the Flask web framework. This combination makes it
|
||
|
possible to express the logic in ~100 SLOC and also provides means to run the
|
||
|
fileserver as a standalone service (without a web server), which is useful for
|
||
|
development.
|
||
|
|
||
|
### Cleaner
|
||
|
|
||
|
Worker can use caching mechanism based on files from fileserver under one
|
||
|
condition, provided files has to have unique name. This means there has to be
|
||
|
system which can download file, store it in cache and after some time of
|
||
|
inactivity delete it. Because there can be multiple worker instances on some
|
||
|
particular server it is not efficient to have this system in every worker on its
|
||
|
own. So it is feasible to have this feature somehow shared among all workers on
|
||
|
the same machine.
|
||
|
|
||
|
Solution may be again having separate service connected through network with
|
||
|
workers which would provide such functionality, but this would mean component
|
||
|
with another communication for the purpose, where it is not exactly needed. But
|
||
|
mainly it would be single-failure component. If it would stop working then it is
|
||
|
quite a problem.
|
||
|
|
||
|
So there was chosen another solution which assumes worker has access to
|
||
|
specified cache folder. In there folder worker can download supplementary files
|
||
|
and copy them from here. This means every worker has the possibility to maintain
|
||
|
downloads to cache, but what is worker not able to properly do, is deletion of
|
||
|
unused files after some time.
|
||
|
|
||
|
#### Architecture
|
||
|
|
||
|
For that functionality single-purpose component is introduced which is called
|
||
|
'cleaner'. It is simple script executed within cron which is able to delete
|
||
|
files which were unused for some time. Together with worker fetching feature
|
||
|
cleaner completes particular server specific caching system.
|
||
|
|
||
|
Cleaner as mentioned is simple script which is executed regularly as a cron job.
|
||
|
If there is caching system like it was introduced in paragraph above there are
|
||
|
little possibilities how cleaner should be implemented.
|
||
|
|
||
|
On various filesystems there is usually support for two particular timestamps,
|
||
|
`last access time` and `last modification time`. Files in cache are once
|
||
|
downloaded and then just copied, this means that last modification time is set
|
||
|
only once on creation of file and last access time should be set every time on
|
||
|
copy. From this we can conclude that last access time is what is needed here.
|
||
|
|
||
|
But unlike last modification time, last access time is not usually enabled on
|
||
|
conventional filesystems (more on this subject can be found
|
||
|
[here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime)).
|
||
|
So if we choose to use last access time, filesystem used for cache folder has to
|
||
|
have last access time for files enabled. Last access time was chosen for
|
||
|
implementation in ReCodEx but this might change in further releases.
|
||
|
|
||
|
However, there is another way, last modification time which is broadly supported
|
||
|
can be used. But this solution is not automatic and worker would have to 'touch'
|
||
|
cache files whenever they are accessed. This solution is maybe a bit better than
|
||
|
the one with last access time and might be implemented in future releases.
|
||
|
|
||
|
#### Caching Flow
|
||
|
|
||
|
Having cleaner as separated component and caching itself handled in worker is
|
||
|
kind of blurry and is not clearly observable that it works without problems.
|
||
|
The goal is to have system which can recover from every kind of errors.
|
||
|
|
||
|
Follows description of one possible implementation. This whole mechanism relies
|
||
|
on worker ability to recover from internal fetch task failure. In case of error
|
||
|
here job will be reassigned to another worker where problem hopefully does not
|
||
|
arise.
|
||
|
|
||
|
First start with worker implementation:
|
||
|
|
||
|
- worker discovers fetch task which should download supplementary file
|
||
|
- worker takes name of file and tries to copy it from cache folder to its
|
||
|
working folder
|
||
|
- if successful then last access time should be rewritten (by filesystem
|
||
|
itself) and whole operation is done
|
||
|
- if not successful then file has to be downloaded
|
||
|
- file is downloaded from fileserver to working folder and then
|
||
|
copied to cache
|
||
|
|
||
|
Previous implementation is only within worker, cleaner can anytime intervene and
|
||
|
delete files. Implementation in cleaner follows:
|
||
|
|
||
|
- cleaner on its start stores current reference timestamp which will be used for
|
||
|
comparison and load configuration values of caching folder and maximal file
|
||
|
age
|
||
|
- there is a loop going through all files and even directories in specified
|
||
|
cache folder
|
||
|
- if difference between last access time and reference timestamp is greater
|
||
|
than specified maximal file age, then file or folder is deleted
|
||
|
|
||
|
Previous description implies that there is gap between detection of last access
|
||
|
time and deleting file within cleaner. In the gap there can be worker which will
|
||
|
access file and the file is anyway deleted but this is fine, file is deleted but
|
||
|
worker has it copied. If worker does not copy whole file or even do not start to
|
||
|
copy it and the file is deleted then copy process will fail. This will cause
|
||
|
internal task failure which will be handled by reassigning job to another
|
||
|
worker.
|
||
|
|
||
|
Another problem can be with two workers downloading the same file, but this is
|
||
|
also not a problem, file is firstly downloaded to working folder and after that
|
||
|
copied to cache.
|
||
|
|
||
|
And even if something else unexpectedly fails and because of that fetch task
|
||
|
will fail during execution, even that should be fine as mentioned previously.
|
||
|
Reassigning of job should be the last salvation in case everything else goes
|
||
|
wrong.
|
||
|
|
||
|
### Monitor
|
||
|
|
||
|
Users want to view real time evaluation progress of their solution. It can be
|
||
|
easily done with established double-sided connection stream, but it is hard to
|
||
|
achieve with plain HTTP. HTTP itself works on a separate request basis with no
|
||
|
long term connection. The HTML5 specification contains Server-Sent Events - a
|
||
|
means of sending text messages unidirectionally from an HTTP server to a
|
||
|
subscribed website. Sadly, it is not supported in Internet Explorer and Edge.
|
||
|
|
||
|
However, there is another widely used technology that can solve this problem --
|
||
|
the WebSocket protocol. It is more general than necessary (it enables
|
||
|
bidirectional communication) and requires additional web server configuration,
|
||
|
but it is supported in recent versions of all major web browsers.
|
||
|
|
||
|
Working with the WebSocket protocol from the backend is possible, but not ideal
|
||
|
from the design point of view. Backend should be hidden from public internet to
|
||
|
minimize surface for possible attacks. With this in mind, there are two possible
|
||
|
options:
|
||
|
|
||
|
- send progress messages through the API
|
||
|
- make a separate component that forwards progress messages to clients
|
||
|
|
||
|
Both of the two possibilities have their benefits and drawbacks. The first one
|
||
|
requires no additional component and the API is already publicly visible. On the
|
||
|
other side, working with WebSockets from PHP is complicated (but it is possible
|
||
|
with the help of third-party libraries) and embedding this functionality into
|
||
|
API is not extendable. The second approach is better for future changing the
|
||
|
protocol or implementing extensions like caching of messages. Also, the progress
|
||
|
feature is considered only optional, because there may be clients for which this
|
||
|
feature is useless. Major drawback of separate component is another part, which
|
||
|
needs to be publicly exposed.
|
||
|
|
||
|
We decided to make a separate component, mainly because it is smaller component
|
||
|
with only one role, better maintainability and optional demands for progress
|
||
|
callback.
|
||
|
|
||
|
There are several possibilities how to write the component. Notably, considered
|
||
|
options were already used languages C++, PHP, JavaScript and Python. At the end,
|
||
|
the Python language was chosen for its simplicity, great support for all used
|
||
|
technologies and also there are free Python developers in out team.
|
||
|
|
||
|
### API Server
|
||
|
|
||
|
The API server must handle HTTP requests and manage the state of the application
|
||
|
in some kind of a database. The API server will be a RESTful service and will
|
||
|
return data encoded as JSON documents. It must also be able to communicate with
|
||
|
the backend over ZeroMQ.
|
||
|
|
||
|
We considered several technologies which could be used:
|
||
|
|
||
|
- PHP + Apache -- one of the most widely used technologies for creating web
|
||
|
servers. It is a suitable technology for this kind of a project. It has all
|
||
|
the features we need when some additional extensions are installed (to support
|
||
|
LDAP or ZeroMQ).
|
||
|
- Ruby on Rails, Python (Django), etc. -- popular web technologies that appeared
|
||
|
in the last decade. Both support ZeroMQ and LDAP via extensions and have large
|
||
|
developer communities.
|
||
|
- ASP.NET (C#), JSP (Java) -- these technologies are very robust and are used to
|
||
|
create server technologies in many big enterprises. Both can run on Windows
|
||
|
and Linux servers (ASP.NET using the .NET Core).
|
||
|
- JavaScript (Node.js) -- it is a quite new technology and it is being used to
|
||
|
create REST APIs lately. Applications running on Node.js are quite performant
|
||
|
and the number of open-source libraries available on the Internet is very
|
||
|
huge.
|
||
|
|
||
|
We chose PHP and Apache mainly because we were familiar with these technologies
|
||
|
and we were able to develop all the features we needed without learning to use a
|
||
|
new technology. Since the number of features was quite high and needed to meet a
|
||
|
strict deadline. This does not mean that we would find all the other
|
||
|
technologies superior to PHP in all other aspects - PHP 7 is a mature language
|
||
|
with a huge community and a wide range of tools, libraries, and frameworks.
|
||
|
|
||
|
We decided to use an ORM framework to manage the database, namely the widely
|
||
|
used PHP ORM Doctrine 2. Using an ORM tool means we do not have to write SQL
|
||
|
queries by hand. Instead, we work with persistent objects, which provides a
|
||
|
higher level of abstraction. Doctrine also has a robust database abstraction
|
||
|
layer so the database engine is not very important and it can be changed without
|
||
|
any need for changing the code. MariaDB was chosen as the storage backend.
|
||
|
|
||
|
To speed up the development process of the PHP server application we decided to
|
||
|
use a web framework. After evaluating and trying several frameworks, such as
|
||
|
Lumen, Laravel, and Symfony, we ended up using Nette.
|
||
|
|
||
|
- **Lumen** and **Laravel** seemed promising but the default ORM framework
|
||
|
Eloquent is an implementation of ActiveRecord which we wanted to avoid. It
|
||
|
was also surprisingly complicated to implement custom middleware for validation
|
||
|
of access tokens in the headers of incoming HTTP requests.
|
||
|
- **Symfony** is a very good framework and has Doctrine "built-in". The reason
|
||
|
why we did not use Symfony in the end was our lack of experience with this
|
||
|
framework.
|
||
|
- **Nette framework** is very popular in the Czech Republic -- its lead
|
||
|
developer is a well-known Czech programmer David Grudl. We were already
|
||
|
familiar with the patterns used in this framework, such as dependency
|
||
|
injection, authentication, routing. These concepts are useful even when
|
||
|
developing a REST application which might be a surprise considering that
|
||
|
Nette focuses on "traditional" web applications.
|
||
|
Nette is inspired by Symfony and many of the Symfony bundles are available
|
||
|
as components or extensions for Nette. There is for example a Nette
|
||
|
extension which makes integration of Doctrine 2 very straightforward.
|
||
|
|
||
|
#### Architecture of The System
|
||
|
|
||
|
The Nette framework is an MVP (Model, View, Presenter) framework. It has many
|
||
|
tools for creating complex websites and we need only a subset of them or we use
|
||
|
different libraries which suite our purposes better:
|
||
|
|
||
|
- **Model** - the model layer is implemented using the Doctrine 2 ORM instead of
|
||
|
Nette Database
|
||
|
- **View** - the whole view layer of the Nette framework (e.g., the Latte engine
|
||
|
used for HTML template rendering) is unnecessary since we will return all the
|
||
|
responses encoded in JSON. JSON is a common format used in APIs and we decided
|
||
|
to prefer it to XML or a custom format.
|
||
|
- **Presenter** - the whole lifecycle of a request processing of the Nette
|
||
|
framework is used. The Presenters are used to group the logic of the individual
|
||
|
API endpoints. The routing mechanism is modified to distinguish the actions by
|
||
|
both the URL and the HTTP method of the request.
|
||
|
|
||
|
#### Authentication
|
||
|
|
||
|
To make certain data and actions accessible only for some specific users, there
|
||
|
must be a way how these users can prove their identity. We decided to avoid PHP
|
||
|
sessions to make the server stateless (session ID is stored in the cookies of
|
||
|
the HTTP requests and responses). The server issues a specific token for the
|
||
|
user after his/her identity is verified (i.e., by providing email and password)
|
||
|
and sent to the client in the body of the HTTP response. The client must
|
||
|
remember this token and attach it to every following request in the
|
||
|
*Authorization* header.
|
||
|
|
||
|
The token must be valid only for a certain time period ("log out" the user after
|
||
|
a few hours of inactivity) and it must be protected against abuse (e.g., an
|
||
|
attacker must not be able to issue a token which will be considered valid by the
|
||
|
system and using which the attacker could pretend to be a different user). We
|
||
|
decided to use the JWT standard (the JWS).
|
||
|
|
||
|
The JWT is a base64-encoded string which contains three JSON documents - a
|
||
|
header, some payload, and a signature. The interesting parts are the payload and
|
||
|
the signature: the payload can contain any data which can identify the user and
|
||
|
metadata of the token (i.e., the time when the token was issued, the time of
|
||
|
expiration). The last part is a digital signature contains a digital signature
|
||
|
of the header and payload and it ensures that nobody can issue their own token
|
||
|
and steal the identity of someone. Both of these characteristics give us the
|
||
|
opportunity to validate the token without storing all of the tokens in the
|
||
|
database.
|
||
|
|
||
|
To implement JWT in Nette, we have to implement some of its security-related
|
||
|
interfaces such as IAuthenticator and IUserStorage, which is rather easy thanks
|
||
|
to the simple authentication flow. Replacing these services in a Nette
|
||
|
application is also straightforward, thanks to its dependency injection
|
||
|
container implementation. The encoding and decoding of the tokens itself
|
||
|
including generating the signature and signature verification is done through a
|
||
|
widely used third-party library which lowers the risk of having a bug in the
|
||
|
implementation of this critical security feature.
|
||
|
|
||
|
##### Backend Monitoring
|
||
|
|
||
|
The next thing related to communication with the backend is monitoring its
|
||
|
current state. This concerns namely which workers are available for processing
|
||
|
different hardware groups and which languages can be therefore used in
|
||
|
exercises.
|
||
|
|
||
|
Another step would be the overall backend state like how many jobs were
|
||
|
processed by some particular worker, workload of the broker and the workers,
|
||
|
etc. The easiest solution is to manage this information by hand, every instance
|
||
|
of the API server has to have an administrator which would have to fill them.
|
||
|
This includes only the currently available workers and runtime
|
||
|
environments which does not change very often. The real-time statistics of the
|
||
|
backend cannot be made accessible this way in a reasonable way.
|
||
|
|
||
|
A better solution is to update this information automatically. This can be
|
||
|
done in two ways:
|
||
|
|
||
|
- It can be provided by the backend on-demand if API needs it
|
||
|
- The backend will send these information periodically to the API.
|
||
|
|
||
|
Things like currently available workers or runtime environments are better to be
|
||
|
really up-to-date so this could be provided on-demand if needed. Backend
|
||
|
statistics are not that necessary and could be updated periodically.
|
||
|
|
||
|
However due to the lack of time automatic monitoring of the backend state will
|
||
|
not be implemented in the early versions of this project but might be
|
||
|
implemented in some of the next releases.
|
||
|
|
||
|
### Web Application
|
||
|
|
||
|
The web application ("WebApp") is one of the possible client applications of the
|
||
|
ReCodEx system. Creating a web application as the first client application has
|
||
|
several advantages:
|
||
|
|
||
|
- no installation or setup is required on the device of the user
|
||
|
- works on all platforms including mobile devices
|
||
|
- when a new version is released, all the clients will use this version without
|
||
|
any need for manual installation of the update
|
||
|
|
||
|
One of the downsides is the large number of different web browsers (including
|
||
|
the older versions of a specific browser) and their different interpretation
|
||
|
of the code (HTML, CSS, JS). Some features of the latest specifications of HTML5
|
||
|
are implemented in some browsers which are used by a subset of the Internet
|
||
|
users. This has to be taken into account when choosing appropriate tools
|
||
|
for implementation of a website.
|
||
|
|
||
|
There are two basic ways how to create a website these days:
|
||
|
|
||
|
- **server-side approach** - the actions of the user are processed on the server
|
||
|
and the HTML code with the results of the action is generated on the server
|
||
|
and sent back to the web browser of the user. The client does not handle any
|
||
|
logic (apart from rendering of the user interface and some basic user
|
||
|
interaction) and is therefore very simple. The server can use the API server
|
||
|
for processing of the actions so the business logic of the server can be very
|
||
|
simple as well. A disadvantage of this approach is that a lot of redundant
|
||
|
data is transferred across the requests although some parts of the content can
|
||
|
be cached (e.g., CSS files). This results in longer loading times of the
|
||
|
website.
|
||
|
- **server-side rendering with asynchronous updates (AJAX)** - a slightly
|
||
|
different approach is to render the page on the server as in the previous case
|
||
|
but then execute the actions of the user asynchronously using the
|
||
|
`XMLHttpRequest` JavaScript functionality. Which creates a HTTP request and
|
||
|
transfers only the part of the website which will be updated.
|
||
|
- **client-side approach** - the opposite approach is to transfer the
|
||
|
communication with the API server and the rendering of the HTML completely
|
||
|
from the server directly to the client. The client runs the code (usually
|
||
|
JavaScript) in his/her web browser and the content of the website is generated
|
||
|
based on the data received from the API server. The script file is usually
|
||
|
quite large but it can be cached and does not have to be downloaded from the
|
||
|
server again (until the cached file expires). Only the data from the API
|
||
|
server needs to be transferred over the Internet and thus reduce the volume of
|
||
|
payload on each request which leads to a much more responsive user experience,
|
||
|
especially on slower networks. Since the client-side code has full control
|
||
|
over the UI and a more sophisticated user interactions with the UI can be
|
||
|
achieved.
|
||
|
|
||
|
All of these are used in production by the web developers and all
|
||
|
of them are well documented and there are mature tools for creating websites
|
||
|
using any of these approaches.
|
||
|
|
||
|
We decided to use the third approach -- to create a fully client-side
|
||
|
application which would be familiar and intuitive for a user who is used to
|
||
|
modern web applications.
|
||
|
|
||
|
#### Used Technologies
|
||
|
|
||
|
We examined several frameworks which are commonly used to speed up the
|
||
|
development of a web application. There are several open source options
|
||
|
available with a large number of tools, tutorials, and libraries. From the many
|
||
|
options (Backbone, Ember, Vue, Cycle.js, ...) there are two main frameworks
|
||
|
worth considering:
|
||
|
|
||
|
- **Angular 2** - it is a new framework which was developed by Google. This
|
||
|
framework is very complex and provides the developer with many tools which
|
||
|
make creating a website very straightforward. The code can be written in pure
|
||
|
JavaScript (ES5) or using the TypeScript language which is then transpiled
|
||
|
into JavaScript. Creating a web application in Angular 2 is based on creating
|
||
|
and composing components. The previous version of Angular is not compatible
|
||
|
with this new version.
|
||
|
- **React and Redux** - [React](https://facebook.github.io/react) is a fast
|
||
|
library for rendering of the user interface developed by Facebook. It is based
|
||
|
on components composition as well. A React application is usually written in
|
||
|
EcmaScript 6 and the JSX syntax for defining the component tree. This code is
|
||
|
usually transpiled to JavaScript (ES5) using some kind of a transpiler like
|
||
|
Babel. [Redux](http://redux.js.org/) is a library for managing the state of
|
||
|
the application and it implements a modification of the so-called Flux
|
||
|
architecture introduced by Facebook. React and Redux are being used for a
|
||
|
longer time than Angular 2 and both are still actively developed. There are
|
||
|
many open-source components and addons available for both React and Redux.
|
||
|
|
||
|
We decided to use React and Redux over Angular 2 for several reasons:
|
||
|
|
||
|
- There is a large community around these libraries and there is a large number
|
||
|
of tutorials, libraries, and other resources available online.
|
||
|
- Many of the web frontend developers are familiar with React and Redux and
|
||
|
contributing to the project should be easy for them.
|
||
|
- A stable version of Angular 2 was still not released at the time we started
|
||
|
developing the web application.
|
||
|
- We had previous experience with React and Redux and Angular 2 did not bring
|
||
|
any significant improvements and features over React so it would not be worth
|
||
|
learning the paradigms of a new framework.
|
||
|
- It is easy to debug React component tree and Redux state transitions
|
||
|
using extensions for Google Chrome and Firefox.
|
||
|
|
||
|
##### Internationalization And Globalization
|
||
|
|
||
|
The user interface must be accessible in multiple languages and should be easily
|
||
|
translatable into more languages in the future. The most promissing library
|
||
|
which enables react applications to translate all of the messages of the UI is
|
||
|
[react-intl](https://github.com/yahoo/react-intl).
|
||
|
|
||
|
A good JavaScript library for manipulation with dates and times is
|
||
|
[Moment.js](http://momentjs.com). It is used by many open-source react
|
||
|
components like date and time pickers.
|
||
|
|
||
|
#### User Interface Design
|
||
|
|
||
|
There is no artist on the team so we had to come up with an idea how to create a
|
||
|
visually appealing application with this handicap. User interfaces created by
|
||
|
programmers are notoriously ugly and unintuitive. Luckily we found the
|
||
|
[AdminLTE](https://almsaeedstudio.com/) theme by Abdullah Almsaeed which is
|
||
|
built on top of the [Bootstrap framework](http://getbootstrap.com/) by Twitter.
|
||
|
|
||
|
This is a great combination because there is an open-source implementation of
|
||
|
the Bootstrap components for React and with the stylesheets from AdminLTE the
|
||
|
application looks good and is distingushable form the many websites using the
|
||
|
Bootstrap framework with very little work.
|
||
|
|
||
|
<!---
|
||
|
// vim: set formatoptions=tqn flp+=\\\|^\\*\\s* textwidth=80 colorcolumn=+1:
|
||
|
-->
|
||
|
|