# Implementation ## Broker The broker is a central part of the ReCodEx backend that directs most of the communication. It was designed to maintain a heavy load of messages by making only small actions in the main communication thread and asynchronous execution of other actions. The responsibilities of broker are: - allowing workers to register themselves and keep track of their capabilities - tracking status of each worker and handle cases when they crash - accepting assignment evaluation requests from the frontend and forwarding them to workers - receiving a job status information from workers and forward them to the frontend either via monitor or REST API - notifying the frontend on errors of the backend ### Internal Structure The main work of the broker is to handle incoming messages. For that a _reactor_ subcomponent is written to bind events on sockets to handler classes. There are currently two handlers -- one that handles the main functionality and the other that sends status reports to the REST API asynchronously. This prevents broker freezes when synchronously waiting for responses of HTTP requests, especially when some kind of error happens on the server. Main handler takes care of requests from workers and API servers: - *init* -- initial connection from worker to broker - *done* -- currently processed job on worker was executed and is done - *ping* -- worker proving that it is still alive - *progress* -- job progress state from worker which is immediately forwarded to monitor - *eval* -- request from API server to execute given job Second handler is asynchronous status notifier which is able to execute HTTP requests. This notifier is used on error reporting from backend to frontend API. #### Worker Registry The `worker_registry` class is used to store information about workers, their status and the jobs in their queue. It can look up a worker using the headers received with a request (a worker is considered suitable if and only if it satisfies all the job headers). The headers are arbitrary key-value pairs, which are checked for equality by the broker. However, some headers require special handling, namely `threads`, for which we check if the value in the request is lesser than or equal to the value advertised by the worker, and `hwgroup`, for which we support requesting one of multiple hardware groups by listing multiple names separated with a `|` symbol (e.g. `group_1|group_2|group_3`. The registry also implements a basic load balancing algorithm -- the workers are contained in a queue and whenever one of them receives a job, it is moved to its end, which makes it less likely to receive another job soon. When a worker is assigned a job, it will not be assigned another one until a `done` message is received. #### Error Reporting Broker is the only backend component which is able to report errors directly to the REST API. Other components have to notify the broker first and it forwards the messages to the API. For HTTP communication a *libcurl* library is used. To address security concerns there is a *HTTP Basic Auth* configured on particular API endpoints and correct credentials have to be entered. Following types of failures are distinguished: **Job failure** -- there are two ways a job can fail, internal and external one. An internal failure is the fault of worker, for example when it cannot download a file needed for the evaluation. An external error is for example when the job configuration is malformed. Note that wrong student solution is not considered as a job failure. Jobs that failed internally are reassigned until a limit on the amount of reassignments (configurable with the `max_request_failures` option) is reached. External failures are reported to the frontend immediately. **Worker failure** -- when a worker crash is detected, an attempt to reassign its current job and also all the jobs from its queue is made. Because the current job might be the reason of the crash, its reassignment is also counted towards the `max_request_failures` limit (the counter is shared). If there is no worker that could process a job available (i.e. it cannot be reassigned), the job is reported as failed to the frontend via REST API. **Broker failure** -- when the broker itself crashed and is restarted, workers will reconnect automatically. However, all jobs in their queues are lost. If a worker manages to finish a job and notifies the "new" broker, the report is forwarded to the frontend. The same goes for external failures. Jobs that fail internally cannot be reassigned, because the "new" broker does not know their headers -- they are reported as failed immediately. ### Additional Libraries Broker implementation depends on several open-source C and C++ libraries. - **libcurl** -- Libcurl is used for notifying REST API on job finish event over HTTP protocol. Due to lack of documentation of all C++ bindings the plain C API is used. - **cppzmq** -- Cppzmq is a simple C++ wrapper for core ZeroMQ C API. It basicaly contains only one header file, but its API fits into the object architecture of the broker. - **spdlog** -- Spdlog is small, fast and modern logging library used for system logging. It is highly customizable and configurable from the configuration of the broker. - **yaml-cpp** -- Yaml-cpp is used for parsing broker configuration text file in YAML format. - **boost-filesystem** -- Boost filesystem is used for managing logging directory (create if necessary) and parsing filesystem paths from strings as written in the configuration of the broker. Filesystem operations will be included in future releases of C++ standard, so this dependency may be removed in the future. - **boost-program_options** -- Boost program options is used for parsing of command line positional arguments. It is possible to use POSIX `getopt` C function, but we decided to use boost, which provides nicer API and is already used by worker component. ## Fileserver The fileserver component provides a shared file storage between the frontend and the backend. It is written in Python 3 using Flask web framework. Fileserver stores files in configurable filesystem directory, provides file deduplication and HTTP access. To keep the stored data safe, the fileserver should not be visible from public internet. Instead, it should be accessed indirectly through the REST API. ### File Deduplication From our analysis of the requirements, it is certain we need to implement a means of dealing with duplicate files. File deduplication is implemented by storing files under the hashes of their content. This procedure is done completely inside fileserver. Plain files are uploaded into fileserver, hashed, saved and the new filename is returned back to the uploader. SHA1 is used as hashing function, because it is fast to compute and provides reasonable collision safety for non-cryptographic purposes. Files with the same hash are treated as the same, no additional checks for collisions are performed. However, it is really unlikely to find one. If SHA1 proves insufficient, it is possible to change the hash function to something else, because the naming strategy is fully contained in the fileserver (special care must be taken to maintain backward compatibility). ### Storage Structure Fileserver stores its data in following structure: - `./submissions//` -- folder that contains files submitted by users (the solutions to the assignments of the student). `` is an identifier received from the REST API. - `./submission_archives/.zip` -- ZIP archives of all submissions. These are created automatically when a submission is uploaded. `` is an identifier of the corresponding submission. - `./exercises//` -- supplementary exercise files (e.g. test inputs and outputs). `` is a hash of the file content (`sha1` is used) and `` is its first letter (this is an attempt to prevent creating a flat directory structure). - `./results/.zip` -- ZIP archives of results for submission with `` identifier. ## Worker The job of the worker is to securely execute a job according to its configuration and upload results back for latter processing. After receiving an evaluation request, worker has to do following: - download the archive containing submitted source files and configuration file - download any supplementary files based on the configuration file, such as test inputs or helper programs (this is done on demand, using a `fetch` command in the assignment configuration) - evaluate the submission according to job configuration - during evaluation progress messages can be sent back to broker - upload the results of the evaluation to the fileserver - notify broker that the evaluation finished ### Internal Structure Worker is logically divided into two parts: - **Listener** -- communicates with broker through ZeroMQ. On startup, it introduces itself to the broker. Then it receives new jobs, passes them to the evaluator part and sends back results and progress reports. - **Evaluator** -- gets jobs from the listener part, evaluates them (possibly in sandbox) and notifies the other part when the evaluation ends. Evaluator also communicates with fileserver, downloads supplementary files and uploads detailed results. These parts run in separate threads of the same process and communicate through ZeroMQ in-process sockets. Alternative approach would be using shared memory region with unique access, but messaging is generally considered safer. Shared memory has to be used very carefully because of race condition issues when reading and writing concurrently. Also, messages inside worker are small, so there is no big overhead copying data between threads. This multi-threaded design allows the worker to keep sending `ping` messages even when it is processing a job. ### Capability Identification There are possibly multiple worker instances in a ReCodEx installation and each one can run on different hardware, operating system, or have different tools installed. To identify the hardware capabilities of a worker, we use the concept of **hardware groups**. Each worker belongs to exactly one group that specifies the hardware and operating system on which the submitted programs will be run. A worker also has a set of additional properties called **headers**. Together they help the broker to decide which worker is suitable for processing a job evaluation request. This information is sent to the broker on worker startup. The hardware group is a string identifier of the hardware configuration, for example "i7-4560-quad-ssd-linux" configured by the administrator for each worker instance. If this is done correctly, performance measurements of a submission should yield the same results on all computers from the same hardware group. Thanks to this fact, we can use the same resource limits on every worker in a hardware group. The headers are a set of key-value pairs that describe the worker capabilities. For example, they can show which runtime environments are installed or whether this worker measures time precisely. Headers are also configured manually by an administrator. ### Running Student Submissions Student submissions are executed in a sandbox environment to prevent them from damaging the host system and also to restrict the amount of used resources. Currently, only the Isolate sandbox support is implemented, but it is possible to add support for another sandbox. Every sandbox, regardless of the concrete implementation, has to be a command line application taking parameters with arguments, standard input or file. Outputs should be written to a file or to the standard output. There are no other requirements, the design of the worker is very versatile and can be adapted to different needs. The sandbox part of the worker is the only one which is not portable, so conditional compilation is used to include only supported parts of the project. Isolate does not work on Windows environment, so also its invocation is done through native calls of Linux OS (`fork`, `exec`). To disable compilation of this part on Windows, the `#ifndef _WIN32` guard is used around affected files. Isolate in particular is executed in a separate Linux process created by `fork` and `exec` system calls. Communication between processes is performed through an unnamed pipe with standard input and output descriptors redirection. To prevent Isolate failure there is another safety guard -- whole sandbox is killed when it does not end in `(time + 300) * 1.2` seconds where `time` is the original maximum time allowed for the task. This formula works well both for short and long tasks, but the timeout should never be reached if Isolate works properly -- it should always end itself in time. ### Directories and Files During a job execution the worker has to handle several files -- input archive with submitted sources and job configuration, temporary files generated during execution or fetched testing inputs and outputs. For each job is created a separate directory structure which is removed after finishing the job. The files are stored in local filesystem of the worker computer in a configurable location. The job is not restricted to use only specified directories (tasks can do anything that is allowed by the system), but it is advised not to write outside them. In addition, sandboxed tasks are usually restricted to use only a specific (evaluation) directory. The following directory structure is used for execution. The working directory of the worker (root of the following paths) is shared for multiple instances on the same computer. - `downloads/${WORKER_ID}/${JOB_ID}` -- place to store the downloaded archive with submitted sources and job configuration - `submission/${WORKER_ID}/${JOB_ID}` -- place to store a decompressed submission archive - `eval/${WORKER_ID}/${JOB_ID}` -- place where all the execution should happen - `temp/${WORKER_ID}/${JOB_ID}` -- place for temporary files - `results/${WORKER_ID}/${JOB_ID}` -- place to store all files which will be uploaded on the fileserver, usually only yaml result file and optionally log file, other files have to be explicitly copied here if requested Some of the directories are accessible during job execution from within sandbox through predefined variables. List of these is described in job configuration appendix. ### Judges ReCodEx provides a few initial judges programs. They are mostly adopted from CodEx and installed automatically with the worker component. Judging programs have to meet some requirements. Basic ones are inspired by standard `diff` application -- two mandatory positional parameters which have to be the files for comparison and exit code reflecting if the result is correct (0) or wrong (1). This interface lacks support for returning additional data by the judges, for example similarity of the two files calculated as the Levenshtein edit distance. To allow passing these additional values an extended judge interface can be implemented: - Parameters: There are two mandatory positional parameters which have to be files for comparison - Results: - _comparison OK_ - exitcode: 0 - stdout: there is a single line with a double value which should be quality percentage of the judged file - _comparison BAD_ - exitcode: 1 - stdout: can be empty - _error during execution_ - exitcode: 2 - stderr: there should be description of error The additional double value is saved to the results file and can be used for score calculation in the frontend. If just the basic judge is used, the values are 1.0 for exit code 0 and 0.0 for exit code 1. If more values are needed for score computation, multiple judges can be used in sequence and the values used together. However, extended judge interface should comply most of possible use cases. ### Additional Libraries Worker implementation depends on several open-source C and C++ libraries. All of them are multi-platform, so both Linux and Windows builds are possible. - **libcurl** -- Libcurl is used for all HTTP communication, that is downloading and uploading files. Due to lack of documentation of all C++ bindings the plain C API is used. - **libarchive** -- Libarchive is used for compressing and extracting archives. Actual supported formats depends on installed packages on target system, but at least ZIP and TAR.GZ should be available. - **cppzmq** -- Cppzmq is a simple C++ wrapper for core ZeroMQ C API. It basicaly contains only one header file, but its API fits into the object architecture of the worker. - **spdlog** -- Spdlog is small, fast and modern logging library. It is used for all of the logging, both system and job logs. It is highly customizable and configurable from the configuration of the worker. - **yaml-cpp** -- Yaml-cpp is used for parsing and creating text files in YAML format. That includes the configuration of the worker, the configuration and the results of a job. - **boost-filesystem** -- Boost filesystem is used for multi-platform manipulation with files and directories. However, these operations will be included in future releases of C++ standard, so this dependency may be removed in the future. - **boost-program_options** -- Boost program options is used for multi-platform parsing of command line positional arguments. It is not necessary to use it, similar functionality can be implemented be ourselves, but this well known library is effortless to use. ## Monitor Monitor is an optional part of the ReCodEx solution for reporting progress of job evaluation back to users in the real time. It is written in Python, tested versions are 3.4 and 3.5. Following dependencies are used: - **zmq** -- binding to ZeroMQ message framework - **websockets** -- framework for communication over WebSockets - **asyncio** -- library for fast asynchronous operations - **pyyaml** -- parsing YAML configuration files There is just one monitor instance required per broker. Also, monitor has to be publicly visible (has to have public IP address or be behind public proxy server) and also needs a connection to the broker. If the web application is using HTTPS, it is required to use a proxy for monitor to provide encryption over WebSockets. If this is not done, browsers of the users will block unencrypted connection and will not show the progress to the users. ### Message Flow ![Message flow inside monitor](https://raw.githubusercontent.com/ReCodEx/wiki/master/images/Monitor_arch.png) Monitor runs in 2 threads. _Thread 1_ is the main thread, which initializes all components (logger for example), starts the other thread and runs the ZeroMQ part of the application. This thread receives and parses incoming messages from broker and forwards them to _thread 2_ sending logic. _Thread 2_ is responsible for managing all of WebSocket connections asynchronously. Whole thread is one big _asyncio_ event loop through which all actions are processed. None of custom data types in Python are thread-safe, so all events from other threads (actually only `send_message` method invocation) must be called within the event loop (via `asyncio.loop.call_soon_threadsafe` function). Please note, that most of the Python interpreters use [Global Interpreter Lock](https://wiki.python.org/moin/GlobalInterpreterLock), so there is actually no parallelism in the performance point of view, but proper synchronization is still required. ### Handling of Incoming Messages Incoming ZeroMQ progress message is received and parsed to JSON format (same as our WebSocket communication format). JSON string is then passed to _thread 2_ for asynchronous sending. Each message has an identifier of channel where to send it to. There can be multiple receivers to one channel id. Each one has separate _asyncio.Queue_ instance where new messages are added. In addition to that, there is one list of all messages per channel. If a client connects a bit later than the point when monitor starts to receive messages, it will receive all messages from the beginning. Messages are stored 5 minutes after last progress command (normally FINISHED) is received, then are permanently deleted. This caching mechanism was implemented because early testing shows, that first couple of messages are missed quite often. Messages from the queue of the client are sent through corresponding WebSocket connection via main event loop as soon as possible. This approach with separate queue per connection is easy to implement and guarantees reliability and order of message delivery. ## Cleaner Cleaner component is tightly bound to the worker. It manages the cache folder of the worker, mainly deletes outdated files. Every cleaner instance maintains one cache folder, which can be used by multiple workers. This means on one server there can be numerous instances of workers with the same cache folder, but there should be only one cleaner instance. Cleaner is written in Python 3 programming language, so it works well multi-platform. It uses only `pyyaml` library for reading configuration file and `argparse` library for processing command line arguments. It is a simple script which checks the cache folder, possibly deletes old files and then ends. This means that the cleaner has to be run repeatedly, for example using cron, systemd timer or Windows task scheduler. For proper function of the cleaner a suitable cronning interval has to be used. It is recommended to use 24 hour interval which is sufficient enough for intended usage. The value is set in the configuration file of the cleaner. ## REST API The REST API is a PHP application run in an HTTP server. Its purpose is providing controlled access to the evaluation backend and storing the state of the application. ### Used Technologies We chose to use PHP in version 7.0, which was the most recent version at the time of starting the project. The most notable new feature is optional static typing of function parameters and return values. We use this as much as possible to enable easy static analysis with tools like PHPStan. Using static analysis leads to less error-prone code that does not need as many tests as code that uses duck typing and relies on automatic type conversions. We aim to keep our codebase compatible with new releases of PHP. To speed up the development and to make it easier to follow best practices, we decided to use the Nette framework. The framework itself is focused on creating applications that render HTML output, but a lot of its features can be used in a REST application, too. Doctrine 2 ORM is used to provide a layer of abstraction over storing objects in a database. This framework also makes it possible to change the database server. The current implementation uses MariaDB, an open-source fork of MySQL. To communicate with the evaluation backend, we need to use ZeroMQ. This functionality is provided by the `php_zmq` plugin that is shipped with most PHP distributions. ### Data model We decided to use a code-first approach when designing our data model. This approach is greatly aided by the Doctrine 2 ORM framework, which works with entities -- PHP classes for which we specify which attributes should be persisted in a database. The database schema is generated from the entity classes. This way, the exact details of how our data is stored is a secondary concern for us and we can focus on the implementation of the business logic instead. The rest of this section is a description of our data model and how it relates to the real world. All entities are stored in the `App\Model\Entity` namespace. There are repository classes that are used to work with entities without calling the Doctrine `EntityManager` directly. These are in the `App\Model\Repository` namespace. #### User Account Management The `User` entity class contains data about users registered in ReCodEx. To allow extending the system with additional authentication methods, login details are stored in separate entities. There is the `Login` entity class which contains a user name and password for our internal authentication system, and the `ExternalLogin` entity class, which contains an identifier for an external login service such as LDAP. Currently, each user can only have a single authentication method (account type). The entity with login information is created along with the `User` entity when a user signs up. If a user requests a password reset, a `ForgottenPassword` entity is created for the request. A user needs a way to adjust settings such as their preferred language or theme. This is the purpose of the `UserSettings` entity class. Each possible option has its own attribute (database column). Current supported options are `darkTheme`, `defaultLanguage` and `vimMode` Every user has a role in the system. The basic ones are student, supervisor and administrator, but new roles can be created by adding `Role` entities. Roles can have permissions associated with them. These associations are represented by `Permission` entities. Each permission consists of a role, resource, action and an `isAllowed` flag. If the `isAllowed` flag is set to true, the permission is positive (lets the role access the resource), and if it is false, it denies access. The `Resource` entity contains just a string identifier of a resource (e.g., group, user, exercise). Action is another string that describes what the permission allows or denies for the role and resource (e.g., edit, delete, view). The `Role` entity can be associated with a parent entity. If this is the case, the role inherits all the permissions of its parent. All actions done by a user are logged using the `UserAction` entity for debugging purposes. #### Instances and Groups Users of ReCodEx are divided into groups that correspond to school lab groups for a single course. Each group has a textual name and description. It can have a parent group so that it is possible to create tree hierarchies of groups. Group membership is realized using the `GroupMembership` entity class. It is a joining entity for the `Group` and `User` entities, but it also contains additional information, most importantly `type`, which helps to distinguish students from group supervisors. Groups are organized into instances. Every `Instance` entity corresponds to an organization that uses the ReCodEx installation, for example a university or a company that organizes programming workshops. Every user and group belong to exactly one instance (users choose an instance when they create their account). Every instance can be associated with multiple `Licence` entities. Licences are used to determine whether an instance can be currently used (access to those without a valid instance will be denied). They can correspond to billing periods if needed. #### Exercises The `Exercise` entity class is used to represent exercises -- programming tasks that can be assigned to student groups. It contains data that does not relate to this "concrete" assignment, such as the name, version and a private description. Some exercise descriptions need to be translated into multiple languages. Because of this, the `Exercise` entity is associated with the `LocalizedText` entity, one for each translation of the text. An exercise can support multiple programming runtime environments. These environments are represented by `RuntimeEnvironment` entities. Apart from a name and description, they contain details of the language and operating system that is being used. There is also a list of extensions that is used for detecting which environment should be used for student submissions. `RuntimeEnvironment` entities are not linked directly to exercises. Instead, the `Exercise` entity has an M:N relation with the `RuntimeConfig` entity, which is associated with `RuntimeEnvironment`. It also contains a path to a job configuration file template that will be used to create a job configuration file for the worker that processes solutions of the exercise. Resource limits are stored outside the database, in the job configuration file template. #### Reference Solutions To make setting resource limits objectively possible for a potentially diverse set of worker machines, there should be multiple reference solutions for every exercise in all supported languages that can be used to measure resource usage of different approaches to the problem on various hardware and platforms. Reference solutions are contained in `ReferenceSolution` entities. These entities can have multiple `ReferenceSolutionEvaluation` entities associated with them that link to evaluation results (`SolutionEvaluation` entity). Details of this structure will be described in the section about student solutions. Source codes of the reference solutions can be accessed using the `Solution` entity associated with `ReferenceSolution`. This entity is also used for student submissions. #### Assignments The `Assignment` entity is created from an `Exercise` entity when an exercise is assigned to a group. Most details of the exercise can be overwritten (see the reference documentation for a detailed overview). Additional information such as deadlines or point values for individual tests is also configured for the assignment and not for an exercise. Assignments can also have their own `LocalizedText` entities. If the assignment texts are not changed, they are shared between the exercise and its assignment. Runtime configurations can be also changed for the assignment. This way, a supervisor can for example alter the resource limits for the tests. They could also alter the way submissions are evaluated, which is discouraged. #### Student Solutions Solutions submitted by students are represented by the `Submission` entity. It contains data such as when and by whom was the solution submitted. There is also a timestamp, a note for the supervisor and an url of the location where evaluation results should be stored. However, the most important part of a submission are the source files. These are stored using the `SolutionFile` entity and they can be accessed through the `Solution` entity, which is associated with `Submission`. When the evaluation is finished, the results are stored using the `SolutionEvaluation` entity. This entity can have multiple `TestResult` entities associated with it, which describe the result of a test and also contain additional information for failing tests (such as which limits were exceeded). Every `TestResult` can contain multiple `TaskResult` entities that provide details about the results of individual tasks. This reflects the fact that "tests" are just logical groups of tasks. #### Comment Threads The `Comment` entity contains the author of the comment, a date and the text of the comment. In addition to this, there is a `CommentThread` entity associated with it that groups comments on a single entity (such as a student submission). This enables easily adding support for comments to various entities -- it is enough to add an association with the `CommentThread` entity. An even simpler way is to just use the identifier of the commented entity as the identifier of the comment thread, which is how submission comments are implemented. #### Uploaded Files Uploaded files are stored directly on the filesystem instead of in the database. The `UploadedFile` entity is used to store their metadata. This entity is extended by `SolutionFile` and `ExerciseFile` using the Single Table Inheritance pattern provided by Doctrine. Thanks to this, we can access all files uploaded on the API through the same repository while also having data related to e.g., supplementary exercise files present only in related objects. ### Request Handling A typical scenario for handling an API request is matching the HTTP request with a corresponding handler routine which creates a response object, that is then sent back to the client, encoded with JSON. The `Nette\Application` package can be used to achieve this with Nette, although it is meant to be used mainly in MVP applications. Matching HTTP requests with handlers can be done using standard Nette URL routing -- we will create a Nette route for each API endpoint. Using the routing mechanism from Nette logically leads to implementing handler routines as Nette Presenter actions. Each presenter should serve logically related endpoints. The last step is encoding the response as JSON. In `Nette\Application`, HTTP responses are returned using the `Presenter::sendResponse()` method. We decided to write a method that calls `sendResponse` internally and takes care of the encoding. This method has to be called in every presenter action. An alternative approach would be using the internal payload object of the presenter, which is more convenient, but provides us with less control. ### Authentication Instead of relying on PHP sessions, we decided to use an authentication flow based on JWT tokens (RFC 7519). On successful login, the user is issued an access token that they have to send with subsequent requests using the HTTP Authorization header (Authorization: Bearer ). The token has a limited validity period and has to be renewed periodically using a dedicated API endpoint. To implement this behavior in Nette framework, a new IUserStorage implementation was created (`App\Security\UserStorage`), along with an IIdentity and authenticators for both our internal login service and CAS. The authenticators are not registered in the DI container, they are invoked directly instead. On successful authentication, the returned `App\Security\Identity` object is stored using the `Nette\Security\User::login()` method. The user storage service works with the http request to extract the access token if possible. The logic of issuing tokens is contained in the `App\Security\AccessManager` class. Internally, it uses the Firebase JWT library. The authentication flow is contained in the `LoginPresenter` class, which serves the `/login` endpoint group. An advantage of this approach is being able control the authentication process completely instead of just receiving session data through a global variable. ### Accessing Endpoints The REST API has a [generated documentation](https://recodex.github.io/api/) describing detailed format of input values as well as response structures including samples. Knowing the exact format of the endpoints allows interacting directly with the API using any REST client available, for example `curl` or `Postman` Chrome extension. However, there is a generated [REST client](https://recodex.github.io/api/ui.html) directly for the ReCodEx API structure using Swagger UI tool. For each endpoint there is a form with boxes for all the input parameters including description and data type. The responses are shown as highlighted JSON. The authorization can be set for whole session at once using "Authorize" button at the top of the page. ### Permissions In a system storing user data has to be implemented some kind of permission checking. Each user has a role, which corresponds to his/her privileges. Our research showed, that three roles are sufficient -- student, supervisor and administrator. The user role has to be checked with every request. The good points is, that roles nicely match with granularity of API endpoints, so the permission checking can be done at the beginning of each request. That is implemented using PHP annotations, which allows to specify allowed user roles for each request with very little of code, but all the business logic is the same, together in one place. However, roles cannot cover all cases. For example, if user is a supervisor, it relates only to groups, where he/she is a supervisor. But using only roles allows him/her to act as supervisor in all groups in the system. Unfortunately, this cannot be easily fixed using some annotations, because there are many different cases when this problem occurs. To fix that, some additional checks can be performed at the beginning of request processing. Usually it is only one or two simple conditions. With this two concepts together it is possible to easily cover all cases of permission checking with quite a small amount of code. ### Uploading Files There are two cases when users need to upload files using the API -- submitting solutions to an assignment and creating a new exercise. In both of these cases, the final destination of the files is the fileserver. However, the fileserver is not publicly accessible, so the files have to be uploaded through the API. Each file is uploaded separately and is given a unique ID. The uploaded file can then be attached to an exercise or a submitted solution of an exercise. Storing and removing files from the server is done through the `App\Helpers\UploadedFileStorage` class which maps the files to their records in the database using the `App\Model\Entity\UploadedFile` entity. ### Forgotten Password When user finds out that he/she does not remember a password, he/she requests a password reset and fills in his/her unique email. A temporary access token is generated for the user corresponding to the given email address and sent to this address encoded in a URL leading to a client application. User then goes to the URL and can choose a new password. The temporary token is generated and emailed by the `App\Helpers\ForgottenPasswordHelper` class which is registered as a service and can be injected into any presenter. This solution is quite safe and user can handle it on its own, so administrator does not have to worry about it. ### Job Configuration Parsing and Modifying Even in the API the job configuration file can be loaded in the corresponding internal structures. This is necessary because there has to be possibility to modify particular job details, such as the job identification or the fileserver address, during the submission. Whole codebase concerning the job configuration is present in the `App\Helpers\JobConfig` namespace. Job configuration is represented by the `JobConfig` class which directly contains structures like `SubmissionHeader` or `Tasks\Task` and indirectly `SandboxConfig` or `JobId` and more. All these classes have parameterless constructor which should set all values to their defaults or construct appropriate classes. Modifying of values in the configuration classes is possible through *fluent interfaces* and *setters*. Getting of values is also possible and all setters should have *get* counterparts. Job configuration is serialized through `__toString()` methods. For loading of the job configuration there is separate `Storage` class which can be used for loading, saving or archiving of job configuration. For parsing the storage uses the `Loader` class which does all the checks and loads the data from given strings in the appropriate structures. In case of parser error `App\Exceptions\JobConfigLoadingException` is thrown. Worth mentioning is also `App\Helpers\UploadedJobConfigStorage` class which takes care of where the uploaded job configuration files should be saved on the API filesystem. It can also be used for copying all job configurations during assignment of exercise. ### Solution Loading When a solution evaluation is finished by the backend, the results are saved to the fileserver and the API is notified by the broker. The results are parsed and stored in the database. For the results of the evaluations of the reference solutions and for the asynchronously evaluated solutions of the students (e.g., resubmitted by the administrator) the result is processed right after the notification from backend is received and the author of the solution will be notified by an email after the results are processed. When a student submits his/her solution directly through the client application we do not parse the results right away but we postpone this until the student (or a supervisor) wants to display the results for the first time. This may save save some resources when the solution results are not important (e.g., the student finds a bug in his solution before the submission has been evaluated). #### Parsing of The Results The results are stored in a YAML file. We map the contents of the file to the classes of the `App\Helpers\EvaluationResults` namespace. This process validates the file and gives us access to all of the information through an interface of a class and not only using associative arrays. This is very similar to how the job configuration files are processed. ## Web Application The whole project is written using the next generation of JavaScript referred to as *ECMAScript 6* (also known as *ES6*, *ES.next*, or *Harmony*). Since not all of the features introduced in this standard are implemented in the modern web browsers of today (like classes and the spread operator) and hardly any are implemented in the older versions of the web browsers which are currently still in use, the source code is transpiled into the older standard *ES5* using [Babel.js](https://babeljs.io/) transpiler and bundled into a single script file using the [webpack](https://webpack.github.io/) moudle bundler. The need for a transpiler also arises from the usage of the *JSX* syntax for declaring React components. To read more about these these tools and their usage please refer to the [installation documentation](#Installation). The whole bundling process takes place at deployment and is not repeated afterwards when running in production. ### State Management Web application is a SPA (Single Page Application. When the user accesses the page, the source codes are downloaded and are interpreted by the web browser. The communication between the browser and the server then runs in the background without reloading the page. The application keeps its internal state which can be altered by the actions of the user (e.g., clicking on links and buttons, filling input fields of forms) and by the outcomes of HTTP requests to the API server. This internal state is kept in memory of the web browser and is not persisted in any way -- when the page is refreshed, the internal state is deleted and a new one is created from scratch (i.e., all of the data is fetched from the API server again). The only part of the state which is persisted is the token of the logged in user. This token is kept in cookies and in the local storage. Keeping the token in the cookies is necessary for server-side rendering. #### Redux The in-memory state is handled by the *redux* library. This library is strongly inspired by the [Flux](https://facebook.github.io/flux/) architecture but it has some specifics. The whole state is in a single serializable tree structure called the *store*. This store can be modified only by dispatching *actions* which are Plain Old JavaScript Oblects (POJO) which are processed by *reducers*. A reducer is a pure function which takes the state object and the action object and it creates a new state. This process is very easy to reason about and is also very easy to test using unit tests. Please read the [redux documentation](http://redux.js.org/) for detailed information about the library. ![Redux state handling schema](https://github.com/ReCodEx/wiki/raw/master/images/redux.png) The main difference between *Flux* and *redux* is the fact that there is only one store with one reducer in redux. The single reducer might be composed from several silmple reducers which might be composed from other simple reducers as well, therefore the single reducer of the store is often refered to as the root reducer. Each of the simple reducers receives all the dispatched actions and it decide which actions it will process and which it will ignore based on the *type* of the action. The simple reducers can change only a specific subtree of the whole state tree and these subtrees do not overlap. ##### Redux Middleware A middleware in redux is a function which can process actions before they are passed to the reducers to update the state. The middleware used by the ReCodEx store is defined in the `src/redux/store.js` script. Several open source libraries are used: - [redux-promise-middleware](https://github.com/pburtchaell/redux-promise-middleware) - [redux-thunk](https://github.com/gaearon/redux-thunk) - [react-router-redux](https://github.com/reactjs/react-router-redux) We created two other custom middleware functions for our needs: - **API middleware** -- The middleware filters out all actions with the *type* set to `recodex-api/CALL`, sends a real HTTP request according to the information in the action. - **Access Token Middleware** -- This middleware persists the access token each time after the user signs into the application into the local storage and the cookies. The token is removed when the user decides to sign out. The middleware also attaches the token to each `recodex-api/CALL` action when it does not have an access token set explicitly. ##### Accessing The Store Using Selectors The components of the application are connected to the redux store using a higher order function `connect` from the *react-redux* binding library. This connection ensures that the react components will re-render every time some of the specified subtrees of the main state changes. The specific subtrees of interest are defined for every connection. These definitions called *selectors* and they are are simple pure functions which take the state and return its subtree. To avoid unnecesary re-renders and selections a small library called [reselect](https://github.com/reactjs/reselect) is used. This library allows us to compose the selectors in a similar way the reducers are composed and therefore simply reflect the structure of the whole state tree. The selectors for each reducer are stored in a separate file in the `src/redux/selectors` directory. #### Routing The page should not be reloaded after the initial render but the current location of the user in the system must be reflected in the URL. This is achieved through the [react-router](https://github.com/ReactTraining/react-router) and [react-router-redux](https://github.com/reactjs/react-router-redux) libraries. These libraries use `pushState` method of the `history` object, a living standard supported by all of the modern browsers. The mapping of the URLs to the components is defined in the `src/pages/routes.js` file. To create links between pages, use either the `Link` component from the `react-router` library or dispatch an action created using the `push` action creator from the `react-router-redux` library. All the navigations are mapped to redux actions and can be handled by any reducer. Having up-to-date URLs gives the users the possibility to reload the page if some error occurs on the page and land at the same page as he or she would expect. Users can also send links to the very page they want to. ### Creating HTTP Requests All of the HTTP requests are made by dispatching a specific action which will be processed by our custom *API middleware*. The action must have the *type* property set to `recodex-api/CALL`. The middleware catches the action and it sends a real HTTP request created according to the information in the `request` property of the action: - **type** -- Type prefix of the actions which will be dispatched automatically during the lifecycle of the request (pending, fulfilled, failed). - **endpoint** -- The URI to which the request should be sent. All endpoints will be prefixed with the base URL of the API server. - **method** (*optional*) -- A string containing the name of the HTTP method which should be used. The default method is `GET`. - **query** (*optional*) -- An object containing key-value pairs which will be put in the URL of the request in the query part of the URL. - **headers** (*optional*) -- An object containing key-value pairs which will be appended to the headers of the HTTP request. - **accessToken** (*optional*) -- Explicitly set the access token for the request. The token will be put in the *Authorization* header. - **body** (*optional*) -- An object or an array which will be recursively flattened into the `FormData` structure with correct usage of square brackets for nested (associative) arrays. It is worth mentioning that the keys must not contain a colon in the string. - **doNotProcess** (*optional*) -- A boolean value which can disable the default processing of the response to the request which includes showing a notification to the user in case of a failure of the request. All requests are processed in the way described above by default. The HTTP requests are sent using the `fetch` API which returns a *Promise* of the request. This promise is put into a new action and creates a new action containing the promise and the type specified in the `request` description. This action is then caught by the promise middleware and the promise middleware dispatches actions whenever the state of the promise changes during its the lifecycle. The new actions have specific types: - {$TYPE}_PENDING -- Dispatched immediately after the action is processed by the promise middleware. The `payload` property of the action contains the body of the request. - {$TYPE}_FAILED -- Dispatched if the promise of the request is rejected. - {$TYPE}_FULFILLED -- Dispatched when the response to the request is received and the promise is resolved. The `payload` property of the action contains the body of the HTTP response parsed as JSON. ### Routine CRUD Operations For routine CRUD (Create, Read, Update, Delete) operations which are common to most of the resources used in the ReCodEx (e.g., groups, users, assignments, solutions, solution evaluations, source code files) a set of functions called *Resource manager* was implemented. It contains a factory which creates basic actions (e.g., `fetchResource`, `addResource`, `updateResource`, `removeResource`, `fetchMany`) and handlers for all of the lifecycle actions created by both the API middleware and the promise middleware which can be used to create a basic reducer. The *resource manager* is spread over several files in the `src/redux/helpers/resourceManager` directory and is covered with unit tests in scripts located at `test/redux/helpers/resourceManager`. ### Server-side Rendering To speed-up the initial time of rendering of the web application a technique called server-side rendering (SSR) is used. The same code which is executed in the web browser of the client can run on the server using [Node.js](https://nodejs.org). React can serialize its HTML output into a string which can be sent to the client and can be displayed before the (potentially large) JavaScript source code starts being executed by the browser. The redux store is in fact just a large JSON tree which can be easily serialized as well. If the user is logged in then the access token should be in the cookies of the web browser and it should be attached to the HTTP request when the user navigates to the ReCodEx web page. This token is then put into the redux store and so the user is logged in on the server. The whole logic of the SSR is in a single file called `src/server.js`. It contains only a definition of a simple HTTP server (using the [express](http://expressjs.com/) framework) and some necessary boilerplate of the routing library. All the components which are associated to the matched route can have a class property `loadAsync` which should contain a function returning a *Promise*. The SRR calls all these functions and delays the response of the HTTP server until all of the promises are resolved (or some of them fails). ### Localization and globalization The whole application is prepared for localization and globalization. All of the translatable texts can be extracted from the user interface and translated into several languages. The numbers, dates, and time values are also formatted with respect to the selected language. The [react-intl](https://github.com/yahoo/react-intl) and [Moment.js](http://momentjs.com/) libraries are used to achieve this. All the strings can be extracted from the application using a command: ``` $ npm run exportStrings ``` This will create JSON files with the exported strings for the 'en' and 'cs' locale. If you want to export strings for more languages, you must edit the `/manageTranslations.js` script. The exported strings are placed in the `/src/locales` directory. ## Communication Protocol Detailed communication inside the ReCodEx system is captured in the following image and described in sections below. Red connections are through ZeroMQ sockets, blue are through WebSockets and green are through HTTP(S). All ZeroMQ messages are sent as multipart with one string (command, option) per part, with no empty frames (unless explicitly specified otherwise). ![Communication schema](https://github.com/ReCodEx/wiki/raw/master/images/Backend_Connections.png) ### Broker - Worker Communication Broker acts as server when communicating with worker. Listening IP address and port are configurable, protocol family is TCP. Worker socket is of DEALER type, broker one is ROUTER type. Because of that, very first part of every (multipart) message from broker to worker must be target the socket identity of the worker (which is saved on its **init** command). #### Commands from Broker to Worker: - **eval** -- evaluate a job. Requires 3 message frames: - `job_id` -- identifier of the job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids) - `job_url` -- URL of the archive with job configuration and submitted source code - `result_url` -- URL where the results should be stored after evaluation - **intro** -- introduce yourself to the broker (with **init** command) -- this is required when the broker loses track of the worker who sent the command. Possible reasons for such event are e.g. that one of the communicating sides shut down and restarted without the other side noticing. - **pong** -- reply to **ping** command, no arguments #### Commands from Worker to Broker: - **init** -- introduce self to the broker. Useful on startup or after reestablishing lost connection. Requires at least 2 arguments: - `hwgroup` -- hardware group of this worker - `header` -- additional header describing worker capabilities. Format must be `header_name=value`, every header shall be in a separate message frame. There is no limit on number of headers. There is also an optional third argument -- additional information. If present, it should be separated from the headers with an empty frame. The format is the same as headers. Supported keys for additional information are: - `description` -- a human readable description of the worker for administrators (it will show up in broker logs) - `current_job` -- an identifier of a job the worker is now processing. This is useful when we are reassembling a connection to the broker and need it to know the worker will not accept a new job. - **done** -- notifying of finished job. Contains following message frames: - `job_id` -- identifier of finished job - `result` -- response result, possible values are: - OK -- evaluation finished successfully - FAILED -- job failed and cannot be reassigned to another worker (e.g. due to error in configuration) - INTERNAL_ERROR -- job failed due to internal worker error, but another worker might be able to process it (e.g. downloading a file failed) - `message` -- a human readable error message - **progress** -- notice about current evaluation progress. Contains following message frames: - `job_id` -- identifier of current job - `command` -- what is happening now. - DOWNLOADED -- submission successfully fetched from fileserver - FAILED -- something bad happened and job was not executed at all - UPLOADED -- results are uploaded to fileserver - STARTED -- evaluation of tasks started - ENDED -- evaluation of tasks is finished - ABORTED -- evaluation of job encountered internal error, job will be rescheduled to another worker - FINISHED -- whole execution is finished and worker ready for another job execution - TASK -- task state changed -- see below - `task_id` -- only present for "TASK" state -- identifier of task in current job - `task_state` -- only present for "TASK" state -- result of task evaluation. One of: - COMPLETED -- task was successfully executed without any error, subsequent task will be executed - FAILED -- task ended up with some error, subsequent task will be skipped - SKIPPED -- some of the previous dependencies failed to execute, so this task will not be executed at all - **ping** -- tell broker I am alive, no arguments #### Heartbeating It is important for the broker and workers to know if the other side is still working (and connected). This is achieved with a simple heartbeating protocol. The protocol requires the workers to send a **ping** command regularly (the interval is configurable on both sides -- future releases might let the worker send its ping interval with the **init** command). Upon receiving a **ping** command, the broker responds with **pong**. Whenever a heartbeating message doesn't arrive, a counter called _liveness_ is decreased. When this counter drops to zero, the other side is considered disconnected. When a message arrives, the liveness counter is set back to its maximum value, which is configurable for both sides. When the broker decides a worker disconnected, it tries to reschedule its jobs to other workers. If a worker thinks the broker crashed, it tries to reconnect periodically, with a bounded, exponentially increasing delay. This protocol proved great robustness in real world testing. Thus whole backend is reliable and can outlive short term issues with connection without problems. Also, increasing delay of ping messages does not flood the network when there are problems. We experienced no issues since we are using this protocol. ### Worker - Fileserver Communication Worker is communicating with file server only from _execution thread_. Supported protocol is HTTP optionally with SSL encryption (**recommended**). If supported by server and used version of libcurl, HTTP/2 standard is also available. File server should be set up to require basic HTTP authentication and worker is capable to send corresponding credentials with each request. #### Worker Side Workers communicate with the file server in both directions -- they download the submissions of the student and then upload evaluation results. Internally, worker is using libcurl C library with very similar setup. In both cases it can verify HTTPS certificate (on Linux against system cert list, on Windows against downloaded one from CURL website during installation), support basic HTTP authentication, offer HTTP/2 with fallback to HTTP/1.1 and fail on error (returned HTTP status code is >=400). Worker have list of credentials to all available file servers in its config file. - download file -- standard HTTP GET request to given URL expecting file content as response - upload file -- standard HTTP PUT request to given URL with file data as body -- same as command line tool `curl` with option `--upload-file` #### File server side File server has its own internal directory structure, where all the files are stored. It provides simple REST API to get them or create new ones. File server does not provide authentication or secured connection by itself, but it is supposed to run file server as WSGI script inside a web server (like Apache) with proper configuration. Relevant commands for communication with workers: - **GET /submission_archives/\.\** -- gets an archive with submitted source code and corresponding configuration of this job evaluation - **GET /exercises/\** -- gets a file, common usage is for input files or reference result files - **PUT /results/\.\** -- upload archive with evaluation results under specified name (should be same _id_ as name of submission archive). On successful upload returns JSON `{ "result": "OK" }` as body of returned page. If not specified otherwise, `zip` format of archives is used. Symbol `/` in API description is root of the domain of the file server. If the domain is for example `fs.recodex.org` with SSL support, getting input file for one task could look as GET request to `https://fs.recodex.org/tasks/8b31e12787bdae1b5766ebb8534b0adc10a1c34c`. ### Broker - Monitor Communication Broker communicates with monitor also through ZeroMQ over TCP protocol. Type of socket is same on both sides, ROUTER. Monitor is set to act as server in this communication, its IP address and port are configurable in the config of the monitor file. ZeroMQ socket ID (set on the side of the monitor) is "recodex-monitor" and must be sent as first frame of every multipart message -- see ZeroMQ ROUTER socket documentation for more info. Note that the monitor is designed so that it can receive data both from the broker and workers. The current architecture prefers the broker to do all the communication so that the workers do not have to know too many network services. Monitor is treated as a somewhat optional part of whole solution, so no special effort on communication reliability was made. #### Commands from Monitor to Broker: Because there is no need for the monitor to communicate with the broker, there are no commands so far. Any message from monitor to broker is logged and discarded. #### Commands from Broker to Monitor: - **progress** -- notification about progress with job evaluation. This communication is usually redirected as is from worker, more info can be found in "Broker - Worker Communication" chapter above. ### Broker - REST API Communication Broker communicates with main REST API through ZeroMQ connection over TCP. Socket type on broker side is ROUTER, on frontend part it is DEALER. Broker acts as a server, its IP address and port is configurable in the API. #### Commands from API to Broker: - **eval** -- evaluate a job. Requires at least 4 frames: - `job_id` -- identifier of this job (in ASCII representation -- we avoid endianness issues and also support alphabetic ids) - `header` -- additional header describing worker capabilities. Format must be `header_name=value`, every header shall be in a separate message frame. There is no maximum limit on number of headers. There may be also no headers at all. A worker is considered suitable for the job if and only if it satisfies all of its headers. - empty frame -- frame which contains only empty string and serves only as breakpoint after headers - `job_url` -- URI location of archive with job configuration and submitted source code - `result_url` -- remote URI where results will be pushed to #### Commands from Broker to API: All are responses to **eval** command. - **ack** -- this is first message which is sent back to frontend right after eval command arrives, basically it means "Hi, I am all right and am capable of receiving job requests", after sending this broker will try to find acceptable worker for arrived request - **accept** -- broker is capable of routing request to a worker - **reject** -- broker cannot handle this job (for example when the requirements specified by the headers cannot be met). There are (rare) cases when the broker finds that it cannot handle the job after it was confirmed. In such cases it uses the frontend REST API to mark the job as failed. #### Asynchronous Communication Between Broker And API Only a fraction of the errors that can happen during evaluation can be detected while there is a ZeroMQ connection between the API and broker. To notify the frontend of the rest, the API exposes an endpoint for the broker for this purpose. Broker uses this endpoint whenever the status of a job changes (it is finished, it failed permanently, the only worker capable of processing it disconnected...). When a request for sending a report arrives from the backend then the type of the report is inferred and if it is an error which deserves attention of the administrator then an email is sent to him/her. There can also be errors which are not that important (e.g., it was somehow solved by the backend itself or it is only informative), then these do not have to be reported through an email but they are stored in the persistent database for further consideration. For the details of this interface please refer to the attached API documentation and the `broker-reports/` endpoint group. ### Fileserver - REST API Communication File server has a REST API for interaction with other parts of ReCodEx. Description of communication with workers is in "Worker - Fileserver Communication" chapter above. On top of that, there are other commands for interaction with the API: - **GET /results/\.\** -- download archive with evaluated results of job _id_ - **POST /submissions/\** -- upload new submission with identifier _id_. Expects that the body of the POST request uses file paths as keys and the content of the files as values. On successful upload returns JSON `{ "archive_path": , "result_path": }` in response body. From _archive_path_ the submission can be downloaded (by worker) and corresponding evaluation results should be uploaded to _result_path_. - **POST /tasks** -- upload new files, which will be available by names equal to `sha1sum` of their content. There can be uploaded more files at once. On successful upload returns JSON `{ "result": "OK", "files": }` in response body, where _file_list_ is dictionary of original file name as key and new URL with already hashed name as value. There are no plans yet to support deleting files from this API. This may change in time. REST API calls these fileserver endpoints with standard HTTP requests. There are no special commands involved. There is no communication in opposite direction. ### Monitor - Web App Communication Monitor interacts with web application through WebSocket connection. Monitor acts as server and browsers are connecting to it. IP address and port are configurable. When client connects to the monitor, it sends a message with string representation of channel id (which messages are interested in, usually id of evaluating job). There can be multiple listeners per channel, even (shortly) delayed connections will receive all messages from the very beginning. When monitor receives **progress** message from broker there are two options: - there is no WebSocket connection for listed channel (job id) -- message is dropped - there is active WebSocket connection for listed channel -- message is parsed into JSON format (see below) and send as string to that established channel. Messages for active connections are queued, so no messages are discarded even on heavy workload. Message from monitor to web application is in JSON format and it has form of dictionary (associative array). Information contained in this message should correspond with the ones given by worker to broker. For further description please read more in "Broker - Worker communication" chapter under "progress" command. Message format: - **command** -- type of progress, one of: DOWNLOADED, FAILED, UPLOADED, STARTED, ENDED, ABORTED, FINISHED, TASK - **task_id** -- id of currently evaluated task. Present only if **command** is "TASK". - **task_state** -- state of task with id **task_id**. Present only if **command** is "TASK". Value is one of "COMPLETED", "FAILED" and "SKIPPED". ### Web App - REST API Communication The provided web application runs as a JavaScript process inside the browser of the user. It communicates with the REST API on the server through the standard HTTP requests. Documentation of the main REST API is in a separate [document](https://recodex.github.io/api/) due to its extensiveness. The results are returned encoded in JSON which is simply processed by the web application and presented to the user in an appropriate way.