From 783250e090edc80e89dbc59addf941ca9e392cd6 Mon Sep 17 00:00:00 2001 From: Simon Rozsival Date: Thu, 12 Jan 2017 17:44:14 +0100 Subject: [PATCH] backend monitoring rewritten --- Rewritten-docs.md | 82 +++++++++++++++++------------------------------ 1 file changed, 29 insertions(+), 53 deletions(-) diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 99d5ab3..15646a4 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -1633,59 +1633,35 @@ It seems with the benefit of hindsight that immediate loading of all jobs could simplify the code and it has no major drawbacks. In the next version of ReCodEx we will re-evaluate this decision. -#### Backend management - -Considering the fact that we have the backend as a separate component which has no -clue about administrators and uses only logging as some kind of failure -reporting. It can be handy to provide this functionality to backend from -frontend which manages users. The simplest solution would be again to have -separate component with some sort of public interface. It can be for example -REST or some other communication which backend can handle. Functionality of this -kind of component is then quite easy. When request for report arrives from -backend then type is inferred and if it is error which deserves attention of -administrator then email is sent to him/her. There can also be errors which are -not that important, was somehow solved by backend itself or are only -informative, these do not have to be reported by email but only stored in -persistent database for further consideration. On top of that separate component -can be internal and not exposed to outside network. Disadvantage is that -database layer which is used in some particular API instance cannot be used here -because multiple instances of API can use one backend. - -Another solution which was at the end implemented is to integrate backend -failure reporting feature to API. Problem with previous one is that if job -execution fails backend has to report this error to some particular API server -from which request for evaluation came. This information is essential and has to -be stored there and not in some general component and general error database. -Obviously if there are multiple API servers connected to one backend there has -to be some API server configured in backend as the main one which receives -reports about general backend errors which are not connected to jobs. This -solution was chosen because as stated we have to implement job error reporting -in API and having separate component only for general errors is not feasible. In -the end error reporting should be available under different route which is -secured by basic HTTP authentication, because basic authentication is easy -enough to implement in low-level backend components. That also means this -feature is visible and can be exploited but from our points of view it seems as -appropriate compromise in simplicity. - -Next thing relating backend management is storing its current state. This namely -concerns which workers are available for processing with what hardware and which -languages can be used in exercises. Another step is overall backend state like -how many jobs were processed on some particular worker, workload of broker and -workers, etc. The easiest solution is to manage these information by hand, every -instance of API has to have administrator which would have to fill them. This of -course includes only currently available workers and runtime environments, -backend statistics cannot be provided this way. - -Better solution is to let these information update automatically. This can be -done two ways either it can be provided by backend on-demand if API needs them -or backend will send these information periodically to API. Things like -currently available workers or environments are better to be really up-to-date -so this can provided on-demand if needed. Backend statistics are not necessary -stuff which can be updated periodically. But it really depends on the period of -updates, if it is short enough then even available workers, etc. could be -updated this way and be quite up-to-date. However due to lack of time automatic -refreshing of backend state will not be implemented in early versions but might -be implemented in next releases. +#### Communication with the backend + +##### Backend failiure reporting + +The backend is a separate component which does not communicate with the administrators directly. When it encounters an error it stores it in a log file. It would be handy to inform the administrator directly at this moment so he can fix the cause of the error as soon as possible. The backend does not have any mechanism for notifying users using for example an email. The API server on the other hand has email sending implemented and it can easily forward any messages to the administrator. A secured communication protocol between the backend and the frontend already exists (it is used for the reporting of a finished job processing) and it is easy to add another endpoint for bug reporting. + +When a request for sending a report arrives from the backend then the type of the report is inferred and if it is an error which deserves attention of +the administrator then an email is sent to him/her. There can also be errors which are not that important (e.g., it was somehow solved by the backend itself or it is only informative, then these do not have to be reported through an email but can only be stored in the persistent database for further consideration. + +On top of that the separate backend component does not have to be exposed to the outside network at all. + +If a job processing fails then the backend informs the API server which initiated processing of the job. If an error which is not related to job-processing occurs then the backend must communicate with a given API server which is configured by the administrator while the other API servers which are using the same backend are not informed. + +##### Backend state monitoring + +The next thing related to communication with the backend is monitoring its current state. This concerns namely which workers are available for processing different hardware groups and which languages can be therefore used in exercises. + +Another step would be the overall backend state like how many jobs were processed by some particular worker, workload of the broker and the workers, etc. The easiest solution is to manage this information by hand, every +instance of the API server has to have an administrator which would have to fill them. This of course includes only the currently available workers and runtime environments which does not change very often. The real-time statistics of the backend cannot be made accesible this way in a reasonable way. + +A better solution is to update this information automatically. This can be +done in two ways: + +- It can be provided by the backend on-demand if API needs it +- The backend will send these information periodically to the API. + +Things like currently available workers or runtime environments are better to be really up-to-date so this could be provided on-demand if needed. Backend statistics are not that necessary and could be updated periodically. + +However due to the lack of time automatic monitoring of the backend state will not be implemented in the early versions of this project but might be implemented in some of the next releases. ### Web-app