From 73bea479815935e7d73980ace13a4f1b44c411c4 Mon Sep 17 00:00:00 2001 From: Teyras Date: Mon, 9 Jan 2017 23:03:37 +0100 Subject: [PATCH] fileserver analysis --- Rewritten-docs.md | 48 +++++++++++++++++++++++++++++++++++------------ 1 file changed, 36 insertions(+), 12 deletions(-) diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 0778df7..ee74a1c 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -1250,18 +1250,42 @@ term project for C# course so it might be written and integrated in future. The fileserver provides access to a shared storage space that contains files submitted by students, supplementary files such as test inputs and outputs and -results of evaluation. This functionality can be easily separated from the rest -of the backend features, which led to designing the fileserver as a -standalone component. Such design helps encapsulate the details of how the files -are stored (e.g. on a file system, in a database or using a cloud storage -service), while also making it possible to share the storage between multiple -ReCodEx frontends. - -@todo: mention hashing on fileserver and why this approach was chosen - -@todo: what can be stored on fileserver - -@todo: how can jobs be stored on fileserver, mainly mention that it is nonsense to store inputs and outputs within job archive +results of evaluation. In other words, it acts as an intermediate node for data +passed between the frontend and the backend. This functionality can be easily +separated from the rest of the backend features, which led to designing the +fileserver as a standalone component. Such design helps encapsulate the details +of how the files are stored (e.g. on a file system, in a database or using a +cloud storage service), while also making it possible to share the storage +between multiple ReCodEx frontends. + +For early releases of the system, we chose to store all files on the file system +-- it is the least complicated solution (in terms of implementation complexity) +and the storage backend can be rather easily migrated to a different technology. + +One of the facts we learned from CodEx is that many exercises share test input +and output files, and also that these files can be rather large (hundreds of +megabytes). A direct consequence of this is that we cannot add these files to +submission archives that are to be downloaded by workers -- the combined size of +the archives would quickly exceed gigabytes, which is impractical. Another +conclusion we made is that a way to deal with duplicate files must be +introduced. + +A simple solution to this problem is storing supplementary files under the +hashes of their content. This ensures that every file is stored only once. On +the other hand, it makes it more difficult to understand what the content of a +file is at a glance, which might prove problematic for the administrator. + +A notable part of the fileserver's work is done by a web server (e.g. listening +to HTTP requests and caching recently accessed files in memory for faster +access). What remains to be implemented is handling requests that upload files +-- student submissions should be stored in archives to facilitate simple +downloading and supplementary exercise files need to be stored under their +hashes. + +We decided to use Python and the Flask web framework. This combination makes it +possible to express the logic in ~100 SLOC and also provides means to run the +fileserver as a standalone service (without a web server), which is useful for +development. ### Monitor