From 73bea479815935e7d73980ace13a4f1b44c411c4 Mon Sep 17 00:00:00 2001
From: Teyras <teyras@gmail.com>
Date: Mon, 9 Jan 2017 23:03:37 +0100
Subject: [PATCH] fileserver analysis

---
 Rewritten-docs.md | 48 +++++++++++++++++++++++++++++++++++------------
 1 file changed, 36 insertions(+), 12 deletions(-)

diff --git a/Rewritten-docs.md b/Rewritten-docs.md
index 0778df7..ee74a1c 100644
--- a/Rewritten-docs.md
+++ b/Rewritten-docs.md
@@ -1250,18 +1250,42 @@ term project for C# course so it might be written and integrated in future.
 
 The fileserver provides access to a shared storage space that contains files
 submitted by students, supplementary files such as test inputs and outputs and
-results of evaluation. This functionality can be easily separated from the rest
-of the backend features, which led to designing the fileserver as a
-standalone component. Such design helps encapsulate the details of how the files
-are stored (e.g. on a file system, in a database or using a cloud storage
-service), while also making it possible to share the storage between multiple
-ReCodEx frontends.
-
-@todo: mention hashing on fileserver and why this approach was chosen
-
-@todo: what can be stored on fileserver
-
-@todo: how can jobs be stored on fileserver, mainly mention that it is nonsense to store inputs and outputs within job archive
+results of evaluation. In other words, it acts as an intermediate node for data
+passed between the frontend and the backend. This functionality can be easily
+separated from the rest of the backend features, which led to designing the
+fileserver as a standalone component. Such design helps encapsulate the details
+of how the files are stored (e.g. on a file system, in a database or using a
+cloud storage service), while also making it possible to share the storage
+between multiple ReCodEx frontends.
+
+For early releases of the system, we chose to store all files on the file system
+-- it is the least complicated solution (in terms of implementation complexity)
+and the storage backend can be rather easily migrated to a different technology.
+
+One of the facts we learned from CodEx is that many exercises share test input
+and output files, and also that these files can be rather large (hundreds of
+megabytes). A direct consequence of this is that we cannot add these files to
+submission archives that are to be downloaded by workers -- the combined size of
+the archives would quickly exceed gigabytes, which is impractical. Another
+conclusion we made is that a way to deal with duplicate files must be
+introduced.
+
+A simple solution to this problem is storing supplementary files under the
+hashes of their content. This ensures that every file is stored only once. On
+the other hand, it makes it more difficult to understand what the content of a
+file is at a glance, which might prove problematic for the administrator.
+
+A notable part of the fileserver's work is done by a web server (e.g. listening
+to HTTP requests and caching recently accessed files in memory for faster
+access). What remains to be implemented is handling requests that upload files
+-- student submissions should be stored in archives to facilitate simple
+downloading and supplementary exercise files need to be stored under their
+hashes.
+
+We decided to use Python and the Flask web framework. This combination makes it
+possible to express the logic in ~100 SLOC and also provides means to run the
+fileserver as a standalone service (without a web server), which is useful for
+development.
 
 ### Monitor