diff --git a/Rewritten-docs.md b/Rewritten-docs.md index 33109fd..bd1dad6 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -1311,15 +1311,15 @@ course so it might be written and integrated in future. ### Fileserver -The fileserver provides access to a shared storage space that contains files -submitted by students, supplementary files such as test inputs and outputs and -results of evaluation. In other words, it acts as an intermediate storage node -for data passed between the frontend and the backend. This functionality can be -easily separated from the rest of the backend features, which led to designing -the fileserver as a standalone component. Such design helps encapsulate the -details of how the files are stored (e.g. on a file system, in a database or -using a cloud storage service), while also making it possible to share the -storage between multiple ReCodEx frontends. +The fileserver provides access over HTTP to a shared storage space that contains +files submitted by students, supplementary files such as test inputs and outputs +and results of evaluation. In other words, it acts as an intermediate storage +node for data passed between the frontend and the backend. This functionality +can be easily separated from the rest of the backend features, which led to +designing the fileserver as a standalone component. Such design helps +encapsulate the details of how the files are stored (e.g. on a file system, in a +database or using a cloud storage service), while also making it possible to +share the storage between multiple ReCodEx frontends. For early releases of the system, we chose to store all files on the file system -- it is the least complicated solution (in terms of implementation complexity) @@ -1337,11 +1337,15 @@ A simple solution to this problem is storing supplementary files under the hashes of their content. This ensures that every file is stored only once. On the other hand, it makes it more difficult to understand what the content of a file is at a glance, which might prove problematic for the administrator. - -A notable part of the fileserver's work is done by a web server (e.g. listening -to HTTP requests and caching recently accessed files in memory for faster -access). What remains to be implemented is handling requests that upload files --- student submissions should be stored in archives to facilitate simple +However, human-readable identification is not as important as removing +duplicates -- administrators rarely need to inspect stored files (and when they +do, they should know their hashes), but duplicate files occupied a large part of +the disk space used by CodEx. + +A notable part of the work of the fileserver is done by a web server (e.g. +listening to HTTP requests and caching recently accessed files in memory for +faster access). What remains to be implemented is handling requests that upload +files -- student submissions should be stored in archives to facilitate simple downloading and supplementary exercise files need to be stored under their hashes.