fileserver analysis

master
Teyras 8 years ago
parent 1c44f40547
commit 73bea47981

@ -1250,18 +1250,42 @@ term project for C# course so it might be written and integrated in future.
The fileserver provides access to a shared storage space that contains files
submitted by students, supplementary files such as test inputs and outputs and
results of evaluation. This functionality can be easily separated from the rest
of the backend features, which led to designing the fileserver as a
standalone component. Such design helps encapsulate the details of how the files
are stored (e.g. on a file system, in a database or using a cloud storage
service), while also making it possible to share the storage between multiple
ReCodEx frontends.
@todo: mention hashing on fileserver and why this approach was chosen
@todo: what can be stored on fileserver
@todo: how can jobs be stored on fileserver, mainly mention that it is nonsense to store inputs and outputs within job archive
results of evaluation. In other words, it acts as an intermediate node for data
passed between the frontend and the backend. This functionality can be easily
separated from the rest of the backend features, which led to designing the
fileserver as a standalone component. Such design helps encapsulate the details
of how the files are stored (e.g. on a file system, in a database or using a
cloud storage service), while also making it possible to share the storage
between multiple ReCodEx frontends.
For early releases of the system, we chose to store all files on the file system
-- it is the least complicated solution (in terms of implementation complexity)
and the storage backend can be rather easily migrated to a different technology.
One of the facts we learned from CodEx is that many exercises share test input
and output files, and also that these files can be rather large (hundreds of
megabytes). A direct consequence of this is that we cannot add these files to
submission archives that are to be downloaded by workers -- the combined size of
the archives would quickly exceed gigabytes, which is impractical. Another
conclusion we made is that a way to deal with duplicate files must be
introduced.
A simple solution to this problem is storing supplementary files under the
hashes of their content. This ensures that every file is stored only once. On
the other hand, it makes it more difficult to understand what the content of a
file is at a glance, which might prove problematic for the administrator.
A notable part of the fileserver's work is done by a web server (e.g. listening
to HTTP requests and caching recently accessed files in memory for faster
access). What remains to be implemented is handling requests that upload files
-- student submissions should be stored in archives to facilitate simple
downloading and supplementary exercise files need to be stored under their
hashes.
We decided to use Python and the Flask web framework. This combination makes it
possible to express the logic in ~100 SLOC and also provides means to run the
fileserver as a standalone service (without a web server), which is useful for
development.
### Monitor

Loading…
Cancel
Save