Fileserver implementation

master
Petr Stefan 8 years ago
parent ebb1892c74
commit 63762cab74

@ -1,35 +0,0 @@
# Fileserver
The fileserver is a simple frontend to a disk storage space that contains auxiliary files for assignments, archives with job configuration and files submitted by users and evaluation results. These files are the only ones required for backend to run, so dedicated fileserver gives the possibility of testing backend separately. Also, one fileserver instance could be shared among multiple API instances (with the same broker), so common files does not need to be duplicated in each API instance.
One exception is that important files with character of database entry (but not stored in database due to size) are stored directly in filesystem of API server. But this fact does not devaluate benefit of separate fileserver. From security point of view, fileserver should be completely isolated from public internet to keep the data safe while API server must be public from its nature.
For a description of the communication protocol used by the frontend
and workers, see the [Communication](#communication) chapter.
## Description
The storage is implemented in Python, using the Flask web framework. This
particular implementation evolved from a simple mock fileserver we used in early
stages of development. It prooved to be very reliable, so we decided to keep fileserver
as separate component instead of integrating this functionality into main API.
### Internal storage structure
Fileserver stores its data in a configurable filesystem folder. This folder has
the following subfolders:
- `./submissions/<id>` -- folders that contain files submitted by users
(student's solutions to assignments). `<id>` is an identifier received from
the ReCodEx API.
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
created automatically when a submission is uploaded. `<id>` is an identifier
of the corresponding submission.
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and
outputs). `<key>` is a hash of the file content (sha-1 is used) and `<subkey>`
is its first letter (this is an attempt to prevent creating a flat directory
structure).

@ -2629,7 +2629,40 @@ used.
## Fileserver
@todo: stores particular data from frontend and backend, hashing, HTTP API
Fileserver component provides shared storage between frontend and backend. It is
writtend in Python 3 using Flask web framework. Fileserver stores files in
configurable filesystem directory, provides file deduplication and HTTP access.
To keep the stored data safe, fileserver is not visible from public internet.
### File deduplication
File deduplication is designed as storing files under the hashes of their
content. This procedure is done completely inside fileserver. Plain files are
uploaded into fileserver, hashed, saved and the new filename returned back to
the uploader.
SHA1 is used as hashing function, because it is fast to compute and provides
better collision safety than MD5 hashing function. Files with the same hash are
treated as the same, no additional checks for collisions are performed. However,
it is really unlikely to find one.
### Storage structure
Fileserver stores its data in following structure:
- `./submissions/<id>/` -- folder that contains files submitted by users
(student's solutions to assignments). `<id>` is an identifier received from
the REST API.
- `./submission_archives/<id>.zip` -- ZIP archives of all submissions. These are
created automatically when a submission is uploaded. `<id>` is an identifier
of the corresponding submission.
- `./tasks/<subkey>/<key>` -- supplementary task files (e.g. test inputs and
outputs). `<key>` is a hash of the file content (`sha1` is used) and
`<subkey>` is its first letter (this is an attempt to prevent creating a flat
directory structure).
- `./results/<id>.zip` -- ZIP archive of results for submission with `<id>`
identifier.
## Worker

Loading…
Cancel
Save