## File Server

**File Server** stores data, that should be kept outside of **WebApp's** 
database (both because storing files in a database is inefficient and because 
the workers need to access the files in the simplest possible way). It should 
meet following requirements:
- store files without duplicates
- keep consistent state with main database
- serve files to workers on demand
- allow versioning of tasks with revert back feature

To meet these requirements, **Storage** and **Database** must be set as bellow.

### Storage
**Storage** is meant as disc space with some commonly used filesystem. We'll use `ext4`, but the other ones should work too. **Storage** file structure is:
```
.
├── submits
│   └── user_id
│       └── advanced_dot_net_1
│           └── submit_id
│               ├── eval.yml
│               └── source.cs
├── submit_archives
│   └── submit_id.tar.gz
├── tasks
│   ├── a
│   │   ├── a014ed2abb56371bfaf2b4298a85d5dfb56509ed
│   │   └── a5edbd8b12e670ed1e3110d6c0524000cd4c3c7a
│   └── b
│       └── b1696358b8540923eb79b68f95c0f94c13a83fa7
└── temp
    └── 1795184136b8bdddabe50453cc2cc2d46f0f7c5e
```
- **submits** keep information about all files submited by users to ReCodEx. 
  There are subdirectories _user_id_ and _advanced_dot_net_1_ which groups
  submits by users and courses the submits are for. This structure is easy to 
  maintain for new and deleted users.
- **submit_archives** contains the student submissions in compressed archives so 
  that they can be easily downloaded by workers.
- **tasks** contains supplementary files (such as test inputs or helper 
  programs) for all existing task in ReCodEx. To avoid too many files in one 
  directory, files are separated to subfolders by first character of their name.
- **temp** directory is dedicated to temporary storing outputs of programs on teachers' demand. This directory will be erased by cron job on daily basis.

### Database
For user friendly access and modifying tasks following information should be stored in database:
- list of tasks with their newest version number
- for every task and version list of used files (their hashed names)
- for every hash name one human readable filename

### Conclusion
Files are internally stored by their `sha1sum` hashes, so it's easy to implement 
versioning and get rid of files with duplicate content (multiple files can have 
the same content, which is only stored once). **Worker** also uses files by 
their hashes, which is great for local caching without worries about actual 
version number of given file. On the other hand, **Database** stores information 
about human readable names, so that the files are presented in a friendly way to 
users (teachers) in **WebApp**.