diff --git a/Rewritten-docs.md b/Rewritten-docs.md index fa0fa21..4c4d1af 100644 --- a/Rewritten-docs.md +++ b/Rewritten-docs.md @@ -665,18 +665,44 @@ state. @todo: how to display generally all outputs of executed programs to user (supervisor, student), what students can or cannot see and why -@todo: judges, discuss what they possibly can do and what it can be used for (returning for instance 2 numbers instead of 1 and why we return just one) - @todo: discuss points assigned to solution, why are there bonus points, explain minimal point threshold @todo: discuss several ways how points can be assigned to solution, propose basic systems but also general systems which can use outputs from judges or other executed programs, there is need for variables or other concept, explain why ### Persistence -@todo: where is kept the state (MariaDB) - - -@todo: and many many more general concepts which can be discussed and solved... please append more of them if something comes to your mind... thanks +Previous parts of analysis show that the system has to keep some state. This +could be user settings, group membership, evaluated assignments and so on. The +data have to be kept across restart, so persistence is important decision +factor. There are several ways how to save structured data: + +- plain files +- NoSQL database +- relational database + +Another important factor is amount and size of stored data. Our guess is about +1000 users, 100 exercises, 200 assignments per year and 400000 unique solutions +per year. The data are mostly structured and there are a lot of them with the +same format. For example, there is a thousand of users and each one has the same +values -- name, email, age, etc. These kind of data are relatively small, name +and email are short strings, age is an integer. Considering this, relational +databases or formatted plain files (CSV for example) fits best for them. +However, the data often have to support find operation, so they have to be +sorted and allow random access for resolving cross references. Also, addition a +deletion of entries should take reasonable time (at most logaritmic time +complexity to number of saved values). This practicaly excludes plain files, so +relational database is used instead. + +On the other hand, there are some data with no such great structure and much +larger size. These can be evaluation logs, sample input files for exercises or +submited sources by students. Saving this kind of data into relational database +is not suitable, but it is better to keep them as ordinary files or store them +into some kind of NoSQL database. Since they are already files and does not need +to be backed up in multiple copies, it is easier to keep them as ordinary files +in filesystem. Also, this solution is more lightweight and does not require +additional dependencies on third-party software. File can be identified using +its filesystem path or unique index stored as value in relational database. Both +approaches are equaly good, final decission depends on actual case. ## Structure of the project