Student submissions are executed inside sandboxing environment to prevent damage of host system and also to restrict amount of used resources. Now only the Isolate sandbox support is implemented in worker, but there is a possibility of easy extending list of supported sandboxes.
Isolate is executed in separate Linux process created by `fork` and `exec` system calls. Communication between processes is performed through unnamed pipe with standard input and output descriptors redirection. To prevent Isolate failure there is another safety guard -- whole sandbox is killed when it does not end in `(time + 300) * 1.2` seconds for `time` as original maximum time allowed for the task. However, Isolate should allways end itself in time, so this additional safety should never be used.
Sandbox in general has to be command line application taking parameters with arguments, standard input or file. Outputs should be written to file or standard output. There are no other requirements, worker design is very versatile and can be adapted to different needs.
Following text describes how to set up and run **worker** program. It is supposed to have required binaries installed. Also, using systemd is recommended for best user experience, but it is not required. Almost all modern Linux distributions are using systemd nowadays.
Worker should have some default configuration which is applied to worker itself or may be used in given jobs (implicitly if something is missing, or explicitly with special variables). This configuration should be hardcoded and can be rewritten by explicitly declared configuration file. Format of this configuration is yaml with similar structure to job configuration.
- **worker-id** -- unique identification of worker at one server. This id is used by _isolate_ sanbox on linux systems, so make sure to meet isolate's requirements (default is number from 1 to 999).
- **broker-uri** -- URI of the broker (hostname, IP address, including port, ...)
- _broker-ping-interval_ -- time interval how often to send ping messages to broker. Used units are milliseconds.
- _max-broker-liveness_ -- specifies how many pings in a row can broker miss without making the worker dead.
- _headers_ -- map of headers specifies worker's capabilities
- _env_ -- list of enviromental variables which are sent to broker in init command
- _threads_ -- information about available threads for this worker
- **hwgroup** -- hardware group of this worker. Hardware group must specify worker hardware and software capabilities and it is main item for broker routing decisions.
- _working-directory_ -- where will be stored all needed files. Can be the same for multiple workers on one server.
- **file-managers** -- addresses and credentials to all file managers used (eq. all different frontends using this worker)
- **hostname** -- URI of file manager
- _username_ -- username for http authentication (if needed)
- _password_ -- password for http authentication (if needed)
- _file-cache_ -- configuration of caching feature
- _cache-dir_ -- path to caching directory. Can be the same for multiple workers.
- _logger_ -- settings of logging capabilities
- _file_ -- path to the logging file with name without suffix. `/var/log/recodex/worker` item will produce `worker.log`, `worker.1.log`, ...
- _level_ -- level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug`
- _max-size_ -- maximal size of log file before rotating
- _rotations_ -- number of rotation kept
- _limits_ -- default sandbox limits for this worker. All items are described in assignments section in job configuration description. If some limits are not set in job configuration, defaults from worker config will be used. In such case the worker's defaults will be set as the maximum for the job. Also, limits in job configuration cannot exceed limits from worker.
Isolate is used as one and only sandbox for linux-based operating systems. Headquarters of this project can be found at [GitHub](https://github.com/ioi/isolate) and more of its installation and setup can be found in [installation](#installation) section. Isolate uses linux kernel features for sandboxing and thus its security depends on them, namely _kernel namespaces_ and _cgroups_ are used. Similar functionality can now be partially achieved with systemd.
From the very beginning of ReCodEx project there was sure that Isolate sandbox for Linux environment will be used. There is no suitable general purpose sandbox on Windows platform, so main operation system of whole backend should be linux-based. Set of supported operations in Isolate seems reasonable for every sandbox, so most of its functionality is accessible from job configuration. As there is no other sandbox, naming often reflects Isolate's names. However worker is prepared to run on Windows too, so integrating with other sandboxes (as libraries or commandline tools) is possible.
Isolate as sandbox provides wide scale of functionality which can be used to limit resources or even cut off particular resources from sandboxed program. There is of course basics like limiting cpu-time and memory consumption, but there can be found also wall-time (human perception of time) or extra-time which is extra limit added to other time limits to increase chance of successful exiting of sandboxed program. From other features there is limiting stack-size, redirection of stdin, stdout or stderr from/to a file. Worth of mentioning is also defining number of processes/threads which can be created or defining environment variables which are passed to sandboxed program.
Chapter by itself is filesystem handling. Isolate uses mount kernel namespace to create "virtual" filesystem which will be mounted in sandboxed program. By default there are only few read-only files/directories mapped into sandbox (described in Isolate man-page). This can be of course changed by providing another numerous folders as isolate parameters. By default folders are mapped as read-only but Isolate has few access options which can be set to some mount point.
New feature in version 1.3 is possibility of limit Isolate box to one or more cpu or memory node. This functionality is provided by _cpusets_ kernel mechanism and is now integrated in isolate. It is allowed to set only `cpuset.cpus` and `cpuset.mems` which should be just fine for sandbox purposes. As kernel functionality further description can be found in manual page of _cpuset_ or in Linux documentation in section `linux/Documentation/cgroups/cpusets.txt`. As previously stated this settings can be applied for particular isolate boxes and has to be written in isolate configuration. Standard configuration path should be `/usr/local/etc/isolate` but it may depend on your installation process. Configuration of _cpuset_ in there is really simple and is described in example below.
- **cpuset.cpus:** Cpus limitation will restrict sandboxed program only to processor threads set in configuration. On hyperthreaded processors this means that all virtual threads are assignable, not only the physical ones. Value can be represented by single number, list of numbers separated by commas or range with hyphen delimiter.
- **cpuset.mems:** This value is particularly handy on NUMA systems which has several memory nodes. On standard desktop computers this value should always be zero because only one independent memory node is present. As stated in `cpus` limitation there can be single value, list of values separated by comma or range stated with hyphen.
WrapSharp is sandbox for programs in C# written also in C#. We have written it as a proof of concept sandbox for using in Windows environment. However, it is not properly tested and integrated to the worker yet. Security audit should be done before using in production. After that, with just a little bit of effort integrating into worker there can be a running sandbox for C# programs on Windows system.
Cleaner is integral part of worker which manages its cache folder, mainly deletes outdated files. Every cleaner instance maintains one cache folder, which can be used by multiple workers. This means on one server there can be numerous instances of workers with the same cache folder, but there should be only one cleaner.
Cleaner is written in Python programming language and is used as simple script which just does its job and ends, so has to be cronned. For proper function of cleaner some suitable cronning interval has to be used. It is recommended to use 24 hour interval which should be sufficient enough.
There is a bit of catch with cleaner service, to work properly, server filesystem has to have enabled last access timestamp. Cleaner checks these stamps and based on them it decides if file will be deleted or not, simple write timestamp or created at timestamp are not enough to reflect real usage and need of particular file. Last access timestamp feature is a bit controversial (more on this subject can be found [here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime)) and it is not by default enabled on conventional filesystems. In linux this can be solved by adding `strictatime` option to `fstab` file. On Windows following command has to be executed (as administrator) `fsutil behavior set disablelastaccess 0`.
Another possibility seems to be to update last modified timestamp when accessing the file. This timestamp is used in most major filesystems, so there are less issues with compatibility than last access timestamp. The modified timestamp then must be updated by workers at each access, for example using `touch` command or similar. Final decision on better of these ways will be made after practical experience of running production system.