Worker specific requirements are written in this section. It covers only basic requirements, additional runtimes or tools may be needed depending on type of use. The package names are for CentOS in not specified otherwise.
- ZeroMQ in version at least 4.0, packages `zeromq` and `zeromq-devel` (`libzmq3-dev` on Debian)
- YAML-CPP library, `yaml-cpp` and `yaml-cpp-devel` (`libyaml-cpp0.5v5` and `libyaml-cpp-dev` on Debian)
- libcurl library `libcurl-devel` (`libcurl4-gnutls-dev` on Debian)
- libarchive library as optional dependency. Installing will speed up build process, otherwise libarchive is built from source during installation. Package name is `libarchive` and `libarchive-devel` (`libarchive-dev` on Debian)
First, we need to compile sandbox Isolate from source and install it. Current worker is tested against version 1.3, so this version needs to be checked out. Assume that we keep source code in `/opt/src` dir. For building man page you need to have package `asciidoc` installed.
For proper work Isolate depends on several advanced features of the Linux kernel. Make sure that your kernel is compiled with `CONFIG_PID_NS`, `CONFIG_IPC_NS`, `CONFIG_NET_NS`, `CONFIG_CPUSETS`, `CONFIG_CGROUP_CPUACCT`, `CONFIG_MEMCG`. If your machine has swap enabled, also check `CONFIG_MEMCG_SWAP`. With which flags was your kernel compiled with can be found in `/boot` directory, file `config-` and version of your kernel. Red Hat based distributions should have these enabled by default, for Debian you you may want to add the parameters `cgroup_enable=memory swapaccount=1` to the kernel command-line, which can be set by adding value `GRUB_CMDLINE_LINUX_DEFAULT` to `/etc/default/grub` file.
- Disable address space randomization. Create file `/etc/sysctl.d/10-recodex.conf` with content `kernel.randomize_va_space=0`. Changes will take effect after restart or run `sysctl kernel.randomize_va_space=0` command.
- Disable dynamic CPU frequency scaling. This requires setting the cpufreq scaling governor to _performance_.
Note that `rpm` and `deb` packages are build in the same time. You may need to have `rpmbuild` command (usually as `rpmbuild` or `rpm` package) or edit CPACK_GENERATOR variable in _CMakeLists.txt_ file in root of source code tree.
_Note:_ If you don't want to generate binary packages, you can just install the project with `make install` (as root). But installation through your distribution's package manager is preferred way to keep your system clean and manageable in long term horizon.
From beginning we are determined to support Windows operating system on which some of the workers may run (especially for projects in C# programming language). Support for Windows is quite hard and time consuming and there were several problems during the development. To ensure capability of compilation on Windows we set up CI for Windows named [Appveyor](http://www.appveyor.com/). However installation should be easy due to provided installation script.
There are only two additional dependencies needed, **Windows 7 and higher** and **Visual Studio 2015+**. Provided simple installation batch script should do all the work on Windows machine. Officially only VS2015 and 32-bit compilation is supported, because of hardcoded compile options in installation script. If different VS or different platform is needed, the script should be changed to appropriate values, which is simple and straightforward.
Mentioned script is placed in *install* directory alongside supportive scripts for UNIX systems and is named *win-build.cmd*. Provided script will do almost all the work connected with building and dependency resolving (using **NuGet** package manager and `msbuild` building system). Script should be run under 32-bit version of _Developer Command Prompt for VS2015_ and from *install* directory.
- *-build* -- It's the default options if none specified. Builds worker and its tests, all is saved in *build* folder and subfolders.
- *-clean* -- Cleanup of downloaded NuGet packages and built application/libraries.
- *-test* -- Build worker and run tests on compiled test cases.
- *-package* -- Generation of clickable installation using cpack and [NSIS](http://nsis.sourceforge.net/) (has to be installed on machine to get this to work).
Following text describes how to set up and run **worker** program. It's supposed to have required binaries installed. Also, using systemd is recommended for best user experience, but it's not required. Almost all modern Linux distributions are using systemd now.
Worker should have some default configuration which is applied to worker itself or may be used in given jobs (implicitly if something is missing, or explicitly with special variables). This configuration should be hardcoded and can be rewritten by explicitly declared configuration file. Format of this configuration is yaml with similar structure to job configuration.
- **worker-id** - unique identification of worker at one server. This id is used by _isolate_ sanbox on linux systems, so make sure to meet isolates requirements (default is number from 1 to 999).
- **broker-uri** - URI of the broker (hostname, IP address, including port, ...)
- _broker-ping-interval_ - time interval how often to send ping messages to broker. Used units are milliseconds.
- _max-broker-liveness_ - specifies how many pings in a row can broker miss without making the worker dead.
- _threads_ - information about available threads for this worker
- **hwgroup** - hardware group of this worker. Hardware group must specify worker hardware and software capabilities and it's main item for broker routing decisions.
- _working-directory_ - where will be stored all needed files. Can be the same for multiple workers on one server.
- **file-managers** - addresses and credentials to all file managers used (eq. all different frontends using this worker)
- **hostname** - URI of file manager
- _username_ - username for http authentication (if needed)
- _password_ - password for http authentication (if needed)
- _file-cache_ - configuration of caching feature
- _cache-dir_ - path to caching directory. Can be the same for mutltiple workers.
- _logger_ - settings of logging capabilities
- _file_ - path to the logging file with name without suffix. `/var/log/recodex/worker` item will produce `worker.log`, `worker.1.log`, ...
- _level_ - level of logging, one of `off`, `emerg`, `alert`, `critical`, `err`, `warn`, `notice`, `info` and `debug`
- _max-size_ - maximal size of log file before rotating
- _limits_ - default sandbox limits for this worker. All items are described in assignments section in job configuration description. If some limits are not set in job configuration, defaults from worker config will be used. Also, limits in job configuration cannot exceed limits from worker. In such case the worker's defaults will be set as the maximum for the job.
Isolate is used as one and only sandbox for linux-based operating systems. Headquarters of this project can be found at [GitHub](https://github.com/ioi/isolate) and more of its installation and setup can be found in [installation](#installation) section.
New feature in isolate is possibility of limit isolate box to one or more cpu or memory node. This functionality is provided by cpusets kernel mechanism and is now integrated in isolate. It is allowed to set only `cpuset.cpus` and `cpuset.mems` which should be just fine for sandbox purposes. As kernel functionality further description can be found in manual page of cpuset or in linux documentation in section `linux/Documentation/cgroups/cpusets.txt`. As previously stated this settings can be applied for particular isolate boxes and has to be written in isolate configuration. Standard configuration path should be `/usr/local/etc/isolate` but it may depend on your installation process. Configuration of cpuset in there is really simple and is described in example below.
**cpuset.cpus:** Cpus limitation will restrict sandboxed program only to processor threads set in configuration. On hyperthreaded processors this means that all virtual threads are assignable not only the physical ones. Value can be represented by single one, list of values separated by commas or range with hyphen delimiter.
**cpuset.mems:** This value is particularly handy on NUMA systems which has several memory nodes. On standard desktop computers this value should always be zero because only one independent memory node is present. As stated in `cpus` limitation there can be single value, list of values separated by comma or range stated with hyphen.
WrapSharp is sandbox for programs in C# written also in C#. We have written it as a proof of concept sandbox for using in Windows environment. However, it's not properly tested and integrated to the worker yet. With just a little bit of effort there can be a running sandbox for C# programs on Windows system.
Cleaner is integral part of **worker** which manages its cache folder, mainly deletes outdated files. Every cleaner maintains its one and only cache folder, which can be used by multiple workers. This means on one server there can be numerous instances of workers with the same cache folder, but there can be (and should be) only one cleaner.
Cleaner is written in **Python** and is used as simple script which just does its job and ends and therefore has to be cronned. For proper function of cleaner some suitable cronning interval has to be used. Its recommended to use 24 hour interval which should be sufficient enough.
There is a bit of catch with cleaner service, to work properly, server filesystem has to have enabled last access timestamp. Cleaner checks these stamps and based on them it decides if file will be deleted or not, simple write timestamp or created at timestamp are not enough to reflect real usage and need of particular file. Last access timestamp feature is a bit controversial (more on this subject can be found [here](https://en.wikipedia.org/wiki/Stat_%28system_call%29#Criticism_of_atime)) and its not by default enabled on conventional filesystems. In linux this can be solved by adding `strictatime` option to `fstab` file. On Windows following command has to be executed (as administrator) `fsutil behavior set disablelastaccess 0`.
- run `python setup.py bdist_rpm --post-install ./cleaner/install/postinst` to generate binary `.rpm` package
- install package using `sudo dnf install ./dist/recodex-cleaner-0.1.0-1.noarch.rpm` (depends on actual version)
#### Other Linux systems
- run installation as `python setup.py install --install-scripts /usr/bin`
- run postinst script as root with `sudo ./cleaner/install/postinst`
#### Windows
- start `cmd` with administrator permissions
- decide in which folder cleaner should be installed, `C:\Program Files\ReCodEx\cleaner` is assumed
- run installation with `python setup.py install --install-scripts "C:\Program Files\ReCodEx\cleaner"` where path specified with `--install-scripts` can be changed
- copy configuration file alongside with installed executable using `copy install\config.yml "C:\Program Files\ReCodEx\cleaner\config.yml"`
### Configuration and usage
#### Configuration items
- **cache-dir** - directory which cleaner manages
- **file-age** - file age in seconds which are considered outdated and will be deleted
As stated before cleaner should be cronned, on linux systems this can be done by built in `cron` service or if there is `systemd` present cleaner itself provides `*.timer` file which can be used for cronning from `systemd`. On Windows systems internal scheduler should be used.
- Running cleaner from command line is fairly simple: `recodex-cleaner -c /etc/recodex/cleaner`
- Add cleaner to Windows cheduler service with following command: `schtasks /create /sc daily /tn "ReCodEx Cleaner" /tr "\"C:\Program Files\ReCodEx\cleaner\recodex-cleaner.exe\" -c \"C:\Program Files\ReCodEx\cleaner\config.yml\""`