You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

22 KiB

Worker

Description

The worker's job is to securely execute submitted assignments and possibly evaluate results against model solutions provided by submitter. Worker is divided into two parts:

  • Listener - communicates with Broker through ZeroMQ. On startup, it introduces itself to the broker. Then it receives new jobs, passes them to the Evaluator part and sends back results and progress reports.
  • Evaluator - gets jobs from the Listener part, evaluates them (possibly in sandbox) and notifies the other part when the evaluation ends. This part also communicates with Fileserver, downloads supplementary files and uploads detailed results.

These parts run in separate threads that communicate through a ZeroMQ in-process socket. This design allows the worker to keep sending ping messages even when it's processing a job.

After receiving an evaluation request, Worker has to:

  • Download the archive containing submitted source files and configuration file
  • Download any supplementary files based on the configuration file, such as test inputs or helper programs (This is done on demand, using a fetch command in the assignment configuration)
  • Evaluate the submission accordingly to job configuration
  • During evaluation progress messages can be sent back to Broker
  • Upload the results of the evaluation to the Fileserver
  • Notify Broker that the evaluation finished

Header matching

Every worker belongs to exactly one hardware group and has a set of headers. These properties help the broker decide which worker is suitable for processing a request.

The hardware group is a string identifier used to group worker machines with similar hardware configuration -- for example "i7-4560-quad-ssd". It's important for assignments where running times are compared to those of reference solutions -- we have to make sure that both programs run on simmilar hardware.

The headers are a set of key-value pairs that describe the worker's capabilities -- which runtime environments are installed, how many threads can the worker run or whether it measures time precisely.

This information is sent to the broker on startup using the init command.

Architecture

Picture below is internal architecture of worker which shows its defined classes with private variables and public functions. Internal Worker architecture Worker class diagram

Installation

Dependencies

Worker specific requirements are written in this section. Some parts of this guide may be different for each type of worker, for example Compiler principles worker is not fully covered here.

Install libarchive library (optional)

Installing this package will only speed up build process, otherwise libarchive is built from source.

  • Debian package is libarchive-dev.
  • RedHat packages are libarchive and libarchive-devel. These are probably not available for RHEL.

Install Isolate from source

First, we need to compile sandbox Isolate from source and install it. Current worker is tested against version 1.3, so this version needs to be checked out. Assume that we keep source code in /opt/src dir. For building man page you need to have package asciidoc installed.

$ cd /opt/src
$ git clone https://github.com/ioi/isolate.git
$ cd isolate
$ git checkout v1.3
$ make
# make install && make install-doc

For proper work Isolate depends on several advanced features of the Linux kernel. Make sure that your kernel is compiled with CONFIG_PID_NS, CONFIG_IPC_NS, CONFIG_NET_NS, CONFIG_CPUSETS, CONFIG_CGROUP_CPUACCT, CONFIG_MEMCG. If your machine has swap enabled, also check CONFIG_MEMCG_SWAP. Which flags was your kernel compiled with can be found in /boot directory, for example in /boot/config-4.2.6-301.fc23.x86_64 file for kernel version 4.2.6-301. Red Hat distros should have these enabled by default, for Debian you you may want to add the parameters cgroup_enable=memory swapaccount=1 to the kernel command-line, which can be set using GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub.

For better reproducibility of results, some kernel parameters can be tweaked:

  • Disable address space randomization. Create file /etc/sysctl.d/10-recodex.conf with content kernel.randomize_va_space=0. Changes will take effect after restart or run sysctl kernel.randomize_va_space=0 command.
  • Disable dynamic CPU frequency scaling. This requires setting the cpufreq scaling governor to performance. TODO - do we really need it and is it worth higher power consumption? Red Hat setup, Debian setup

Clone worker source code repository

$ git clone https://github.com/ReCodEx/worker.git
$ git submodule update --init

Install worker on Linux

It's supposed that your current working directory is that one with clonned worker source codes.

  • Prepare environment running mkdir build && cd build
  • Build sources by cmake .. following by make -j# where '#' symbol refers to number of your CPU threads.
  • Build binary package by make package (may require root permissions). Note that rpm and deb packages are build in the same time. You may need to have rpmbuild command (usually as rpmbuild or rpm package) or edit CPACK_GENERATOR variable CMakeLists.txt file in root of source code tree.
  • Install generated package through your package manager (yum, dnf, dpkg).

Note: If you don't want to generate binary packages, you can just install the project with make install (as root). But installation through your distribution's package manager is preferred way to keep your system clean and manageable in long term horizon.

Install worker on Windows

From beginning we are determined to support Windows operating system on which some of the workers may run (especially for projects in C# programming language). Support for Windows is quite hard and time consuming and there were several problems during this. To ensure capability of compilation on Windows we set up CI for Windows named Appveyor. However installation should be easy due to provided installation script.

There are only two additional dependencies needed, Windows 7 and higher and Visual Studio 2015+. Provided simple installation batch script should do all the work on Windows machine. Officially only VS2015 and 32-bit compilation is supported, because of hardcoded compile options in installation script. If different VS is needed or different platform, script should be changed to appropriate values, it should be quite simple and straightforward.

Mentioned script is placed in install directory alongside supportive scripts for UNIX systems and is named win-build.cmd. Provided script will do almost all the work connected with building and dependency resolving (using NuGet package manager and msbuild building system). Script should be run under 32-bit version of Developer Command Prompt for VS2015 and from install directory.

Building and installing of worker is then quite simple, script has command line parameters which can be used to specify what will be done:

  • -build - Don't have to be specified. Build worker and all its tests, all is saved in build folder and subfolders.
  • -clean - Cleanup of downloaded NuGet packages and built application/libraries.
  • -test - Build worker and run tests on compiled test cases.
  • -package - Clickable installation generation using cpack and NSIS. NSIS have to be installed on machine to get this to work.
install> win-build.cmd  # same as: win-build.cmd -build
install> win-build.cmd -clean
install> win-build.cmd -test
install> win-build.cmd -package

All build binaries and cmake temporary files can be found in build folder, classically there will be subfolder Release which will contain compiled application with all needed dlls. Once if clickable installation binary is created, it can be found in build folder named something like recodex-worker-VERSION-win32.exe.

Clickable installation feature:

NSIS Installation

Compilers

For evaluating jobs you have to install tools that fit your needs. Here are some useful tips of different compillers to install.

C/C++

For compiling C and C++ programs is used GCC compiler. Maybe you could install install most of the staff by executing following command, but it's not needed.

# yum group install "Development Tools"

To install the compiler separately, you could install it from the distribution's repositories.

# yum install gcc gcc-c++ make

To get reasonably new version, you may consider installing Red Hat Developer Toolset 4, or install these from Fedora repo. In Debian, testing repo can be used.

C#

For new versions of Mono, we'll use Xamarin repositories. For Red Hat based OS:

# rpm --import "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF"
# yum-config-manager --add-repo http://download.mono-project.com/repo/centos/
# yum upgrade
# yum install mono-complete

For Debian based OS:

$ sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
$ echo "deb http://download.mono-project.com/repo/debian wheezy main" | sudo tee /etc/apt/sources.list.d/mono-xamarin.list
$ sudo apt-get update
$ sudo apt-get install mono-complete

Free Pascal

Free Pascal compiler 3.0.0 can be downloaded as rpm packages from official website. In Debian, this version is experimental, but seems to be stable enough.

Java TODO

Configuration and usage

Following text describes how to set up and run worker program. It's supposed to have required binaries installed. For instructions see Installation section. Also, using systemd is recommended for best user experience, but it's not required. Almost all modern Linux distributions are using systemd now.

The worker installation program is composed of following steps:

  • create config file /etc/recodex/worker/config-1.yml
  • create systemd unit file /etc/systemd/system/recodex-worker@.service
  • put main binary to /usr/bin/recodex-worker
  • put judges binaries to /usr/bin/recodex-judge-normal, /usr/bin/recodex-judge-shuffle and /usr/bin/recodex-judge-filter
  • create system user and group recodex with /sbin/nologin shell (if not existing)
  • create log directory /var/log/recodex
  • set ownership of config (/etc/recodex) and log (/var/log/recodex) directories to recodex user and group

Default worker configuration

Worker should have some default configuration which is applied to worker itself or may be used in given jobs (implicitly if something is missing, or explicitly with special variables). This configuration should be hardcoded and can be rewritten by explicitly declared configuration file. Format of this configuration is yaml like in the job config.

Configuration items

Mandatory items are bold, optional italic.

  • worker-id - unique identification of worker at one server. This id is used by isolate sanbox on linux systems, so make sure to meet isolates requirements (default is number from 1 to 999).
  • broker-uri - URI of the broker (hostname, IP address, including port, ...)
  • broker-ping-interval - time interval how often to send ping messages to broker. Used units are milliseconds.
  • max-broker-liveness - specifies how many pings in a row can broker miss without making the worker dead.
  • headers - headers specifies worker's capabilities
    • env - map of enviromental variables
    • threads - information about available threads for this worker
  • hwgroup - hardware group of this worker. Hardware group must specify worker hardware and software capabilities and it's main item for broker routing decisions.
  • working-directory - where will be stored all needed files. Can be the same for multiple workers on one server.
  • file-managers - addresses and credentials to all file managers used (eq. all different frontends using this worker)
    • hostname - URI of file manager
    • username - username for http authentication (if needed)
    • password - password for http authentication (if needed)
  • file-cache - configuration of caching feature
    • cache-dir - path to caching directory. Can be the same for mutltiple workers.
  • logger - settings of logging capabilities
    • file - path to the logging file with name without suffix. /var/log/recodex/worker item will produce worker.log, worker.1.log, ...
    • level - level of logging, one of off, emerg, alert, critical, err, warn, notice, info and debug
    • max-size - maximal size of log file before rotating
    • rotations - number of rotation kept
  • limits - default sandbox limits for this worker. All items are described in Assignments section.

Example config file

worker-id: 1
broker-uri: tcp://localhost:9657
broker-ping-interval: 10  # milliseconds
max-broker-liveness: 10
headers:
    env:
        - c
        - python
    threads: 2
hwgroup: "group1"
working-directory: /tmp/recodex
file-managers:
    - hostname: "http://localhost:9999"  # port is optional
      username: ""  # can be ignored in specific modules
      password: ""  # can be ignored in specific modules
file-cache:  # only in case that there is cache module
    cache-dir: "/tmp/recodex/cache"
logger:
    file: "/var/log/recodex/worker"  # w/o suffix - actual names will be worker.log, worker.1.log, ...
    level: "debug"  # level of logging
    max-size: 1048576  # 1 MB; max size of file before log rotation
    rotations: 3  # number of rotations kept
limits:
    time: 5  # in secs
    wall-time: 6  # seconds
    extra-time: 2  # seconds
    stack-size: 0  # normal in KB, but 0 means no special limit
    memory: 50000  # in KB
    parallel: 1  # time and memory limits are merged
    disk-size: 50
    disk-files: 5
    environ-variable:
        ISOLATE_BOX: "/box"
        ISOLATE_TMP: "/tmp"
    bound-directories:
        - src: /tmp/recodex/eval_5
          dst: /evaluate
          mode: RW,NOEXEC

Running the worker

A systemd unit file is distributed with the worker to simplify its launch. It integrates worker nicely into your Linux system and allows you to run it automatically on system startup. It's possible to have more than one worker on every server, so the provided unit file is templated. Each instance of the worker unit has a unique string identifier, which is used for managing that instance through systemd. By default, only one worker instance is ready to use after installation and its ID is "1".

  • Starting worker with id "1" can be done this way:
# systemctl start recodex-worker@1.service

Check with

# systemctl status recodex-worker@1.service

if the worker is running. You should see "active (running)" message.

  • Worker can be stopped or restarted accordigly using systemctl stop and systemctl restart commands.
  • If you want to run worker after system startup, run:
# systemctl enable recodex-worker@1.service

For further information about using systemd please refer to systemd documentation.

Adding new worker

To add a new worker you need to do a few steps:

  • Make up an unique string ID.
  • Copy default configuration file /etc/recodex/worker/config-1.yml to the same directory and name it config-<your_unique_ID>.yml
  • Edit that config file to fit your needs. Note that you must at least change worker-id and logger file values to be unique.
  • Run new instance using
# systemctl start recodex-worker@<your_unique_ID>.service

Sandboxes

Isolate

Isolate is used as one and only sandbox for linux-based operating systems. Headquarters of this project can be found at https://github.com/ioi/isolate and more of its installation and setup can be found in Installation section.

// TODO: further desc

Limit isolate boxes to particular cpu or memory node

New feature in isolate is possibility of limit isolate box to one or more cpu or memory node. This functionality is provided by cpusets kernel mechanism and is now integrated in isolate. It is allowed to set only cpuset.cpus and cpuset.mems which should be just fine for sandbox purposes. As kernel functionality further description can be found in manual page of cpuset or in linux documentation in section linux/Documentation/cgroups/cpusets.txt. As previously stated this settings can be applied for particular isolate boxes and has to be written in isolate configuration. Standard configuration path should be /usr/local/etc/isolate but it may depend on your installation process. Configuration of cpuset in there is really simple and is described in example below.

box0.cpus = 0  # assign processor with id 0 to isolate box with id 0
box0.mems = 0  # assign memory node with id 0
# if not set linux by itself will decide where should sandboxed program run
box2.cpus = 1-3  # assign range of processors to isolate box with id 2
box2.mems = 4-7  # assign range of memory nodes 
box3.cpus = 1,2,3  # assign list of processors to isolate box with id 3

cpuset.cpus: Cpus limitation will restrict sandboxed program only to processor threads set in configuration. On hyperthreaded processors this means that all virtual threads are assignable not only the physical ones. Value can be represented by single one, list of values separated by commas or range with hyphen delimiter.

cpuset.mems: This value is particularly handy on NUMA systems which has several memory nodes. On standard desktop computers this value should always be zero because only one independent memory node is present. As stated in cpus limitation there can be single value, list of values separated by comma or range stated with hyphen.

Cleaner

Description

Cleaner is integral part of worker which manages its cache folder, mainly deletes outdated files. Every cleaner maintains its one and only cache folder, which can be used by multiple workers. This means on one server there can be numerous instances of workers with the same cache folder, but there can be (and should be) only one cleaner.

Cleaner is written in Python and is used as simple script which just does its job and ends and therefore has to be cronned. For proper function of cleaner some suitable cronning interval has to be used. Its recommended to use 24 hour interval which should be sufficient enough.

Last access timestamp

There is a bit of catch with cleaner service, to work properly, server filesystem has to have enabled last access timestamp. Cleaner checks these stamps and based on them it decides if file will be deleted or not, simple write timestamp or created at timestamp are not enough to reflect real usage and need of particular file. Last access timestamp feature is a bit controversial (more on this subject can be found here) and its not by default enabled on conventional filesystems. In linux this can be solved by adding strictatime option to fstab file. On Windows following command has to be executed (as administrator) fsutil behavior set disablelastaccess 0.

Installation

To install and use the cleaner, it's necessary to have Python and Pip in version 3 installed.

  • Firstly dependencies of cleaner has to be installed:
pip install -r requirements.txt

Fedora (and other RPM distributions)

  • run python setup.py bdist_rpm --post-install ./cleaner/install/postinst to generate binary .rpm package
  • install package using sudo dnf install ./dist/recodex-cleaner-0.1.0-1.noarch.rpm (depends on actual version)

Other Linux systems

  • run installation as python setup.py install --install-scripts /usr/bin
  • run postinst script as root with sudo ./cleaner/install/postinst

Windows

  • start cmd with administrator permissions
  • decide in which folder cleaner should be installed, C:\Program Files\ReCodEx\cleaner is assumed
  • run installation with python setup.py install --install-scripts "C:\Program Files\ReCodEx\cleaner" where path specified with --install-scripts can be changed
  • copy configuration file alongside with installed executable using copy install\config.yml "C:\Program Files\ReCodEx\cleaner\config.yml"

Configuration and usage

Configuration items

  • cache-dir - directory which cleaner manages
  • file-age - file age in seconds which are considered outdated and will be deleted

Example configuration

cache-dir: "/tmp"
file-age: "3600"  # in seconds

Usage

As stated before cleaner should be cronned, on linux systems this can be done by built in cron service or if there is systemd present cleaner itself provides *.timer file which can be used for cronning from systemd. On Windows systems internal scheduler should be used.

  • Running cleaner from command line is fairly simple: recodex-cleaner -c /etc/recodex/cleaner
  • Enable cleaner service using systemd: systemctl start recodex-cleaner.timer
  • Add cleaner to linux cron service using following configuration line: 0 0 * * * /usr/bin/recodex-cleaner -c /etc/recodex/cleaner/config.yml
  • Add cleaner to Windows cheduler service with following command: schtasks /create /sc daily /tn "ReCodEx Cleaner" /tr "\"C:\Program Files\ReCodEx\cleaner\recodex-cleaner.exe\" -c \"C:\Program Files\ReCodEx\cleaner\config.yml\""