You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

5.0 KiB

Raw Blame History

Assignments

Assignments are programming tasks that can be tested by a worker after a user submits their solution.

Configuration format

An assignment is described by a YAML file that contains information on how to build, run and test it. The testing process is divided into stages.

Tests

A test is a pair of files, where the first one specifies an input and the other one the expected output. Tests can be organized into groups for easier organization and to facilitate more complex grading scenarios (e.g. "An assignment passes only if it passes all tests in group X").

Tests are run by a special program called a judge, which compares the output of the program to the expected output. By setting a judge, we can specify how strict the testing is - for example, some assignments require the solution to output exactly the same bytes as expected. Others permit any number of whitespace characters between words of the output.

Stages

A stage is a logical unit of the testing process. It specifies how to do a step in the build process and how to test if the student's submission behaves correctly. After the evaluation, the worker outputs a log for every processed stage that contains information such as which tests passed and how many resources were used.

A stage's configuration contains the following (this doesn't yet map exactly to a particular configuration file format):

Name - a unique string identifier of the stage
Build command (optional) - used to prepare the submitted files for this test stage
Test list - specifies the tests (or test groups) to be run during this stage
Test command - used to run one specific test
Test input policy - how to pass the test input to the program?
- redirect it to its standard input (default)
- pass the path to an input file as an argument
Judge - which judge should be used to evaluate the solution's output? Custom judges can be supplied with the assignment.
Limits - how much memory, time, etc. can be used when evaluating a test
Error policy (optional) - what should we do when a test fails?
- interrupt the evaluation (default)
- continue with another test
- continue with another group
- jump to another stage
Success policy (optional) - what to do when all tests pass?
- jump to another stage (the next one by default)
- end the evaluation, even if there are still unprocessed stages

When jumping between stages, it's only possible to jump forward, so that no stage is evaluated multiple times.

Case study

We present some of the courses that might use ReCodEx to evaluate homework assignments and outline the setup of the evaluation with respect to the concept of stages.

Simple programming exercises

For example introductory programming courses such as Programming I or Java programming.

In the simplest case we only need one stage that builds the program and passes the test inputs to its standard input. We will use the C language for this example. The build command is gcc source.c, the test command is ./a.out.

Compiler principles

This course uses multiple tools in a pipeline-like fashion - for example flex and bison.

We create a stage for each of the steps of this pipeline - we run flex and test the output, then we run bison and do the same.

XML technologies

In this course, students choose a topic they model using XML - for example a library or a bulletin board. During the semester, they expand this project by adding XSLT transformations, XQuery scripts, XPath queries, etc. These are tested against fixed requirements (e.g. using some particular language constructs).

This course already has a rather sophisticated application for testing homework assignments, so we only include it for demonstration purposes.

Because every assignment focuses on a different technology, we would need a new type of stage for each one. These stages would only run some checker programs against the submitted sources (and possibly try to check their syntax etc.).

Non-procedural programming

This course is different from other programming courses, because it only teaches input/output manipulation by the end of the semester. In their assignments, students are mostly required to write a function/predicate that behaves according to a specification (e.g. appends an item at the end of a list).

Due to this, we need to take the function submitted by a student and combine it with a snippet of code that reads the standard input and calls the submitted function. This could be achieved by setting the build command.

Operating systems

The operating systems course requires students to work on a simple OS kernel that is then run in a MIPS simulator called msim. There are various tests that check if the student's implementation of core OS mechanisms is correct. These tests are compiled into the kernel.

Each of these tests could be represented by a stage that compiles the kernel with the test and then runs it against different configurations of msim.

5.0 KiB Raw Blame History