4.5 KiB
Assignments
Assignments are programming tasks that can be tested by a worker after a user submits their solution.
Configuration format
An assignment is described by a YAML file that contains information on how to build, run and test it. The testing process is divided into stages.
Tests
A test is a pair of files, where the first one specifies an input and the other one the expected output. Tests can be organized into groups for easier organization and to facilitate more complex grading scenarios (e.g. "An assignment passes only if it passes all tests in group X").
Tests are run by a special program called a judge, which compares the output of the program to the expected output. By setting a judge, we can specify how strict the testing is - for example, some assignments require the solution to output exactly the same bytes as expected. Others permit any number of whitespace characters between words of the output.
Stages
A stage is a logical unit of the testing process. It specifies how to do a step in the build process and how to test if it behaves correctly. The following is contained in its configuration:
- Name - a unique string identifier of the stage
- Build command (optional) - used to prepare the submitted files for this test stage
- Test list - specifies the tests (or test groups) to be run during this stage
- Test command - used to run one specific test
- Test input policy - how to pass the test input to the program?
- redirect it to its standard input
- pass the name of the input file as an argument
- Judge - which judge should be used to evaluate the solution's output? Custom judges can be supplied with the assignment.
- Error policy (optional) - what should we do when a test fails?
- interrupt the stage (default)
- continue with the next test
- jump to another stage (TODO cycle detection?)
- Success policy (optional) - what to do when all tests pass?
- jump to another stage (the next one by default)
- end the evaluation, even if there are still unprocessed stages
Case study
We present some of the courses that might use ReCodEx to evaluate homework assignments and outline the setup of the evaluation with respect to the concept of stages.
Simple programming exercises
For example introductory programming courses such as Programming I or Programming in Java.
In the simplest case we only need one stage that builds the program and passes
the test inputs to its standard input. We will use the C language for this
example. The build command is gcc source.c
, the test command is ./a.out
.
Compiler principles
This course uses multiple tools in a pipeline-like fashion - for example flex
and bison
.
We create a stage for each of the steps of this pipeline - we run flex and test the output, then we run bison and do the same.
XML technologies
In this course, students choose a topic they model using XML - for example a library or a bulletin board. During the semester, they expand this project by adding XSLT transformations, XQuery scripts, XPath queries, etc. These are tested against fixed requirements (e.g. using some particular language constructs).
This course already has a rather sophisticated application for testing homework assignments, so we only include it for demonstration purposes.
Because every assignment focuses on a different technology, we would need a new type of stage for each one. These stages would only run some checker programs against the submitted sources (and possibly try to check their syntax etc.).
Non-procedural programming
This course is different from other programming courses, because it only teaches input/output manipulation by the end of the semester. In their assignments, students are mostly required to write a function/predicate that behaves according to a specification (e.g. appends an item at the end of a list).
Due to this, we need to take the function submitted by a student and combine it with a snippet of code that reads the standard input and calls the submitted function. This could be achieved by setting the build command.
Operating systems
The operating systems course requires students to work on a simple OS kernel
that is then run in a MIPS simulator called msim
. There are various tests that
check if the student's implementation of core OS mechanisms is correct. These
tests are compiled into the kernel.
Each of these tests could be represented by a stage that compiles the kernel
with the test and then runs it against different configurations of msim
.