Competition Track

Functional Errors

This track is focused on fixing functional correctness errors in real-world software programs. Repair tools will be fixing bugs with a failing test-case, accompanied by a set of passing test cases from a real-world application.

The competition will run using Cerberus platform for program repair. All repair tools will be executed using Docker containers using machines at the National University of Singapore. The server machines have the following specifications.

The competition platform provides the repair tools with parameters and meta-data when started for a repair task. In order to participate at APR-Comp, the repair tool needs to be integrated into the Cerberus platform under directory app/drivers/tools/repair using a pull request.

Specification

Each repair task will be executed on a separate container created using the author provided docker image. For each track, following resource limits will be enforced to the running container.


    Processor: Intel(R) Xeon(R) Platinum 8468V CPU @ 2.40GHz
    Cache: 97.5 MB
    Number of Cores: 8
    Memory: 64 GB
    GPU: 2 x NVIDIA A40 GPU (48 GB GDDR6 RAM)
    Time Duration: 1 hour
                  

Setup

Each track would contain N number of repair tasks, for each repair task the tool will be invoked once in which the specified time and resource constraints will be applied. The exact value for N will differ for each track but consistent within all sub-tracks.

Expected Output

The repair tool should generate at most 5 patch files in unified diff format for the provided source code. The patch will be applied on the original source-code, those which fails to successfully apply will be considered invalid. Please ensure your repair driver places the generated patches in a subdirectory named patches within self.output directory.

Sub Tracks

The first edition of APR-Comp will target the two most widely supported programming languages, for logical error repair.

Benchmark programs will be selected from ManyBugs and DBGBench.
Demo Benchmark: GitHub Repo
Time Duration: 1 hour
Input to Repair: source code, test-suite (at least one failing test case)

Benchmark programs will be selected from Defects4J, Bears and Bugs.Jar.
Demo Benchmark: GitHub Repo
Time Duration: 1 hour
Input to Repair: source code, test-suite (at least one failing test case)

Timeout

If a tool becomes unresponsive after specified timeout (i.e. hangs) then the tool is killed after 5 min of wall time, and the resulting runtime is set to +5 minutes (specified timeout + 5).

Disclaimer

The organizers reserve the right to make software, hardware and configuration changes before the competition. Please exercise your tools on a few benchmarks to ensure that they run successfully!