Artifact Evaluation — Reviewer Instructions

First, please carefully read the author instructions. In your evaluation of an artifact, make sure that it adheres to the instructions for authors. In this document, we emphasize certain points that are especially relevant to reviewers.

Timeline

There will be two rounds of reviews. One for tool papers, and a later one for regular papers. We expect that most reviewers will be assigned artifacts from both categories, and so will need to review artifacts in two separate periods.

Tool papers artifacts:

  • Artifact submission deadline for tool papers: May 2 May 9
  • Artifact bidding due: May 2 (Note that to bid for an artifact, it is enough to look at the paper) May 9
  • Smoke test reviews due: May 9 May 16
  • Possible fixed artifacts due: May 14 May 21
  • [AEC] Artifact reviews due: Jun 4 June 11
  • [PC] Post-response discussion Jun 8–Jun 16 June 15 – June 23
  • Author notification: Jun 19 June 26

Regular papers artifacts:

  • Artifact submission for regular papers: June 25 July 2
  • Bidding due: June 30 (Note that to bid for an artifact, it is enough to look at the paper) July 7
  • Smoke test reviews due: July 8 July 15
  • Possible fixed artifacts due: July 11 July 18
  • Artifact reviews due: August 1 August 8
  • Artifact notification: August 10 August 17
  • Camera-ready version: August 15 August 22

Smoke Test

  • The goal of the smoke test is to avoid having good artifacts rejected for bad reasons, such as minor technical issues or limited resources.
  • This is an opportunity for the authors to fix very minor issues with their artifacts.
  • There is no planned interaction with the authors after the smoke test period, so please make sure to verify that the artifact can indeed be evaluated by the end of this period.
  • Please do not use the smoke test period to check that the artifact corresponds to the results in the paper, and focus on the technical possibility to run the artifact.

Evaluation Process

  • Be prepared: please make sure you have access to a machine that can run the VM and docker in advance.
  • Allocate enough time for the evaluation. Some artifacts take a long time to run and reproduce, so leave enough time as a buffer. If you identify any potential issues, please contact us as soon as possible.
  • Pay special attention to the difference between the three evaluation criteria from the instructions from authors: functional means you can reproduce the results of the paper, while reusable means that you can use the tool without the script regardless of the entire artifact, as a standalone. So, for reusability, try to use the tool on a new input, or on a variant of an input from the artifact. The documentation of the submitted tool should then include instructions for using the tool as a standalone. For availability, simply verify that it the artifact is available online in a complete manner.
  • You are expected to use the scripts and README from the authors. For script, please also briefly look at the scripts themselves in order to make sure that they do what you expect them to do.

Your Review

While we do not ask for any specific structure of your reviews, we do ask that you keep in mind the following considerations:

  • Your review should be constructive and respectful. The idea is that the authors of the artifact will be able to improve their future artifact based on your feedback.
  • If something was done in a way that exceeded your expectations, don’t hesitate to mention it.
  • The evaluation of the artifact itself should be done carefully. We do not want to accept artifacts that are not of high enough quality, and also do not want to reject great artifacts. Make sure to check everything carefully before making your recommendation.
  • Before you decide that a part of the artifact is not reproducible, make sure you’ve carefully read the README submitted by the authors.
  • On the other hand, try not to fill in missing details yourself — the README submitted by the authors should be self-contained and detailed enough for any computer-science professional, without assuming any particular expertise beyond that.
  • Check trends, not numbers: it is expected that you will not get exactly the same results that are reported in the paper, because of differences between your machines and the authors’. Therefore, focus your attention on trends rather than on numbers, e.g., it is ok if the paper reported that tool A runs in 3 seconds and tool B runs in 6 seconds, while on your machines the numbers are around 5 and 10).

For Questions

Please try to identify any potential problems as early as you can. If you have any questions, feel free to contact the Artifact Evaluation Committee chairs.

  • Martin Jonáš, Masaryk University (martin.jonas@mail.muni.cz)
  • Mathias Preiner, Stanford University (preiner@cs.stanford.edu)
  • Yoni Zohar, Bar-Ilan University (yoni.zohar@biu.ac.il)