ACM SIGMOD/PODS Conference: Vancouver, 2008
SIGMOD: Experimental Repeatability Requirements FAQ
- The requirement to repeat the experiment is a
penalty on faculty, given that the industry gets
a blank check (and some industry people say
that part of their work will always be in this
situation).
Response: There are many differences between academia and industry -- teaching, resources etc. Another difference is that IP constraints may prevent industry people from giving away source code. This does not mean their work does not need to be repeatable. Instead, industrial researchers may be able to use other means to make their experiments repeatable: give away restricted executables or small samples of anonymized data, provide detailed algorithm descriptions etc. A working group is being set up to establish acceptable procedures for industry submissions to future SIGMOD conferences.
Repeatability is not a wedge concern, giving part of the community an advantage over the rest. It is a general goal for any scientific activity. Some experiments will always be more difficult to repeat than others (for instance, long-running experiments over a period of 1 year will never be repeatable in the time frame of SIGMOD submission). Repeatable experiments have more value in the terms of insight gained. - Even if people do provide their code for testing,
(or archiving), comparing their code
implementing algorithm A with your code
implementing algorithm B gives no reasonable
conclusion, because the implementation quality
is not the same (if the sub-routines are
not the same, the language is not the same etc.)
Response:
Running algorithm A coded by somebody else
may give many insights other than its running time:
its results (e.g. a ranking algorithm), the size of its
result (e.g. a compression algorithm), its overall
data or query complexity, its behavior on inputs
not considered by the authors of the original paper
etc.
One should keep in mind also that response time is hardly the only (or the best) metric in many cases, where it makes more sense to count I/Os/cache misses or other features of the algorithm and backup everything by sound analytical models.
Finally, if authors of algorithm B actually want to compare the running time of their implementation to the running times of the original algorithm A implementation, the archive rules will require that they should contact the authors of algorithm A to establish whether the comparison is relevant and/or make their own judgement. Reviewers (the normal conference PC) will then form their own opinion. This situation is not new. For example, many recent works have shown that their engine is faster than a free engine, the latter being known for its completeness, not for being optimized. Reviewers do reject comparisons they find unfair. - Rather than giving reviewers access to
the authors' code, why don't we ask authors to
provide all information needed in order
to re-implement their code.
This serves for comparison among successive
works, not to check that the authors of the first
proposed version actually implemented
the approach.
For instance, we could add appendices to the
regular paper submissions, where authors would
provide all necessary detail so that a graduate
student can implement it again relatively fast.
Response: Detailed pseudo-code is always helpful. For SIGMOD 2008, the reseach paper submission format remains 12 pages. However, more documentation (e.g. an appendix with detailed algorithmic description, that reviewers are not obliged to read) could also be archived for accepted papers. Initially these appendices should be sent to the repeatability FTP site.
Repeated experiments still remain the gold standard of confidence. - Complying with the requirement
is too hard.
Response: Repeatability is fundamental to science, and is easier in our science than in most. However, sometimes it is not easy to achieve. We will try to work to make it easier over time. The SIGMOD repeatability evaluation process merely requires a running interface (shell scripts are OK), not complete GUIs. A large majority of the research done in our community is not extremely hardware and OS-specific. (Notable exceptions exist, such as work on new hardware. If free or commercial simulators of the special hardware you are working with exist, provide pointers to these simulators on the FTP site. If none is available, specify this in a comment on the FTP site. The unavailability of special hardware or a simulator thereof to the repeatability committee is one of the legitimate reasons why some experiments may not be repeated during the SIGMOD evaluation process.)
When possible, writing portable code with few dependencies will increase its scientific value. - What if my data is proprietary?
Response: One solution is to use proprietary data for some experiments and synthetic data for others. The synthetic data can then be used by other groups. As for the software, if you can find out before your submission whether a demo version can be obtained from some vendor, then you should include that information in your README. - I come from industry and our lawyers
forbid us from sending anything out before
it's thoroughly scrubbed.
What should I do?
Response: You should claim an Intellectual Property exemption in your explanation provided in the code Web site, together with a justification of the form "I plan to commercialize this or my company intends to commercialize this." We don't need to know who you are at this point. - Do I give up my rights to software by
sending it in?
Response: No. Not at all. Our only purpose is to test it. This does not constitute a public release at all. - Giving away code makes it easier for
people to steal your work.
Response: Code given for repeatability assessment will not be archived or re-used without the authors' permission. If some code is archived and re-used, the original authors' work will be cited. - Assume I don't want to put my code into an
archive. Then, what do I stand to gain by providing
code to the SIGMOD 2008 evaluation? Why
shouldn't I just write "it is too difficult" in the
explanation file?
Response: There are two benefits to you.- You enjoy the ability to put a sentence in the final publication indicating that your code has been found to be repeatable. Some colleagues consider this a stamp of scientific maturity.
- We will provide you with feedback. This may help you in the future if you must or choose to produce repeatable code.
- Assume I don't want to put my code into an archive.
Then, what is the purpose of asking for my code in a future
(e.g. SIGMOD 2009) evaluation? I do not like to be
considered guilty of fraud unless I can prove otherwise.
Response: Repeatability is fundamental to every science. This has been the case for natural scientists since Francis Bacon. As our field matures, articles in a prestigions venue like SIGMOD should, when possible, obtain added value by having their experiments repeated.
Questions about any of the requirements or reviewing process listed on this page should be directed to the SIGMOD 2008 Program Committee Chair Dennis Shasha (shasha@cs.nyu.edu).