Abstract
Jupyter notebooks-documents that contain live code, equations,visualizations, and narrative text-now are among the most popular means to compute, present, discuss and disseminate scientificfindings. In principle, Jupyter notebooks should easily allow to reproduce and extend scientific computations and their findings; butin practice, this is not the case. The individual code cells in Jupyternotebooks can be executed in any order, with identifier usages preceding their definitions and results preceding their computations.In a sample of 936 published notebooks that would be executablein principle, we found that 73% of them would not be reproduciblewith straightforward approaches, requiring humans to infer (andoften guess) the order in which the authors created the cells.In this paper, we present an approach to (1) automatically satisfydependencies between code cells to reconstruct possible executionorders of the cells; and (2) instrument code cells to mitigate theimpact of non-reproducible statements (i.e., random functions) inJupyter notebooks. Our Osiris prototype takes a notebook as inputand outputs the possible execution schemes that reproduce theexact notebook results. In our sample, Osiris was able to reconstructsuch schemes for 82.23% of all executable notebooks, which hasmore than three times better than the state-of-the-art; the resultingreordered code is valid program code and thus available for furthertesting and analysis.
Original language | English |
---|---|
Title of host publication | Proceedings - 2020 ACM/IEEE 42nd International Conference on Software Engineering |
Subtitle of host publication | Companion Proceedings, ICSE-Companion 2020 |
Editors | Hyunsook Do, Tien N. Nguyen |
Place of Publication | New York NY USA |
Publisher | IEEE, Institute of Electrical and Electronics Engineers |
Pages | 288-289 |
Number of pages | 2 |
ISBN (Electronic) | 9781450371223 |
DOIs | |
Publication status | Published - 2020 |
Event | International Conference on Software Engineering 2020 - Online, Seoul, Korea, Republic of (South) Duration: 27 Jun 2020 → 19 Jul 2020 Conference number: 42nd https://dl.acm.org/doi/proceedings/10.1145/3377811 (Proceedings) https://conf.researchr.org/home/icse-2020 (Website) |
Publication series
Name | Proceedings - International Conference on Software Engineering |
---|---|
Publisher | The Association for Computing Machinery |
ISSN (Print) | 0270-5257 |
Conference
Conference | International Conference on Software Engineering 2020 |
---|---|
Abbreviated title | ICSE 2020 |
Country/Territory | Korea, Republic of (South) |
City | Seoul |
Period | 27/06/20 → 19/07/20 |
Internet address |
|
Keywords
- Jupyter Notebooks
- Osiris
- Python
- Reproducibility