Invited Keynote
Prof. Lorena Barba, George Washington University, USA
On undertaking a full replication study of a previous publication by our own research group, we learned some new lessons about reproducible computational research. The previous study used an in-house code to solve the Navier-Stokes equations for flow around an object, using GPU hardware. As is common in CFD applications, we rely on an external library for solving linear systems of equations, in this case, the NVIDIA Cusp library. We later did a code re-write of the CFD solver in a distributed parallel setting, on CPU hardware. This version uses the PETSc library for solving linear systems of equations. In addition, we used two open-source CFD libraries: OpenFOAM and IBAMR (from New York University). Apart from the many things that can go wrong with discretization meshes and boundary conditions, we found that simply using a different version of an external library can lead to different results, for example. In view of this exercise, we tightened our reproducibility practices even more. Open data and code are the minimum requirement; we also require a fully automated workflow, systematic records of dependencies, environment and system, and only scripted visualizations (no GUI manipulation). We must also raise awareness of numerical non-reproducibility of parallel software and include this topic as part of the training of HPC researchers.