.
Image credit: Department of Defense. American Forces Information Service. Defense Visual Information Center. 1994. Retrieved from the Digital Public Library of America.
A lot of the documentation this guide discusses involves writing detailed instructions--intended for both another person and for another computer--that describe how to recreate the environment where you initially wrote your software. A container takes these instructions another step further by actually creating a virtual environment in which a user can run the software. When you run software in a container, it can only access whatever other programs, libraries, and data you have placed inside the container. This isolation is useful, as it ensures that other versions of software installed on someone else's computer won't sneak into the workflow you've designed to cause problems.
This is useful for several reasons:
Docker and Singularity are both open source tools for containerizing software. They function as operating-system level virtual environments, which require fewer computing resources than an entire virtual machine. The tech industry tends to use Docker, and consequently it has more features and integrations. However, Singularity is required by the NYU HPC and by most other university-based high performance computing environments because of how it handles permissions and security. Luckily, Singularity can open and convert Docker containers.
You can also export a Docker or Singularity container to a file, which you can then use to share or archive your project and its environment.
Several projects focused on scientific computation and reproducibility allow you to test, run, and display your project online, either using a container or using a well-documented GitHub repository. NYU provides access to several of these services.
We strongly recommend using the NYU-created tool ReproZip to preserve projects at their conclusion.
ReproZip is a software developed by the ViDA (Visualization and Data Analysis) Center at NYU. It's a tool aimed at simplifying the process of creating reproducible research from command-line executions. It creates a self-contained package that have all the binaries, files, and dependencies required to reproduce research on the author’s computational environment. A reviewer can then unpack the research in their own environment to reproduce the results, even if the environment has a different operating system from the original one.
ReproZip has two main steps:
The packing step happens in the original environment and generates a compendium of the experiment so as to make it reproducible. ReproZip creates a .rpz
file, which contains all the necessary information and components for the experiment.
The unpacking step reproduces the experiment from the .rpz
file. With ReproUnzip you CAN simply decompressing the files in a directory to starting a full virtual machine, and they can be used interchangeably from the same packed experiment.
ReproZip has users across domains, from digital humanities to machine learning. Check out some multi-disciplinary examples and ReproZip video demos on YouTube!