|Mondays:||12pm - 8pm|
|Tuesdays:||12pm - 8pm|
|Wednesdays:||12pm - 6pm|
|Thursdays:||12pm - 6pm|
|Fridays:||12pm - 6pm|
Data Services workstations are available for walk-in use whenever the library stacks are open.
The ultimate goal of data management is the reproducibility of an experiment and the reuse of its results.
Reproducibility of scientific research is imperative: it helps researchers verify results, and it allows others to build on them, advancing the global body of scientific knowledge.
However, with experiments becoming increasingly complex and digital, researchers have to rely on data described in papers, or secondary data if it is supplied. This leaves out data critical to understanding the composition of an experiment: descriptions of column names in tabular data, libraries used in scripting or computational experiments, algorithms used in machine learning, even software used to view files.
Funders, award-granting institutions, and peer-reviewed journals are beginning to take notice of the general lack of reproducibility plaguing many scientific communities. Websites such as Retraction Watch have sprung up to track which journal articles are being retracted, most of the time because of issues with the data, mainly in reproducing the data.
By taking proper care of your data throughout it's lifecycle, as detailed in this guide, you can avoid the horribly embarrassing fate of getting a paper retracted or research defunded.
ReproZip is a software developed by the ViDA (Visualization and Data Analysis) group at NYU. It's a tool aimed at simplifying the process of creating reproducible research from command-line executions. It creates a self-contained package that have all the binaries, files, and dependencies required to reproduce research on the author’s computational environment. A reviewer can then unpack the research in their own environment to reproduce the results, even if the environment has a different operating system from the original one.
ReproZip has two main steps: