Hello! I am Vicky Rampin, the Librarian for Research Data Management and Reproducibility. I am also the liaison to computer science and data science programs at NYU! I am here to help you navigate the resources for both at NYU and beyond. You can set up an appointment with me or always email me at: vs77@nyu.edu.
If you need help with a specific quantitative, GIS, or qualitative software, you should reach out to Data Services.
You will be working with a LOT of different project files when doing your data science work -- code, data, documentation, presentations, visualizations, articles, and more! So you really have to pay attention to store your materials. Even if your data management practices are pristine, if your data is at risk because there are no backups of it or the storage medium isn't reliable, then you will have trouble. Luckily, we have some resources at NYU and a few good rules of thumb to help you!
NYU ITS has a helpful chart comparing NYU storage options. Among them include:
NYU Drive for faculty, staff, and students (all-purpose file sharing via Google Apps for Education)
NYU Research Workspace for faculty, staff, and by request, students, designed for fast access to large datasets. Can get access to up to 5TB free after consultation.
NYU Box for faculty, staff, and by request, students, geared towards secure data needs
NYU Stream for faculty, staff, and students, specifically for audio, video, and image files with a focus on collaborative editing and linking with NYU Classes
NYU High Performance Computing Backups and Storage for those already using HPC for a project via the /archive data storage
We recommended that backups be saved in open or standard file formats, and not be compressed or encrypted (though sensitive data may require encryption). The UK Data Service also has a nice guide to data backups. Do not use CDs or DVDs as these have been known to fail frequently.
If you are planning on working with sensitive data, you should first review NYU's policy on transmitting and storing sensitive data and NYU's policy on data classification.
There are two options for storing secure data:
To keep data safe, it is recommended that folks follow the 3-2-1 Rule, which suggests you maintain three copies of your data on two different storage types, with 1 of those being offsite:
Both Google Drive and Box have desktop applications (Google Drive for Desktop, Box Drive) where folks can mount and access files quickly. When downloaded and installed, the applications create a folder that appears just like a My Documents folder, only it’s connected to your account on whatever service (so it’s Google Drive or Box in your file explorer). Then it operates like a two-way door: changes will be synced to and from your local computer to the service in the cloud.
This helps us stick to the 3-2-1 rule pretty nicely as well:
This looks something like this in practice:
During the data cleaning and data analysis phases, it is often necessary to push and pull data from an external storage source efficiently so as to integrate that data into a workflow. The following tools can provide useful ways of doing this: