Hello! I am Vicky Rampin, the Librarian for Research Data Management and Reproducibility. I am also the liaison to computer science and data science programs at NYU! I am here to help you navigate the resources for both at NYU and beyond. You can set up an appointment with me or always email me at: vs77@nyu.edu.
If you need help with a specific quantitative, GIS, or qualitative software, you should reach out to Data Services.
NYU's High Speed Research Network department has an alpha version of Kubernetes available for the NYU community to test out. More information here: https://k8s-docs.hsrn.nyu.edu/
At some point in your data science career, you will probably need access to some medium or high performance computing infrastructure. You would use the HPC when you need to deal with some data that is too large to be dealt with on your local machine.
HPC infrastructure is basically a bunch of clusters (computers) to which people submit jobs (scripts) and wait for them to be first in the queue, then run. Runs last from minutes to days, depending on the size of the input data. Typically you have to use the command line to access clusters. So a typical workflow might look like this:
NYU Research Technology's High Performance Computing department maintains computing infrastructure that is available to the NYU community for research and teaching and learning. The HPC team also provides classes (live and online) and support for using the clusters.
Below is a table showing the different compute infrastructure offered. There is also the Secure Research Data Environment service, which is available on a case-by-case basis. This is a custom secure research environment for you to store sensitive data and analyze it.
Cluster | Short info, Use cases |
Greene - HPC Cluster |
The Greene cluster is named after Greene street in SoHo, a neighborhood in Lower Manhattan in New York City, near NYU. The cluster has also "green" characteristics, such as most of the cluster nodes are water cooled and it is deployed in a power efficient data center.
|
Cloud |
There are a few options for those who need cloud infrastructure for their work: |
The HPC team has also put together documentation on how to transfer large amounts of data to and from the NYU HPC: