Hello! I am Vicky Rampin, the Librarian for Research Data Management and Reproducibility. I am also the liaison to computer science and data science programs at NYU! I am here to help you navigate the resources for both at NYU and beyond. You can set up an appointment with me or always email me at: vs77@nyu.edu.
If you need help with a specific quantitative, GIS, or qualitative software, you should reach out to Data Services.
Programming is a key activity of all data scientists. The languages data scientists use vary, so one key skill to cultivate is to learn how to learn new programming languages. Do you learn best through taking on a project and learning as you go from a tutorial? Or do you prefer to read about the languages, then try through an open course? Or, do you follow YouTube videos? However you learn technology, that is a skill you'll definitely want to refine as you move through your data science career.
This page lists some resources for learning Python and R, the two most popular programming language for academic data science work. I would recommend also learning how to version control your code as a core data science skill.
NYU Library databases likely to contain relevant resources:
Skillsoft Books is an online collection of computer technology-related ebooks. It contains hundreds of books and videos from respected IT publishers such as MIT Press, Microsoft Press, Osborne/McGraw-Hill, Que, Sams, Sybex and Wiley. Use it to search for a wide variety of books and videos, ranging from beginners level to advanced (Microsoft Word for beginners or an advanced programming language).
Selected books on the topic (available online):
NYU Library databases likely to contain relevant resources:
Skillsoft Books is an online collection of computer technology-related ebooks. It contains hundreds of books and videos from respected IT publishers such as MIT Press, Microsoft Press, Osborne/McGraw-Hill, Que, Sams, Sybex and Wiley. Use it to search for a wide variety of books and videos, ranging from beginners level to advanced (Microsoft Word for beginners or an advanced programming language).
Selected books on the topic (available online):
Projects making use of data software such as R, Stata, SPSS, SAS, and Matlab as well as programming languages like Python should take account of best practices in writing scripts, do-files, and documentation for the many steps of the data transformation and analysis process. This is a part of good internal research data management for individuals and collaborators, but it is also becoming increasingly vital (even mandatory) to meet the demand for reproducibility of data-driven research.
Do | Don't |
---|---|
Comment your script frequently | Embed ##comments## within a line of code |
Load dependencies (libraries, input data) at the beginning of a script | Undermine readability by not using indentation, bracketing, and other stylistic conventions |
Follow style conventions appropriate for the language being used | Leave code that doesn't actually do anything anymore in your script |