Skip to Main Content

Data Science

A guide with resources for the data science community on campus.

PROGRAMMING

Programming is a key activity of all data scientists. The languages data scientists use vary, so one key skill to cultivate is to learn how to learn new programming languages. Do you learn best through taking on a project and learning as you go from a tutorial? Or do you prefer to read about the languages, then try through an open course? Or, do you follow YouTube videos? However you learn technology, that is a skill you'll definitely want to refine as you move through your data science career.

This page lists some resources for learning Python and R, the two most popular programming language for academic data science work. I would recommend also learning how to version control your code as a core data science skill.

Python Learning Resources (tabbed)

NYU Library databases likely to contain relevant resources:

Selected books on the topic (available online):

R Learning Resources (tabbed)

NYU Library databases likely to contain relevant resources:

Selected books on the topic (available online):

Commenting your code

Projects making use of data software such as R, Stata, SPSS, SAS, and Matlab as well as programming languages like Python should take account of best practices in writing scripts, do-files, and documentation for the many steps of the data transformation and analysis process. This is a part of good internal research data management for individuals and collaborators, but it is also becoming increasingly vital (even mandatory) to meet the demand for reproducibility of data-driven research.

Do Don't
Comment your script frequently Embed ##comments## within a line of code
Load dependencies (libraries, input data) at the beginning of a script Undermine readability by not using indentation, bracketing, and other stylistic conventions
Follow style conventions appropriate for the language being used Leave code that doesn't actually do anything anymore in your script