A guide with resources for the data science community on campus.

- WELCOME
- FINDING DATA
- COMPUTE RESOURCES
- STORAGE AND BACKUP
- PROGRAMMING
- VISUALIZATION ↗
- SHARING YOUR WORK
- WORKSHOP CALENDAR
- RESOURCES FOR INSTRUCTORS
- MS ORIENTATION

Hello! I am **Vicky Rampin**, the Librarian for Research Data Management and Reproducibility. I am also the liaison to computer science and data science programs at NYU! I am here to help you navigate the resources for both at NYU and beyond. You can set up an appointment with me or always email me at: vs77@nyu.edu.

If you need help with a specific quantitative, GIS, or qualitative software, you should reach out to Data Services.

Original work in this LibGuide is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Programming is a key activity of all data scientists. The languages data scientists use vary, so one key skill to cultivate is to learn how to learn new programming languages. Do you learn best through taking on a project and learning as you go from a tutorial? Or do you prefer to read about the languages, then try through an open course? Or, do you follow YouTube videos? However you learn technology, that is a skill you'll definitely want to refine as you move through your data science career.

This page lists some resources for learning Python and R, the two most popular programming language for academic data science work. I would recommend also learning how to version control your code as a core data science skill.

- Python Essential TrainingA 5 hour course for beginners which covers syntax, objects, loops conditionals, regular expressions, functions, etc...
- Master Python for Data ScienceA collection of courses aimed to introduce python and apply it to data analytics.
- Advance Your Skills as a Python Data ExpertA collection of courses covering numpy, pandas, NLP, and recommenders.

- Codecademy PythonInteractively learn Python through Codecademy.
- Online Python TutorA tool to help students understand what happens as the computer executes code.
- Google's Python ClassA nice introduction to Python including both text resources and in depth videos.
- Python for Non-ProgrammersA collection of resources from Python for users who have little-to-no programming experience.
- Tutorialspoint Python TutorialA quick introductory guide to using Python.
- Automate the Boring Stuff with PythonFor students, administrators, office workers, and anyone who uses a computer how to write small, practical programs to automate tasks on their computer.
- Think PythonStep by step introduction to the practice of programming.
- NumPy for R UsersA cheat sheet for users familiar with R but using NumPy.

*NYU Library databases likely to contain relevant resources:*

- Ebook Central This link opens in a new windowEbook Central is NYU's preferred ebook provider. Users can search, read, highlight, and annotate full-text books in many subject areas, including the social sciences and humanities.
- Skillsoft Books (formerly Books24x7) This link opens in a new windowSkillsoft Books (formerly Books24x7) is an online collection of computer technology-related ebooks. It contains hundreds of books and videos from respected IT publishers such as MIT Press, Microsoft Press, Osborne/McGraw-Hill, Que, Sams, Sybex and Wiley. Use it to search for a wide variety of books and videos, ranging from beginners level to advanced (Microsoft Word for beginners or an advanced programming language).
- O'Reilly Online Learning This link opens in a new windowO'Reilly's Safari Books Online provides access to ebooks related to technology, coding, developing, web design, and data visualization.If database is asking for "Sign In" information for content access, please refresh browser cache and cookies, and try the link again.

*Selected books on the topic (available online):*

- Python for Data Analysis by Wes McKinney Python for Data Analysis is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. It is also a practical, modern introduction to scientific computing in Python, tailored for data-intensive applications. This is a book about the parts of the Python language and libraries you'll need to effectively solve a broad set of data analysis problems. This book is not an exposition on analytical methods using Python as the implementation language. Written by Wes McKinney, the main author of the pandas library, this hands-on book is packed with practical cases studies. It's ideal for analysts new to Python and for Python programmers new to scientific computing. Use the IPython interactive shell as your primary development environment Learn basic and advanced NumPy (Numerical Python) features Get started with data analysis tools in the pandas library Use high-performance tools to load, clean, transform, merge, and reshape data Create scatter plots and static or interactive visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Measure data by points in time, whether it's specific instances, fixed periods, or intervals Learn how to solve problems in web analytics, social sciences, finance, and economics, through detailed examplesISBN: 9781449319793Publication Date: 2012-11-06
- Python data analytics : with Pandas, NumPy, and Matplotlib by Fabio Nelli Contents: : An introduction to data analysis -- Introduction to the Python world -- The NumPy library -- The pandas library : an introduction -- Pandas : reading and writing data -- Pandas in depth : data manipulation -- Data visualization with matplotlib -- Machine learning with scikit-learn -- Deep learning with TensorFlow -- An example : meteorological data -- Embedding the JavaScript D3 library in the IPython notebook -- Recognizing handwritten digits -- Textual data analysis with NLTK -- Image analysis and computer vision with OpenCV -- Appendix A: Writing mathematical expressions with LaTeX -- Appendix B: Open data sources. Summary: : Explore the latest Python tools and techniques to help you tackle the world of data acquisition and analysis. You'll review scientific computing with NumPy, visualization with matplotlib, and machine learning with scikit-learn. This revision is fully updated with new content on social media data analysis, image analysis with OpenCV, and deep learning libraries. Each chapter includes multiple examples demonstrating how to work with each library. At its heart lies the coverage of pandas, for high-performance, easy-to-use data structures and tools for data manipulation Author Fabio Nelli expertly demonstrates using Python for data processing, management, and information retrieval. Later chapters apply what you've learned to handwriting recognition and extending graphical capabilities with the JavaScript D3 library. Whether you are dealing with sales data, investment data, medical data, web page usage, or other data sets, Python Data Analytics, Second Edition is an invaluable reference with its examples of storing, accessing, and analyzing data.ISBN: 9781484239131Publication Date: 2018
- Python for Data Science for Dummies by John Paul Mueller; Luca Massaron The fast and easy way to learn Python programming and statistics Python is a general-purpose programming language created in the late 1980s--and named after Monty Python--that's used by thousands of people to do things from testing microchips at Intel, to powering Instagram, to building video games with the PyGame library. Python For Data Science For Dummies is written for people who are new to data analysis, and discusses the basics of Python data analysis programming and statistics. The book also discusses Google Colab, which makes it possible to write Python code in the cloud. Get started with data science and Python Visualize information Wrangle data Learn from data The book provides the statistical background needed to get started in data science programming, including probability, random distributions, hypothesis testing, confidence intervals, and building regression models for prediction.ISBN: 9781119547648Publication Date: 2019-01-25
- Data Science Using Python and R by Chantal D. Larose; Daniel T. Larose Learn data science by doing data science! Data Science Using Python and R will get you plugged into the world's two most widespread open-source platforms for data science: Python and R. Data science is hot. Bloomberg called data scientist "the hottest job in America." Python and R are the top two open-source data science tools in the world. In Data Science Using Python and R, you will learn step-by-step how to produce hands-on solutions to real-world business problems, using state-of-the-art techniques. Data Science Using Python and R is written for the general reader with no previous analytics or programming experience. An entire chapter is dedicated to learning the basics of Python and R. Then, each chapter presents step-by-step instructions and walkthroughs for solving data science problems using Python and R. Those with analytics experience will appreciate having a one-stop shop for learning how to do data science using Python and R. Topics covered include data preparation, exploratory data analysis, preparing to model the data, decision trees, model evaluation, misclassification costs, naïve Bayes classification, neural networks, clustering, regression modeling, dimension reduction, and association rules mining. Further, exciting new topics such as random forests and general linear models are also included. The book emphasizes data-driven error costs to enhance profitability, which avoids the common pitfalls that may cost a company millions of dollars. Data Science Using Python and R provides exercises at the end of every chapter, totaling over 500 exercises in the book. Readers will therefore have plenty of opportunity to test their newfound data science skills and expertise. In the Hands-on Analysis exercises, readers are challenged to solve interesting business problems using real-world data sets.ISBN: 9781119526841Publication Date: 2019-03-21
- Python Data Visualization Cookbook by Igor MilovanoviISBN: 9781782163374Publication Date: 2013-01-01

- R Statistics Essential TrainingA 6 hour detailed tutorial focusing on using R for basic statistics

- Quick-RA great quick reference that covers many common topics in R.
- RStudio Online LearningResources provided by the RStudio team which cover the basics of R programming and other tools the RStudio team has developed.
- UCLA Statistical Computing (R)A variety of learning modules, FAQ and case examples of using R for statistical computing.
- Rdocumentation.orgSearch through all available R packages on CRAN, Github and Bioconductor.
- R Reference CardA pdf guide that highlights important commands under several main topics in R programming.
- Cookbook for RProvides examples of common problems and their solutions in R.
- Advanced RHadley Wickham's reference for more advanced R users who want to improve their R programming skills.
- R-bloggersA collection of articles and blogs from around the R community.
- Code SchoolInteractive exercises for R beginners.
- Handling and Processing Strings in RGaston Sanchez's guide to handling strings in R.
- swirlInteractive courses through the swirl package.
- R Studio CheatsheetsCheatsheets created by the R Studio team which go over topics such as shiny, R Markdown and dplyr.
- R TutorialR tutorial from Clarkson University.
- R for Data ScienceAn online book with examples and exercises comprehensively covering basic and intermediate topics in R.

*NYU Library databases likely to contain relevant resources:*

- Ebook Central This link opens in a new windowEbook Central is NYU's preferred ebook provider. Users can search, read, highlight, and annotate full-text books in many subject areas, including the social sciences and humanities.
- Skillsoft Books (formerly Books24x7) This link opens in a new windowSkillsoft Books (formerly Books24x7) is an online collection of computer technology-related ebooks. It contains hundreds of books and videos from respected IT publishers such as MIT Press, Microsoft Press, Osborne/McGraw-Hill, Que, Sams, Sybex and Wiley. Use it to search for a wide variety of books and videos, ranging from beginners level to advanced (Microsoft Word for beginners or an advanced programming language).
- O'Reilly Online Learning This link opens in a new windowO'Reilly's Safari Books Online provides access to ebooks related to technology, coding, developing, web design, and data visualization.If database is asking for "Sign In" information for content access, please refresh browser cache and cookies, and try the link again.

*Selected books on the topic (available online):*

- R Recipes by Larry A. Pace R Recipes is your handy problem-solution reference for learning and using the popular R programming language for statistics and other numerical analysis. Packed with hundreds of code and visual recipes, this book helps you to quickly learn the fundamentals and explore the frontiers of programming, analyzing and using R. R Recipes provides textual and visual recipes for easy and productive templates for use and re-use in your day-to-day R programming and data analysis practice. Whether you're in finance, cloud computing, big or small data analytics, or other applied computational and data science - R Recipes should be a staple for your code reference library.ISBN: 9781484201312Publication Date: 2014-12-18
- R in Action by Rob Kabacoff Summary R in Action, Second Edition presents both the R language and the examples that make it so useful for business developers. Focusing on practical solutions, the book offers a crash course in statistics and covers elegant methods for dealing with messy and incomplete data that are difficult to analyze using traditional methods. You'll also master R's extensive graphical capabilities for exploring and presenting data visually. And this expanded second edition includes new chapters on time series analysis, cluster analysis, and classification methodologies, including decision trees, random forests, and support vector machines. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the Technology Business pros and researchers thrive on data, and R speaks the language of data analysis. R is a powerful programming language for statistical computing. Unlike general-purpose tools, R provides thousands of modules for solving just about any data-crunching or presentation challenge you're likely to face. R runs on all important platforms and is used by thousands of major corporations and institutions worldwide. About the Book R in Action, Second Edition teaches you how to use the R language by presenting examples relevant to scientific, technical, and business developers. Focusing on practical solutions, the book offers a crash course in statistics, including elegant methods for dealing with messy and incomplete data. You'll also master R's extensive graphical capabilities for exploring and presenting data visually. And this expanded second edition includes new chapters on forecasting, data mining, and dynamic report writing. What's Inside Complete R language tutorial Using R to manage, analyze, and visualize data Techniques for debugging programs and creating packages OOP in R Over 160 graphs About the Author Dr. Rob Kabacoff is a seasoned researcher and teacher who specializes in data analysis. He also maintains the popular Quick-R website at statmethods.net. Table of Contents PART 1 GETTING STARTED Introduction to R Creating a dataset Getting started with graphs Basic data management Advanced data management PART 2 BASIC METHODS Basic graphs Basic statistics PART 3 INTERMEDIATE METHODS Regression Analysis of variance Power analysis Intermediate graphs Resampling statistics and bootstrapping PART 4 ADVANCED METHODS Generalized linear models Principal components and factor analysis Time series Cluster analysis Classification Advanced methods for missing data PART 5 EXPANDING YOUR SKILLS Advanced graphics with ggplot2 Advanced programming Creating a package Creating dynamic reports Advanced graphics with the lattice package available online only from manning.com/kabacoff2ISBN: 9781617291388Publication Date: 2015-06-06
- Hands-On Programming with R by Garrett Grolemund Learn how to program by diving into the R language, and then use your newfound skills to solve practical data science problems. With this book, you'll learn how to load data, assemble and disassemble data objects, navigate R's environment system, write your own functions, and use all of R's programming tools. RStudio Master Instructor Garrett Grolemund not only teaches you how to program, but also shows you how to get more from R than just visualizing and modeling data. You'll gain valuable programming skills and support your work as a data scientist at the same time. Work hands-on with three practical data analysis projects based on casino games Store, retrieve, and change data values in your computer's memory Write programs and simulations that outperform those written by typical R users Use R programming tools such as if else statements, for loops, and S3 classes Learn how to write lightning-fast vectorized R code Take advantage of R's package system and debugging tools Practice and apply R programming concepts as you learn themISBN: 9781449359010Publication Date: 2014-08-02
- R for Everyone by Jared P. Lander Statistical Computation for Programmers, Scientists, Quants, Excel Users, and Other Professionals Using the open source R language, you can build powerful statistical models to answer many of your most challenging questions. R has traditionally been difficult for non-statisticians to learn, and most R books assume far too much knowledge to be of help. R for Everyone is the solution. Drawing on his unsurpassed experience teaching new users, professional data scientist Jared P. Lander has written the perfect tutorial for anyone new to statistical programming and modeling. Organized to make learning easy and intuitive, this guide focuses on the 20 percent of R functionality you'll need to accomplish 80 percent of modern data tasks. Lander's self-contained chapters start with the absolute basics, offering extensive hands-on practice and sample code. You'll download and install R; navigate and use the R environment; master basic program control, data import, and manipulation; and walk through several essential tests. Then, building on this foundation, you'll construct several complete models, both linear and nonlinear, and use some data mining techniques. By the time you're done, you won't just know how to write R programs, you'll be ready to tackle the statistical problems you care about most. COVERAGE INCLUDES * Exploring R, RStudio, and R packages * Using R for math: variable types, vectors, calling functions, and more * Exploiting data structures, including data.frames, matrices, and lists * Creating attractive, intuitive statistical graphics * Writing user-defined functions * Controlling program flow with if, ifelse, and complex checks * Improving program efficiency with group manipulations * Combining and reshaping multiple datasets * Manipulating strings using R's facilities and regular expressions * Creating normal, binomial, and Poisson probability distributions * Programming basic statistics: mean, standard deviation, and t-tests * Building linear, generalized linear, and nonlinear models * Assessing the quality of models and variable selection * Preventing overfitting, using the Elastic Net and Bayesian methods * Analyzing univariate and multivariate time series data * Grouping data via K-means and hierarchical clustering * Preparing reports, slideshows, and web pages with knitr * Building reusable R packages with devtools and Rcpp * Getting involved with the R global community ISBN: 9780321888037Publication Date: 2013-12-19
- R Cookbook by Paul Teetor With more than 200 practical recipes, this book helps you perform data analysis with R quickly and efficiently. The R language provides everything you need to do statistical work, but its structure can be difficult to master. This collection of concise, task-oriented recipes makes you productive with R immediately, with solutions ranging from basic tasks to input and output, general statistics, graphics, and linear regression. Each recipe addresses a specific problem, with a discussion that explains the solution and offers insight into how it works. If you're a beginner, R Cookbook will help get you started. If you're an experienced data programmer, it will jog your memory and expand your horizons. You'll get the job done faster and learn more about R in the process. Create vectors, handle variables, and perform other basic functions Input and output data Tackle data structures such as matrices, lists, factors, and data frames Work with probability, probability distributions, and random variables Calculate statistics and confidence intervals, and perform statistical tests Create a variety of graphic displays Build statistical models with linear regressions and analysis of variance (ANOVA) Explore advanced statistical techniques, such as finding clusters in your data "Wonderfully readable, R Cookbook serves not only as a solutions manual of sorts, but as a truly enjoyable way to explore the R language--one practical example at a time."--Jeffrey Ryan, software consultant and R package authorISBN: 9780596809157Publication Date: 2011-03-25
- The Art of R Programming by Norman Matloff R is the world's most popular language for developing statistical software- Archaeologists use it to track the spread of ancient civilizations, drug companies use it to discover which medications are safe and effective, and actuaries use it to assess financial risks and keep economies running smoothly. The Art of R Programming takes you on a guided tour of software development with R, from basic types and data structures to advanced topics like closures, recursion, and anonymous functions. No statistical knowledge is required, and your programming skills can range from hobbyist to pro. Along the way, you'll learn about functional and object-oriented programming, running mathematical simulations, and rearranging complex data into simpler, more useful formats. You'll also learn to- -Create artful graphs to visualize complex data sets and functions -Write more efficient code using parallel R and vectorization -Interface R with C/C++ and Python for increased speed or functionality -Find new R packages for text analysis, image manipulation, and more -Squash annoying bugs with advanced debugging techniques Whether you're designing aircraft, forecasting the weather, or you just need to tame your data, The Art of R Programming is your guide to harnessing the power of statistical computing.ISBN: 9781593273842Publication Date: 2011-10-11
- Text Analysis with R for Students of Literature by Matthew L. Jockers Text Analysis with R for Students of Literature is written with students and scholars of literature in mind but will be applicable to other humanists and social scientists wishing to extend their methodological tool kit to include quantitative and computational approaches to the study of text. Computation provides access to information in text that we simply cannot gather using traditional qualitative methods of close reading and human synthesis. Text Analysis with R for Students of Literature provides a practical introduction to computational text analysis using the open source programming language R. R is extremely popular throughout the sciences and because of its accessibility, R is now used increasingly in other research areas. Readers begin working with text right away and each chapter works through a new technique or process such that readers gain a broad exposure to core R procedures and a basic understanding of the possibilities of computational text analysis at both the micro and macro scale. Each chapter builds on the previous as readers move from small scale "microanalysis" of single texts to large scale "macroanalysis" of text corpora, and each chapter concludes with a set of practice exercises that reinforce and expand upon the chapter lessons. The book's focus is on making the technical palatable and making the technical useful and immediately gratifying.ISBN: 9783319031637Publication Date: 2014-07-03
- R Graphics Cookbook by Winston Chang This practical guide provides more than 150 recipes to help you generate high-quality graphs quickly, without having to comb through all the details of R's graphing systems. Each recipe tackles a specific problem with a solution you can apply to your own project, and includes a discussion of how and why the recipe works. Most of the recipes use the ggplot2 package, a powerful and flexible way to make graphs in R. If you have a basic understanding of the R language, you're ready to get started. Use R's default graphics for quick exploration of data Create a variety of bar graphs, line graphs, and scatter plots Summarize data distributions with histograms, density curves, box plots, and other examples Provide annotations to help viewers interpret data Control the overall appearance of graphics Render data groups alongside each other for easy comparison Use colors in plots Create network graphs, heat maps, and 3D scatter plots Structure data for graphingISBN: 9781449316952Publication Date: 2013-01-06
- Ggplot2 by Hadley Wickham This new edition to the classic book by ggplot2 creator Hadley Wickham highlights compatibility with knitr and RStudio. ggplot2 is a data visualization package for R that helps users create data graphics, including those that are multi-layered, with ease. With ggplot2, it's easy to: produce handsome, publication-quality plots with automatic legends created from the plot specification superimpose multiple layers (points, lines, maps, tiles, box plots) from different data sources with automatically adjusted common scales add customizable smoothers that use powerful modeling capabilities of R, such as loess, linear models, generalized additive models, and robust regression save any ggplot2 plot (or part thereof) for later modification or reuse create custom themes that capture in-house or journal style requirements and that can easily be applied to multiple plots approach a graph from a visual perspective, thinking about how each component of the data is represented on the final plot This book will be useful to everyone who has struggled with displaying data in an informative and attractive way. Some basic knowledge of R is necessary (e.g., importing data into R). ggplot2 is a mini-language specifically tailored for producing graphics, and you'll learn everything you need in the book. After reading this book you'll be able to produce graphics customized precisely for your problems, and you'll find it easy to get graphics out of your head and on to the screen or page.ISBN: 9783319242774Publication Date: 2016-06-08
- An Introduction to Statistical Learning by Gareth James; Trevor Hastie; Robert Tibshirani; Daniela Witten An Introduction to Statistical Learning provides an accessible overview of the field of statistical learning, an essential toolset for making sense of the vast and complex data sets that have emerged in fields ranging from biology to finance to marketing to astrophysics in the past twenty years. This book presents some of the most important modeling and prediction techniques, along with relevant applications. Topics include linear regression, classification, resampling methods, shrinkage approaches, tree-based methods, support vector machines, clustering, and more. Color graphics and real-world examples are used to illustrate the methods presented. Since the goal of this textbook is to facilitate the use of these statistical learning techniques by practitioners in science, industry, and other fields, each chapter contains a tutorial on implementing the analyses and methods presented in R, an extremely popular open source statistical software platform. Two of the authors co-wrote The Elements of Statistical Learning (Hastie, Tibshirani and Friedman, 2nd edition 2009), a popular reference book for statistics and machine learning researchers. An Introduction to Statistical Learning covers many of the same topics, but at a level accessible to a much broader audience. This book is targeted at statisticians and non-statisticians alike who wish to use cutting-edge statistical learning techniques to analyze their data. The text assumes only a previous course in linear regression and no knowledge of matrix algebra.ISBN: 9781461471370Publication Date: 2017-09-01

Projects making use of data software such as R, Stata, SPSS, SAS, and Matlab as well as programming languages like Python should take account of best practices in writing scripts, do-files, and documentation for the many steps of the data transformation and analysis process. This is a part of good internal research data management for individuals and collaborators, but it is also becoming increasingly vital (even mandatory) to meet the demand for reproducibility of data-driven research.

Do | Don't |
---|---|

Comment your script frequently | Embed ##comments## within a line of code |

Load dependencies (libraries, input data) at the beginning of a script | Undermine readability by not using indentation, bracketing, and other stylistic conventions |

Follow style conventions appropriate for the language being used | Leave code that doesn't actually do anything anymore in your script |

- Last Updated: Jun 28, 2024 3:45 PM
- URL: https://guides.nyu.edu/datascience
- Print Page

Subjects: Data Science