Hello! I am Vicky Rampin, the Librarian for Research Data Management and Reproducibility. I am also the liaison to computer science and data science programs at NYU! I am here to help you navigate the resources for both at NYU and beyond. You can set up an appointment with me or always email me at: vs77@nyu.edu.
If you need help with a specific quantitative, GIS, or qualitative software, you should reach out to Data Services.
Donald Knuth first defined literate programming as a script, notebook, or computational document that contains an explanation of the program logic in a natural language (e.g. English or Mandarin), interspersed with snippets of macros and source code, which can be compiled and rerun. You can think of it as an executable paper!
No matter which literate programming tool you use, only run the cells from top to bottom – ONLY. The number 1 cause of irreproducible jupyter notebooks is that the original authors run the cells out of order, which can't be reproduced without documentation about which cells in which order. So run your literate programming notebooks from top to bottom only.
Jupyter Notebook is an interactive computing environment that enables users to author notebook documents that include code, interactive widgets, plots, narrative text, equations, images and even videos! Jupyter notebooks are heavily used in data science, and it would behoove you to get comfortable with the tool. The jupyter name comes from 3 programming languages: Julia, Python, and R. You can use one programming language per document, and it is done through choosing a kernel (e.g. Python, R, Go, and more -- get the full list of kernels from the wiki).
Jupyter notebooks can be comprised mainly of two types of cells (though more can be added with plugins).
Some key jupyter notebook shortcuts to keep in mind while you work:
Use shift + enter to run an active cell
Use esc in highlighted cell to toggle command options:
esc + L - show line numbers
esc + M - format cell as Markdown cell
esc + a - insert cell above current cell
esc + b - insert cell below current cell
Check all current variables: run %whos
from a code cell
RMarkdown is another popular literate programming tool and can be considered an extension of Markdown. Like all literate programming tools, it mixes documentation & code, and not just R code either! You can insert code snippets from other languages (SQL, bash, Python, etc.) into ONE DOCUMENT! Incorporating results directly into your documents is an important step in reproducible research. Any changes that occur in either your data set or the analysis are automatically updated in your document the next time the document is created.
Typically, RMarkdown files are edited from within RStudio. The R for Data Science book contains a great chapter on RMarkdown for more information: https://r4ds.had.co.nz/r-markdown.html.