Hello! I am Vicky Rampin, the Librarian for Research Data Management and Reproducibility. I am also the liaison to computer science and data science programs at NYU! I am here to help you navigate the resources for both at NYU and beyond. You can set up an appointment with me or always email me at: vs77@nyu.edu.
If you need help with a specific quantitative, GIS, or qualitative software, you should reach out to Data Services.
Data science is a collaborative field and there is a lot of open source code for you to utilize during your data science career. Given that you'll use code you don't write, it's natural that you should cite that code!
It can be hard to to know what resources need to be cited and what resources don't. Generally, if you used a function or algorithm that you did not create and it came from someone else then this should be cited so as to give the creator credit. Ideas and programs that are "common knowledge" do not generally need to be referenced or cited. If there is only one way to program for a specific task and this is so commonly used then it may not need to be cited (like printing "hello world").
I would recommend in general citing things that affect the analytical result you are presenting. For example, you don't cite Microsoft Office for helping you write your paper, but DO cite scikit-learn as software that contributed to your analysis!
You should be citing citing specific code snippets you use from StackOverflow -- if you don't know, it is a popular place to look for help on writing code, and people often will re-use code snippets from StackOverflow. User submitted content, including code snippets, are licensed under a Creative Commons ShareAlike license (read more from StackOverflow directly). Content written after May 2, 2018 is licensed under the CC BY-SA 4.0, and snippets written earlier use the earlier version of the license. You can view the terms of this license here: https://creativecommons.org/licenses/by-sa/4.0/.
A code citation should include the following fields:
Author or creator: the entity/entities responsible for creating the code (e.g. maintainers)
Date of publication: the date the code was published or otherwise released to the public
Title: the title of the code/software package or a brief description of it if missing a title
Publisher: entity responsible for hosting the code
URL or preferably, a DOI: where on the web the code can be found
If you are citing a Git repository, I recommend also adding in the commit hash. If you don't have access to the commit hash, consider adding the version or release number that you used, or the date that you accessed the code online. This helps your readers know exactly which version of the code that you used, which is helpful for reproducibility.
You should also use a citation manager to keep track of all these citations! I recommend Zotero - check out our Zotero guide.