Skip to Main Content

Data Science

A guide with resources for the data science community on campus.

CITING DATA

Data should be cited within our work for the same reasons journal articles are cited: to give credit where credit is due (original author/producer) and to help other researchers find the material. If you use data without citation, that is deeply problematic for academic integrity as well as reproducibility purposes. Pay attention to licenses (here's a page on those) and give attribution!

A data citation includes the typical components of other citations:

Author or creator: the entity/entities responsible for creating the data
Date of publication: the date the data was published or otherwise released to the public
Title: the title of the dataset or a brief description of it if it's missing a title
Publisher: entity responsible for hosting the data (like a repository or archive)
URL or preferably, a DOI: a link that points to the data
Data Accessed: since most data are published without versions, it's important to note the time that you accessed the data in case newer releases are made over time.

Citation standards for data sets differ by journal, publisher, and conference, but you have a few options generally (depending on the situation):

  1. Use the format of a style manual as determined by a publisher or conference, such as IEEE or ACM. If you use a citation manager (highly recommended for organizing research reading!) like Zotero (which we support at NYU - check out our Zotero guide), you can have them export your citations in whatever format you need.
  2. Use the author or repository's preferred citation that they list on the page where you downloaded the data initially.

Here's an example of how to find the citation information for a dataset hosted on Zenodo, a generalist repository that houses data, code, and more:

a gif showing how to find a citation for a dataset hosted on Zenodo