Data should be cited within our work for the same reasons journal articles are cited: to give credit where credit is due (original author/producer) and to help other researchers find the material. If you use data without citation, that is deeply problematic for academic integrity as well as reproducibility purposes. Pay attention to licenses (here's a page on those) and give attribution!
A data citation includes the typical components of other citations:
Author or creator: the entity/entities responsible for creating the data
Date of publication: the date the data was published or otherwise released to the public
Title: the title of the dataset or a brief description of it if it's missing a title
Publisher: entity responsible for hosting the data (like a repository or archive)
URL or preferably, a DOI: a link that points to the data
Data Accessed: since most data are published without versions, it's important to note the time that you accessed the data in case newer releases are made over time.
Citation standards for data sets differ by journal, publisher, and conference, but you have a few options generally (depending on the situation):
Here's an example of how to find the citation information for a dataset hosted on Zenodo, a generalist repository that houses data, code, and more:
All scholarly or academic work requires that you cite your sources, whether you are writing a long paper or a quick report. Why is citing your research so important?
Researching and writing a paper ideally involves a process of exploring and learning. By citing your sources, you are showing your reader how you came to your conclusions and acknowledging the other people's work that brought you to your conclusions. Citing sources:
Partially adapted from "When and Why to Cite Sources." SUNY Albany. 2008. Retrieved 14 Jan 2009.