Skip to main content

Data Management Planning: Selecting a Repository

Information on best practices and standards for data management planning.


A simple and effective way to make your data accessible is to store it in a repository. A repository is a storage facility (sometimes also a storage and curation facility) where users can upload and download their data, make it accessible and discoverable, all in an effort to fulfill grant requirements and/or support the free sharing of scientific knowledge. 

General Information

NYU Data Services: NYU Libraries and Information Technology logo




For assistance, reach out by chat below or submit a request

We can be reached by email at

If you've met with us before, tell us how we're doing

Help Chat

Chat Service Staffed Hours: Fall 2020
   Mondays:        12pm - 6pm
   Tuesdays:       12pm - 6pm
   Wednesdays: 12pm - 6pm
   Thursdays:     12pm - 6pm
   Fridays:          12pm - 4pm

chat loading...


There are some things to keep in mind when selecting a repository. Data in a repository should be:

  • Persistent (not likely to be modified)

  • Searchable and browsable 

  • Retrieved or downloaded easily

  • Cited

A wide variety of institution-based and discipline-specific repositories exist for digital data. The repository itself should be: 

  • Appropriate for the type of data you generate
  • Appropriate for the audience of the repository (so they will make use of your data!)
  • Open access

If both a discipline-specific repository and an institution-based one exist for your data, then consider depositing in both locations to maximize discovery and safety of the data. 


Many more data repositories are available online than can be listed here. Consult, an external resource, for an extensive list of discipline-specific repositories.

Other tools to help you find a repository:

  • Repository Finder

    About: A new tool recently launched by DataCite for helping people identify and locate online repositories of research data. Draws from the re3data listings for repository information.

  • Open Access Directory's Data Repositories Wiki
    About: A list of repositories and databases for locating and depositing open data.

  • Dataverse Network Project
    About: The Dataverse Network is an application to publish, share, reference, extract and analyze research data. It facilitates making data available to others, and allows to replicate others work. Researchers and data authors get credit, publishers and distributors get credit, affiliated institutions get credit. 
    How to archive your data: Create your own Dataverse at Harvard's IQSS here. Once you have created a Dataverse, you are free to upload, describe, and share datasets on your own.

  • NYU Faculty Digital Archive (FDA)
    About: The Faculty Digital Archive is a place where full-time NYU faculty can deposit their work in digital form. FDA collections can be shared with the world, or restricted to selected people. The FDA is intended to be a highly visible repository of NYU faculty digital scholarship.
    How to archive your data: For more information on the FDA or to request space on the FDA for your materials, please e-mail



  • The Cell: An Image Library
    About: Images of all cell types from all organisms, including intracellular structures and movies or animations demonstrating functions. This project relies upon the cell biology community to populate the library. Freely accessible, easy-to-search, public repository of reviewed and annotated images, videos, and animations of cells from a variety of organisms, showcasing cell architecture, intracellular functionalities, and both normal and abnormal processes.

  • Morphbank
    About: Holds biological Imaging documents a wide variety of research including: specimen-based research in comparative anatomy, morphological phylogenetics, taxonomy and related fields focused on increasing our knowledge about biodiversity. The project receives its main funding from the Biological Databases and Informatics program of the National Science Foundation (Grant DBI-0446224).

  • National Biological Information Infrastructure
    About: A broad, collaborative program to provide increased access to data and information on the nation's biological resources. The NBII links diverse, high-quality biological databases, information products, and analytical tools maintained by NBII partners and other contributors in government agencies, academic institutions, non-government organizations, and private industry. (Note: In the President's budget for Fiscal Year 2012 the repository was terminated.)

  • PaleoBiology Database
    About: "We are bringing together taxonomic and distributional information about the entire fossil record of plants and animals." From a large number of researchers at a large number of institutions.


Computer Science

  • GitHub
    About: Keeps your public and private code available, secure, and backed up.

  • SourceForge
    About: 2.7 million developers create powerful software in over 260,000 projects. Our popular directory connects more than 46 million consumers with these open source projects and serves more than 2,000,000 downloads a day. SourceForge is where open source happens.

  • SNAP
    About: Stanford Large Network Dataset Collection. The SNAP library is being actively developed since 2004 and is organically growing as a result of our research pursuits in analysis of large social and information networks. Largest network we analyzed so far using the library was the Microsoft Instant Messenger network from 2006 with 240 million nodes and 1.3 billion edges.


Environmental Sciences

  • The Marine Geoscience Data System (MGDS)
    About: The Marine Geoscience Data System (MGDS) provides access to data portals for the NSF-supported Ridge 2000 and MARGINS programs, the Antarctic and Southern Ocean Data Synthesis, the Global Multi-Resolution Topography Synthesis, and Seismic Reflection Field Data Portal.


  • IRIS (Incorporated Research Institutions for Seismology).
    About: From 100+ US universities and the National Science Foundation.

Geosciences & Geospatial Data

  • EarthChem
    About: Holds data systems and services for geochemical, geochronological, and petrological data, developed and maintained by EarthChem, including the EarthChem Library, the EarthChem Portal, PetDB, NAVDAT, SedDB, and Geochron. EarthChem is operated by a joint team of disciplinary scientists, data scientists, data managers and information technology developers who are part of the NSF-funded data facility Integrated Earth Data Applications (IEDA).

  • The Geosciences Network (GEON)
    About: project is a collaboration among a dozen PI institutions and a number of other partner projects, institutions, and agencies to develop cyberinfrastructure in support of an environment for integrative geoscience research. GEON is funded by the NSF Information Technology Research (ITR) program.

  • The National Space Science Data Center
    About: serves as the permanent archive for NASA space science mission data. "Space science" means astronomy and astrophysics, solar and space plasma physics, and planetary and lunar science. As permanent archive, NSSDC teams with NASA's discipline-specific space science "active archives" which provide access to data to researchers and, in some cases, to the general public.


  • MIRAGE (Middlesex medical Image Repository with a CBIR ArchivinG Environment).
    About: From JISC and Middlesex University.


  • Nist Atomic Spectra Database
    About: The Atomic Spectra Database (ASD) contains data for radiative transitions and energy levels in atoms and atomic ions. Data are included for observed transitions of 99 elements and energy levels of 56 elements.

  • CORE Repository (MLA)
    A service offered as part of the MLA Commons, the Commons Open Repository Exchange offers a place to store and publish digital assets and data in the humanities.
  • HumanitiesCommons
    Humanities Commons is a repository for the humanities. Discover the latest open-access scholarship and teaching materials, make interdisciplinary connections, build a WordPress Web site, and increase the impact of your work by sharing it in the repository.
  • DataONE 
    An international federation of data repositories containing earth observations data, including data from fields such as ecology, biology, evolution, and environmental sciences such as hydrology, oceanography, and atmospheric science. DataONE is a federation with participation from hundreds of field stations, universities, and government agencies through the DataONE Member Nodes.

  • Dryad 
    An international repository of data underlying scientific and medical publications, particularly data for which no specialized repository exists. All material in Dryad is associated with a scholarly publication. Most data in the repository are associated with peer-reviewed articles, although data associated with non-peer reviewed publications from reputable academic sources, such as dissertations, are also accepted. Dryad is a non-profit organization.

  • FigShare 
    A scientific publishing as it stands is an inefficient way to do science on a global scale. FigShare allows you to share all of your data, negative results and unpublished figures.

  • Freebase
    A repository for data in all fields from Metaweb.

  • KNB
    The Knowledge Network for Biocomplexity (KNB) is an international data repository containing ecology, biology, and environmental science data with a global distribution. The KNB is a grass-roots partnership of collaborating feld stations, laboratories, and research networks that openly publish and share data. The KNB is a Member Node within the DataONE data federation.


Creative Commons License
Original work in this LibGuide is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.