Skip to Main Content

Data Science

A guide with resources for the data science community on campus.

SHARING YOUR WORK

Openness is a shared value in data science. Sharing code, data, research articles, and other materials is part and parcel of participating in the wider data science community. You might consider sharing your work as well!

Sharing via repository

A simple and effective way to share your research materials is to publish them in a repository. A repository is a storage facility (often also a preservation and curation facility) where users can upload and download their data, make it accessible and discoverable, all in an effort to fulfill grant requirements and/or support the free sharing of scholarly knowledge. Materials that are deposited into a repository should be:

  • Persistent (not likely to be modified)

  • Searchable and browesable 

  • Retrieved or downloaded easily

  • Citeable

A wide variety of institution-based and discipline-specific repositories exist for researchers to choose from. The repository itself should be: 

  • Appropriate for the type of data you generate
  • Appropriate for the audience of the repository (so they will make use of your data!)
  • Open access

If both a discipline-specific repository and an institution-based one exist for your data, then consider depositing in both locations to maximize discovery and safety of the data. If you need some help finding an appropriate repository for your work, don't hesitate to reach out to us!

Licensing

When publishing your data and code, it's crucial that you apply a license. A license is a document that acts as your official permission for others to do, use, or own something that you are the copyright owner for (like code you've written, or data you've gathered). It's a crucial part of scholarly communications -- your colleagues need to know exactly how they can use your materials, and you can establish boundaries that you are comfortable with. Releasing your materials without a license creates ambiguity.
 

Licenses exist on a spectrum from totally open (like CC0, a license that says anyone can do anything with your materials and don't need to cite you -- it puts your materials in the public domain) to more restrictive (like CC-BY-NC-ND, which says that anyone can use your materials as long as it's for non-commercial use, derivatives are distributed, and they attribute you). You can choose a license that you are comfortable with -- there is no one-size-fits-all solution. There are licenses that have no restrictions.

Finding a repository

Publishing Data

There are many more repositories than we could list here, so we'll include our institutional repository and some up-to-date aggregators of repositories that can help you search for the right repository in your field:

  • OAD list of data repositories: a list of repositories and databases for open data maintained by the Open Access Directory.
  • re3data: a repository finder that can help you find an appropriate repository to deposit your research data.
  • NYU UltraViolet (UV): our research repository at NYU. Get a permanent home and a DOI for any of your research materials.
  • Open Science Framework (OSF): we have an institutional membership with the OSF, a project management tool that can also be used to publish data and code. You can make a registration of an OSF project to make a "read-only, frozen" copy of it that is assigned a DOI for sharing and citation.

Here are some resources to help you pick a license for data:

Publishing Code

Your options for publishing research code are somewhat more limited. There are:

Here are some resources to help you pick a license for code: