Skip to Main Content

Data Science

A guide with resources for the data science community on campus.

FIND CODE

In doing your data science work, you will likely want to look for code to do common operations, such as processing data from CSV to JSON. You can absolutely use code that you find on the Internet, as long as you cite it. So let's look at the steps to finding code online that you can use:

  1. Form your research methods and identify the following:
    1. What platform does the code need to work with?
  2. Scour different hosting platforms (GitHub, GitLab, Bitbucket), software publications (e.g. from JOSS), and awesome-lists
    1. What data does it handle?
    2. Is it written in a language you understand?
    3. Does it have good enough documentation to use if you get stuck?
  3. Cite accordingly!

If you can't find what you are looking for on those sources that I listed, you might consider the following:

  1. Think about who might publish code that you could use?
    1. Academic researchers or labs?
    2. Government agencies?
    3. A nonprofit/nongovernmental organization?
    4. A private business or industry group?
  2. Once you know that what you want exists:
    1. Is it openly available for you to use? 
    2. Can it be requested directly from the original authors?
  3. Does it have a license that allows you to use and edit the code?
    1. Does the license let you republish any changes you make?

Sources to find code