Skip to Main Content

Research Software: Designing for Publication and Reproducibility

How to prepare scholarly code for submission to journals or repositories.

But I Am Still Learning To Code!

A photo from the 1940s shows 2 rows of white women sitting at desks and wearing headsets. Each one has a book propped open behind a telegraph machine, and is taking notes with a pen and paper.

Image credit: WACs Learning Code. War Department. Army Service Forces. Office of the Chief of Transportation. 3/12/1943-6/11/1946. Retrieved from the Digital Public Library of America.

 

Some scholars love to create software. They arrived at NYU with extensive coding experience, they planned to do computational research, they hope to leave NYU and continue make software, etc. Other scholars end up creating software out of necessity. Maybe you came to NYU thinking you'd be interviewing people about economic inequality, figuring out how to restore a polluted waterway, or studying genetics, and suddenly you had to write scripts to analyze large data sets and generate visualizations for your analyses. You've had to pick up your software creation skills ad hoc but now, faced with a data sharing mandate, you're being asked to make your code public. Moreover, perhaps you are a grad student already feeling over-burdened and vulnerable to scrutiny. 

Here's why, all those factors notwithstanding, we still encourage you to publish your code, particularly if it relates to conclusions you plan to publish or present. 

  • First of all, no one writes perfect, well-formatted, completely flawless code. As the software engineer Nick Barnes says in a Nature op-ed (linked below), "software in all trades is written to be good enough for the job intended. So if your code is good enough to do the job, then it is good enough to release — and releasing it will help your research and your field."

 

  • You can start small. A short README with installation instructions, a license, a list of dependencies, and a not-broken script might not be the entirety of what we recommend, but that's all it takes to make software shareable.

 

  • The world at large will probably not flock to your scientific code repo. The people who understand enough about your research to want to seek or read your code do not come to it to find errors, but rather to use it as a resource.

 

  • It's a sign of your commitment to the values of open scholarship and access

 

  • Publishing your code means that you will always be able to find it. If you have code on GitHub or Zenodo and you change jobs or your computer breaks, you can always recover your work. It's one way of fulfilling the 3-2-1 rule for backup, as described in our Data Management Guide

For specific trainings and classes, please see the Data Services calendar. We are also always willing to discuss other reasons why you feel hesitant to publish your code, so we can better understand how we at the Libraries can support scholars working on computational research.

Resources

Using Version Control

Our number one piece of advice as you embark on a research software project--even if you take over a research software project from someone else--is to implement version control. If you are unfamiliar with Git and Git hosting platforms like GitHub, GitLab, and BitBucket, you will face a slight learning curve. Luckily, Data Services regularly offers GitHub classes and has an excellent guide to using version control systems. Using a version control system

  • allows you to see what changes were made, when they were made, and who made them;
  • lets you create branches, so you can work on and experiment with aspects of project without affecting the "main" project until you're ready;
  • creates a space for communication with collaborators; 
  • serves as a backup system, so if you leave your lab or something happens to your computer, you and others will be able to continue developing the code;
  • makes it easy to share your work on other scholarly platforms, including OSF, figshare, and Zenodo; 
  • eliminates the possibility of accidentally deleting the entire project;
  • can help minimize frustration

and more! 

It's important to note that "Git" (a software tool for versioning files) and GitHub (a cloud-based service owned by Microsoft that uses git) are not the same. You can use Git with a different cloud-based service, or even locally on your own computer. Also, not everything you make using Git or GitHub is necessarily public right away. If you feel uncomfortable letting others see evidence of the early stages of your work, you can make a private Git repository, still get the benefits of version control, and then switch it to become public later on.

If the idea of using Git stresses you out, try using another software that has some ability to track history and changes, like Google Colab. 

 

Resources