Skip to Main Content

Research Data Management

Information on best practices and standards for data and code management.

FILE ORGANIZATION

File naming, when done in a well-organized fashion, can contribute to project documentation, workflow organization, and sharing. Moreover, certain choices in file naming are essential to accessing and sharing files across a computing systems.

Projects often develop over the course of many years, and usually involve periodic work interrupted by spans of inactivity. To ensure that naming conventions are understood months or years after they are initially conceived, include a readme.txt file or some kind of file manifest (in a plain text or other sustainable format) in your directory that explains the contents of files and the naming system developed.

ORGANIZING PROJECT FOLDERS

Sometimes you may see that folks try to document their data by using file paths. In the image below on the right, you can tell that the researchers want to note the date, time, and quality of the experiment. However, instead of doing with documentation, they did it with the file path.

But what happens when they want to send someone else the 0034tz.tiff? They will have no access to the file path and any of the context embedded within it. This is a huge problem!

An image showing a table of contents pointing to specific files on the left and a nest of folders on the right
Instead of putting your documentation in your file path, it's more efficient instead to put all that contextual information in a README file, which is simply a document that describes the files in a folder. By listing out your files with the documentation there instead of within a file path, you will be able to understand your data more!

It’s useful then to keep a standard way of organizing your projects, to help avoid the nested folder rabbit hole. This way of organizing projects I’ve found is one of the most helpful across different domains of research:

  • Put each project in its own directory, which is named after the project and perhaps prepended with that YYYY-MM-DD of when the project started.
  • Put text documents and relevant supplementary documentation associated with the project in the docs folder.
  • Put raw data and metadata in the data folder (which should be ready-only, do not change your raw data directly!)
  • Files generated during cleanup and analysis (like processed data or visualizations) in a results folder.
  • Put source for the project’s scripts and programs in the src folder.

Visually, it would look like this:

An image that shows a folder structure for research projects

FILE NAMING BEST PRACTICES

Naming your files consistently is one low-hanging RDM fruit that will really help you in your research projects. Certain choices in file naming are essential to accessing and sharing files across different types of computer environments.

You should follow these practices as you implement a file naming convention for your project:

  • Prefix your files with the date created using a YYYY-MM-DD format
  • Avoid special characters like &, %, $, #, @, and *. Just use letters and numbers.
  • Do not make file identity dependent on capitalization unless implementing camel case (e.g. fileName.xml).
  • Never use spaces in filenames – many systems and software will not recognize them or will give errors unless such filenames are treated specially. Use an underscore _ instead of a space.
  • Use short file names. For your sake and the sake of systems that’ll fail if you give it like a 50 character file name.

It’s the difference between VS_IMG%Archive2&3 Jan 2018.tiff and 2018-01-04_VS-Archive2-3.tiff. One is way more understandable later on than the other. Another hint: you don’t want all the metadata about your files in the file name, because then it can get too long and unwieldy.

Other practices to keep in mind:

  • Use 001, 002, 003 instead of 1, 2, 3 to help sort and search through the data more effectively.
  • Choose file names that are recognizable to humans and that make sense within the project environment by including information such as:

    • Name of creator (say, in a collaboratively built project)

    • Date of creation

    • Version number (avoid terms like "final" or "latest," since file versions usually not final)

    • Descriptive term for object referenced by the file (a text title, a specimen name, a geographical location, a scientific instrument type)

CC

Creative Commons License
Original work in this LibGuide is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.