Skip to main content

Data Management Planning: File Format Selection

Information on best practices and standards for data management planning.

FILE FORMAT SELECTION

General Information

NYU Data Services, NYU Libraries and Information Technology

 

 

 

Data Services Homepage

Location [online services]

Staffed Hours: Summer 2020

Contact Us

Tell us how we're doing

COVID - 19 Service Status

Data Services has shifted to virtual services for the Summer 2020 sessions. During our normal working hours, we will respond to requests via e-mail and will attempt to offer guidance via Zoom when necessary.

THE IMPORTANCE OF FILE FORMAT SELECTION

Ideally, file types for a project should be standard, non-proprietary, and open source. If these features are not possible, at the very least file format selection should be made with the suggested preferences of a digital archive in mind, or with an eye to the format with the greatest stability, longest period of usage, most widespread community, and most organized governing body for standards.

Analysis software often relies on proprietary file formats that are subject to obsolescence as new versions are created or tools lose relevance. Where possible, export data files to stable formats for long-term preservation, or convert proprietary files into equivalent standardized files that will be able to represent that data for preservation purposes.

CRITERIA FOR CHOOSING A FILE FORMAT

Common File Types for Top-Level Digital Preservation and Archiving

Text

  • XML (.xml)

  • HTML (.htm)

  • OpenDocument Format (e.g. OpenDocument Text, .odt)

  • Plain text (.txt)

  • Markdown and other human-readable markup languages deploying plain-text editing

Tabular

Media

  • Uncompressed TIFF (.tif)

  • JPEG 2000 (.mj2)

  • MPEG-4 (.mp4)

  • Free Lossless Audio Codec (.flac)

Geospatial

  • ESRI Shapefiles and supporting files (.shp, .shx, .dbf, .prj, .sbx, .sbn)

  • KML (.kml)

  • GML (.gml)

  • GeoTIFF (.tif, .tfw)

Resources for File Format Conversion

File Types for Sharing, Long-Term Preservation, and Mid-Level Preservation

Text

  • PDF/A

Statistical

  • SPSS portable format (.por)

  • R file formats, i.e. script files (.R) data (.Rda, .Rdata) or markdown files (.Rmd)

  • Stata file formats, i.e. do-files (.do) and data files (.dta)

  • SAS file formats (.sas, .xpt, etc.)

Media

  • JPEG (.jpeg, .jpg)

  • MP3 (.mp3)

  • Photoshop files (.psd)

Geospatial

Encoding

Where possible given the limits of file formatting, encoding should be done using the Unicode system (UTF-8 or UTF-16), or using the older ASCII system that has been incorporated into Unicode.