Skip to main content

Data Management Planning: File Format Selection

Information on best practices and standards for data management planning.

FILE FORMAT SELECTION

Ideally, file types for a project should be standard, non-proprietary, and open source. If these features are not possible, at the very least file format selection should be made with the suggested preferences of a digital archive in mind, or with an eye to the format with the greatest stability, longest period of usage, most widespread community, and most organized governing body for standards.

Analysis software often relies on proprietary file formats that are subject to obsolescence as new versions are created or tools lose relevance. Where possible, export data files to stable formats for long-term preservation, or convert proprietary files into equivalent standardized files that will be able to represent that data for preservation purposes.

General Information

NYU Data Services: NYU Libraries and Information Technology logo

 

 

 

Data Services home page

Data Services continues virtual services in Fall 2020. During our working hours, we will respond to requests via e-mail and hold consultations via Zoom. Chat for immediate assistance during our staffed hours.

Staffed Hours: Fall 2020
   Mondays:       12pm - 6pm
   Tuesdays:       12pm - 6pm
   Wednesdays: 12pm - 6pm
   Thursdays:     12pm - 6pm
   Fridays:          12pm - 4pm

To contact us, submit a request or email data.services@nyu.edu.

If you've met with us before, tell us how we're doing.

Virtual Help

chat loading...





CRITERIA FOR CHOOSING A FILE FORMAT

Common File Types for Top-Level Digital Preservation and Archiving

Text

  • XML (.xml)

  • HTML (.htm)

  • OpenDocument Format (e.g. OpenDocument Text, .odt)

  • Plain text (.txt)

  • Markdown and other human-readable markup languages deploying plain-text editing

Tabular

Media

  • Uncompressed TIFF (.tif)

  • JPEG 2000 (.mj2)

  • MPEG-4 (.mp4)

  • Free Lossless Audio Codec (.flac)

Geospatial

  • ESRI Shapefiles and supporting files (.shp, .shx, .dbf, .prj, .sbx, .sbn)

  • KML (.kml)

  • GML (.gml)

  • GeoTIFF (.tif, .tfw)

Resources for File Format Conversion

File Types for Sharing, Long-Term Preservation, and Mid-Level Preservation

Text

  • PDF/A

Statistical

  • SPSS portable format (.por)

  • R file formats, i.e. script files (.R) data (.Rda, .Rdata) or markdown files (.Rmd)

  • Stata file formats, i.e. do-files (.do) and data files (.dta)

  • SAS file formats (.sas, .xpt, etc.)

Media

  • JPEG (.jpeg, .jpg)

  • MP3 (.mp3)

  • Photoshop files (.psd)

Geospatial

 

Encoding

Where possible given the limits of file formatting, encoding should be done using the Unicode system (UTF-8 or UTF-16), or using the older ASCII system that has been incorporated into Unicode.