Skip to Main Content

Research life cycle: 2.Research phase: File formats

The importance of open file formats

Digital data is stored in file formats. The standard format of the applied software is often used. However, this does not guarantee that the file content can be used, or displayed, in the same way at a future date.
Reasons why data may become unreadable in the future:

  • Formats may be dependent on particular software.
  • Software (as well as hardware) can become obsolete.
  • Software can only support certain format versions.
  • Specific format properties can only work in the previously used software, or even only in a certain version of this software.
  • Files may depend on the use of expensive or exclusive software that not everyone can access. In order to avoid the risk of file aging and to ensure the accessibility and sustainability of the important characteristics of the files, a number of precautions can be taken. One of these measures is to use file formats that have a high chance of remaining usable for many years.

To prevent file aging and increase the accessibility and sustainability of your data, it is important to choose an open file format, i.e. a format that does not rely on particular software. This guarantees long(er) life of your research data.

Preferred formats

In the long term these offer the best guarantee in terms of usability, accessibility and sustainability.

Type Preferred format
Text documents PDF/A  (.pdf)
Plain text Unicode text  (.txt)
Spreadsheets

• ODS  (.ods)
• CSV  (.csv)

Statistical data • SPSS portable  (.por)
• SPSS  (.sav)
• STATA  (.dta)
• DDI  (.xml)
• data  (.csv) + setup  (.txt)
Images (raster) • JPEG  (.jpg) (.jpeg)
• TIFF  (.tif) (.tiff)
• PNG  (.png)
• JPEG 2000  (.jp2)
Images (vector) SVG  (.svg)
Images (geo reference) GeoTIFF  (.tif) (.tiff)
Audio • WAVE  (.wav)
• BWF  (.wav)
• FLAC  (.flac)
Video • MPEG-2  (.mpg) (.mpeg)
• MPEG-4 H.264 (.mp4)
• Lossless AVI  (.avi)
• QuickTime  (.mov)
Markup language

• XML  (.xml)
• HTML  (.html) (.xhtml)
NB: if valid and complete

If necessary:
• related files
  (.css) (.xslt) (.js) (.es)

Databases • SQL  (.sql)
• SIARD  (.siard)
• tables from DB  (.csv)
Computer Aided Design (CAD) • AutoCAD DXF version R12
  (.dxf)
3D • WaveFront Object  (.obj)
• X3D  (.x3d)
RDF W3C standards
Geographic information (GIS) • GML  (.gml)
• MIF  (.mif)
• MID  (.mid)
Raster GIS ASCII GRID  (.asc) (.txt)
Computer Assisted Qualitative Data Analysis (CAQDAS) REFI-QDA (.qdpx)

Acceptable formats

These are widely used in addition to the preferred formats, but have moderate to reasonable scores in terms of usability, accessibility and robustness in the long term.

Type Acceptable format
Text documents • ODT  (.odt)
• MS Word  (.doc) (.docx)
• RTF  (.rtf)
• PDF  (.pdf)
Plain text Non-Unicode text  (.txt)
Spreadsheets

• MS Excel  (.xls) (.xlsx)
• PDF/A  (.pdf)
• OOXML  (.docx) (.docm)

Statistical data SAS  (.7bdat) (.sd2) (.tpt)
Images (raster) DICOM  (.dcm)
Images (vector) • Illustrator  (.ai)
• EPS  (.eps)
Images (geo reference) TIFF World File  (.tfw) (.tif)
Audio • AIFF  (.aif) (.aiff)
• MP3  (.mp3)
• AAC  (.aac) (.m4a)
Video MKV  (.mkv)
Markup language SGML  (.sgml)
Databases • MS Access  (.mdb) (.accdb)
  (version 2000 of later)
• dBase  (.dbf) (version 7 or later)
• HDF5  (.hdf5) (.he5) (.h5)
Computer Aided Design (CAD) • AutoCAD other versions  (.dwg)
  (.dxf)
3D • COLLADA  (.dae)
• Autodesk FBX  (.fbx)
RDF  
Geographic information (GIS) • ESRI shapefiles  ((.shp) and
  associated files)
• MapInfo  ((.tab) and associated
  files)
• KML  (.kml)
Raster GIS ESRI GRID  ((.grd) and associated files)
Computer Assisted Qualitative Data Analysis (CAQDAS) • ATLAS.TI copy bundle
• NVivo project file

More information

The tables were retrieved from:

Data Archiving and Networked Services (DANS). (2015 September). Preferred formats [version 3.0]. The Hague: DANS.

On this website you will find additional information for each type of data:

www.zuyd.nl | Disclaimer | Over Zuyd Bibliotheek