Menü

Archiving research data

It is important to ensure that research data remains accessible, readable and usable over the long term. For long-term digital preservation that follows publication, the digital objects must meet certain requirements.

For example, research data are created with different software and are accordingly available in different file formats.
However, not all file formats are equally suited for digital preservation. A general rule of digital preservation is that there is a clear preference for open, well-documented file formats that are in widespread use (e.g. CSV, XML, DOCX, TXT, PDF/A, …) as opposed to proprietary formats (e.g. XLS, DOC, …). Certain file formats have been declared suitable for long-term digital archiving (see table).
In some cases, digital preservation may require format transfer measures or to emulate a format’s original system environment. This may be necessary to ensure the long-term technical interpretability and readability of the data and avoid losing information. Both are tasks of ZB MED.

ZB MED uses the Rosetta system from the company Ex Libris as the technical infrastructure for digital preservation.

For more information, see our pages on digital preservation.

Recommended preservation formats for research data

Type of data

File formats suitable

for digital preservation

Standard,

widespread file formats

Examples of sources

and applications

Audio

AIFF (*.aiff, *.aif),

Matroska (*.mka),

MXF (*.mxf),

WAVE (*.wav)

AAC (*.aac, *.m4a, mp4),

AIFF (*.aiff, *.aif),

BWF (*.bwf),

FLAC (*.flac),

Matroska (*.mka),

MP3 (*.mp3),

MXF (*.mxf),

OGG (*.ogg),

OPUS (*.opus),

WAVE (*.wav)

Interviews, surveys

Biomaterial data

CSV (*.csv),

TXT (*.txt),

XML (*.xml)

CSV (*.csv),

FASTA (*.fasta),

FASTQ (*.fq, *.fastq),

PDB (*.pdb, *.ent, *.brk),

TXT (*.txt),

XLS (*.xls),

XML (*.xml)

DNA sequencers,

mass spectrometers,

microarrays,

spectrophotometers

Classifications, thesauri, codes

PDF/A (*.pdf),

XML (*.xml)

DOC (*.doc, *.docx),

PDF (*.pdf),

XML (*.xml)

Institutions

Databases

SQL (*.sql)

CSV (*.csv),

HDF5 (*.hdf5, *.he5, *.h5),

MS Access (*.mdb, *.accdb),

dBase (*.dbf),

SIARD (*.siard),

SQL (*.sql)

Institutions

Geospatial data

GML (*.gml),

MIF/MID (*.mif/ *.mid)

ESRI Shapefiles (*.shp),

GML (*.gml),

KML (*.kml),

MapInfo (*.tab),

MID (*.mid),

MIF (*.mif)

Vector and raster data

Image data

JPEG2000 (*.jp2),

PNG (*.png),

SVG (*.svg),

TIFF (*.tif, *.tiff)

DICOM (*.dcm),

EPS (*.eps),

GIF (*.gif),

Illustrator (*.ai),

JPEG 2000 (*.jp2),

JPG (*.jpg, *.jpeg),

PDF (*.pdf),

PNG (*.png),

STL (*.stl),

SVG (*.svg),

TIFF (*.tif, *.tiff)

Cameras, microscopes,

MRT and CT scans,

ultrasonic, X-ray and sonography instruments

Image data 3D

OBJ (*.obj, *.mod, in ASCII format),

VRML (*.vrml, *.wrl),

X3D (*.x3d)

COLLADA (*.dae),

DXF (*.dxf),

FBX (*.fbx),

OBJ (*.obj, *.mod),

PLY (*.ply),

STL (*.stl),

VRML (*.vrml, *.wrl),

X3D (*.x3d)

3D technologies such

as stereolithography

Markup language

XML (*.xml)

HTML (*.html),

SGML (*.sgml),

XML (*.xml)

Websites

Sensor data

CSV (*.csv),

PDF (*.pdf),

TXT (*.txt)

CSV (*.csv),

PDF (*.pdf),

TXT (*.txt),

XLS (*.xls, *.xlsx),

XML (*.xml)

Thermal sensors,

pressure sensors,

polysomnography,

ECG, EEG

Spreadsheets

CSV (*.csv)

CSV (*.csv),

ODS (*.ods, *.odt, *.odg, *.odc, *.odf),

OOXML (*.docx, *.docm),

PDF/A (*.pdf),

XLS (*.xls, *.xlsx)

Data from research,

clinical care

Statistical data

CSV (*.csv),

R (*.r)

CSV (*.csv),

data (*.csv, *.txt),

DDI (*.xml),

R (*.r),

SAS (*.7dat, *.sd2, *.tpt),

SPSS (*.sav),

SPSS Portable (*.por),

STATA (*.dta)

Data from research,

clinical care

Text files

PDF/A (*.pdf),

TXT Unicode (*.txt, *.asc, *.c, *.h, *.cpp, *.m, *.py etc. in ASCII format),

XML (*.xml)

DOC (*.doc, *.docx),

ODT (*.odt),

PDF (*.pdf),

Powerpoint (*.ppt),

RTF (*.rtf),

TXT (*.txt)

Documentations,

reports, findings,

administrative data

Video

Matroska (*.mkv),

MXF (*.mxf)

AVI (*.avi),

Matroska (*.mka, *.mkv),

MPEG-2 (*.mpg, *.mpeg, *.m2v, *.mpg2),

MPEG-4 (*.mp4, *.m4a, *.m4v),

MXF (*.mxf),

QuickTime (*.mov, *.qt),

Windows Media (*.wmv)

Cameras, CT scans,

ultrasonic instruments

Sources

ETH Zurich: suitable file formats for digital preservations
DANS suitable file formats for digital preservations
DARIAH-DE (humanities): suitable file formats for digital preservations
Nestor-Handbook: Digital Curation of Research Data: Experiences of a Baseline Study in Germany
Forschungsdaten-Info (in German)

 

 

Contact

Birte Lindstädt,

Birte Lindstädt
Head of Research Data Management

Phone: +49 (0)221 478-97803
Send mail

Uta Parmaksiz,

Uta Parmaksiz
Digital Preservation of Research Data

Phone: +49 (0)221 999 892 648
Send mail

Related links

Digital preservation at ZB MED 
Metadata in digital preservation
OAIS