LibGuides: Research life cycle: 2.Research phase: Data documentation

Why have data documentation?

Within the framework of the FAIR principles, research data should be delivered as FAIR as possible. In order to make data (re-)usable by other researchers who have not yet worked with it, data documentation is essential. Clear and detailed documentation increases the quality of the data and increases the chance that the data will be understood by (future) others. Metadata are a special form of data documentation.

The area between data, data documentation and metadata is a grey area. Certain data formats also automatically include metadata in their data. For example, exposure, aperture, etc. in digital photos. Ultimately, it is not a question of whether something is called data, metadata or data documentation, but of the underlying goal: to describe the data in such detail that the chance of reproducibility and re-use increases.

Source for this page: Essentials 4 Data Support by Research Data Netherlands

What to document?

The characteristics of a dataset are defined at different levels:

1. At data level: description of the data itself
Create a ReadMe text file in which you give an overview of all files of the dataset with a description
of the content per file. This could be a description/explanation of:

data format
versions
transcripts
variables used and what they mean
used/suitable software to read the data
parameter settings
software code used to edit and analyze the data
licenses or restrictions placed on the data (terms of use)

You can also embed data-level information in data files. For example, in interviews, it is best to
record the contextual and descriptive information of each interview at the beginning of each file.
Quantitative data variables and value names can be embedded within the data file itself.

2. At project level: description of the data collection process
Explain here what the aims of the study are, what the research questions/hypotheses are, what
methodologies were used, what instruments and measures were used, etc.
The questions that your documentation should answer are:
- For what purpose were the data created?
- What does the dataset contain?
- How was data collected?
- Which instruments have been used? (e.g. codebook, lab journal, questionnaire, diary, manual, etc.)
- Who collected the data and when?
- How was the data processed?
- What possible manipulations were done to the data?
- What were the quality assurance procedures?
- How can the data be accessed?

More information about data documentation for quantitative and qualitative data (part 1) and additional information per question (part 2) can be found on this website:

Data documentation: CESSDA training guide

3. Description of the changes of the dataset over time
A historical account of the wanderings and processing of the research data over time
(= data provenance). To create this, parts 1 and 2 are necessary.

Metadata

Metadata are 'data about data' and facilitate cataloguing and discovery of data. They are an important element in creating a FAIR data infrastructure; not only human, but also computers can read, interpret and combine metadata. Metadata can help to explain the purpose, origin, time, location, creator(s), terms of use, and access conditions of research data.

Metadata types:

Descriptive metadata
Minimum requirement to find a (digital) object.
e.g.: title, author, abstract, date
Structural metadata
Define the relationship between individual objects that together form a unity.
e.g.: links to related objects, such as an article written on the basis of the data
Technical metadata
Information on the technical aspects of a dataset.
e.g.: data format, hardware/software used, calibration, version, authentication, encryption
Administrative metadata
Information on use, usage rights and management.
e.g.: license, reason for embargo, waivers, search logs, user tracking

Metadata schemes are used to assign metadata, a set of individual metadata elements that can be used to describe data. There are many different metadata schemes, depending on the discipline, the archive or the platform.