Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Research life cycle: 2.Research phase: Data documentation

Why have data documentation?

Within the framework of the FAIR principles, research data should be delivered as FAIR as possible. In order to make data (re-)usable by other researchers who have not yet worked with it, data documentation is essential. Clear and detailed documentation increases the quality of the data and increases the chance that the data will be understood by (future) others. Metadata are a special form of data documentation.

The area between data, data documentation and metadata is a grey area. Certain data formats also automatically include metadata in their data. For example, exposure, aperture, etc. in digital photos. Ultimately, it is not a question of whether something is called data, metadata or data documentation, but of the underlying goal: to describe the data in such detail that the chance of reproducibility and re-use increases.

Source for this page:  Essentials 4 Data Support by Research Data Netherlands

What to document?

The characteristics of a dataset are defined at different levels:

1. At data level: description of the data itself
    Create a ReadMe text file in which you give an overview of all files of the dataset with a description
    of the content per file. This could be a description/explanation of:

  • data format
  • versions
  • transcripts
  • variables used and what they mean
  • used/suitable software to read the data
  • parameter settings
  • software code used to edit and analyze the data
  • licenses or restrictions placed on the data (terms of use)

    You can also embed data-level information in data files. For example, in interviews, it is best to
    record the contextual and descriptive information of each interview at the beginning of each file.
    Quantitative data variables and value names can be embedded within the data file itself.

2. At project level: description of the data collection process
    Explain here what the aims of the study are, what the research questions/hypotheses are, what
    methodologies were used, what instruments and measures were used, etc.
    The questions that your documentation should answer are:
    - For what purpose were the data created?
    - What does the dataset contain?
    - How was data collected? 
    - Which instruments have been used? (e.g. codebook, lab journal, questionnaire, diary, manual, etc.)
    - Who collected the data and when?
    - How was the data processed?
    - What possible manipulations were done to the data?
    - What were the quality assurance procedures?
    - How can the data be accessed?

More information about data documentation for quantitative and qualitative data (part 1) and additional information per question (part 2) can be found on this website:

3. Description of the changes of the dataset over time
    A historical account of the wanderings and processing of the research data over time
    (= data provenance). To create this, parts 1 and 2 are necessary.

Metadata

Metadata are 'data about data' and facilitate cataloguing and discovery of data. They are an important element in creating a FAIR data infrastructure; not only human, but also computers can read, interpret and combine metadata. Metadata can help to explain the purpose, origin, time, location, creator(s), terms of use, and access conditions of research data.

Metadata types:

  1. Descriptive metadata
    Minimum requirement to find a (digital) object.
    e.g.: title, author, abstract, date
  2. Structural metadata
    Define the relationship between individual objects that together form a unity.
    e.g.: links to related objects, such as an article written on the basis of the data
  3. Technical metadata
    Information on the technical aspects of a dataset.
    e.g.: data format, hardware/software used, calibration, version, authentication, encryption
  4. Administrative metadata
    Information on use, usage rights and management.
    e.g.: license, reason for embargo, waivers, search logs, user tracking

Metadata schemes are used to assign metadata, a set of individual metadata elements that can be used to describe data. There are many different metadata schemes, depending on the discipline, the archive or the platform.
Zuyd works with figshare; figshare uses its own metadata scheme.

www.zuyd.nl | Disclaimer | Over Zuyd Bibliotheek