Skip to Main Content
UNH Library home

Data Management Toolkit @ UNH

This guide provides information on effectively managing research data and developing data management plans.

Data documentation

Data documentation is like a guidebook for a dataset. It explains what the data is about, where it came from, how it was collected, and much more.

Good documentation:

  • ensures that data can be shared, discovered, and reused
  • ensures that anyone using the dataset can understand its structure and meaning without needing to ask the original creators for help
  • emphasizes transparency in the research process
  • facilitates replication, reproducibility, and scientific rigor

This guide summarizes various methods for documenting research data: metadata standards, README.txt files, data dictionaries, codebooks, and code documentation. Data documentation also includes supplemental information such as protocols, software code, survey instruments, and interview guides.

Metadata standards

Metadata standards specify explicit data fields or elements for describing data that are machine-readable. If you are depositing your data into a repository, the repository may have a preferred metadata standard. Some research fields have agreed upon metadata standards. For more field-specific metadata standards, check out:

Wherever possible, use existing standards for documenting research data. This can include a set list of terminology (sometimes called a “taxonomy”), a specific file format, or a documentation/metadata standard. When many researchers use the same format to document their data, it becomes easier for everyone to understand and reuse data. There are different metadata schemas, standards, and taxonomies for different disciplines, so you will need to learn what’s applicable to your data. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

Examples of metadata standards

README.txt

A README file is the starting point for understanding the group of files it accompanies. README files are digital text files in a common format (often .txt or .md, but not exclusively) that provide information about a group of files. README’s are flexible and can be used for anything from describing a whole project (giving a project overview and general file layout) to providing nuanced documentation for a small subset of files. For those using a paper notebook, README’s can supplement written notes by keeping a copy of the documentation alongside digital data. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

README templates

Data dictionary

Spreadsheets are rich in data but short on details. A data dictionary provides necessary context for interpreting tabular datasets by defining variables, units, codes, and more. Data dictionaries generally record the following:

  • Variable name and description

  • Variable units

  • Variable coding values and meanings

  • Known issues with the data (systematic errors, missing values, etc.)

  • Relationships between variables

  • Other details needed to better understand the data

[Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

More on data dictionaries

Codebook

Like data dictionaries, codebooks are supplemental documents that contextualize an, often encoded, dataset. Codebooks are regularly used to document survey data, and can include the same information as a data dictionary as well as summary statistics and further documentation on the survey and its methodology. Codebooks can also be used for analyzing qualitative data, where they describe: codes used to categorize the data; the code definitions; relationship between codes; and examples for when each code applies. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

More on codebooks