Research Guides: Data Management Toolkit @ UNH: Metadata & Documentation

Data documentation

Data documentation is like a guidebook for a dataset. It explains what the data is about, where it came from, how it was collected, and much more.

Good documentation:

ensures that data can be shared, discovered, and reused
ensures that anyone using the dataset can understand its structure and meaning without needing to ask the original creators for help
emphasizes transparency in the research process
facilitates replication, reproducibility, and scientific rigor

This guide summarizes various methods for documenting research data: metadata standards, README.txt files, data dictionaries, codebooks, and code documentation. Data documentation also includes supplemental information such as protocols, software code, survey instruments, and interview guides.

Metadata standards

Metadata standards specify explicit data fields or elements for describing data that are machine-readable. If you are depositing your data into a repository, the repository may have a preferred metadata standard. Some research fields have agreed upon metadata standards. For more field-specific metadata standards, check out:

Wherever possible, use existing standards for documenting research data. This can include a set list of terminology (sometimes called a “taxonomy”), a specific file format, or a documentation/metadata standard. When many researchers use the same format to document their data, it becomes easier for everyone to understand and reuse data. There are different metadata schemas, standards, and taxonomies for different disciplines, so you will need to learn what’s applicable to your data. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

Examples of metadata standards

README.txt

A README file is the starting point for understanding the group of files it accompanies. README files are digital text files in a common format (often .txt or .md, but not exclusively) that provide information about a group of files. README’s are flexible and can be used for anything from describing a whole project (giving a project overview and general file layout) to providing nuanced documentation for a small subset of files. For those using a paper notebook, README’s can supplement written notes by keeping a copy of the documentation alongside digital data. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

README templates

Data dictionary

Spreadsheets are rich in data but short on details. A data dictionary provides necessary context for interpreting tabular datasets by defining variables, units, codes, and more. Data dictionaries generally record the following:

Variable name and description
Variable units
Variable coding values and meanings
Known issues with the data (systematic errors, missing values, etc.)
Relationships between variables
Other details needed to better understand the data

[Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

Codebook

Like data dictionaries, codebooks are supplemental documents that contextualize an, often encoded, dataset. Codebooks are regularly used to document survey data, and can include the same information as a data dictionary as well as summary statistics and further documentation on the survey and its methodology. Codebooks can also be used for analyzing qualitative data, where they describe: codes used to categorize the data; the code definitions; relationship between codes; and examples for when each code applies. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]

Code documentation

Documenting code is the practice of creating a README file for your code, and adding comments and explanations within your code to help others (and yourself) understand and use it.

Data Management Toolkit @ UNH

Data Management Plans: Metadata & documentation

Sharing and reusing data: Metadata & documentation

Data documentation

Metadata standards

Examples of metadata standards

README.txt

README templates

Data dictionary

More on data dictionaries

Codebook

More on codebooks

Code documentation

More about code documentation

More about data documentation and metadata