Addressing what metadata standards will be used or how data will be documented is a common component of data management plans.
Responsibly sharing data requires providing sufficient metadata and documentation so that data are FAIR: findable, accessible, interoperable, and reusable.
Data documentation is like a guidebook for a dataset. It explains what the data is about, where it came from, how it was collected, and much more.
Good documentation:
This guide summarizes various methods for documenting research data: metadata standards, README.txt files, data dictionaries, codebooks, and code documentation. Data documentation also includes supplemental information such as protocols, software code, survey instruments, and interview guides.
Metadata standards specify explicit data fields or elements for describing data that are machine-readable. If you are depositing your data into a repository, the repository may have a preferred metadata standard. Some research fields have agreed upon metadata standards. For more field-specific metadata standards, check out:
Wherever possible, use existing standards for documenting research data. This can include a set list of terminology (sometimes called a “taxonomy”), a specific file format, or a documentation/metadata standard. When many researchers use the same format to document their data, it becomes easier for everyone to understand and reuse data. There are different metadata schemas, standards, and taxonomies for different disciplines, so you will need to learn what’s applicable to your data. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]
A README file is the starting point for understanding the group of files it accompanies. README files are digital text files in a common format (often .txt or .md, but not exclusively) that provide information about a group of files. README’s are flexible and can be used for anything from describing a whole project (giving a project overview and general file layout) to providing nuanced documentation for a small subset of files. For those using a paper notebook, README’s can supplement written notes by keeping a copy of the documentation alongside digital data. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]
Spreadsheets are rich in data but short on details. A data dictionary provides necessary context for interpreting tabular datasets by defining variables, units, codes, and more. Data dictionaries generally record the following:
Variable name and description
Variable units
Variable coding values and meanings
Known issues with the data (systematic errors, missing values, etc.)
Relationships between variables
Other details needed to better understand the data
[Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]
Like data dictionaries, codebooks are supplemental documents that contextualize an, often encoded, dataset. Codebooks are regularly used to document survey data, and can include the same information as a data dictionary as well as summary statistics and further documentation on the survey and its methodology. Codebooks can also be used for analyzing qualitative data, where they describe: codes used to categorize the data; the code definitions; relationship between codes; and examples for when each code applies. [Source: Briney, Kristin A. 2022. “Research Data Documentation Methods”. November 22. https://doi.org/10.7907/rt4k-2d76]
Documenting code is the practice of creating a README file for your code, and adding comments and explanations within your code to help others (and yourself) understand and use it.