Why do I need a plan to manage my data?
Planning for your data management needs at the beginning of your research project will save you time and resources in the long run and ensure that your data will be compliant with standards in your field and usable in the future. A formal plan can be valuable to you and may be required by your funding agency.
Data constitute the core of a research project. Maintaining data reliability is key to ensuring the integrity of data-based conclusions. Without proper data management, the validity of research results can be questioned, jeopardizing not only your own reputation, but also the work of others and the reputation of the University.
Responsible data management:
- Protects data from falsification.
- Preserves confidential human subject information.
- Protects proprietary knowledge.
- Provides evidence of inventorship.
- Clarifies ownership of intellectual property rights.
- Assures that University policies and procedures are followed.
- Supports compliance with sponsor requirements.
- Protects the PI, investigators, and the University from consequences of poor data management.
What do we mean by data?
When we talk about data in this Toolkit, we are referring to systematically recorded information that is produced as part of a research process and is the basis of research findings. Funders with data sharing and data management policies may have their own definitions.
Here are some examples of research data types and formats:
Audio or video recordings
The Data Management Toolkit
This Toolkit is meant to help you manage your data by providing practical information on a range of topics, including creating formal Data Management Plans for NSF grants and other funding opportunities. This guide contains the following sections:
Evaluating your data needs
In order to properly manage your data you need to understand the nature of the data, its audience and ownership, and its long-term viability. Reviewing the following questions will help you get started:
- What type of data are you producing?
- Gather a clear picture of what your data will look like. Is it, for example, numerical data, image data, text sequences, or modeling data? Knowing exactly will inform many decisions you need to make about storage, backups and more. Image data requires a lot of storage space, so you'll want to decide which of your images, if not all, you want to retain, and where such large data sets can be housed.
- How much data, and at what growth rate?
- Once you know what kind of data you're producing, you'll be able to assess the growth rate. For example, are you gathering data by hand or using sophisticated instrumentation that is able to capture a lot of data at once? Will there be more data as time goes on? If so, you will need to plan for the growth. What amounts to enough storage this year may not be sufficient for next year.
- Will it change frequently?
- The answer to this question impacts how you organize the data as well as the level of versioning you will need to undertake. Keeping track of rapidly changing datasets can be a challenge, so it's imperative you begin with a plan that will carry you through the data management process.
- Who will use the data?
- Who is your audience for the data? How will they use the data? The answer to this question will tell you how to structure the data and where to distribute it.
- Who controls the data (you, UNH, a research center, the funder)?
- Before you decide how you will manage the data, you need to know if you have the authority to control it or if you have to abide by external requirements.
- How long should the data be retained? (e.g. 3-5 years, 10-20 years, permanently)
- Not all data needs to be retained indefinitely. Figure out what's important to keep long-term and make sure your plan for those datasets is solid.
License and acknowledgement
The Data Management Toolkit was created by Sherry Vellucci and Eleta Exline and is maintained by Eleta Exline. It is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
This guide was initially adapted from the MIT Libraries Data Management and Publishing Guide with additional content from the CalTech Library Data Management Guide (with grateful acknowledgement to MIT Librarians and George Porter @ the California Institute of Technology)