Clean, Correlate, and Compare: The Importance of Having a Data Analysis Plan
![](https://cehhs.utk.edu/elps/wp-content/uploads/sites/9/2024/07/New-MAD-Blog-Design-Final-1-3.jpg)
By Dr. Jennifer Ann Morrow
![](https://cehhs.utk.edu/elps/wp-content/uploads/sites/9/2024/05/Jennifer-Morrow-Bio-3.jpg)
Data Cleaning Step 2: Create a Data Analysis Plan
Hi again! For those that read my earlier blog on Data Cleaning Step 1: Create a Data Codebook, you know I love data cleaning! My colleagues, Dr. Louis Rocconi and Dr. Gary Skolits, love to nerd out and talk about data cleaning and why it is such an important part of analyzing your evaluation data. As I mentioned in my earlier blog post before we can tackle addressing our evaluation or assessment questions, we need to get our data organized. Creating a data analysis plan is an important part of the data management process. Once I create my first draft of my data codebook (Step 1), I draft a data analysis plan…and both of these get updated as I make changes to my evaluation/assessment dataset.
Why a Data Analysis Plan?
While it can be tempting to just dive right on in and conduct your proposed analyses (I mean who doesn’t just want to run a multiple regression right away?!?) it’s good practice to have a detailed plan for how you intend to clean your data and how you will address your evaluation/assessment questions. Creating a data analysis plan BEFORE you start working with your dataset helps you think through the data that you need to collect to address your questions, what specific pieces of the data that you will use to address your questions, how you will analyze the data that you collect, and what are the most appropriate ways to disseminate the data that you analyze. While creating a data analysis plan can be time consuming, it is an invaluable part of the data management and analysis process. Also, if you are working with a team (as many of us evaluator/assessment professional do!) it makes collaboration, replication, and report generation easier. Just like the data codebook, the data analysis plan is a living document that changes as you make decisions and modifications to your dataset and planned analyses.
I share the data analysis plan with my clients throughout the life of the project so they are aware of the process but also so they can chime in if they have questions or requests for different ways to approach the analysis of their data. At the end of my time with the project I routinely share a copy of the data codebook, data analysis plan, and a cleaned/sanitized dataset for the client to continue to use to inform their program and organization.
What is in a Data Analysis Plan?
Whether you create your data analysis plan in Excel, Word, or some other software platform (I tend to prefer Word) these are my suggestions for what you should include in a data analysis plan:
- 1.) General Instructions to Data Analysts
- 2.) List of Datasets for the Project
- 3.) Who is Responsible for Each Section of the Analysis Plan
- 4.) Evaluation/Assessment Questions
- 5.) Variables that You Will Use in Your Analyses
- 6.) Step by Step Description of Your Data Cleaning Process
- 7.) Specific Analyses that You Will Use to Address Each Evaluation/Assessment Question
- 8.) Proposed Data Visualizations that You Will Use for Each Analysis
- 9.) Software Syntax/Code (e.g., SPSS, R) that You Will Use to Analyze Your Data
Since many times there are multiple people working with my datasets (Boy…did it take me a long time to get used to giving up control here!) including step by step instructions for how your data analysts should name, label, and save files is extremely important. Also providing guidance for how data analysts should document what they do (see project notebook in your data codebook!) and how they arrived at their decisions is invaluable for keeping the evaluation/assessment team aware of each step of the data analysis process.
I typically organize my data analysis plan by first listing any data cleaning that needs to be completed followed by each of my evaluation/assessment questions. This way all of my analyses are organized by the questions that my client wants me to address…and this helps immensely when writing up my evaluation/assessment report for them.
Including either the software syntax/code (if using something like SPSS or R) or the step-by-step approach to how you are using the software tool (if using something like Excel) to clean and analyze the data is so helpful to not only your team members but also your clients. It allows them to easily rerun analyses and critique the steps that you took to analyze the data. I also include in my syntax/code notes about my decision-making process so anyone can easily follow how and why I approached the analyses the way that I did.
Additional Advice
While it is important to develop your data analysis plan early in your project always remember that it is a living document and it will definitely change as you are collecting data, meeting with your client to discuss the evaluation/assessment, and during the data cleaning process. Your “perfect” plan may not work once you have collected your data, so be flexible in your approach. Just remember to document any changes that you make to the plan and to your data in your project notebook!
Resources
12 Steps of Data Cleaning Handout: https://www.dropbox.com/scl/fi/x2bf2t0q134p0cx4kvej0/TWELVE-STEPS-OF-DATA-CLEANING-BRIEF-HANDOUT-MORROW-2017.pdf?rlkey=lfrllz3zya83qzeny6ubwzvjj&dl=0
http://fogartyfellows.org/wp-content/uploads/2015/09/SAP_workbook.pdf
https://cghlewis.com/blog/project_beginning
https://learn.crenc.org/how-to-create-a-data-analysis-plan
https://pmc.ncbi.nlm.nih.gov/articles/PMC4552232/pdf/cjhp-68-311.pdf
https://the.datastory.guide/hc/en-us/articles/360003250516-Creating-Analysis-Plans-for-Surveys
https://www.slideshare.net/slideshow/brief-introduction-to-the-12-steps-of-evaluagio/26168236#1
https://www.surveymonkey.com/mp/developing-data-analysis-plan