• Request Info
  • Visit
  • Apply
  • Give
  • Request Info
  • Visit
  • Apply
  • Give

Search

  • A-Z Index
  • Map

Educational Leadership and Policy Studies

  • About
  • Our People
    • Our People Overview
    • Faculty
    • Staff
    • Students
  • Academic Programs
    • Academic Programs Overview
    • Adult & Continuing Education
    • College Student Personnel
    • Educational Administration
    • Evaluation Programs
    • Higher Education Administration
    • Undergraduate Studies
  • Education Research & Opportunity Center
  • Admissions & Information
    • Admissions Overview
    • Graduate Forms, Handbooks, and Resources
    • Contact ELPS
  • About
  • Our People
    • Our People Overview
    • Faculty
    • Staff
    • Students
  • Academic Programs
    • Academic Programs Overview
    • Adult & Continuing Education
    • College Student Personnel
    • Educational Administration
    • Evaluation Programs
    • Higher Education Administration
    • Undergraduate Studies
  • Education Research & Opportunity Center
  • Admissions & Information
    • Admissions Overview
    • Graduate Forms, Handbooks, and Resources
    • Contact ELPS
Home » Is Your Data Dirty? The Importance of Conducting Frequencies First

Is Your Data Dirty? The Importance of Conducting Frequencies First

Is Your Data Dirty? The Importance of Conducting Frequencies First

Is Your Data Dirty? The Importance of Conducting Frequencies First

March 1, 2025 by Jonah Hall

By Jennifer Ann Morrow, Ph.D.

Data, like life, can be messy. I’ve worked with all types of data, both collected by me and by my clients, for over 25 years and I ALWAYS check my data before conducting my proposed analyses. Sometimes, this part of the analysis process is quick and easy but most of the time it’s like an investigation…you need to be thorough, take your time, and provide evidence for your decision making. 

Data Cleaning Step 3: Perform Initial Frequencies 

After you have drafted your codebook and analysis plan you should conduct frequencies on all of your variables in your dataset, both numeric and string variables. I typically use Excel or SPSS to do this, my colleague Dr. Louis Rocconi prefers R, but you can use any statistical software that you feel most comfortable with. At this step I conduct frequencies and request graphics (e.g., bar chart, histogram) for every variable. This output will be invaluable as you work through your next data cleaning steps. 

So, what should you be looking at when reviewing your frequencies? One thing that I will make note of is any discrepancies in coding between my data and what is listed in my codebook. I’ll flag any spelling issues in my variable names/labels and note anything that doesn’t match my codebook. One thing that I always check is that my value labels (what labels are given to my numeric categories) are the same as my codebook and consistent across sets of variables. Many times, if you are using an online survey software package to collect your data there can easily have been programming mistakes when creating the survey that results in mislabeled values. Also, if you have had many individuals enter data into your database it can increase the chances that mistakes were made during the data entry process. During this step I will also check to make sure that I have properly labeled any values that I’m using to designate missing data and that this is consistent with what I have listed in my codebook.  

Lastly, I will highlight when I see variables that may have extreme scores (i.e., potential outliers), variables with more than 5% missing data, and variables with very low sample size in any of their response categories. I’ll use this output in future data cleaning steps to aid in my decision making on variable modification. 

Data Cleaning Step 4: Check for Coding Mistakes 

At this step I will take my output that I highlighted potential issues with coding and start reviewing and making variable modification decisions at this step. Coding issues are more common when you have data that has been manually entered but you can still have coding errors in online data collection! Any variables that have coding issues I first determine if I can verify the data from the original/another source. For data that has been manually entered I’ll go back to the organization/paper survey/data form to verify the data. If it needs to be changed to the correct response I will make a note of this to fix in my next data cleaning step. If I cannot verify the datapoint (like when you have collected your data anonymously) and the value doesn’t fall in the possible values listed in my codebook then I make a note to set the value as missing when I get to the next data cleaning step.  

Additional Advice 

As I am going through my frequencies I will highlight/enter notes directly in the output to make things easier as I move forward through the data cleaning process. I’ll also put notes in my project notebook summarizing any issues and then once I make decisions on variable modifications, I note these in my notebook as well. You will use the output from Step 3 in the next few data cleaning steps to aid in your decision making so keep it handy! 

Resources

12 Steps of Data Cleaning Handout: https://www.dropbox.com/scl/fi/x2bf2t0q134p0cx4kvej0/TWELVE-STEPS-OF-DATA-CLEANING-BRIEF-HANDOUT-MORROW-2017.pdf?rlkey=lfrllz3zya83qzeny6ubwzvjj&dl=0 

Step 1: https://cehhs.utk.edu/elps/organizing-your-evaluation-data-the-importance-of-having-a-comprehensive-data-codebook/ 

Step 2: https://cehhs.utk.edu/elps/clean-correlate-and-compare-the-importance-of-having-a-data-analysis-plan/ 

https://davenport.libguides.com/data275/spss-tutorial/cleaning

https://libguides.library.kent.edu/SPSS/FrequenciesCategorical

https://www.datacamp.com/tutorial/tutorial-data-cleaning-tutorial

https://www.geeksforgeeks.org/frequency-table-in-r

https://www.goskills.com/Excel/Resources/FREQUENCY-Excel

Filed Under: Evaluation Methodology Blog

Educational Leadership and Policy Studies

325 Bailey Education Complex
Knoxville, Tennessee 37996

Phone: 865-974-2214
Fax: 865.974.6146

The University of Tennessee, Knoxville
Knoxville, Tennessee 37996
865-974-1000

The flagship campus of the University of Tennessee System and partner in the Tennessee Transfer Pathway.

ADA Privacy Safety Title IX