{"id":23756,"date":"2024-10-01T07:00:00","date_gmt":"2024-10-01T11:00:00","guid":{"rendered":"https:\/\/cehhs.utk.edu\/elps\/?p=23756"},"modified":"2025-01-02T10:28:50","modified_gmt":"2025-01-02T15:28:50","slug":"common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them","status":"publish","type":"post","link":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/","title":{"rendered":"Common &#8220;Dirty Data&#8221; Problems I Encounter and How to Save Time Fixing Them"},"content":{"rendered":"\n<p><strong>By M. Andrew Young <\/strong><\/p>\n\n\n\n<div style=\"height:22px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<div class=\"wp-block-media-text is-stacked-on-mobile\" style=\"grid-template-columns:16% auto\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"841\" height=\"1024\" src=\"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/05\/Headshot-1-841x1024.png\" alt=\"\" class=\"wp-image-23409 size-full\" srcset=\"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/05\/Headshot-1-841x1024.png 841w, https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/05\/Headshot-1-246x300.png 246w, https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/05\/Headshot-1-768x935.png 768w, https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/05\/Headshot-1-1262x1536.png 1262w, https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/05\/Headshot-1.png 1347w\" sizes=\"auto, (max-width: 841px) 100vw, 841px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p>Hello, my name is M. Andrew Young. I\u2019m a third-year Ph.D. student in the Evaluation, Statistics and Methodology program in the Educational Leadership &amp; Policy Studies department at the University of Tennessee. For the past 4, nearly 5 years now, I have served as a higher education evaluator as a Director of Assessment. In every job I\u2019ve had since I graduated from my undergraduate degree in 2011, I have dealt with dirty data. Now that I deal with data daily from a variety of sources and people who are content experts in their field, but not necessarily research methodologists, I encounter a lot of creative, but not useful, solutions for managing data. If you, like me, have a full plate every single day, shaving seconds and minutes off your cleaning tasks can really make your life easier.\u00a0\u00a0<\/p>\n<\/div><\/div>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p>We are often told \u201cthere is no perfect evaluation\u201d, \u201cthere is no perfect survey\u201d, or even \u201cthere is no perfect data set\u201d, but what does that look like in practical terms? Even when we are the designer of the data collection instrument(s), our data can be messy, but what happens when we are coming in way after the fact into someone\u2019s dataset for an instrument we didn\u2019t design, administer, or manage? In those instances, we can find ourselves having to riddle out someone else\u2019s solution to data management. Sometimes they are good, but we weren\u2019t given the key to know how they evaluated the data, and sometimes they are downright horrible solutions because they are designed by a human to appeal to human senses instead of being interpreted by a computation device such as a computer.&nbsp;&nbsp;<\/p>\n\n\n\n<p>I don\u2019t have a ton of programming language experience, so I have had to rely on ChatGPT, for which I pay for a premium subscription, to help write code. <strong>CAVEAT: ChatGPT can be <\/strong><strong>highly inaccurate<\/strong><strong>, devise clunky or improper solutions based on the information you give it, and the Python and R packages are woefully out-of-date! I suggest contacting a local programming community. Use GitHub with the AI plugins and debuggers to help you! <\/strong>I had to learn how to debug and evaluate ChatGPT\u2019s code, which took a long time and iterative rounds of testing to see what happened and where it failed.&nbsp;&nbsp;<\/p>\n\n\n\n<p>So, let\u2019s get right down to it. I will share the most common dirty data problems I encounter, how to identify them, and what my solution is. They are in no particular order, but I have encountered them all:&nbsp;&nbsp;<\/p>\n\n\n\n<div style=\"height:31px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>File formats that aren\u2019t usable.<\/strong> \u00a0<br><br>Some data repositories that I have had to analyze data from will export a file with a .xls extension, but the actual encoding is different, like in HTML. Sounds pretty trivial, but if you must download dozens of files, this can be a time-waster.\u00a0 \u00a0<br>\u00a0<br>Solution: Python does some cool stuff, and if you can learn to use Pandas, openpyxl, and beautiful soup, you can get this file conversion done quickly in an entire folder. At the end of this blog post, I\u2019ll place a share link to some extra resources including my Python script for this solution.\u00a0 \u00a0<br>\u00a0\u00a0<\/p>\n\n\n\n<div style=\"height:23px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Merged Cells, empty rows, leading\/trailing spaces, carriage returns, color formatting as data, etc.<\/strong> \u00a0<br><br>In my workplace, and I am going to say commonly in other workplaces, Excel is the preferred place to put data. It isn\u2019t always the best, but it is what people are used to. Sometimes people will attempt to make Excel sheets pleasing to the eye, or able to be viewed by people, but this often makes the file unreadable to Excel or other packages like R without modifications. Since I am a novice R user, and I like to be able to see my data while I\u2019m cleaning it in a dynamic environment, I use Excel for most of my cleaning unless the dataset is too large and unwieldy to utilize Excel. \u00a0<br>\u00a0<br>Leading and trailing spaces, carriage returns, and special characters that we can\u2019t see in a cell can make a unique identifier such as a first\/last name combo or email address \u201clook different\u201d to Excel, meaning it doesn\u2019t find your match unless you use \u201cfuzzy\u201d matching formulae, which I tend to avoid. Cleaning the data is, in my opinion, better in the long run. I have provided a VBA script that does that. I have written it so that it allows you to choose the sheet to run the script for instead of the active sheet. You can change that chunk of code if you want it to behave differently. The carriage return remover can be modified to remove other special characters or search for all of them. Here is a link to that list:<a href=\"https:\/\/excelx.com\/characters\/list\/\" target=\"_blank\" rel=\"noreferrer noopener\"> https:\/\/excelx.com\/characters\/list\/<\/a> \u00a0<br>\u00a0<br>What about colors? I encountered a dataset where the person\u2019s solution to designating different statuses for participant records was color-coding. Unfortunately, those color codes were not mutually-exclusive and some depended on each other in a hierarchical or funnel-flow manner. I always tell people \u201cColumns are free!\u201d, meaning, create an additional column and code those data with numbers, oh, and provide a key in your data journal so the person behind you can figure out what precisely you were doing. \u00a0<\/p>\n\n\n\n<p>I don\u2019t have an elegant solution for this other than formatting the range of data as a table and using the filter and sort options to filter for color. Copy, and paste your numeric code in those spaces for each filtering option. &nbsp;<br>&nbsp;&nbsp;<\/p>\n\n\n\n<div style=\"height:24px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Reconciling mismatches due to form design<\/strong> &nbsp;<br>&nbsp;<br>I encounter this all the time for repeated-measures designs. Participant is asked to do a pretest in one semester\/year, then a posttest in a separate semester\/year. How you can identify that Participant1 at the pretest is Participant1 at the posttest is having a unique identifier. The form designer asks for email. Great! It\u2019s free text and Participant1 put two different emails in at both times, had a typo in one of those periods, and used their full name in one time period with their shortened nickname or typed their name incorrectly in the other. Removing leading and trailing space won\u2019t help you there. &nbsp;<br>&nbsp;<br>I encountered a situation where the data collectors administered a pre\/posttest design in the same semester. They even used a forced-response option for the students to indicate what course they were enrolled in for the evaluation. Sounds good so far. However, I found out later that many of the courses were cross-listed, or had different names and numbers altogether depending on the enrollee\u2019s major field of study host department. All of the cross-listed courses were options and there were no screening questions to filter for that, so Participant234 at the pretest selected one course and a different one at the posttest, even though they were the same course held at the same and time taught by the same faculty. Excel doesn\u2019t know that. In large datasets, this can be challenging, but going back to your client and asking questions can reap a solution. My solution was to get a cross-listed course crosswalk, set a single identifier, and then use formula to replace all of the cross-listed courses (into a new column, of course) with a single descriptor. &nbsp;<br>&nbsp;<br>There are more scenarios, but this is common for me to encounter. &nbsp;<\/p>\n\n\n\n<p><strong>Connecting data for participants from multiple datasets <\/strong>&nbsp;<br>&nbsp;<br>Client A shares a folder with you with 3 different forms, all with multiple tabs, and is scratching their head on how to connect datasets with participant data because the answer to their question lies in the connection of the three sources. Unfortunately for you, there was no unique identifier created to link all three, and <em>that\u2019s why you are there<\/em> (according to them). If I knew SQL, it might not be as big an issue, but I got my start in Excel, so I\u2019ll show you what I do in Excel to connect those sources, before OR after I\u2019ve created a UniqueID. Sometimes I use this method to HELP create a UniqueID: &nbsp;<br>&nbsp;<br>Excel has VLOOKUP, HLOOKUP, and now, XLOOKUP, but a nested INDEX(MATCH()) formula is much faster than those in larger sets, so I always use it (Excel XLOOKUP vs INDEX MATCH, 2024).&nbsp;<br>&nbsp;<br><strong>First, using table references is much less typing than ranges, so my first step in Excel is to ALWAYS create a table AND name it.<\/strong> &nbsp;<br>&nbsp;<br>How to use =INDEX(MATCH()) properly:&nbsp;&nbsp;<\/p>\n\n\n\n<p>1) For when you have a SINGLE UniqueID: Start in the table or sheet you want to pull data INTO, type =INDEX(OtherTable[ColumnName of data you want to get],MATCH([@[same row, but the column where your UniqueID lives],OtherTable[Column where the same UniqueID lives],0))&nbsp;&nbsp;<\/p>\n\n\n\n<p>This will bring over the data you want, matching on a SINGLE criteria using an exact match (that\u2019s done by that \u201c0\u201d before the closing parentheses).&nbsp;<\/p>\n\n\n\n<p>2) For when you need to match on multiple criteria: {=INDEX(OtherTable[ColumnName of data you want to get],MATCH(1,([@[criteria col1]=OtherTable[matching criteria column]1)* ([@[criteria col2]=OtherTable[matching criteria column2])*(etc.),0,))} <strong>&lt;&#8211; you get the {} by pressing CTRL + SHIFT + ENTER at the end of the formula to designate an array formula. It will return a whole column of #N\/A\u2019s if you don\u2019t! Also, you need to set your table to auto-write or flash-fill formulae to save time.<\/strong>&nbsp;&nbsp;<\/p>\n\n\n\n<div style=\"height:26px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Finding duplicate entries <\/strong>\u00a0<br>Some of the most common mistakes I find in dirty data are duplicate entries the original owners didn\u2019t know they had. This is common when data collectors don\u2019t set their survey platform to not allow duplicate entries. The result is that you will have two different answers for the same person within days or weeks for the same form. If Participant30 took the pretest twice and the posttest once, which pretest entry do you keep? \u00a0<br>\u00a0<br>a) Look for completion first, and if there is a deep disparity, keep the more complete submission. \u00a0<br>b) If they are both equally-complete, negotiate with the client on what they believe is the more \u201cvalid\u201d response. In my references is a cool study about how this is done in a manufacturing process environment. That is the article by Eckert et al. (2022). If you don\u2019t have access to an institutional library, you may not be able to view it.\u00a0<\/p>\n\n\n\n<p><strong>Pairwise, listwise, or analysis-specific deletion, and why<\/strong> &nbsp;<br>&nbsp;<br>When do you use pairwise, listwise, or analysis-specific \u201cdeletion\u201d? I will say, in the famous words of Dr. Morrow (<a href=\"https:\/\/faculty.utk.edu\/Jennifer.Morrow\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/faculty.utk.edu\/Jennifer.Morrow<\/a>) \u201cIt depends\u201d. Each case calls for different handling, and there are several ways to go about this, but these two resources may help: &nbsp;<br>&nbsp;<br><a href=\"https:\/\/www.ibm.com\/support\/pages\/pairwise-vs-listwise-deletion-what-are-they-and-when-should-i-use-them\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.ibm.com\/support\/pages\/pairwise-vs-listwise-deletion-what-are-they-and-when-should-i-use-them<\/a> &nbsp;<br>&nbsp;<br><em>TWELVE STEPS OF QUANTITATIVE DATA CLEANING: STRATEGIES FOR DEALING WITH DIRTY DATA<\/em> by Morrow &amp; Skolits (2017) &nbsp;<br>&nbsp;&nbsp;<\/p>\n\n\n\n<div style=\"height:22px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Trust, but verify <\/strong>&nbsp;<br>&nbsp;<br>Last, but not least: what if, just by chance, the original data owners made a data entry error themselves?&nbsp; &nbsp;<br>&nbsp;<br>*GASP* \u201cNEVER!\u201d It happens, trust me.&nbsp; &nbsp;<br>&nbsp;<br>I have encountered cases where, in the same column for the same survey item the categorical data in the cells had \u201c5 &#8211; Strongly Agree\u201d, and \u201c5 &#8211; Strongly Disagree\u201d, and \u201c1 &#8211; Strongly Disagree\u201d. Well, which is the right entry for those participants? The client did not have a copy of the originally developed form, and we had to go back and figure out the original scale, and since there were many entries where the categorical data were overwritten with straight numerical data in the same column (probably an errant \u201cfind &amp; replace\u201d operation), it was even harder to determine whether 5\u2019s were positive or negative, and if the \u201c5 &#8211; Strongly Disagree\u201d entries were supposed to be \u201c5 &#8211; Strongly Agree\u201d or \u201c1 &#8211; Strongly Disagree\u201d. &nbsp;<br>&nbsp;<br>Again, it was a negotiation with the client and a bit of data inference (using Morrow &amp; Skolits, 2017) to help along with (Enders, 2022) to infer their responses.&nbsp;&nbsp;<\/p>\n\n\n\n<p><strong>All in all, a lot of dealing with dirty data, especially when that data isn\u2019t your own, is, in my opinion, making collaborative choices with the owners of the data, documenting those choices, and defending those choices<\/strong>. The phrase \u201cgarbage in, garbage out\u201d may feel overused, but it is, nevertheless, true. While data cleaning, particularly in the light of data equity concerns, is a much larger topic than this tiny little blog post can cover. I hope this helps you along your journey of tidy data, and if you have solutions that I just am not aware of (very likely), then feel free to pass them along to <a href=\"mailto:myoung96@vols.utk.edu\" target=\"_blank\" rel=\"noreferrer noopener\">myoung96@vols.utk.edu<\/a> (my UTK email). I love learning time-saving techniques, and I am willing to share my dirty data secrets too!&nbsp;<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n\n\n\n<p><strong>Additional Resources<\/strong>&nbsp;<\/p>\n\n\n\n<p><strong>Link to additional resources<\/strong><strong>: <\/strong><a href=\"https:\/\/etsu365-my.sharepoint.com\/:f:\/g\/personal\/youngma_etsu_edu\/Eux3mprTbkFBmNQspR3fXq8BT3lDaNzYJPM8YqA_o0Pu4g?e=DTD8OD\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Dirty Data<\/strong><\/a><strong><\/strong>&nbsp;<\/p>\n\n\n\n<p>Eckert, C., Isaksson, O., Hane-Hagstr\u00f6m, M., &amp; Eckert, C. (2022). My Facts Are not Your Facts: Data Wrangling as a Socially Negotiated Process, A Case Study in a Multisite Manufacturing Company. <em>Journal of Computing and Information Science in Engineering<\/em>, <em>22<\/em>(6), 060906.<a href=\"https:\/\/doi.org\/10.1115\/1.4055953\" target=\"_blank\" rel=\"noreferrer noopener\"> <\/a><a href=\"https:\/\/doi.org\/10.1115\/1.4055953\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/doi.org\/10.1115\/1.4055953<\/a>&nbsp;<\/p>\n\n\n\n<p>Enders, C. K. (2022). <em>Applied missing data analysis<\/em> (Second Edition). The Guilford Press.&nbsp;<\/p>\n\n\n\n<p><em>Excel XLOOKUP vs INDEX MATCH: Which is better and faster?<\/em> (2024, January 24). Ablebits.Com.<a href=\"https:\/\/www.ablebits.com\/office-addins-blog\/xlookup-vs-index-match-excel\/\" target=\"_blank\" rel=\"noreferrer noopener\"> <\/a><a href=\"https:\/\/www.ablebits.com\/office-addins-blog\/xlookup-vs-index-match-excel\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.ablebits.com\/office-addins-blog\/xlookup-vs-index-match-excel\/<\/a>&nbsp;<\/p>\n\n\n\n<p>JanChaPatGud36850. (2019, August 13). Characters in Excel. <em>Excel<\/em>.<a href=\"https:\/\/excelx.com\/characters\/list\/\" target=\"_blank\" rel=\"noreferrer noopener\"> <\/a><a href=\"https:\/\/excelx.com\/characters\/list\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/excelx.com\/characters\/list\/<\/a>&nbsp;<\/p>\n\n\n\n<p><em>Jennifer Ann Morrow Profile | University of Tennessee Knoxville<\/em>. (n.d.). Retrieved September 9, 2024, from<a href=\"https:\/\/faculty.utk.edu\/Jennifer.Morrow\" target=\"_blank\" rel=\"noreferrer noopener\"> <\/a><a href=\"https:\/\/faculty.utk.edu\/Jennifer.Morrow\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/faculty.utk.edu\/Jennifer.Morrow<\/a>&nbsp;<\/p>\n\n\n\n<p>Morrow, J. A., &amp; Skolits, G. (2017). <em>TWELVE STEPS OF QUANTITATIVE DATA CLEANING: STRATEGIES FOR DEALING WITH DIRTY DATA<\/em>. AEA 2017.&nbsp;<\/p>\n\n\n\n<p><em>Pairwise vs. Listwise deletion: What are they and when should I use them?<\/em> (2020, April 16). [CT741].<a href=\"https:\/\/www.ibm.com\/support\/pages\/pairwise-vs-listwise-deletion-what-are-they-and-when-should-i-use-them\" target=\"_blank\" rel=\"noreferrer noopener\"> <\/a><a href=\"https:\/\/www.ibm.com\/support\/pages\/pairwise-vs-listwise-deletion-what-are-they-and-when-should-i-use-them\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/www.ibm.com\/support\/pages\/pairwise-vs-listwise-deletion-what-are-they-and-when-should-i-use-them<\/a>&nbsp;<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>By M. Andrew Young Hello, my name is M. Andrew Young. I\u2019m a third-year Ph.D. student in the Evaluation, Statistics and Methodology program in the Educational Leadership &amp; Policy Studies department at the University of Tennessee. For the past 4, nearly 5 years now, I have served as a higher education evaluator as a Director [&hellip;]<\/p>\n","protected":false},"author":86,"featured_media":23696,"comment_status":"closed","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_eb_attr":"","inline_featured_image":false,"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":"","_links_to":"","_links_to_target":""},"categories":[44],"tags":[],"class_list":{"0":"post-23756","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-utkmadblog","8":"entry"},"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Common &quot;Dirty Data&quot; Problems I Encounter and How to Save Time Fixing Them - Educational Leadership and Policy Studies<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Common &quot;Dirty Data&quot; Problems I Encounter and How to Save Time Fixing Them - Educational Leadership and Policy Studies\" \/>\n<meta property=\"og:description\" content=\"By M. Andrew Young Hello, my name is M. Andrew Young. I\u2019m a third-year Ph.D. student in the Evaluation, Statistics and Methodology program in the Educational Leadership &amp; Policy Studies department at the University of Tennessee. For the past 4, nearly 5 years now, I have served as a higher education evaluator as a Director [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/\" \/>\n<meta property=\"og:site_name\" content=\"Educational Leadership and Policy Studies\" \/>\n<meta property=\"article:published_time\" content=\"2024-10-01T11:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-01-02T15:28:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"2240\" \/>\n\t<meta property=\"og:image:height\" content=\"1260\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Jonah Hall\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Jonah Hall\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"12 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/\"},\"author\":{\"name\":\"Jonah Hall\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/#\\\/schema\\\/person\\\/6bfd2c32cad7f6a411dfcf7e5dd65055\"},\"headline\":\"Common &#8220;Dirty Data&#8221; Problems I Encounter and How to Save Time Fixing Them\",\"datePublished\":\"2024-10-01T11:00:00+00:00\",\"dateModified\":\"2025-01-02T15:28:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/\"},\"wordCount\":2341,\"image\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/wp-content\\\/uploads\\\/sites\\\/9\\\/2024\\\/07\\\/New-MAD-Blog-Design-Final-1-3.jpg\",\"articleSection\":[\"Evaluation Methodology Blog\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/\",\"url\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/\",\"name\":\"Common \\\"Dirty Data\\\" Problems I Encounter and How to Save Time Fixing Them - Educational Leadership and Policy Studies\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/wp-content\\\/uploads\\\/sites\\\/9\\\/2024\\\/07\\\/New-MAD-Blog-Design-Final-1-3.jpg\",\"datePublished\":\"2024-10-01T11:00:00+00:00\",\"dateModified\":\"2025-01-02T15:28:50+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/#\\\/schema\\\/person\\\/6bfd2c32cad7f6a411dfcf7e5dd65055\"},\"breadcrumb\":{\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#primaryimage\",\"url\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/wp-content\\\/uploads\\\/sites\\\/9\\\/2024\\\/07\\\/New-MAD-Blog-Design-Final-1-3.jpg\",\"contentUrl\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/wp-content\\\/uploads\\\/sites\\\/9\\\/2024\\\/07\\\/New-MAD-Blog-Design-Final-1-3.jpg\",\"width\":2240,\"height\":1260},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Common &#8220;Dirty Data&#8221; Problems I Encounter and How to Save Time Fixing Them\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/#website\",\"url\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/\",\"name\":\"Educational Leadership and Policy Studies\",\"description\":\"University of Tennessee, Knoxville\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/#\\\/schema\\\/person\\\/6bfd2c32cad7f6a411dfcf7e5dd65055\",\"name\":\"Jonah Hall\",\"url\":\"https:\\\/\\\/cehhs.utk.edu\\\/elps\\\/author\\\/jhall152\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Common \"Dirty Data\" Problems I Encounter and How to Save Time Fixing Them - Educational Leadership and Policy Studies","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/","og_locale":"en_US","og_type":"article","og_title":"Common \"Dirty Data\" Problems I Encounter and How to Save Time Fixing Them - Educational Leadership and Policy Studies","og_description":"By M. Andrew Young Hello, my name is M. Andrew Young. I\u2019m a third-year Ph.D. student in the Evaluation, Statistics and Methodology program in the Educational Leadership &amp; Policy Studies department at the University of Tennessee. For the past 4, nearly 5 years now, I have served as a higher education evaluator as a Director [&hellip;]","og_url":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/","og_site_name":"Educational Leadership and Policy Studies","article_published_time":"2024-10-01T11:00:00+00:00","article_modified_time":"2025-01-02T15:28:50+00:00","og_image":[{"width":2240,"height":1260,"url":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3.jpg","type":"image\/jpeg"}],"author":"Jonah Hall","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Jonah Hall","Est. reading time":"12 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#article","isPartOf":{"@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/"},"author":{"name":"Jonah Hall","@id":"https:\/\/cehhs.utk.edu\/elps\/#\/schema\/person\/6bfd2c32cad7f6a411dfcf7e5dd65055"},"headline":"Common &#8220;Dirty Data&#8221; Problems I Encounter and How to Save Time Fixing Them","datePublished":"2024-10-01T11:00:00+00:00","dateModified":"2025-01-02T15:28:50+00:00","mainEntityOfPage":{"@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/"},"wordCount":2341,"image":{"@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#primaryimage"},"thumbnailUrl":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3.jpg","articleSection":["Evaluation Methodology Blog"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/","url":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/","name":"Common \"Dirty Data\" Problems I Encounter and How to Save Time Fixing Them - Educational Leadership and Policy Studies","isPartOf":{"@id":"https:\/\/cehhs.utk.edu\/elps\/#website"},"primaryImageOfPage":{"@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#primaryimage"},"image":{"@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#primaryimage"},"thumbnailUrl":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3.jpg","datePublished":"2024-10-01T11:00:00+00:00","dateModified":"2025-01-02T15:28:50+00:00","author":{"@id":"https:\/\/cehhs.utk.edu\/elps\/#\/schema\/person\/6bfd2c32cad7f6a411dfcf7e5dd65055"},"breadcrumb":{"@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#primaryimage","url":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3.jpg","contentUrl":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3.jpg","width":2240,"height":1260},{"@type":"BreadcrumbList","@id":"https:\/\/cehhs.utk.edu\/elps\/common-dirty-data-problems-i-encounter-and-how-to-save-time-fixing-them\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/cehhs.utk.edu\/elps\/"},{"@type":"ListItem","position":2,"name":"Common &#8220;Dirty Data&#8221; Problems I Encounter and How to Save Time Fixing Them"}]},{"@type":"WebSite","@id":"https:\/\/cehhs.utk.edu\/elps\/#website","url":"https:\/\/cehhs.utk.edu\/elps\/","name":"Educational Leadership and Policy Studies","description":"University of Tennessee, Knoxville","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/cehhs.utk.edu\/elps\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/cehhs.utk.edu\/elps\/#\/schema\/person\/6bfd2c32cad7f6a411dfcf7e5dd65055","name":"Jonah Hall","url":"https:\/\/cehhs.utk.edu\/elps\/author\/jhall152\/"}]}},"featured_image_src":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3-600x400.jpg","featured_image_src_square":"https:\/\/cehhs.utk.edu\/elps\/wp-content\/uploads\/sites\/9\/2024\/07\/New-MAD-Blog-Design-Final-1-3-600x600.jpg","author_info":{"display_name":"Jonah Hall","author_link":"https:\/\/cehhs.utk.edu\/elps\/author\/jhall152\/"},"_links":{"self":[{"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/posts\/23756","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/users\/86"}],"replies":[{"embeddable":true,"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/comments?post=23756"}],"version-history":[{"count":0,"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/posts\/23756\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/media\/23696"}],"wp:attachment":[{"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/media?parent=23756"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/categories?post=23756"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cehhs.utk.edu\/elps\/wp-json\/wp\/v2\/tags?post=23756"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}