The process of data collection has inherent flaws.

Most noticeably, a percentage of the surveys fielded are filled out incompletely and some of the information supplied is incorrect. However, that doesn’t stop companies from collecting, and making decisions based on millions of records, some of which include incomplete or incorrect data. The key to making use of this potentially invalid data is data validation. Data visualization provides an entirely new way to determine if your data is valid.

The concept of data validation is not new – most companies already believe they are doing data validation in one form or another. Whether they employ partial-record abandonment, cross-referencing for outliers, or some combination of the two, the data is being scoured for possible irregularities. However, to truly see whether data has been validated, a better way of viewing the potential problems with the dataset is needed.

Where Does Flawed Data Come From?

The most common flaw in a data set is missing data, typically in the form of partial records. This can be a big problem – especially when inferences are drawn from multiple pieces of combined data.

However, missing data is just a symptom of a larger issue. Frequently, gaps in data come from respondent fatigue – the tendency for people to abandon a survey when they grow tired of the task. More concerning is the tendency of respondents to rush through a survey and provide incorrect data, simply so they can finish.

While this false data can be found and removed using traditional statistical analysis, this process is:

  • Expensive
  • Laborious
  • Still not 100% effective

Identify and Prevent the Gaps in Data with Data Visualization

Identifying problem areas in your data can be done with a visualization designed to highlight the “Swiss cheese” effect that comes from incomplete data. This implementation of visualization for data validation is an important, although basic, way to watch data.

Where data visualization really shines is in helping to create a data collection process that is as streamlined as possible; collecting only the information that is actually needed to make decisions.

Understand What Data is Necessary with Data Visualization

Many projects implement data visualization as one of the final steps of the process. They start by focusing on collecting the data, specifically collecting ALL the data they feel will be necessary.

This is a mistake.

Instead, visualization should be planned from the beginning. By going through the data visualization process while designing and planning a project, companies can see how the data they collect will be incorporated. This benefit comes from 2 different aspects of the process:

  • Project Planning – During the project planning phase, a good data visualization firm will analyze the data itself (or the planned data, if none has been collected). This analysis, combined with an understanding of the visualization’s objectives, will allow the agency to show which data is relevant and which is not.
  • Basic Preliminary Visualizations – The early stages of project design and planning should involve the creation of preliminary visualizations. These will show what the collected data is saying and allow trends and outliers to be seen easily. A preliminary visualization is often the tool that finds previously unseen flaws in the data gathering process.

Data visualization plays a key role in collecting valid data. Initially, the visualization planning process highlights exactly what data should be collected and allows for the creation of a streamlined collection process (thereby reducing respondent fatigue). Then, a separate visualization can show places where data gaps are appearing and, if necessary, adaptations to the collection process can be made. The mid-stream changes, instead of re-collection once the original process has finished, represent a significant savings in cost and time (not to mention a more successful project).

If your data validation process isn’t as proactive as it could be, or if you’re seeing a significant amount of your data invalidated, contact Boost Labs to find more about out how a visualization can help you.