Taking full advantage of the relative freedom that comes with adolescence, my buddies and I regularly made a pitstop at the local pizza parlor in lieu of heading directly home to hours of homework. Apt to spend any money I had on a slice or two and Italian donuts, I was more than willing to take the Pepper Challenge – eat a whole habanero and earn not just one, but a whole bag of sugar-dusted treats! I was sure to be the envy of my peers. As a regular consumer of pepper-loaded pizza, I didn’t see how the challenge could be much different. Suspicious, however, I covered my bases by confirming with the head of house that the pizza used the same variety: check! Armed with my friends’ collective testimony to my proven pepper tolerance, I was ready for go time… followed by a wildly unpleasant surprise. It turns out the habaneros used for pizza toppings didn’t include the fire-wielding seeds. My mouth was in flames for longer than I care to remember, and needless to say – I was not the one enjoying my hard-earned dessert.
I never took the Pepper Challenge again, but the takeaway lesson stuck with me: acting on incomplete information is risky and prone to poor outcomes. I was acting rationally with the information I had, but I didn’t ask the right questions to accurately weigh cost against benefit. Having lived in the world of data for more than two decades now, I’ve seen parallel yet far more unfortunate scenarios when businesses don’t ask key questions about data they use to guide important decisions.
In 2016, IBM estimated that misinterpreted data cost the US over $3 trillion annually. With so much at stake, it’s imperative that organizations ask at least 5 key questions about data they use to avoid making improper and potentially costly decisions propelled by distorted information.
To understand data in proper context, here’s what to ask:
This is the most critical question and yet it is often overlooked. When data is being used to drive decisions, it needs to be as accurate as possible. In the world of research, the sample describes the pool of respondents from whom data is collected, and who they represent. Ideally, the sample is an identical but much smaller subset of the population of interest. In national studies, random address-based probability sampling achieves this by actively recruiting the right mix of participants – determined by balancing the sample’s distribution of key demographics to match the US Census. Alternately, convenience sampling typically faster and cheaper, obtains respondents that opt in to participate. This method of recruitment often skews the representativeness of data and can lead to biases in analysis that are difficult to identify. To account for these skews, good providers routinely calibrate, re-balance and/or supplement with information from broadly recognized sources of truth. Be sure to ask your data providers about their sample design and how they correct for any skews that might be present in their data.
For automatically collected passive or transactional data, it is important to know where the data comes from and which consumers/devices are being measured. Using additional external sources can verify whether these data provide an accurate perspective. For example, if you are trying to understand smartphone behaviors and you’re looking at a dataset comprised solely of signals from Android devices, you would be missing over half of US smartphone users1 -- which could lead to missteps.
The manner and context in which data is collected greatly impacts its reliability. This is especially true of survey questions which, when biased, can largely affect responses. Imagine trying to understand the relative popularity of bourbon versus scotch. Asking consumers directly, “which do you prefer, bourbon or scotch?” is one option, but it implies the respondent has both experience and a preference for either type of liquor. An unbiased approach asks about behavior: “In the past 6 months”, for example, “which types of liquor have been purchased in your household, and how many times?” This manner of questioning yields more accurate responses and provides the additional context of other liquor consumption (rum, for example) and those who don’t consume liquor at all.
Just as apples can’t be compared to oranges, results from two similar but different data sets can’t be reliably compared without accounting for differences. Understanding inconsistencies arms data driven decision-makers with the information needed to best interpret, compare or integrate the data they are working with. In a survey, maintaining consistent verbiage, volume levels, and agreement scales is crucial to making accurate comparisons within a single dataset and across time periods. Ensuring consistency is even more important when comparing and contrasting data from disparate datasets.
Knowing how long a data provider has been around is an important consideration. The ability to compare a current data point to a trend over time provides meaningful context for interpretation. Without it, inferring reliable insights is a difficult if not impossible task. MRI-Simmons Cord Evolution Study tells us that in 2019, 32% of American adults were cordless TV viewers2. What does that mean, though? Is that a lot, or a little? Stable, or growing? The answer is that cordless Americans are increasing, and quickly. Since 2016, the group has grown a relative 52%, up from 21% of Americans3. The ability to perceive this upward trend puts 32% in context and sets a frame of reference for future expectations.
The world of data collection is fraught with potential pitfalls: inconsistent sampling, poor survey design, skewed data collection, sub-ideal respondents, methodological changes, human error, and more. While consistent and sound research design is foundational to mitigate these problems, backend systems must also be employed to identify and correct for unforeseen anomalies. Volatile swings in aggregate behaviors and attitudes are not common; when they are observed in data, organizations need the peace of mind that they are seeing an accurate reflection of the environment being measured -- not issues with data quality. Make it a priority to ensure your data providers routinely examine anomalies and take the necessary steps to ensure a truthful reflection of reality. And don’t just take their word for it – ask if they’re regularly audited to ensure their practices are indeed what they say+.
Now more than ever, there is no lack of available data – which can be overwhelming and often conflicting. By asking the 5 questions above, however, organizations can protect themselves from getting burned and instead enjoy the rewards of data-driven decision making.
At MRI-Simmons, we understand that achieving a reliable dataset doesn’t happen overnight. It takes years and even decades of experience to be able to benchmark the consumption of over 6,500 products and services, across roughly 600 categories. Our team of seasoned professionals has the flexibility to shift methods and content when appropriate, all the while maintaining transparency and above all, accuracy.
+MRI’s Survey of the American Consumer has held MRC accreditation since 1998. The MRC assures that audience measurement services are valid, reliable, and effective. As part of its annual accreditation process, the MRC conducts rigorous audits of MRI’s methodology, fieldwork, analytics, and data handling systems.
Discover the latest about our evolving country by downloading our free reports and webinars. COVID tracking reports and webinars here.
1Source: Spring 2020 NHCS Adult Study 12-month
2Source: 2019 MRI November Cord Evolution W12
3Source: 2016 October Cord Evolution W3