The Simplified Guide to Understanding Statistics in the Social Sciences, Part II: Reliability and Validity

By Kelly Tabbutt

This post is the second in a two-part series called “The Simplified Guide to Understanding Statistics in the Social Sciences” on our blog. The first post in this series was about decoding numbers and graphs in statistics.

Valid and reliable data is the lens by which one can gain insight into the world at large.

a complicated interface
Photo by Dan Meyers on Unsplash

Numbers You Can Count On

If numbers and graphs — in other words, the data — is the bread and butter of social statistics, validity and reliability of that data are the measuring cup and oven.

Validity concerns whether the data is an accurate reflection of reality.

Reliability concerns whether the data is consistent for the same respondent across surveys and consistently entered each analyst and across analysts.

When dealing with statistics, consider whether the data you are looking at accurately reflects reality (is valid) and is collected in a consistent manner (is reliable). #statistics #numbercrunching

Let’s discuss reliability and validity in more detail to help you understand how they help present a more truthful (complete and accurate) picture of the data.

Validity

Validity, as I mentioned, refers to whether the data accurately reflects the real world. In other words, validity asks the question, “Is the data measuring what it purports to be measuring — the right population, characteristic, category, or phenomenon?” Further, validity refers to whether the sample of people you are looking at mirrors the entire population of people who are trying to learn about, in terms of gender, ideology, race/ethnicity, education, income, etc.

Validity is important in statistics. If the data doesn’t measure what you say it is measuring, your data cannot be trusted to provide a clear picture of what you sought out to study. In fact, one might argue that data which does not meet the criteria of validity is effectively useless. For example, many population surveys measure the immigrant population. If you are conducting a study of immigrant populations, but you include people whose parents are immigrants but who themselves were born and raised in the US, you have an invalid measure.

Skilled statisticians and pollsters know how to bake validity into their research and data collection endeavors for the best results which accurately reflect things happening in the real world. The best, most effective polls and surveys, after all, tell you something you did not know about the world.

A key to measurement validity is to construct clear and complete definitions of exactly what you are seeking to measure. Then use these definitions as you design your survey or poll measurements.

Reliability

While validity concerns whether the data is truly measuring what it set out to measure, reliability refers to the way the data was collected. Reliable data is collected in a consistent, complete, and accurate fashion to meet the intended purposes. Reliable data also has not been inappropriately altered. In this way, reliability ensures the completeness and accuracy of the data.

Issues with reliability typically come about as a result of human error or due to bugs and malware that corrupt data. Data reliability problems can also result from poor survey design. For example, someone answering a survey twice could respond differently on the same questions, which introduces unreliability in the data. If the survey responses are qualitative and are being scored (quantified) by different analysts, this could also introduce unreliability.

Data reliability should be double-checked by briefing and checking in on data entry and data analyst staff to make sure everyone is using the same definitions of responses and entering the data correctly. 

Valid and reliable data is the lens by which one can gain insight into the world at large. Photo by Saketh Garuda on Unsplash.

Who Counts in Statistics?

Understanding Population Sampling

A political pollster who wants to learn about people’s preferences for an upcoming presidential election does not have the time or money to ask every single person who may vote. Instead, the pollster constructs what is called a “sample population” which can be thought of as a representative subset of the entire population of registered voters who will likely be voting. The sample population is a mere fraction of the overall population, but its goal is to capture the overall characteristics of the larger population.

As I just mentioned, the sample population is never the entire population of a city, state, or nation. Even nationally representative data such as that used in the US Census Bureau’s American Community Survey (ACS) — the long form version of the census — is only collected from a randomly selected subset of the national population. 

While every household in the US is invited to participate in the short form version of the census — which collects information such as number of individuals in the household, gender, and racial and ethnic identification, the long form ACS is only sent to about one in every 480 households. In other words, the sample population for the ACS is one US household out of every 480 US households. 

In statistics, the sample population is meant to be an accurate subset of the entire population. In the case of the ACS, the sample population, while randomly selected, is also selected in such a way as to ensure demographic representativeness and “good geographic coverage.” In other words, the sample population for the ACS is chosen in a way that ensures variation in respondents along the lines of gender, race and ethnicity, household income, and geographic location, among other characteristics. 

When interpreting statistics, it is vital to understand who is included, or at least targeted, in the sample population. That is, you must understand who this population is supposed to represent, and how the sample population and the population this sample represents compares to other populations.

How to Create the Best Sample Population

The ideal sample population would represent everyone in the general population at large, so that you can easily generalize from the sample population to everyone in the population. The truth is that it’s impossible to create a sample population that exactly mirrors every single person in the world. However, being aware of potential biases in sampling — conveniently called sampling bias — can go a long way in understanding whether a sample population is representative enough to be useful for your poll, survey, or other research study.

To understand the sample population and who it represents (and who it doesn’t represent), you want to ask these types of questions:

  1. Who is the sample population? In other words, who was surveyed? (for example, those attending an event)
  2. Who do those surveyed represent? (for example, all voters in a large urban city)
  3. How does the sample population compare to those who were not sampled? (for example, are those who attended the event where they were surveyed likely to differ from those who didn’t in any important way, such as in political leaning or level of political participation?)
  4. How does the group represented in your sample population compare to other populations? (for example, are the voting patterns of a population in a large urban city comparable to that of a small rural city?) 

Sampling Strategies for the Best Sample Population

Two important strategies which should both be used in population sampling are representative sampling and random sampling. These sampling strategies ensure that the data collected is as free of sampling bias as possible, and therefore can be generalized to the entire population.

Representative Sampling

One sampling strategy is called representative sampling. According to Investopedia, a representative sample is “a group or set chosen from a larger statistical population according to specified characteristics.” Obtaining a representative sample is a goal in social sciences research, because insights gleaned from a representative sample can be generalized to the population at large.

A representative sample population, also called a representative sample, is a subset of the entire population that accurately reflects the characteristics of the entire population. A representative sample has what is called “generalizability” — the findings from the sample population are generalizable to the entire population.

For a sample population to be representative, that is, for the findings from this sample population to be generalizable to the entire population they represent, the sample population must have been selected randomly and the characteristics – or demographics – of the sample population must mirror those of the larger population. 

When a sample is not representative of the larger population, this is known as sampling error.

Random Sampling

There are entire volumes of books on statistical methods that are devoted to how to conduct random sampling. It can involve complicated mathematical equations and tedious methods of collecting information from individuals.

The basic concept in random sampling is easy to understand. In random sampling, you choose people from the general population in a random manner. Because, in a random sampling process, you choose people for your sample population in a random fashion, this reduces the chance that you are oversampling from one particular group of people or another — which reduces sampling bias.

Random sampling is designed to ensure that you are capturing the experiences of a variety of people in your sample, which ensures that your findings are not an artifact of who and how you chose your sample. 

Random Sampling Reduces Sample Bias and Sampling Error

Individuals with different life experiences, circumstances, and characteristics will have different points of view and patterns of behavior. Sampling error and sampling bias both can be a product of a pollster or researcher’s inability to tap into this diversity of opinion.

In order to generalize from a sample population to the entire population, the sample must mirror the demographics of that entire population. Demographics refer to the proportions of the population that share certain characteristics, for example the percent who are a certain race or ethnicity, the percent male, or the percent at incomes below $50,000 per year.

Random sampling is one of the best tools to be able to produce a useful population sample. The point of random sampling is to make sure that the people you survey do not all share the same characteristics such as political leaning, income level, gender, race, education level, area of residence (such as neighborhood), etc. Random sampling also ensures that the demographic makeup of the sample population mirrors the proportions of individuals who have those characteristics in the entire population.

Source: CAUSE

Garbage In, Garbage Out

As the saying goes, “garbage in, garbage out.”  Quality data yields quality insights. When it comes to statistics, data quality is largely based on how representative the sample population is. Representativeness or generalizability is a product of how well the sample represents the entire population. Thus, to judge how accurate the statistics are, you must think about how the sample was gathered; how the data was interpreted; and who these methods might have excluded.

What is Included in the Data? 

Ask the following important questions to determine whether the method of gathering information might have missed important segments of the population.

  1. How was the data gathered? For example, was it gathered by a telephone landline? In that case, it would have missed those who either do not have a landline or those with landlines who ignore unknown numbers. Was the data gathered at a local church? If so, it would have missed those who either do not practice that faith, or otherwise do not regularly attend that church. 
  2. Who gathered the data? Was the data gathered by professionals or lay people? Those who conduct surveys, questionnaires, or interviews can either encourage or alienate certain individuals or groups from participating, simply by nature of belonging (or not) to the same social categories (e.g. gender, race, class) as those who are answering the questions. 
  3. How were the questions worded? If the questions were worded using complex or technical language, those who do not understand this language may have skipped those questions or misunderstood them. If the wording includes sensitive language, it may have alienated certain individuals.

What Isn’t Included in the Data?

Equally important in determining the accuracy of statistics in terms of the quality of data is knowing what data might be missing. What information wasn’t gathered? Ask yourself the following questions:

  1. How were survey answers coded by the data analysts? If the questionnaire, interview, or survey included an option analogous to “none of the above” or “choose not to answer”, or alternatively “all of the above”, how were these selections interpreted? Were they disregarded? Were they categorized as negative or affirmative? If these types of answers were ignored or miscategorized, the true nature of their response may have been misrepresented or misinterpreted, leading to an effective lack of information. 
  2. Did the questions cover the full spectrum of relevant questions? For example, if surveying individuals about whether they would vote for a political candidate, were they also asked about which political party they belong to? Were they asked how they voted in the last election? These questions are secondary to the purpose of the survey, but provide key information for understanding the context of the survey or poll participants’ voting behavior. Omitting these questions in the survey may obscure your understanding of the meaning and context of the data collected.

Interpreting the Data: Is it Reliable and Valid?

As you’ve learned, a large part of interpreting data is making sure statistics mean what they say they mean. So far, in this blog post, we’ve talked about several ways to construct good surveys. Another important aspect is being able to interpret the data properly, looking at sources and places where validity and reliability may be compromised.

Can I Trust this Data?

When you first see a social sciences statistic, such as a political survey or poll, ask yourself two questions:

  1. Are these numbers from a legitimate source?
  2. What’s at stake here?

Look at where the statistic comes from – any reliable news source should list this. It is best practice to analyze the reliability of the source. Find out what, if any, conflicts of interest may have come up when collecting and interpreting these statistics.

To give a real-world example, think about political polling – that is, the polls that gather information on the number of Democratic, Republican, or Independent (or other third-party) voters. Particularly in a presidential election year, there are numerous organizations, companies, and media outlets which gather information to try to predict how people will vote in the upcoming election. One goal of these organizations is to predict the winner of the election or identify areas in which a competing candidate can improve their messaging by learning about voter preferences. However, many of these sources are not neutral (bipartisan), meaning that many of these sources support either a particular candidate or a particular party. These polls may even be funded by sources representing that candidate or party.

Partisan polling is the most obvious example of a biased and unreliable source for statistics on voter preferences. Such poll results, which are typically skewed toward one candidate or another, are sometimes also announced in an effort to sway an election towards one candidate or another. For example, if a new poll states that one candidate has a 10-point lead over another, this may encourage undecided voters to opt for the candidate that appears to be winning. In fact, this problem is so bad that public opinion of political polls may be faltering.

When considering what’s at stake for polling organizations, also consider who is funding the polling organization. If the organization is funded by a political party or Political Action Committee (PAC), producing a “favorable” report could affect the amount of funding they receive — or even whether they receive any funding or their reputability. If poll results are not favorable to the funding source, the polling organization’s future working with the political group may be in jeopardy.

Go to the Source (And Favor Objective Sources)

When looking at a statistic, it is best practice to go to the source of the statistic. The source of the statistic should tell you what population the statistic is drawn from and may tell you who collected the statistic, when it was collected, who collected it, how they collected it, and who to contact if you have any questions.

When reading statistics such as political or opinion polling, always consider the source.

The most reliable statistics will be produced and funded in a non-partisan context in which the statistician has nothing to gain (for example, funding or a prestigious position) from their results. The Pew Research Center is an excellent example of one such unbiased source. The Pew Research Center is a nonpartisan, non-governmental, non-institutional (meaning that they are not associated with any academic, religious, political, or other type of institution) “fact tank.” Primary funding for The Pew Research Center comes from The Pew Charitable Trusts. This means The Pew Research Center is not dependent upon private, academic, or governmental funding to conduct polls or surveys or produce other research.

Are These Numbers Reliable? 

Reliability, as a concept, simply refers to consistency in the data — including in how it was collected and the way it was analyzed. In reality, ensuring reliability requires a lot of math and statistical tests.

Threats to Data Reliability

There are three major threats to reliability in data collection, analysis, and interpretation.

First, the data itself can be low quality. Remember the adage “garbage in, garbage out” and that bad source data does not lend itself to useful insights. Unreliable base data can become a problem if the data is recorded or given incorrectly. For example, if the person conducting the survey inputs the data incorrectly, or if the person being questioned gives an incorrect answer to a question, either because they are being deceitful, or because they are misremembering.

Second, the data could be coded incorrectly. Coding refers to the ways that data collected is organized. Coding is the process of classifying, summarizing, or otherwise labeling data. This is similar to the issue of incorrect recording of data but occurs at the analysis stage rather than the collection stage.

Finally, data can be incorrectly interpreted. Specifically, data can be interpreted as having a greater probability of accuracy than is warranted by the statistics.

Ensuring Reliability with Good Study Design

Researchers and pollsters can incorporate practices into their study design to improve reliability. One way is to test out the survey questions in what is called a pilot or pretest to make sure everyone understands them correctly. Another method is ensuring that if the data is coded, all the people coding are doing it the same way.

Pre-testing or Piloting

Pre-testing, or piloting, refers to the practice of testing your survey or interview on a small sample of individuals – ideally, a randomly selected, representative sub-sample of your sample population. A pilot or pre-test is designed to check that all participants (and surveyors or interviewers) have the same understanding of the meaning of each question, ensuring reliability at the collection stage.

Intercoder Reliability

Intercoder reliability means that all of those responsible for coding (classifying and summarizing) the data have understood the meaning of the data in the same way and have coded the data the same way. This is achieved by training coders carefully, creating a comprehensive coding guide, and conducting coding checks, where multiple coders code the same data to make sure they are coding it the same way.

While you will certainly not hear about these practices from pollsters, they represent best practices in social science research and should be part of any researcher or pollster’s strategy.

Gauging Reliability

In statistics, looking at the confidence interval or the margin of error are two ways to gauge the level of statistical reliability.

Confidence Intervals

The confidence interval or CI is the likelihood that the statistics reliably represent reality. CIs are represented as percentages. Confidence intervals answer the question, “What is the percent chance that the value discovered by this statistical analysis is the true value of this parameter in the population?”

For example, statistical results found in the 95% CI range will have a 95% chance of being true within the population — that is, there is a 95% chance that they represent reality. Another way to say this is that there is a 95% chance that the same exact results would be produced in each analysis if it were re-run an infinite number of times.

The Margin of Error

The margin of error is a measure of the range between the true statistic (the reality) and any particular statistical result. Margins of error are typically calculated at a specific confidence interval (for example, a 95% CI). Margins of error are always described as “plus or minus,” which is often symbolized as +/- especially in political polling graphs.

Margins of error are frequently seen in presidential polls. For example, a +/- 4% margin of error means that if the statistic you see says 52% with a margin of error of +/- 4%, then the statistic which would be found if you surveyed everyone rather than a sample population would be between 48% and 56%.

Keep in mind that while these measures ensure that the statistic you have obtained is reliable, the CI and margin of error do not necessarily mean that the values you have discovered are the “true” value. That’s related to validity, which we will examine next.

Are These Numbers Valid?

We’ve just talked about statistical reliabilty, which is all about consistency of obtaining a statistical result. Let’s now discuss validity in more detail.

Validity relates to the truthfulness of statistical analysis, rather than reproducibility of statistical values under the same conditions. Truthfulness, here, means that the statistical results are reflective of the true experiences or sentiments of not only the sample population, but the entire population. This is different from reliability, which ensures that the statistical result you have obtained is reproducible under the same conditions.

While validity relates to the truthfulness of statistics, accuracy concerns statistics’ reproducibility.

Threats to Data Validity

In social and behavioral statistics, specifically those dealing with sentiments or other self-reported information, validity can be difficult, if not impossible, to confirm with 100% confidence. It is difficult for a pollster to know or verify the true political preference of a respondent. Individuals may knowingly or unknowingly misrepresent themselves or fail to answer questions properly. They may lie, forget things, or even fail to pay attention to survey questions. This makes figuring out their real opinions a major challenge.

In the case of political polling, people may often not reveal their true opinions or offer a filtered version of their political views, depending on to whom they are speaking. People may also be uninformed on the issues of interest, which means that they may not have an opinion on the issues of the day that could be accurately assessed via polling. The non-partisan Brookings Institution states that “public opinion is an illusive commodity,” one that can be difficult, if not impossible, to truly understand.

Assessing validity is, in reality, quite complex in behavioral research such as political polling. However, there are checks for statistical validity that are used by social and behavioral scientists that are recognized as indicative of validity. Common flags for invalidity or assurances for validity include question repetition and “attention checks” such as “trick” questions.

Repetition and “Attention Checks” for Validity

To get at people’s true opinions, many survey and interview creators will build in questions that ask the same thing in various ways. For example, a researcher may ask the same question twice in both the affirmative and the negative. In one question, they may ask, “Do you believe everyone should have health insurance?” They will also ask the same question in a different way, such as: “Do you believe that no one should go without health insurance?”

Researchers will also include “trick” questions that test whether the respondent is actually reading the questions. One type of trick question requires the participant to follow instructions rather than answer a question. If the instruction is followed, the respondent is paying attention; if the instruction is not followed, they are likely not paying attention, so their other responses may not be useful, either. Another type of trick question is a non-sensical or irrelevant question. These types of “trick” questions test whether the individual answering is actually reading the questions. 

Despite these checks, it can be true to ascertain people’s true preferences. When surveys do not measure what they set out to measure, this is called measurement error. Measurement error, according to the Cornell Roper Center for Political Research, “results from flaws in the instrument, question wording, question order, interviewer error, timing, question response options, etc.” and thereby undermines validity. This type of error is one of the most common and the biggest challenge for pollsters.

Representativeness as Validity

Researchers take pains, using highly complex sampling techniques, to ensure the representativeness and randomness of their samples. The goal in behavioral statistics is to ensure that the same characteristics associated with the diversity of behaviors, background, identities, and sentiments or beliefs found in the whole population are captured in the sample population. 

You can confirm the use of these types of validity checks by reading the statistical report or, if they are not mentioned in the report, by reaching out to the researcher(s) or agency responsible for collecting and analyzing the data. 

How to Not Lie (or be Lied to) with Statistics 

In the first part of this two-part blog: Part I: Numbers and Graphs, I said that “numbers cannot in themselves be ‘lies’.” This is assuming that the numbers come from reputable sources using accurate counts and are taken from representative population samples.

Now that we have discussed how to read – and here more importantly, how to judge the value of – statistics, we can begin to understand the caveat about reputability, accuracy, and representativeness included with the statement that “numbers cannot in themselves be ‘lies.’” 

In short, we could say that valid and reliable numbers cannot lie, but invalid and unreliable statistics are always lies whether they are flat out incorrect or simply misrepresentations (i.e. half-truths). 

If you missed Part 1 of this series, you can check it out here.

3 thoughts on “The Simplified Guide to Understanding Statistics in the Social Sciences, Part II: Reliability and Validity

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: