The African American Research Collaborative led a team of diverse experts including data scientists, pollsters, social scientists, public health professionals and medical doctors to study knowledge and attitudes towards COVID-19 vaccines nationwide. The study was designed to create nationally representative samples across race and ethnicity with large respondent bases of Black, Latino, Asian American and Pacific Islander, Native American and White populations. Overall, the survey interviewed 12,887 adults from May 7 – June 7, 2021. The survey was available in English, Spanish, Chinese, Korean, and Vietnamese, and had a median length of 20 minutes. The sample size and margin of error for each component of the survey is: Full national, 12,288, +/- 0.9%; Black, 2,281 +/-2.1; Latino, 2,944 +/-1.8; AAPI, 2,281 +/-2.1; Native American 1,921 +/-2.2; White, non-Hispanic, 2,861 +/-1.8. The AAPI sample included a small oversample of Pacific Islanders (n=246, +/- 6.2%). The survey also contained an oversample of New Mexico (2,057, +/-2.1%) adults which is why the Latino and White components are somewhat larger, however in all estimates the New Mexico sample is weighted to reflect its correct share of the population. For each racial group, post-stratification weights were implemented using a raking algorithm to balance the sample to the 2019 Census ACS estimates for gender, age, education, nativity, and geography.
Given a rapidly changing landscape for survey research, different modes and dropping response rates, almost all social science has moved beyond random digit dial (RDD) landline surveys which were thought to have almost 99% coverage in the 1980s and 1990s. Today, researchers must employ best practices in data science to accurately sample America’s increasingly diverse and hard-to-reach populations. Building on multiple successful large-n survey projects by our team in 2016, 2018 and 2020, we implemented a mixed-mode randomized stratified sample that offers respondents the opportunity to be interviewed by live interviewer-assisted cell phone or landline phone, text-to-web, email invitation and panel listed sample self-administered online survey. Working across a wide range of sample providers, our sample started with a very large and comprehensive list of the adult population. Our sample sources are well known to reduce non-coverage bias and have been used extensively in academic social science publications. This approach allows our research team to maximize coverage, increase response rate, and reduce overall design effects. Pre-stratification randomize quota sampling was used as a starting point and post-stratification weights were added to bring the resulting sample into balance with known census demographic estimates for each racial and ethnic group in the sample.
The survey was implemented with a mix of phone and online lists that are nationally representative to each racial group. Overall, 31% completed the survey on the phone and 69% online. Phone sample included both cell-only households as well as those with landlines. Online sample included a comprehensive mix of text-to-web, email-invitation, and online panels. In particular, we worked with online panel companies with expertise in hard-to-reach populations and racial/ethnic minorities. Given low and unequal response rates in modern survey research, we relied on pre-stratification quotas, which required random selection within strata, but ensured large and near representative sample sizes for key demographics, including gender; race/ethnicity; age; geography (urban, suburban, rural); immigrant status (native-born, foreign-born, undocumented); education; and political ideology. Demographic analysis of our approach reveals we achieved a sample that matches the population across key markers that are often out of balance in other surveys. In particular, foreign-born and lower socioeconomic status respondents are well-represented in our survey. We ensure that we are interviewing harder-to-reach populations that even the best theoretically designed probability sample cannot ensure due to lower response rates and trust issues. To address questions over survey response quality, we implement multiple checks. First, our survey contains two attention checkpoints that require participants to fully understand and respond appropriately to questions which direct respondents to pick a specific answer. Next, we use an algorithm to detect and remove “bots” or pseudo-bots where people automate or speed through the survey. Third, we de-dupe across panels, so that if a potential respondent is registered with two or more online panels, they are removed so that they only have one chance of being selected. Last, after the data is fully collected, we do a final quality check on data to remove any responses that come in too little time or take too long or contain multiple non-sensical results.