Advertising Research Foundation (ARF) FoQ 2 Committee Leaders
Chris Bacon
Advertising Research Foundation, chris{at}thearf.org
Bill Cook
e-Strategic Advantage, billcook2{at}live.com
Bob Walker
Surveys & Forecasts, LLC, rww{at}safllc.com
INTRODUCTION
The six-billion-dollar, 19-year-old enterprise of online survey research with nonprobability samples has grown rapidly through the years (ESOMAR, 2015). Contributing to this growth is demand from the advertising community. Through their research agencies, advertisers spend approximately two billion dollars each year on copy testing, advertising tracking, and market measurement research alone (ESOMAR, 2015). An implication is that many advertisers consider online research good enough to influence certain important decisions they make.
In contrast, U.S. public opinion researchers continue to question the trustworthiness of information produced through surveys of nonprobability samples. Many do not consider the method satisfactory for measuring public opinion. They would prefer to interview by telephone individuals selected by probability sampling.
In theory, probability sampling provides a reasonable assurance that those individuals would represent the broader population (e.g., all adults) on known and unknown characteristics. The idea of relying on a model to transform a self-selecting sample (e.g., members of an opt-in panel) to a representative one is less appealing. It would require that they identify the correct variables to include in that model. Some view the challenge as daunting. At the very least, “survey researchers who pursue model building will have their work cut out for them” (Terhanian, 2013, p. 126). That may help to explain why spending on online public opinion research represents only a small slice of all public opinion research spending (ESOMAR, 2015).
There is evidence to support public opinion researchers' concerns and decisions. For instance, the Advertising Research Foundation (ARF) found “wide variance…on attitudinal and/or opinion questions,” after holding constant sociodemographic and other factors (Walker, Pettit, and Rubinson, 2009) in its 2008 assessment of major opt-in panels. Similarly, a team led by Stanford University researchers (Yeager, Krosnick, Chang, Javitz, et al., 2011) raised doubts about the believability of online research after reviewing evidence from a large 2005 study.
Other parties, including the American Association for Public Opinion Research (AAPOR), have discouraged its members and others from attempting to draw inferences by means other than probability sampling. AAPOR did so based partly on a review of research with opt-in panels (Baker, Blumberg, Brick, Couper, et al., 2010). A possible implication is that advertisers should revisit their decision to depend on online research.
Challenges with Telephone Research
A return to telephone research (for advertising tracking, for example) is not a practical alternative. During the same 19-year period, it has become more complicated to conduct telephone surveys, a function of
declining coverage,
declining response rates,
the increasing use of mobile phones, and
legal prohibitions associated with using autodialers.
As one consequence, a completed telephone interview can cost at least five times more than a comparable online interview (Terhanian, 2012). Reliability and validity are major concerns now, too. As Cliff Zukin, Rutgers University public opinion researcher and professor, said in a New York Times article: “We are less sure how to conduct good survey research now than we were four years ago, and much less than eight years ago….Polls and pollsters are going to be less reliable. We may not even know when we're off base.”1
Coming up with a solution to this problem, the current authors believe, is important for at least three reasons:
Telephone survey response rates, now in the single digits for typical public opinion polls (Pew Research Center, 2012), may decrease further. Although low response rates do not necessarily lead to increased bias (Groves and Peytcheva, 2008; Pew Research Center, 2012), most researchers, including Robert Groves, former director of the U.S. Census Bureau, view them as a major threat. Groves, whom Zukin referenced in the same New York Times article, warned that “with the very low response rates now common, we should expect more failed predictions based on surveys.”1
The stakes are high, given the continuing need to measure and understand society's opinions, attitudes, behaviors, needs, and preferences. George Gallup, an advertiser before he became a public opinion researcher, viewed public opinion polls as “the machinery for directly approaching the mass of the people and hearing what they have to say” (Gallup and Rae, 1940, p. 13). For Gallup, the ability to conduct polls was crucial because “public opinion can only be of service to democracy if it can be heard” (Ibid, p. 13).
Immediate opportunities are available to learn more about research with non-probability samples through analysis of FoQ 2 data.
An Industry Deep-Dive
FoQ 2's main objective was to identify the steps that researchers might take to improve online research quality, with quality defined as “the accuracy of online research with nonprobability samples compared to other methods, such as government-sponsored telephone and face-to-face surveys that employ forms of population-wide probability sampling” (Terhanian, 2012).
Seventeen online sample providers took part, leading to more than 70,000 completed interviews. The ARF estimates that those providers handle more than 100 million completed online interviews each year, a substantial proportion of all online interviews (Terhanian, 2012).
All 17 providers relied on e-mail to invite opt-in panelists to take part, while five also directed river respondents to the survey:
A panelist, or panel respondent, has agreed to receive e-mail invitations to participate in online surveys.
A river respondent, in comparison, has not agreed to do so. Instead, that person has been directed to an online survey in real-time after clicking through a link or advertisement on the Internet, or the metaphorical river.
The FoQ 2 survey included questions that explored 14 substantive areas, such as
purchase behavior,
brand and advertising awareness and attitudes, and
community and political involvement and behavior.
Many of the questions came from ongoing, high-quality national surveys, such as the General Social Survey and the Behavioral Risk Factor Surveillance Survey.2 Organizations such as the U.S. Census Department and the Centers for Disease Control depend on evidence from these kinds of surveys to establish benchmarks, monitor trends, and inform policy.
The ARF also commissioned a dual-frame, dual-mode telephone survey, with respondents selected by probability sampling. Fielding at the same time as the online survey, it included many of the same questions. It is possible, therefore, to compare FoQ 2 online responses to benchmarks from the nine surveys to measure bias. As defined in the current paper and elsewhere, bias is the difference between what survey respondents report and what researchers know or believe to be true (Bohrnstedt, 1983).
Researchers versed in preelection polling methods should be familiar with this approach. They rely on it, or a close variation, to judge the accuracy of pollsters' final forecasts each time a major election takes place. The British Polling Council (2010), for example, reported that ICM Research produced the most accurate final forecast of the 2010 British General Election. It had an average absolute error, or bias, of 1.25 percentage points (See Table 1).
Identifying Variables to Minimize Bias
Although research with nonprobability samples is good enough for many advertisers, it is not yet adequate for many, possibly even most, public opinion researchers. Both groups undoubtedly would agree there is room for improvement. To this end, the authors of the current paper have developed a process for identifying variables that researchers might include in sampling and weighting models to minimize bias. A related aim was to identify the specific variables to include in these models.
Trying to identify those variables through a nonautomated process (e.g., Duffy, Smith, Terhanian, and Bremer, 2005; Terhanian and Bremer, 2012) did not seem prudent in light of the enormous amount of FoQ 2 data available for analysis. The decision, instead, was made to rely on statistical optimization methods (e.g., Torczon, 1989). They speed up and can possibly improve the variable identification process, although they are not part of survey researchers' standard tool kit. A literature review by the authors of the current study, in fact, uncovered no published accounts of efforts to use optimization methods to identify the optimal combination of variables to include in sampling or weighting models.
The current authors thus intended to break new ground by applying optimization methods to FoQ 2's 17 sample sources and two sample types to identify an optimal set of variables (i.e., questions). If used in sampling or weighting schemes, the authors hypothesized, these variables should minimize bias among a second set of more than 100 questions that represent major areas of inquiry for advertisers and public opinion researchers.
METHODOLOGY
Surveys
The 23-minute telephone survey used a probability-based dual-frame method. Mobile phones were the target of 40 percent of all numbers dialed between January 10 and January 24, 2013. A full 1,008 adults, ages 18 years and older, completed the interview. Of the total participants, 312 did so through a mobile phone; the remaining 696 completed the interview by landline. The overall response rate was 19.9 percent. Figures for age, sex, region, race-ethnicity, and education were weighted, where necessary, to bring them into line with their population proportions.
The 26-minute online survey fielded between January 9 and 24, 2013. The ARF asked the 17 providers to direct to the survey as many respondents as they judged necessary to achieve at least 1,000 completed interviews. The ARF also specified age, gender, and regional (AGR) requirements (See Table 2).
The online providers fulfilled these requirements well, with weighting by AGR producing an average efficiency of 0.98 across all sample sources (Bremer, 2013). A value of 1.00 would have represented a perfect match to the AGR characteristics. Each sample, in effect, is a replicate of a generalized AGR sample, with the 17 replicates, totaling 18,231 completed interviews, the starting point for the analyses described in the current paper.
These analyses focus only on what the ARF referred to as its “Method A” AGR specification (Terhanian, 2012), primarily because it was implemented near-perfectly. Three additional specifications with more complex sampling requirements were implemented less perfectly (Bremer, 2013; Gittelman, Thomas, Lavrakas, and Lange, 2015).
Benchmarks
The authors have argued previously that when choosing benchmarks, it is sensible to exclude “questions with responses that are embarrassing or socially desirable…particularly if those questions were administered aurally or in the presence of an interviewer” (Terhanian and Bremer, 2012, p. 762). Others have made similar arguments (e.g., Gittelman, Lange, Cook, Frede, et al., 2015).
For expediency in the current study, the authors chose to include all questions, other than those in optimization models, as benchmarks. Altogether, more than 100 questions were identified. Of these, 29 were asked previously in government-sponsored surveys. The remaining ones came from the concurrent telephone survey, removing the passage of time as a possible cause of bias for these questions. A copy of the FoQ 2 questionnaire is available at the ARF (http://m.thearf.org/foq2).
Optimization Methods
Optimization methods have been used to apparent good effect for the past 65 years in fields such as mathematics, statistics, economics, engineering, and marketing. Although survey researchers do not rely on these methods, many people, including survey researchers, face optimization problems each day.
A patron entering a McDonald's restaurant in New York City, for instance, would see information on price and number of calories for every menu item. If he or she decided to spend no more than $7 on items containing 1,000 calories or less, one set of choices would be available. If calories were not a concern, there would be a larger choice set. A vegetarian would have a smaller choice set. A researcher with an expertise in optimization methods could assess all possibilities and produce the optimal list of items customers might choose while also accounting for constraints.
Most optimization problems include at least three core components:
An objective function, which is a mathematical representation of the quantity to be maximized or minimized. That quantity could be the number of calories consumed at McDonald's for $7 or a measure of bias in survey research.
A set of candidate variables that affect the value of the objective function. The set might include McDonald's menu items, plus the price and number of calories of each. Or it might include the variables a researcher decides to use in a sampling or weighting model.
A set of constraints. A customer, for example, might decide to buy no more than three items summing to between 950 and 1,000 calories for $7. Or, a researcher might include no more than 10 variables in a sampling or weighting model.
The goal of the optimization method, through its algorithm, would be to search for and identify the set of variables that maximizes or minimizes the objective function.
Optimization Problem and Algorithm
The FoQ 2 optimization problem is specified as a mathematical function below (See Equation 1). The function minimizes the average distance, or bias, between FoQ 2 results for each sample provider and the benchmarks, subject to the variables in the adjustment model. It also limits the number of variables in any model to 10 or less, primarily because preliminary analyses indicated that exceeding a 10-variable threshold would yield little return in bias reduction.
Equation 1. The Optimization Problem As a Mathematical FunctionFor the choice of algorithm, several factors made it plain that a multidirectional, multidimensional search algorithm (Torczon, 1989) would be required. Those factors, or challenges, included
the large number of candidate variables, both continuous and discrete,
the decision for practical purposes to seed each model with variables representing age, gender, and region—most market and advertising research practices mandate that these variables be used in the sampling and weighting specification,
the need to create an additional dimension each time a new variable is added to the search, and
the need for sufficient flexibility and power to navigate the search space without overlooking the true optimal solution.
To the current authors' knowledge, this marks the first time that optimization methods are being used in this manner. For this reason, a great deal of time was not spent testing the comparative effectiveness of many different possible algorithms. It is possible, therefore, that different algorithms could be more effective than the one used here, although it is unlikely that the size of the effect would be meaningful. Investigating this topic in more detail may be an area for future research.
How the Algorithm Works
The algorithm's starting point is the AGR (age/gender/region) specification. It then identifies the variable that makes the largest impact, as specified by the objective function. That variable may not reduce bias the most. In certain cases, a variable can be biased on its own dimension but uncorrelated with the benchmark variables. If that variable were included in a sampling or weighting model, it would affect bias minimally. Yet it would increase the complexity of the design.
By relying on a new measure, the impact score, the problem is eliminated. To produce the impact score, the average distance, or bias, between each FoQ 2 online survey response and the benchmark is measured first. Next, the correlation between that measure and all benchmark variables is calculated. The impact score is then produced by multiplying the two terms: bias and the correlation.
The algorithm, as its next step, calculates the change in the objective function the new specification produces for each of the 17 sample sources. It then measures the average change. If the new specification reduces bias beyond a prespecified amount—the convergence criterion—then the process repeats itself. It adds a second variable based on the size of its recalculated impact score. When the change in the objective function, compared with the previous iteration, is less than the convergence criterion—that is, when it is essentially zero—the process starts over.
In the start-over case, the algorithm begins by selecting the variable with the second largest impact. It then proceeds in a different direction across the search space. The algorithm continues to run this way until it exhausts all possibilities.
A summary of the steps involved in identifying an optimal model follows:
Specify the benchmark items.
Establish the convergence criterion--a tenth of a percentage point in the analyses described here.
Calculate the baseline AGR bias, by sample source and overall, for each benchmark item.
Calculate the correlation between each candidate variable and all benchmark variables.
Calculate the impact score by multiplying the correlation identified in Step 4 and bias.
Select the variable to be added to the specification based on the size of its impact score.
Weight each of the 17 sample sources (to simulate bias reduction processes, whether they involve sampling or weighting) to the correct distributions for each variable in the specification.
Determine the change in bias for each sample source, and overall.
If the overall change exceeds the convergence criterion, return to Step 6 and add another variable to the specification.
If the change is smaller than the convergence criterion, then start anew with a different seed variable.
Sort through the results and identify the model with the smallest bias.
Repeat four times to account for two sample source types (i.e., panel and river) and two sets of variables (i.e., “nonattitudinal” and “anything goes,” which are defined later).
Report the best models—one for panels (i.e., panel respondents only), the other for rivers (i.e., river respondents only)—for each set of variables.
ANALYSIS AND RESULTS
The Models
Traditionally, many researchers have included only nonattitudinal variables such as AGR in sampling and weighting models (Taylor, 1995); so, too, did the first part of the current analysis (Models 1, 1a, 2, and 2a). Along with AGR, variables such as race-ethnicity, education, occupation, marital status, political party, political ideology, and time spent online each week were considered. Several variables pertaining to the household, such as income, number of adults, number of children, housing status (e.g., own or rent), number of cars, and presence of different types of telephones were assessed as well.
Some evidence (e.g., Fahimi, Gross, and Barlas, 2014; Terhanian and Bremer, 2012) suggested that researchers can improve sampling and weighting models by including attitudinal items. To test the proposition, no restrictions were placed on candidate variables in the second part of the analysis, referred to as the “anything goes” approach (Models 3 and 3a).
Effectiveness at Reducing Bias
The bias in Models 1 (optimized for panels) and 1a (optimized for rivers)—the baseline AGR case—was 4.8 percentage points, overall (See Table 3).
When six nonattitudinal variables—those that the optimization algorithm identified—were added to the AGR case, overall bias was reduced by an additional 1.1 percentage points, or 23 percent, as Models 2 and 2a show.
Models 2 and 2a shared seven variables:
Age,
Gender,
Region,
Time Spent Online,
Race-Ethnicity,
Housing Status, and
Presence of a Landline Telephone.
The variables representing Education, Income, Number of Adults in the Household, and Political Party appeared in at least one model.
Models 3 and 3a, the “anything goes” cases, suggested that when no restrictions were placed on candidate variable types, overall bias was reduced by 1.7 points, or 35 percent, beyond the AGR baseline. Each model included 10 variables, the maximum number permitted. Several of the 10 were of the nonattitudinal type. These variables represented four segments:
Hopeful/Optimistic,
Privacy Concerned/Time Spent Online,
Open Minded, and
Product Ownership.
The variables representing Education and Income traded off in the “nonattitudinal” and “anything goes” models. It was unnecessary to include both variables in a sampling or weighting model, no matter the sample type (i.e., panel or river).
View this table:The variable representing time spent online appeared in Models 2, 2a, 3, and 3a, even though the target population was the general population, not the online population. Although the matter merits more consideration, there is at least one reason why that variable may be important. In probability samples, the objective of weighting is to correct for each respondent's selection probability. For instance, individuals living in households with multiple phone numbers have a higher probability of being dialed than those in homes with a single line, all else equal.
Likewise, the more time an individual spends online, the greater the probability of receiving an online survey opportunity. That may explain why respondents from each of the 17 sample sources, on average, reported spending more time online than online users identified in benchmark surveys. Correcting for time spent online may also have corrected for the probability of selection, thereby enhancing sample representativeness.
The analyses suggested that each sample type—panel and river—required a somewhat different set of variables when the aim was to reduce as much bias as possible. More research is required to understand the reasons.
The overall benefits of these models did not come at the expense of certain sample providers or sample types. The “nonattitudinal” and “anything goes” models reduced bias overall, within each sample source, and within each sample type (See Figure 1 and Table 4).
CONCLUSIONS
Many studies (e.g., Duffy et al., 2005; Gittelman, Thomas, et al., 2015; Terhanian and Bremer, 2012) have described efforts to reduce bias in surveys of nonprobability samples by improving sampling or weighting models. Unlike those studies, the analyses described in the current paper were not limited to a single sample source, or even a handful. Nor did they involve only one sample type. By applying optimization methods to a large, diverse data set, several key variables were identified. They minimized bias (for more than 100 questions covering 14 substantive areas) across 17 sample sources and two sample types.
The evidence suggests that some or all of the following nonattitudinal variables, if included in an adjustment model, can reduce bias by 23 percent beyond the AGR baseline case:
Time Spent Online
Race-Ethnicity
Education
Income
Housing Status
Political Party
Presence of a Landline Telephone
Number of Adults in the Household
Number of Vehicles in the Household.
The bias reduction can increase to 35 percent by adding to the model some or all of the following attitudinal variables:
“Hopeful”: How often do you feel hopeful?
“Privacy Concerned”: How concerned are you about having records containing your personal information stolen over the Internet?
“Open Minded”: How much do you agree with the following statement? “‘It is best to treat those who disagree with you with leniency and an open mind as they may be proven right.”
“Optimistic”: How much do you agree with the following statement? “I am optimistic about my future.”
Limitations, Implications, and Recommendations
Many studies have limitations, and this one is no exception. It is possible, for example, that the questions chosen as benchmarks were biased, possibly negating the recommendations and conclusions given here. It is also possible that the models developed here may not be portable to future surveys of U.S. adults, or other populations. That is a question for future research. Future research also should explore how effective these models are at reducing bias by substantive area and individual question. The focus here was on bias across all areas and questions.
Even if the models identified in this study prove to be the key to minimizing bias in nonprobability samples, other considerations might affect their use and usefulness.
Organizational culture and previous experience might persuade some practitioners to pursue one approach (e.g., sampling) rather than another (e.g., weighting). That does not mean they will be home safe given that other evidence from other FoQ 2 analyses suggests that sample providers struggle to implement complex sampling designs.
In one scenario, the weighting efficiency of the 17 sample sources decreased from 0.98 to 0.87 when variables representing race-ethnicity and education were added to the baseline AGR scheme (Bremer, 2013). In another, the weighting efficiency “dropped off a cliff” when more variables were added to the scheme (Bremer, 2013). An implication is that those survey practitioners who move forward should not place all of their eggs in the sampling basket.
To rely mainly on weighting may not make sense either because the size of the weights will hinge partly on how respondents were selected. Although a good weighting plan can compensate for a sampling plan that yields low numbers of people with the characteristics of interest, no amount of weighting can compensate for a poor sampling plan: “however much you weight zero, [the answer] is still zero” (Taylor and Terhanian, 1999, p. 22). Inefficient weighting schemes also reduce the effective sample size, making it more difficult to determine meaningful differences when comparing subgroups on variables of interest.
Some researchers who decide to take no action at all, perhaps believing that the effort involved does not justify the possible end: a bias reduction of about a third. In some cases, that may be the correct decision to make. In others, particularly when major investment decisions hinge heavily on the outcomes of surveys, it may be an unwise or even irresponsible decision.
Without additional evidence and a deeper understanding of context, the best advice the current authors can give to those who attempt to use these models, or slight variations, is to take a balanced approach. They might begin by including variables such as age, gender, region, and race-ethnicity in the sampling scheme. They might then include in the weighting scheme these same variables and some of the sociodemographic and attitudinal ones identified earlier.
To add efficiency, they might consider using in one or both schemes a summary measure, such as a propensity score (Rosenbaum and Rubin, 1983, 1984). This might enable them to replace hundreds of traditional sampling quota cells with five propensity score cells (Terhanian and Bremer, 2012). Similarly, within the weighting scheme, practitioners could rely on a propensity score to function as a single, summary measure representing some or all variables in the model. Among other benefits, this would lessen shrinkage of the effective sample size.
Some researchers may want to develop their own models, perhaps relying or otherwise building on the current study's optimization methods and process. There is precedent. Researchers at The NPD Group, where the current authors are employed and which is an ARF member firm, conduct surveys daily to understand what products people buy, where they buy them, how much they paid, and the reasons behind their purchase decisions. They also work with hundreds of retailers that make available point-of-sale information. That information includes a list of each product purchased each week; the number of times it was purchased; the date and store of purchase; and the amount paid.
The point-of-sale information can serve as criteria against which to measure the accuracy of self-reported survey data, just as the results of elections can function as criteria to judge the accuracy of preelection polls. Viewed in this way, it is then possible for NPD researchers to ask themselves, “What might have we done differently to improve the accuracy of our survey estimates? If we had used different variables to weight the data, for instance, would we have increased our accuracy?” By regarding this as an optimization problem, and by using optimization methods to identify such variables, they establish a basis for refining and even validating their methods and models.
Any attempt to reduce or minimize bias also should take into account the burden it imposes on survey respondents. At first glance, for instance, it would appear that the optimal models identified in this paper increase respondent burden because they can require as many as 10 variables. In practice, the technology that survey researchers—particularly online survey researchers—use today possesses the capability of storing and reusing a variety of respondent information, including their responses to questions asked previously. Several sample providers that participated in the FoQ 2 study rely heavily on this technology. An implication is that model-based approaches, particularly ones that prove to be reliable and valid, can be implemented without increasing respondent burden, all other factors being equal.
The ARF's FoQ 2 dataset and optimization methods are excellent raw material for learning how to reduce bias in non-probability samples. This learning may be important for policy, practice, and possibly even the future of survey research. Those who attempt to build on the current authors' efforts should find the environment more supportive than in the past, when outspoken critics questioned the merit of such pursuits. As an illustration, consider the comments of a respected public opinion researcher:
“I can see no valid survey purpose to the current Internet enterprise. All that will happen will be the accumulation of thousands upon thousands of interviews of dubious merit that will mislead the public and destroy whatever credibility surveys and polls now have. A growing number of survey researchers are unfortunately being led to the rocks like Ulysses' sailors following the siren call of cheap, but worthless, data” (Mitofsky, 1999, p. 26).
At that time, that scholar, Warren Mitofsky, did not envision a future in which survey response rates, then in the mid-30s (Pew Research Center, 2012), would approach zero. If he had, he may have taken a more open posture to new methods and approaches, particularly those with less restrictive respondent selection requirements than probability sampling.
That was then and this is now. Ideally, the current authors hope, the current study will serve as a lantern in the night to guide survey researchers, and those who depend on evidence from surveys, to safer ground.
ABOUT THE AUTHORS
George Terhanian is chief research and analytics officer and president of solutions at The NPD Group, Inc. in Port Washington, NY. His research expertise lies in the design and analysis of multimode studies that employ methods other than probability sampling. His work is published in International Journal of Market Research, Journal of Survey Statistics and Methodology, and Journal of Elections, Public Opinion and Parties. Terhanian played a lead role in the design of the Advertising Research Foundation (ARF)'s Foundations of Quality 2 (FoQ 2) initiative. He serves on the board of directors of both the ARF and Council of American Survey Research Organizations.
John Bremer is executive vice president of research science at The NPD Group. He leads the team that designs, maintains, defends, and enhances the methodologies behind NPD's products. Bremer chaired the weighting subcommittee of the ARF's FoQ 2 initiative and speaks frequently at industry events on topics of market research, sampling, weighting, survey design, and statistical methodology. Bremer's work is published in Journal of Advertising Research and International Journal of Market Research.
Jonathan Olmsted is manager in The NPD Group's solutions group. His background is in computationally demanding statistical methodology, particularly Bayesian inference. Previously, Olmsted was a senior research specialist at Princeton University.
Jiqiang Guo is data scientist at The NPD group. His research interests include marketing data analysis, statistical modeling, statistical software development, and Bayesian statistics. Guo made a substantial contribution to Stan (a C++ library and package for Bayesian sampling), which won a Gold prize in the 6th Open Source Software World Challenge 2012, the annual competition hosted by the Ministry of Science, ICT, and Future Planning of Korea. His research can be found in such journals as Technometrics and Journal of Education and Behavioral Statistics.
APPENDIX Substantive Areas, Surveys, and Variables Appearing in Optimization Models
Substantive Areas:
Cognitive orientation
Socioemotional psychographics
Consumer orientation
Consumer purchase behavior
Media and Internet orientation
Technology orientation
Health-related behaviors
New concept tests
Brand and ad awareness and attitudes
Community and political involvement and behavior
Satisficing behavior
Respondent motivation to participate in surveys and panel activity
Demographics
Miscellaneous Attitudes and Behaviors
Surveys:
American National Election Study
General Social Survey
National Health Interview Survey
Behavioral Risk Factor Surveillance Survey
Pew Internet Surveys
American Time Use Study–Bureau of Labor Statistics
Current Population Survey
American Community Survey
Variables Appearing in Optimization Models:
“Age”: In what year were you born? (Year)
“Gender”: Are you male or female? (Male, Female)
“Region”: In which state or territory do you currently live? (State or Territory)
“Time Spent Online”: And about how many hours in a typical week do you spend online (Number)
“Race”: What race or races do you consider yourself to be? (White or European American; Black or African American; American Indian or Alaskan native; Asian or Asian American; Native Hawaiian, or Pacific Islander; Another race)
“Ethnicity”: Are you of Hispanic, Latino, or Spanish origin? (No, not of Hispanic, Latino, or Spanish origin; Yes, Cuban; Yes, Mexican, Mexican American, Chicano; Yes, Puerto Rican; Yes, another Hispanic, Latino, or Spanish origin)
“Race-Ethnicity”: The combination of the variables “race” and “ethnicity.”
“Education”: What is the highest level of education you have completed or the highest degree you have received? (Less than high school; Some high school; Completed high school – regular diploma; Completed high school – GED or alternative credential; Some college, but no degree; Completed Associate's degree; Completed Bachelor's degree; Some graduate/professional school, but no degree; Completed graduate/professional school)
“Income”: Which of the following income categories best describes your total 2012 income before taxes? If more than one person earns income within the household, please estimate the total amount for everyone. (Less than $10,000; $10,000–$14,999; $15,000–$24,999; $25,000–$34,999; $35,000–$49,999; $50,000–$74,999; $75,000–$99,999; $100,000–$149,999; $150,000–$199,999; $200,000 or more; Decline to answer)
“Housing Status”: Do you own or rent your home? (Yes, No)
“Political Party”: Regardless of how you may vote, what do you usually consider yourself? (Republican, Democrat, Independent, No political party affiliation, Another political party)
“Landline Telephone”: Is there at least one telephone inside your home that is currently working and is not a cell phone? (Yes, No)
“Number of Adults in the Household”: Excluding you, how many other adults (age 18 or over) live in your household? (Number)
“Hopeful”: How often do you feel hopeful? (Always, Usually, Sometimes, Rarely, Never)
“Privacy Concerned”: How concerned are you about having records containing your personal information stolen over the Internet? (Very concerned, Somewhat concerned, Not too concerned, Not at all concerned)
“Open-Minded”: How much do you agree with the following statement: ‘It is best to treat those who disagree with you with leniency and an open mind as they may be proven right’? (Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree)
“Optimistic”: How much do you agree with the following statement: ‘I am optimistic about my future’? (Strongly disagree, Disagree, Neither agree nor disagree, Agree, Strongly agree)
“Number of Vehicles in the Household”: How many automobiles, vans or trucks of one-ton capacity or less are kept at home for use by members of your household? (Number)
Footnotes
Editors' Note
In 2010, the Advertising Research Foundation (ARF) began a groundbreaking study—Foundations of Quality 2 (FoQ 2)—to explore and potentially identify practices, methodologies, and technologies to improve the accuracy of online survey research. Previous analyses of FoQ 2 data have suggested, in fact, that variables other than traditional demographic ones, if included in sampling or weighting models, can increase accuracy. But there is still no consensus on what specific variables to include in those models. In this edition of Research Quality, the authors identify several such variables after applying statistical optimization methods to FoQ 2 data. The evidence suggests these variables can reduce bias (and thereby increase accuracy) by a third for more than 100 questions across 17 sample sources and two sample types. This learning may be important for policy, practice, and even the future of survey research.
↵1 “What's the matter with polling.” The New York Times, June 20, 2015. Retrieved December 14, 2015, from http://www.nytimes.com/2015/06/21/opinion/sunday/whats-the-matter-with-polling.html?_r=0
↵2 See the appendix for a full listing of substantive areas and surveys.
- © Copyright 2015 The ARF. All rights reserved.