ABSTRACT
Since its modern inception about two decades ago, the use of neuroscience tools and insights in studying advertising has grown an increasing prominence in the researcher's toolbox. As a branch of applied neuroscience, labels such as “neuromarketing” and “consumer neuroscience” often are used interchangeably, and this emerging field suffers from many inconsistencies. Methodological differences, conceptual inconsistencies, a lack of systematic validation of neuroscience-based metrics, and questionable business practices are all symptoms of a discipline that is in need of rigor and maturation. The goal of this article is to suggest a basic foundation for the use of neuroscience and related methods in studying advertising effects. Three main elements are suggested: a distinction among basic, translational, and applied research; a conceptual clarification; and a framework for the validation of neuroscience-based metrics.
MANAGEMENT SLANT
The inclusion of neuroscience and psychology has benefited researchers' understanding of consumers and communication effects.
Conceptual inconsistencies, poor business practices, and a lack of academic rigor have hampered the emerging fields of consumer neuroscience and neuromarketing.
Applied research needs to provide a way to ensure that basic research is translated, validated, and tested against the initial claims.
The language of cognitive psychology and cognitive neuroscience should be employed, because it has the longest and most substantial research on the faculties of the mind.
Collaborative efforts are needed to reduce conceptual confusion and increase the validity of neuromarketing and consumer neuroscience.
INTRODUCTION
When the term “neuromarketing” first officially was published in academic journals (Ariely and Berns, 2010; Hotz, 2008; Lee, Broderick, and Chamberlain, 2007; Wilson, Gaines, and Hill, 2008), it was preceded by both academic research and commercial attempts to employ neuroscience to provide answers to challenges in business practices, especially in advertising and marketing research. Around the same time, a novel breed of research firms that offered neuroscience tests to corporations emerged, selling solutions that purportedly measured true, subconscious emotional and cognitive responses. This line of thinking stems back at least 40 years (Kroeber-Riel, 1979), when studies used different neuroscience and physiology measures to test the efficacy of advertising. Although these early steps failed to consolidate, probably because neither the technology nor the science was sufficiently mature at the time, the initiatives at the turn of the millennium and especially since 2010 have had much more success, both commercially and academically.
Articles also emerged that considered that neuroscience tools could be used to boost understanding of consumer behavior and psychology. Early hallmark studies suggested that the measuring of brain responses could allow researchers to understand better
why brands add value to products (McClure, Li, Tomlin, Cypert, et al., 2004),
the role of emotional engagement in advertisements (Marci, 2006),
how price influences the experience of products (Schmidt, Skvortsova, Kullen, Weber, et al., 2017),
the causal mechanisms underlying value-based consumer choice (Knutson, Rick, Wimmer, Prelec, et al., 2007), and even
how in-store emotions can affect store purchase (Groeppel-Klein, 2005).
More recent research has built on these studies and now demonstrates how price modulates the appeal and enjoyment of products (Bogomolova, Oppewal, Cohen, and Yao, 2015; Garaus, Wolfsteiner, and Wagner, 2016; Karmarkar, Shiv, and Knutson, 2015; Plassmann, Doherty, Shiv, and Rangel, 2008; Votinov, Aso, Fukuyama, and Mima, 2016) and how responses in smaller samples can be predictive of market responses (Berns and Moore, 2012; Boksem and Smidts, 2015; Christoforou, Papadopoulos, Constantinidou, and Theodorou, 2017; Dmochowski, Bezdek, Abelson, Johnson, et al., 2014; Shen and Morris, 2016). Other studies have shown the field making advances in theoretical models of advertising effects (Reynolds and Phillips, 2019, please see page 268), branding (Plassmann, Ramsøy, and Milosavljevic, 2012), and other aspects of the consumer journey. These findings demonstrate how the inclusion of neuroscience and psychology has benefited researchers' understanding of consumers and communication effects.
This branch of science initially used the term “neuromarketing” but soon also included “consumer neuroscience.” One originally intended distinction was between
neuromarketing as the commercial use of neuroscience tools and insights, and
consumer neuroscience as denoting the more basic research questions addressing how consumption choices are made (Ramsøy, 2015).
Many companies, however, now use the term “consumer neuroscience” to describe their commercial solutions, thus confounding the original intention with the terminological distinction.
From the very emergence of neuromarketing and consumer neuroscience, many articles have circled around a few central themes, such as
the possible added value of neuromarketing to market research,
basic research into the brain bases of consumer preference and choice, and
the ethical aspects of using neuroscience to boost advertising effects.
Topics that were then and still are covered only marginally include the translation of basic research into applicable practices and the validity of neuromarketing claims. At the same time that such critical aspects were not included in this emerging literature, the commercial side saw the first of two hype cycles in which vendor companies offered more than they could deliver, validate, or make actionable to their clients (Neff, 2016). This dissociation between the promise of neuroscience applied to advertising and marketing, on the one hand, and what actually was provided, on the other, led many to abandon the approach altogether.
Similarly, the scientific literature of consumer neuroscience has been described as, at best, highly fragmented and with a lack of user-friendly and high-quality primers (Harris, Ciorciari, and Gountas, 2018; Lee, Chamberlain, and Brandes, 2018). Despite great scientific and commercial developments, therefore, the discipline as a whole still faces multiple issues, including an uncertainty in terms of the methods and metrics, uncertainty in the interpretation of findings, and a lingering commercial distrust in the methods and metrics offered. This is not helped by offers and claims for which the scientific credibility is nonexistent and that would not have been accepted at scientific conferences or in peer-reviewed journals.
Today, one may encounter claims that (nondescribed) brain-imaging methods can predict price sensitivity, measure deep brain activity with mastoid-positioned electrodes on a pair of Google Glass glasses, or use a single frontal electrode to measure a vast array of cognitive and emotional states—all claims with no independent scientific publications as support. At the intersection between commercial interests and scientific legitimacy, one often may find frustration, in that intellectual property becomes a winning argument at the cost of transparency and independent scientific assessments of the legitimacy of a given claim.
Finally, the commercial presentation of neuroscience often does not reflect the deep and fundamental debates that are ongoing in basic neuroscience research. Although commercial uses of a more basic science always need to strike the right balance between being accurate and being informative, a few examples should serve as a reminder that the claims of neuromarketing sometimes can be more grand than the claims of the basic scientists working in non-commercial settings.
One example is the equation of a brain structure and mental operations such as emotions or memory. Commercial presentations sometimes refer to the brain as a collection of “centers,” which are specialized units for processing certain types of information. In this view, there is a center for attention, a center for reward, and a center for memory.
This view is highly reminiscent of the antiquated view of the brain as a collection of neatly organized compartments, labeled as phrenology (Diener, 2010). Such notions long have been abandoned in basic neuroscience research, to the benefit of a more complex and comprehensive model of the brain as a dynamic network of semi-specialized structures that both are involved in multiple processes and can take on new roles. These structures go under headings such as redundancy, pluripotentiality, and degeneracy (Friston and Price, 2003), concepts that also are found in modern genetics. Recent work suggests that this notion is becoming spread within advertising research (Kennedy and Northover, 2016), although time will tell whether it will replace the frequently seen notions of “center for X” in commercial presentations and media outlets alike.
It is notable that even more extreme and long-abandoned brain myths are long overdue for being taken off commercial sales materials, including
the right–left brain hemisphere idea of creative–logical differences (Corballis, 2014; Hines, 1987);
the assertion that people use only 10 percent of their brain capacity (Higbee and Clay, 1998; Jarrett, 2014); and
the triune brain model, whereby the brain is divided into three distinct brain systems (LeDoux, 1996; Pogliano, 2017).
Although many of these mythical narratives can serve as attention grabbers and selling points, they more likely will become obstacles to valid metrics, proper understanding of the subject matter, and interpretations that can drive insights to better practices.
Still, this article is not intended as an exhaustive list of the scientific literature on consumer neuroscience or neuromarketing, or an evaluation of the many different commercial claims in neuromarketing, although such a review is long overdue. The purpose, rather, is to take a perspective from someone who works both in academic research and in applied, commercial research to highlight three key areas that should be considered foundational for a science that is both basic and applied, as is the case with neuromarketing and consumer neuroscience. These key areas are
a distinction among types of research and how they should be employed;
conceptual clarifications, which will allow the field to align on a mutual nomenclature and understanding of even the most basic phenomena at stake; and
a framework for creating and validating neuroscience-based metrics.
THREE TYPES OF RESEARCH
Within many disciplines it is possible to distinguish between two or three types of research: basic, translational, and applied.
Basic research is the denotation of research that aims to improve theories and models of a given set of phenomena. In consumer neuroscience, several studies have used neuroscience methods to understand better the causal mechanisms of consumer psychology and choice. A few notable studies should be mentioned here.
First, in the previously mentioned hallmark neuroimaging study (McClure et al., 2004), it was reported that brand-induced changes in product taste were related to increased activation in the brain's memory system, such as the dorsolateral prefrontal cortex and hippo campus. This finding suggested that brands give value to products through an automatic memory-related associative spreading of thoughts. More recent studies have extended this finding by showing that brands and brand derivatives (e.g., brand mascots such as the Michelin Man) implicitly can lead to an associative spreading that is related to the number of brand associations (Hulme, Skov, Chadwick, Siebner, et al., 2014).
Second, in a study of neural correlates of purchasing behaviors (Knutson et al., 2007), participants were given money to choose whether to buy products while being scanned with functional magnetic resonance imaging (fMRI). The researchers found that during product viewing, responses in deep brain structures, such as the nucleus accumbens, were highly predictive of choice several seconds later. Additionally, engagement of the insula during price viewing was related to reduced likelihood of purchase. This study demonstrated that customer choice is related to early emotional product evaluation, which has a significant impact on subsequent conscious choice. More recent studies have provided a closer link to this by demonstrating how other types of early emotional responses, such as the frontal brain asymmetry response, are related to subsequent consumer choice (Ravaja, Somervuori, and Salminen, 2012) and the willingness to pay for products (Ramsøy, Skov, Christensen, and Stahlhut, 2018).
A third example comes from studies showing that brain responses to advertisements in a smaller sample of participants can predict market responses. In one study (Dmochowski et al., 2014), it was found that coherent responses in a study sample were related to both Nielsen ratings and Twitter trends when the same episodes and advertisements were aired. A similar study showed that two types of brain responses to movie trailers were predictive of either individual preference or market responses (Boksem and Smidts, 2015). Frontal brain activity in the beta-frequency band was related negatively to individual trailer preference, whereas frontal gamma was associated with U.S. box office sales for the related movies.
These studies demonstrate that coherent brain responses in a smaller subset of people can be highly predictive of market responses. One implication is that there may be general, archetypal responses that are highly reliable across a culture, so that smaller samples can be used to predict cultural responses reliably. It is important and interesting that these responses seem not to be related to subjective preference formation.
Translational research. This term mainly stems from the clinical sciences, which view this type of research as the exercise in which one can apply findings from basic science to enhance human health and well-being (Woolf, 2008). In this context, translational research in consumer neuroscience is the type of research in which one applies the insights from basic research into practical usage in advertising. A general example of this is how recent theoretical advances work in the understanding of the brain bases of perception, preference, and choice. One study employed these recent advances to put forward a model of how advertising and branding work, through four “powers”:
stopping power, which denotes an advertisement's ability to grab and maintain attention;
transmission power, which stands for the advertisement's ability to convey the crucial message and link to the brand;
persuasion power, which is the advertisement's ability to convey an emotionally compelling and persuasive offer; and
locking power, which denotes the advertisement's ability to ensure sustained advertising and brand memory (Shiv, 2011).
This model, overlapping with other models, currently is being put to use with specific neuroscience-based measures, so that stopping power is assessed with eye tracking, transmission power is measured with cognition and related brain responses, persuasion power is related to emotion-based brain responses (or other measures of emotions), and locking power is related to different types of memory test (e.g., implicit measures, free recall, recognition tests) that assess both advertising memory and brand memory.
A different approach is found in attention research, where recent studies have demonstrated that in Western countries (where people read from left to right), information that is placed at the bottom right corner receives less visual attention (Hernandez, Wang, Sheng, Kalliny, et al., 2017). This insight is a practical test based on other eye-tracking studies, and it can show direct practical application within advertising. Similarly, studies have demonstrated that boosting visual saliency can boost not only visual attention but even the likelihood that products will be chosen (Milosavljevic and Cerf, 2008). Such findings suggest that basic knowledge about drivers and mechanisms of visual attention can lead to more effective advertising, both in helping advertisers avoid making bad choices and in providing a causal understanding of not only how something happens but also why it happens.
Applied research. This approach is used to solve a practical problem. In the context of consumer neuroscience and neuromarketing, applied research is found in commercial studies that compare two or more versions of the same advertisement or explore how the same advertisement works on different platforms. In two neuro marketing studies by Canada Post (2016), it first was found that direct mail provided 21 percent increased comprehension and 20 percent higher emotional responses to advertisements than on digital platforms (i.e., e-mail and display). A second study looked at the dynamic combination of advertising on print and digital media; it found that print advertising produced 40 percent more brand recall when it followed digital advertising, compared with other media orders. Such studies demonstrate how neuroscience tools and methods can be used to provide tangible, quantifiable results that go beyond self-reported measures. Similar studies have been made comparing advertising on publishers' websites with social news feeds (Swant, 2017) and suggesting that the combination of neurometric assessment with self-reports demonstrates the superior predictive power of market effects (Nielsen Company, 2017).
Applied research is needed to support translational research in avoiding making logical errors. In particular, one challenge with translational research is the concept of reverse inference, which in logical terms is referred to as “affirming the consequent.” This is explained best through a famous example, the study described in the best-selling book Buyology (Lindström, 2010), which was also an early defining publication for this discipline.
As part of the author's many studies, he found that when smokers were scanned with fMRI and were shown a cigarette package with health warnings, they displayed a higher degree of activation of the ventral striatum (Lindström, 2010). This region previously had been shown to be responsive to the expectation of rewards (Haber and Knutson, 2010), which led the author to assert that smokers watching the warning signs were expecting a reward. The implicit assumption, however, is that there is a 1:1 relationship between brain activity in the ventral striatum and reward expectation.
Other studies, on the contrary, have demonstrated that the ventral striatum also can be engaged in the expectation of punishments (Levita, Hare, Voss, Glover, et al., 2009). Merely reading off the activity of the ventral striatum thus cannot be used as a direct measure of reward—smokers watching the warning signs just as well could be engaging the ventral striatum as a part of expecting a punishment or a bad experience (perhaps bad conscience). Several emotional structures show a so-called bivalent response pattern, which means that they can be engaged by positive and negative events (Gelskov, Henningsson, Madsen, Siebner, et al., 2015; Ramsøy and Skov, 2010). Just as the ventral striatum cannot be seen as a “reward” structure, therefore, the amygdala cannot be seen as a “fear” structure.
What is needed from applied research, therefore, is a way to ensure that basic research is translated, validated, and tested against the initial claims from basic research. If it is suggested that, for instance, a particular brain response is related to brand loyalty (Plassmann et al., 2012), then applied research should seek to validate this in independent studies, by testing both the accuracy of this claim and whether the response is specifically about brand loyalty. In doing this, applied research needs a solid methodological footing, something that the author will return to in the last section, on establishing a neurometric validation framework.
CONCEPTUAL CLARIFICATIONS
A conceptual confusion seems to be at stake in the intersection between the different sciences of economics, psychology, and neuroscience, especially in business. Words such as “emotions” can have different meanings depending on whether one takes an approach from psychology, where emotions often mean bodily responses, typically with a subconscious origin, or behavioral economics, where emotions may be seen as conscious feelings. There are existing definitions from cognitive neuroscience and neuroscience, lending support from psychology, that could be recommended. The primary aspect is that concepts need to be deconstructed further into smaller subcomponents, so that “attention” and “memory” are not treated as single entities and measures. The author strongly recommends that the language of cognitive psychology and cognitive neuroscience be employed, because it by far has the longest and most substantial research on the faculties of the mind.
Consider a couple of examples, and please note that these examples only are offered to serve as examples of the conceptual issues and are not to be seen as fully developed concepts to be applied in consumer neuroscience or neuromarketing. Start with attention. This is well described in a famous quote by the nineteenth-century psychologist William James:
“Everyone knows what attention is. It is taking possession of the mind, in clear and vivid form, of one out of what seems several simultaneously possible objects or trains of thought. Focalization, concentration of consciousness are of its essence. It implies a withdrawal from some things in order to deal effectively with others.” (James, 1890, p. 403)
In everyday language, one knows what attention is. Probing slightly deeper, however, one also can recognize that the term “attention” covers a host of different phenomena and mechanisms. In cognitive psychology and neuropsychology, it long has been known that attention needs to be divided into several forms.
Bottom-Up Attention
Bottom-up attention is fast, nonvolitional, and driven by the senses, which in turn respond to features of the object of attention. In visual attention, aspects such as contrast, density, angles, movement, and color composition all can operate as indices of visual salience and affect the likelihood that an item is seen. In many ways, bottom-up attention is always on.
For example, a study of attention to billboard advertisements during driving showed that attention was driven mainly by billboard position, but there was also a small but significant contribution of visual saliency (Wilson and Casper, 2016). In the domain of food choice, a forthcoming paper demonstrates how products' ability to capture attention can increase the likelihood of these products being chosen (Peschel, Orquin, and Mueller Loose, 2019).
Top-Down Attention
Best equated to concentration, this type of attention is related to a slow, effortful, and volitional mobilization of one's mind to an object of interest. By contrast to bottom-up attention, top-down attention needs time to be mobilized and, as such, is not always on. In a study of attention-related brain responses during product viewing, for example, it was found that watching branded luxury products with another person led to an increase in attention toward and desire for the product (Pozharliev, Verbeke, Van Strien, and Bagozzi, 2015).
Emotion-Driven Attention
When something triggers an emotional response, one immediate effect is that it boosts attention toward the item that triggered the event. From neuroscience, researchers know that an “emotional” brain region, such as the amygdala, sends more signals back to the visual cortex than it receives from the visual cortex (Morris, Friston, Büchel, Frith, et al., 2015), thus exerting emotional control over visual attention. This leads not only to a brain-based boost in activity but also to other behaviors, such as stronger pupil dilation and longer fixation to the item of interest.
In terms of advertising, the literature initially may seem divided slightly. Some studies have suggested that strong emotional responses are able to grab and sustain attention (Teixeira, Wedel, and Pieters, 2012), whereas other studies have suggested that emotional advertising can lead to lower advertising attention (Heath, Nairn, and Bottomley, 2009).
Cognition-Driven Attention
Items that are attended, even for a brief moment, can lead to automatic cognitive responses. When the eyes fall on a text, it is practically impossible not to read the text. This acquired automatic reading leads to slightly longer attention to the items that contain words, at least when they are seen in the first place (Bang and Wojdynski, 2016). One cognitive driver of attention is different aspects of visual complexity. A study comparing levels of advertising complexity reported that only advertisements that had higher visual complexity were associated with lower brand attention. Advertisements with a higher creative complexity, in contrast, were associated with attention that was more dedicated to relevant information, such as product, text, and brand (Pieters, Wedel, and Batra, 2010).
This subdivision demonstrates that attention is not a single thing or process but a phenomenon that covers multiple causes. As such, the term “attention” can be used only as a general reference. Research questions such as, “What are people paying attention to?” can be considered only in a more detailed context and should rather be asked as, “What are people paying attention to automatically, what needs more time, and what is driven by emotional and cognitive responses?” Although this may seem unnecessarily complex, such questions not only allow an answer to the general question of what people are looking at but also allow a better understanding of why people are looking there in the first place. Such causal understanding allows advertisers to have a better model of how they can change and boost attention to the items that they are most interested in having seen.
A second mongrel concept is, as noted earlier, the concept of emotions. One also almost naively knows what emotions and feelings are but is more challenged when pressed to come up with a crisp definition. In the literature, as in the English language, the words “emotions” and “feelings” often are used to denote the same mental response.
The Merriam-Webster Dictionary defines “emotion” as “a conscious mental reaction (such as anger or fear) subjectively experienced as strong feeling usually directed toward a specific object and typically accompanied by physiological and behavioral changes in the body.” Merriam-Webster offers multiple definitions for “feelings,” among them “an emotional state or reaction,” “the undifferentiated background of one's awareness considered apart from any identifiable sensation, perception, or thought,” and “the overall quality of one's awareness” and “conscious recognition.” In the psychology and neuroscience literature, conversely, there is a strong tendency to distinguish between emotions to mean neural and bodily responses to events, often occurring without subjective awareness, and feelings to address the conscious experience of (some) emotional responses (Ramsøy, 2015). This distinction typically is not made in marketing or in other disciplines, which often leads to unnecessary confusion.
Decades of research into psychology and neuroscience clearly have demonstrated that there is a distinction between conscious and subconscious responses that needs a better nomenclature. The aforementioned study (McClure et al., 2004), found that hedonic experience was related to the engagement of specific parts of the frontal lobe, such as the ventromedial prefrontal cortex. Similar studies of hedonic experience support this finding and have extended into different domains of hedonic experience, such as whether the reward experience is abstract, concrete, positive, or negative (Kringelbach and Radcliffe, 2005).
Decades of studies, conversely, have demonstrated neural and physiological responses that occur before or under conscious detection and that are related to valuation and decision making. Studies of negative emotions have shown that the fear response occurs much faster than conscious detection, leading to physiological (e.g., the heart starts racing) and bodily responses (e.g., the body starts to tense up) before people become aware of being scared (Bishop, 2008; Hariri, Mattay, Tessitore, Fera, et al., 2003; Morris et al., 2015). Such examples, also supported by the wiring of the brain (Bishop, 2008; Linke, Kirsch, King, Gass, et al., 2010; Vuilleumier, 2005), demonstrate that early valuation occurs before conscious experience and becomes embedded as part of the conscious experience of fear (e.g., one feels scared at the same time as one jumps in the chair).
An fMRI gambling study showed that participants were able to learn reward–punishment contingencies to subliminally presented visual cues before the gamble choice was to be made (Pessiglione, Petrovic, Daunizeau, Palminteri, et al., 2008). These previously shown cues were related contingently to subsequent either win or loss. Because the cues were presented subliminally, the participant had no way of detecting this contingency consciously and thereby of using such information consciously to steer his or her choice.
The subliminally presented cues led to the engagement of deep structures of the brain, such as the nucleus accumbens, without the participant noticing the stimulus or any kind of evaluation process. Still, this response led to changes in stimulus-based decisions, with the participants feeling that they were guessing the right choice. This study also demonstrates that a subconscious valuation process is at stake, driving behavior and preceding conscious experience. Such studies are supported further by other studies showing that conscious valuation occurs after the behavioral aspects of a choice have been made (Santos, Seixas, Brandão, and Moutinho, 2011).
Researchers now know that valuation processes, which eventually affect behavior and choice, are related to at least two types of mental responses—one conscious, and another subconscious. Capturing this in terms such as “emotions” and “feelings” to describe the subconscious and conscious nature of valuation, respectively, both is supported scientifically and is valuable for the discourse of an emerging multidisciplinary discipline.
A NEUROMETRIC VALIDATION FRAMEWORK
The validity of claims in commercialized neuroscience, in the form of neuromarketing practices, has been a critical issue since the earliest days of the field. There have been laudable efforts in comparing methods and establishing standards, such as the 2010 NeuroStandards 1.0 and 2.0 projects by the Advertising Research Foundation (ARF; www.thearf.org) (Varan, Lang, Barwise, Weber, et al., 2015) and by the NeuroMarketing Business and Science Association (NMSBA; http://www.nmsba.com/) as well as the European Society for Opinion in Marketing Research (ESOMAR; www.esomar.org) (Nosworthy, Marci, Sockut, de Balanzó et al, 2013).
In these publications and efforts, however, the main findings and recommendations have been that neuromarketing measures that are assumed to assess the same things (e.g., of emotional responses) diverge substantially when compared against each other and that more research is needed to validate each method and to demonstrate the predictive ability of these measures. One may claim, however, that a general shortcoming of these approaches has been to establish true standards for the emerging discipline, especially in ensuring that metrics are validated and documented, that any claim should be supported fully by the science on which it rests, and that independent research can be conducted to ensure that metrics are valid and reliable.
Scientific validity and reliability are not only a challenge in applied sciences. Rather, recent debates in basic science and perhaps psychology in particular (Open Science Collaboration, 2015) suggest that the validity and reliability of scientific claims can be challenged. The current replication crisis in science also hinges on the commercialization of science results, which may, in turn, lead to selectivity and boosting of otherwise smaller effects (Aguinis, Cascio, and Ramani, 2017). Neuromarketing and consumer neuroscience can make use of this valuable discourse. The issues pointed out in this article, on the lack of validation studies in consumer neuroscience and neuromarketing, are shared more broadly with science as a whole.
Instead of reinventing the wheel, therefore, researchers can adopt existing terms and approaches for ensuring a validated science of neuromarketing and consumer neuroscience. Some of these can be borrowed from the clinical sciences, as often seen in medical companies developing drugs. If a pharmacological company claims to have developed a drug that cures 80 percent of depressed patients, the company will not be trusted on the results of internally conducted studies. Rather, independent research groups will test the purported effects of the drug as well as adjacent topics, such as side effects. It is notable that the medical company does not need to disclose the exact contents of the drug to have its medication tested. In the same vein, neuromarketing companies need not disclose the exact calculation of their metric to have it tested by an independent agent. Companies thus can retain intellectual property while independent validation is performed.
When establishing new metrics or assessing existing metrics, researchers should choose from among several existing measures. Borrowing heavily from psychometrics and clinical neuropsychology, researchers can focus on some of the crucial measures: sensitivity, specificity, validity, and reliability. This article also considers what commercialization of metrics means for normalizing scores and establishing normative ranges and industry benchmarks.
Sensitivity
A metric needs to be able to respond to what it is supposed to measure. A metric claiming to assess emotional valence should show a response to positive and negative stimuli that is different from the response to neutral stimuli. Measures of cognitive load, or mental demand, also should be responsive to whether people are indeed mentally demanded. This feat can be established through well-controlled experiments, in which only the critical factor (e.g., emotional valence) is controlled and all other factors (e.g., cognitive demand) are kept constant. It is possible, alternatively, to ensure sensitivity through another well-established measure. This is often done if the existing measure is expensive or difficult to implement in the current setting.
Specificity
A metric not only should be sensitive to the response it is intended to measure. It also should not be sensitive to responses that are not considered its main target. A measure of emotional valence, for example, should not be sensitive to mental demand, unless high cognitive demand leads to negative emotions or other nonvalence properties. Tests with good experimental control over the intended measure and other effects should be conducted.
For the aforementioned interpretation—that activity in the nucleus accumbens was indicative of reward expectation (Lindström, 2010)—to be true, the nucleus accumbens should be responding only during reward expectation and not otherwise. As shown, however, a single study demonstrating that the same structure can be engaged by expected negative outcomes will suffice as a rejection of this specificity assumption (Levita et al., 2009). The same goes for assumptions that the amygdala is a fear structure (LeDoux and Ravalomanana, 2004); studies have shown that this structure is responsive to rewards (Fellows, 2004; Kühn, Strelow, and Gallinat, 2016; Murray, 2007) and novelty (Balderston, Schultz, and Helmstetter, 2011; Blackford, Buckholtz, Avery, and Zald, 2010). The use of sensitivity and specificity allows a good designation of the metric's positive predictive value and negative predictive value, which denote the metric's ability to correctly respond to the real effect and to say correctly when there is no response, respectively.
Validity
The concept of validity in metrics is an enormous area and too large for a substantial treatment in this article. In general, validity is concerned with whether a given claim can be supported. If a metric is supposed to measure emotional valence, can this be seen in a well-controlled study? Is the measure not only working inside a controlled lab environment but also predictive of responses and behaviors outside the lab? A few selected validity types include the following.
Construct Validity. Construct validity typically is broken down into two main sections: convergent validity, and discriminant validity. Convergent validity is explained best as whether two metrics that are intended to measure the same thing indeed are doing so. This is related to the term “sensitivity,” as described earlier, although this article's focus is on two metrics that should be compared. One should expect the two measures to show a high degree of correlation with each other and no significant differences with direct comparisons, such as through t test comparison.
Discriminant validity is the ability of two or more metrics that are not intended to show the same thing correctly not to show any relationship. This can be assessed through a combination of direct comparisons—a t test should show that they are different, but a correlation analysis should show no positive or negative correlation. Previous studies have demonstrated clearly that vendor definitions and measures of single terms, such as engagement and positive emotions, varied widely (Varan et al., 2015). There was basically no convergent validity between these vendors, moreover, even though the statistical approach used in the article did not employ recommended statistical approaches for evaluating construct validity (Campbell and Fiske, 1959; Duckworth and Kern, 2011).
External Validity. This type of validity is concerned with whether the result from a study can be extrapolated to a broader population. If an effect can be observed only in a highly controlled lab environment and is not related to any response in the population as a whole, it is said that the external validity is low. Researchers would be in trouble if a lab-based study showed strong emotional responses to advertisements with a particular feature but this was not the case in the broader population. The previously mentioned studies can serve as examples of good examples of relevant external validity, because these metrics demonstrate a clear and direct relation to independent market behaviors, such as Twitter behavior and Nielsen ratings (Dmochowski et al., 2014), music hits that have gone viral (Berns and Moore, 2012), and box office sales of movies (Boksem and Smidts, 2015).
Internal Validity. This type of validity is concerned more with how the test is actually done, in terms of when one is making a metric but also when one is employing the metric to test a given condition, such as advertisements. Internal validity is concerned with whether the test controls for important factors so that irrelevant aspects do not affect the results. In a gender-comparison test of advertisement responses, there would be low internal validity if men and women also were exposed to different testing conditions, such as one group sitting in a more noisy room or seeing advertisements in a systematically different order than the other group. Sometimes internal validity can be in conflict with external validity—the need for good experimental control sometimes can make the experiment less like normal situations. An important discussion between the parties should address the balancing act of making a study that is both well controlled and representative of a broader population.
Reliability
This concept is related to whether a given metric consistently is producing the same response and thereby the same conclusions. Reliability comes in four types.
Test–Retest Reliability. Test–retest reliability is a crucial score whereby a metric should lead to the same result and conclusion in two separate yet comparable samples. If a given metric is low on test–retest reliability, it is not trustworthy as a metric for usage. Related to this score is a split-half test, whereby larger samples can be assigned randomly (or pseudorandomly to balance groups) to two groups and for which a strong positive correlation on the metric should be expected. While this is done, it is also possible to test different group sizes to better understand at what group size a test shows a sufficiently high test–retest reliability.
To the author's knowledge—although the literature on reliability in neuroimaging methods is growing and supporting the methods and basic metrics (Angelidis, van der Does, Schakel, and Putman, 2016; Aron, Gluck, and Poldrack, 2006; Ettinger, Kumari, Crawford, Davis, et al., 2003; Farzin, Scaggs, Hervey, Berry-Kravis, et al., 2011; Mathewson, Hashemi, Sheng, Sekuler, et al., 2015)—there are at this point no known publications of test–retest reliability in consumer neuroscience research. This strongly suggests that, as a field, consumer neuroscience methods need a concerted effort to engage in projects that assess the reliability of their measures and metrics beyond what is found in the basic neuroscience methods.
Parallel Forms Reliability. A metric also can be confirmed by comparison with other measures of the same phenomenon. If a metric claims to assess cognitive demand, then a test can be conducted in which the score itself is compared with existing and established scores of cognitive demand. Such a measure could be compared with the engagement of the dorsolateral prefrontal cortex, as measured with fMRI (Rypma, Berger, and Esposito, 2002), and even with pupil dilation, as measured with high-resolution eye-tracking and pupillometry (Hyona, Tommola, and Alaja, 1995).
Internal Consistency Reliability. This type of reliability is concerned with whether the given metric is corresponding to other metrics. Imagine that there are five different measures of emotional valence, all of which use different methods to measure this, and researchers call them the “emotion test battery” and claim that they measure exactly the same thing. The hypothetical study therefore has one measure using electroencephalography brain scanning, one using fMRI, another using galvanic skin response, one using pupil dilation, and one using facial expressions.
In testing internal consistency reliability, one would expect that the correlation for each pairwise comparison of these measures should be highly positive. If it is not, one cannot claim that these measures are assessing exactly the same response. The same argument applies if one finds, for instance, four different brain responses of emotional valence just by using one brain-scanning method—if the claim that these are all the same responses can be tested with internal consistency reliability. Although one would not expect independent commercial vendors to come up with multiple measures of the same variable, this approach still would be very valuable in ensuring that claims of the same measure between vendors do, in fact, measure the same thing.
Interrater Reliability. This type of reliability should be assessed mostly within an experiment, to make sure that results and conclusions do not depend on the person running the test. As an extension, one can say that there should be no effect from which technicians are doing the test, who runs the data preprocessing, or who runs the analyses. Interrater reliability testing can be done through means such as intraclass-correlation analyses.
These four steps are to be considered crucial in ensuring a robust and valid science and commercialization of neuromarketing and consumer neuroscience. A few other notable areas that may be more or less specific to this industry should be addressed. First, recent advances in cognitive neuroscience imply that one can consider consistency in a smaller sample to be a predictor of market effects. Studies have shown that responses within a smaller sample can be highly predictive of population responses.
An early smaller study showed that neural responses in a small sample were predictive of later music hits (Berns and Moore, 2012). The aforementioned study (Boksem and Smidts, 2015) found that brain responses to movie trailers in a small sample were relatively predictive of box office sales. Some studies have suggested that these responses are related to the degree to which the smaller group shows coherent responses, as demonstrated by one study (Dmochowski et al., 2014) that showed that coherent responses in a sample of 16 participants were predictive of both Nielsen ratings and Twitter feed responses, which prior research had also shown to be correlated (see Spangler, 2018).
Together, these studies suggest that a neuroscience-based metric also can be assessed as to whether it is predictive of behaviors outside of the study itself. Although such studies demonstrate that consumer neuroscience studies with smaller sample sizes can produce predictive results and significant insights, more often one sees that larger sample sizes are recommended. A sample size of 30 typically is a recommended minimum for a coherent sample, and this number should be multiplied by the number of groups that one wants to study, on the basis of aspects such as age, income, geography, education, and ethnicity. Such studies often reach sample sizes of 120, 180, or other multipliers of 30.
Commercial solutions also should strive to establish metrics with additional properties. Normalization and benchmarking are probably among the most crucial elements. In making use of commercial metrics, buyers of such services are interested in understanding both how well an advertisement performs in absolute terms and how it stacks up against some industry benchmark. Two steps should be taken to obtain this.
First, companies offering neurometric solutions should normalize their scores, because this potentially allows a normative read-out of whether a score in itself is positive, neutral, or negative. This is crucial, because the difference in emotional performance of two or more advertisements is tremendously more relevant if one, at the same time, can say that the scores are negative or know whether the highest performing is a positive emotion and the rest are negative emotions. This additional discriminatory ability allows a better understanding of the results.
Second, the establishment of industry benchmarks for advertisements also can be highly relevant—perhaps advertisements for fast-moving consumer goods, in general, are producing much lower scores on cognitive demand than insurance advertisements. Similar to this, benchmarks for types of products (e.g., according to the divisions by Rossiter, Percy, and Donovan, 1991) and for advertising platform types (e.g., mobile, print, desktop) should be considered. These and other, similar efforts to standardize metrics highly likely not only will increase the scientific validity and reliability of these new metrics but also will foster better usage and more trust in them.
CONCLUSION
For many years, the promise of neuromarketing and consumer neuroscience has been touted as an unparalleled access to consumers' subconscious minds. Despite these assertions, however, there instead has been a fragmentation of academic research, a generally subpar publication level, industrial overpromising and under-delivering, and a generally nonexistent validation of the metrics that are offered. The current article suggests three necessary first steps to ensuring that neuromarketing and consumer neuroscience can become a valid, coherent field of conduct.
First, making a distinction among basic, translational, and applied research will allow researchers better to navigate the different types of insight and how they can be used for inspiration and for application.
Second, researchers need to clear the conceptual confusion that this field is littered with. A full nomenclature of consumer neuroscience and neuromarketing is needed, and proper definitions should be made and followed by the industry.
Finally, researchers need to have a rigorous means of ensuring the validity of neurometric approaches and measures. Independent research would be the gold standard, although other standards are also acceptable, such as publication in esteemed peer-reviewed science journals. The Advertising Research Foundation is continuing efforts to raise standards and quality in all research, including neuroscience.
These three steps in no way are to be seen as sufficient to create a valid neuromarketing and consumer neuroscience discipline. They may serve as a foundation and even a roadmap, but more steps should be considered. There should be an organizational leverage, through organizations such as the ARF, the NMSBA, ESOMAR, and related organizations. Further publication efforts should be made, through special issues, such as the one this article is embedded in; through traditional journal publications; and through independent journal and publication initiatives. Researchers and practitioners alike should strive toward lifting the quality of the papers accepted to the highest available publication standards. Ensuring the validity of neuroscience measures should be a regular topic at conference sessions.
The current state of neuromarketing and consumer neuroscience is far from where it is intended and has been promised to be. A recent business report suggested that the total market interest in consumer neuroscience, neuromarketing, and nonconscious assessment methods is about 80 percent and thus has “reached a tipping point and is now being embraced by the majority of the industry using one or more of the [non-conscious] key methods available” “(GRIT, 2017, p. 23). This suggests that consumer neuro science and neuromarketing are at the cusp of going big.
If the issues mentioned in this article persist, however, one can assume that such development will stall and possibly reverse, because of an increase in confusion and distrust. To avoid this, this author contends that collaborative efforts are needed to reduce conceptual confusion and increase the validity of neuromarketing and consumer neuroscience. The upside of this will be easier access for newcomers to the field, improved translation of insights from basic research, a better clarification of what can be delivered with neuroscience-based metrics, and a higher degree of transparency for buyers and vendors of this technology.
ABOUT THE AUTHOR
Thomas Zoëga Ramsøy is the founder of Neurons, Inc., a consumer neuroscience company, through which he consults leading corporations on how to apply neuroscience to better understand consumer psychology and behavior. In 2007, he established the Center for Decision Neuroscience, a collaboration between Copenhagen Business School and Copenhagen University Hospital. Ramsøy is also a member of the Faculty of Neuroscience at Singularity University, Santa Clara, CA.
- Received July 30, 2018.
- Received (in revised form) December 28, 2018.
- Accepted May 27, 2018.
- Copyright© 2019 ARF. All rights reserved.