Validity and Test Development

Validity and Test DevelopmentOrder DescriptionA review paper describes many published research studies on a particular topic. For this project, you will be critiquing an article that is a review paper (Mitchell, 2012) related to external validity of psychological research. External validity refers to the generalizability of results of a study and how well those results will hold up outside a laboratory setting in the real world.Complete a three- to four-page paper that summarizes the main points and findings. Describe it in a scholarly (using critical thinking) manner, but also in a way that someone from outside the area can understand. Be sure to include the reference of the paper in your submission.Mitchell, G. (2012). Revisiting truth or triviality: The external validity of research in the psychological laboratory.Perspectives on Psychological Science, 7(2), 109-11Perspectives on Psychological Science7(2) 109–117© The Author(s) 2012Reprints and permission:sagepub.com/journalsPermissions.navDOI: 10.1177/1745691611432343http://pps.sagepub.comA widely held assumption within the social sciences is that therigor of experimental research is purchased at the price of generalizabilityof results (Black, 1955; Locke, 1986; Wilson,Aronson, & Carlsmith, 2010). This trade-off plays out mostdirectly in those fields that use laboratory experiments to studyhow humans navigate complex social environments, such as insocial and industrial–organizational (I-O) psychology. In thesefields, highly controlled experiments produce internally validfindings with suspect external validity (e.g., Flowe, Finklea, &Ebbesen, 2009; Greenwood, 2004; Harré & Secord, 1972).Researchers typically respond to external validity suspicionsin one of three ways: by arguing that findings from evenhighly artificial laboratory studies advance theories thatexplain behavior outside the laboratory (e.g., Mook, 1983;Wilson et al., 2010), by conducting field studies that demonstratethat causal relations observed in the laboratory hold inthe field (e.g., Behrman & Davey, 2001), or by conducting ameta-analysis of laboratory and field studies to assess theimpact of research setting on results within a particular area ofresearch (e.g., Avolio, Reichard, Hannah, Walumbwa, & Chan,2009). Anderson, Lindsay, and Bushman (1999) offered anovel and broad response to the external validity question bycomparing 38 pairs of effect sizes from laboratory and fieldstudies of various psychological phenomena as compiled in 21meta-analyses (i.e., each meta-analysis compared the meaneffect size found in the laboratory to that found in the field forthe particular phenomenon under investigation).1 Andersonand colleagues found a high correlation between these metaanalyzedlaboratory and field effects (r = .73), leading them toconclude that “the psychological laboratory is doing quite wellin terms of external validity; it has been discovering truth, nottriviality: (Anderson et al., 1999, p. 8).Anderson et al. (1999) has been widely cited (as ofthis writing, 150 times in PsycINFO), often for the propositionthat psychological laboratory research in general possessesexternal validity and, thus, the new laboratory finding beingreported is likely to generalize (e.g., Ellis, Humphrey, Conlon,& Tinsley, 2006; von Wittich & Antonakis, 2011; West, Patera,& Carsten, 2009). This proposition, and its use to allay externalvalidity concerns about new laboratory findings, assumesthe external validity of Anderson and colleagues’ conclusionabout the external validity of laboratory studies.However, Anderson and colleagues’ conclusion was basedon a fairly small number of paired effect sizes that showconsiderable variation despite the strong overall correlationbetween laboratory and field results. For instance, their sixcomparisons of laboratory and field effect sizes fromCorresponding Author:Gregory Mitchell, School of Law, University of Virginia, Charlottesville, VA22903.E-mail: greg_mitchell@virginia.eduRevisiting Truth or Triviality: The ExternalValidity of Research in the PsychologicalLaboratoryGregory MitchellUniversity of VirginiaAbstractAnderson, Lindsay, and Bushman (1999) compared effect sizes from laboratory and field studies of 38 research topicscompiled in 21 meta-analyses and concluded that psychological laboratories produced externally valid results. A replicationand extension of Anderson et al. (1999) using 217 lab-field comparisons from 82 meta-analyses found that the externalvalidity of laboratory research differed considerably by psychological subfield, research topic, and effect size. Laboratoryresults from industrial–organizational psychology most reliably predicted field results, effects found in social psychologylaboratories most frequently changed signs in the field (from positive to negative or vice versa), and large laboratory effectswere more reliably replicated in the field than medium and small laboratory effects.Keywordsexternal validity, generalizability, meta-analysis, effect sizeDownloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012110 Mitchellmeta-analyses of gender differences in behavior reachedinconsistent results (r = -.03). Furthermore, their correlationalresult indicated the direction and magnitude of therelationship, but not the magnitude of differences in effectsizes between the laboratory and the field (i.e., the rankordering of effects could be quite consistent despite large differencesin effect size between the lab and field). Because thesmall sample examined by Anderson and his colleagues limitedthe analyses that could be performed and the conclusionsthat could be drawn from their study, a replication and extensionof Anderson et al. (1999) was undertaken to examine theexternal validity of psychological laboratory research after10 years using a larger database of effect sizes covering awider range of psychological phenomena. This larger dataset permitted a more detailed examination of external validityby psychological subfield and area of research.2The goal of my study, therefore, was to replicate Andersonet al.’s (1999) study using a larger data set to determine whethertheir broad positive conclusion about the external validity oflaboratory research remains defensible or whether there areidentifiable patterns of external validity variation. This study,like Anderson and colleagues’ study, is focused on whether laboratoryand field results agree and thus employs a coarse distinctionbetween research settings—comparing results obtainedunder laboratory conditions to those found in the field or undermore mundanely realistic conditions. To the extent that variationbetween the laboratory and field is observed, a moredetailed inquiry is called for because many different designvariables could account for the variation: differences in participantcharacteristics between lab and field studies and across cultures(Henrich, Heine, & Norenzayan, 2010; Henry, 2009);differences in guiding design principles such as the use of“mundane realism” versus “psychological realism” (Aronson,Wilson, & Akert, 1994, p. 58) versus representative sampling ofstimuli to develop participant tasks, environments, and measures(Dhami, Hertwig, & Hoffrage, 2004); or differences in thetiming of the research that may be related to larger societalor historical changes (Cook, 2001). Also, there may be fundamentaldifferences in the generalizability of the processes orphenomena studied across psychological subfields: Some phenomenaat some levels of analysis may not vary with the characteristicsof the individual and situation, some phenomena maybe unique to particular laboratory designs using particular typesof participants (i.e., some phenomena may be created in thelaboratory rather than be brought into the laboratory for study),and some phenomena may generalize across a narrow range ofpersons and situations.In short, examining the consistency of meta-analytic estimatesof effects across research settings provides a good firsttest of the generalizability of laboratory results, but the limitsof this approach must be acknowledged. The inferences to bedrawn from positive results are limited by the diversity of theparticipant and situation samples found in the synthesizedstudies, and negative results call for deeper inquiry into thecauses of external invalidity. The meta-analytic data examinedhere cover a wide range of psychological topics, research settings,and participants. Therefore, if results based on this dataset approximate those found by Anderson et al. (1999), thenwe should have greater confidence in their conclusion thatpsychological laboratories reveal truths rather than trivialities.If results based on this larger data set differ, then the task willbe to understand why some laboratory results generalize whileothers do not.Meta-Analytic Data on Effects Studied inthe Laboratory and the FieldAn effort was made to identify all meta-analyses that synthesizedresearch on some aspect of human psychology conductedin a laboratory setting and in an alternative researchsetting (see the Appendix for details on the literature search).In keeping with the approach taken by Anderson et al. (1999),comparisons were not limited strictly to laboratory versusfield research on the same topic but also included comparisonsof results found under less and more mundanely realistic conditions(e.g., the use of experimentally created versus realgroups in the study of group behavior and the use of hypotheticalversus real transgressions in the study of forgiveness). Areview of over 1,100 papers located in the literature searchidentified 82 meta-analyses reporting effect sizes for at leasttwo research settings, for a total of 217 comparisons of resultsfound under laboratory, or less realistic, conditions to resultsfound under field, or more realistic, conditions (including twodissertations that contributed six lab–field comparisons).3 Thefull data set is provided in an online supplement.Most meta-analyses reported effect sizes in terms of r.When an effect size was reported in a unit other than r, theeffect size was converted to r using standard conversion formulas(Cohen, 1988; Rosenthal, 1994). When both weightedand unweighted effect sizes were reported, the weighted effectsizes were used in the analyses reported here.Four of the meta-analyses compared two types of laboratorystudies with one or more types of field studies, and 17 ofthe meta-analyses compared two or more types of field studieswith a single type of laboratory study (see online supplementfor details). The results discussed below focus on thecomparison of laboratory effects with true field studies orwith conditions that differ most from the laboratory conditionsbecause these research settings possess the least “proximalsimilarity” (Cook, 1990) to the laboratory and thus arelikely to raise the greatest generalizability concerns (e.g.,McKay & Schare’s, 1999, comparison of results found in atraditional laboratory to those found in the field serves as thefocal comparison, rather than their comparison of a traditionallab to a “bar lab”).4In order to examine possible variation in generalizabilityacross research domains, I classified the meta-analytic data in anumber of ways: (a) by PsycINFO group codes that are used toclassify studies by primary subject matter (for more informationon this classification system, see http://www.apa.org/pubs/Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012External Validity of Laboratory Research 111databases/training/class-codes.aspx), (b) by psychological subfieldas classified by the present author before knowing thePsycINFO classifications of the meta-analyses, (c) by psychologicalsubfield of meta-analysis first author as determined bythe affiliation disclosed in the meta-analysis or from informationavailable on the Web if the first author’s subfield affiliationwas not apparent from the meta-analysis, and (d) by researchtopics according to PsycINFO subgroup codes and classificationby the present author. Results using the PsycINFO classificationsare emphasized because those classifications weremade by independent coders, show consistency over time, andcover more of the data than some alternative classifications.5Consistency and Variation in Effects in theLaboratory and FieldAggregate resultsA plot of the data reveals considerable correspondence inpaired laboratory and field effects (see Fig. 1). When onepotential outlier is removed, the overall correlation betweenlab and field effects in this expanded sample approximates thatfound in Anderson et al.’s (1999) sample: r = .71 versus r = .73reported by Anderson and colleagues (see Table 1 for the fullcorrelation matrix).6As a measure of the reliability of the direction of effectsfound in the laboratory, the number of times in which a laboratoryeffect changed its sign in the field (from positive to negativeor vice versa) was counted: overall, 30 of 215 laboratoryeffects changed signs (14%).7 Thus, a nontrivial number ofeffects observed in the laboratory produced opposite effects inthe field. With respect to the relative magnitude of effects, themean difference between laboratory and field effects was only.01, but this difference had a standard deviation of .18 on ascale in which the average laboratory and field effects wereboth r = .17.Results by subfieldIt is possible that the dispersion seen in Figure 1 is randomacross research topics and domains, or it may be that theaggregate results mask systematic differences in lab–fieldcorrespondence. To examine possible differences in lab–fieldcorrespondence across traditional divisions of psychologicalinquiry, the paired effects were divided by two alternative subfieldclassifications: first by the subfield that PsycINFO classifiedeach meta-analysis into, and second by the subfieldthat I classified each lab–field comparison into (see Table 2).Subfield assignments and results converged under the twoapproaches to classification, indicating that there was meaningand consistency to the partitioning of the research by psychologicalsubfield.The two subfields with the greatest number of pairedeffects, I-O psychology and social psychology, differed considerablyin the degree of correspondence between the lab andthe field. Laboratory and field effects from I-O psychologycorrelate very highly (r = .89, n = 72, 95% CI [.83, .93]),whereas laboratory and field effects from social psychologyshow a lower correlation (r = .53, n = 80, 95% CI [.35, .67]).8A similar result holds if we partition effects by the subfieldaffiliation of the first author of each meta-analysis: The1.0000.5000.000–0.500–1.000–0.400 –0.200 0.000 0.200 0.400 0.600 0.800 1.000LabFieldy = .639x + .062Fig. 1. Scatter plot of paired lab and field effects across all meta-analyses.Table 1. Correlation of Lab-Field EffectsLab Lab2 Field Field2 Field3Lab 2 (n = 216) .99 [.99, .99] —Field (n = 216) .71 [.64, .77] .70 [.63, .76] —Field 2 (n = 42) .68 [.48, .82] .69 [.49, .82] .57 [.32, .74] —Field 3 (n = 21) .49 [.07, .76] .49 [.07, .76] .63 [.27, .83] .43 [.00, .73] —Note: “Lab” represents collection of primary lab results; “Lab2” substitutes second lab result for primary labresult from four meta-analyses that examined two types of lab studies. “Field” represents collection of primaryfield results; “Field2” and “Field3” represent field studies from meta-analyses examining two or three differenttypes of field studies. Sample sizes reflect number of paired effect sizes. Brackets present 95% confidenceintervals. Results exclude the possible outlier paired-effects from Mullen et al. (1991).Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012112 Mitchelllab–field correlation from meta-analyses conducted by I-Oauthors is .82 (n = 107, 95% CI [.75, .87]), whereas the lab–field correlation from meta-analyses conducted by social psychologyauthors is .53 (n = 76, 95% CI [.35, .67]).9A plot of paired lab and field effects for I-O psychologyand social psychology illustrates the greater convergence oflab and field results within I-O psychology: The slope of thefitted line is steeper for I-O psychology, with I-O lab effectsthus being better predictors of field effects (see Fig. 2).10Also, the paired effects from I-O psychology differed less intheir magnitude, as the distribution around zero difference issteeper for I-O psychology than for social psychology(KurtosisI-O = 2.318 vs. KurtosisSocial = -.03). For comparisonpurposes, a boxplot of the differences in effect sizebetween the laboratory and field across all subfields is providedin Figure 3.Furthermore, most of the 30 laboratory effects that changedsigns in the field came from social psychology. Twenty-one of80 (26.3%) laboratory effects from social psychology changedsigns between research settings, but only 2 of 71 (2.8%) laboratoryeffects from I-O psychology changed signs; as an additionalreference point, only 1 of 22 (.05%) laboratory effectsfrom personality psychology changed signs, ?2(2) = 19.12,p < .001.11Table 2. Correlation of Lab-Field Effects by Subfield ClassificationsPsycINFO classification (n) r r Author’s classification (n)Social (80) .53 .60 Social (79)I-O (72) .89 .82 I-O (98)Personality (22) .83 .84 Clinical (19)Consumer (7) .59 .59 Marketing (7)Education (7) .71 .87 Education (5)Developmental (3) -.82 -.88 Developmental (6)Psychometrics/Statistics/Methods (19) .61Human Experimental (5) .61Note: Sample sizes reflect number of paired effect sizes. The PsycINFO classification excludes onepair of effects classified as “Environmental Psychology,” and the author classification excludes twopairs of effects classified as “Health Psychology.” Results exclude possible outlier effects from Mullenet al. (1991).–.40 –.20 .00 .20 .40 .60 .80 1.00–.40–.20.00.20.40.60.801.00–.40 –.20 .00 .20 .40 .60 .80 1.00FieldLaby = .522x + .087 y = .819x + .02Social I-OFig. 2. Scatter plot of paired lab and field effects from social and I-O psychology.Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012External Validity of Laboratory Research 113Results by effect sizeA partial explanation for the relatively weaker external validityof social psychology laboratory results appears to be a disproportionatefocus on small effect sizes. Using Cohen’s rule ofthumb to categorize laboratory effect sizes, meta-analyseswithin I-O psychology examined 29 small, 22 medium, and 21large laboratory effects, and meta-analyses within social psychologyexamined 53 small, 20 medium, and 8 large laboratoryeffects.12 Small laboratory effects studied by social psychologistsvaried more in the field than medium effects from socialpsychology labs: rsmall effects = .30 (n = 53, 95% CI [.03, .53]) vs.rmedium effects = .57 (n = 20, 95% CI [.17, .81]).13 Small laboratoryeffects from I-O psychology likewise varied more in the fieldthan larger effects: rsmall effects = .53 (n = 29, 95% CI [.20, .75]) vs.rmedium effects = .84 (n = 22, 95% CI [.65, .93]) vs. rlarge effects = .90(n = 21, 95% CI [.77, .96]). This trend held across all studies,rsmall effects = .47 (n =112, 95% CI [.31, .60]) vs. rmedium effects = .56(n = 66, 95% CI [.37, .71]) vs. rlarge effects = .83 (n = 38, 95% CI[.70, .91]), and small laboratory effects more frequently changedsigns in the field than medium and large effects (22.7% vs. 6.1%vs. 2.6%, respectively).Results by research topicLab–field correlations for specific areas of research (e.g.,aggression studies, leadership studies) with at least ninemeta-analytic comparisons of laboratory and field effects wereexamined. These results should be interpreted cautiouslybecause they are more sensitive to extreme values given thesmaller number of comparisons, but these results do convergewith the subfield results because topics of primary interest toI-O psychologists showed the highest correlations and topicsof primary interest to social psychologists showed greatervariation (see Table 3).However, these results also illustrate the hazard of assumingthat aggregate correlations of lab–field effects are representativeof the external validity of all laboratory research within asubfield. There were large differences in the relative magnitudeof laboratory and field results across research topics (see thestandard deviations in mean effect size differences in Table 3)and in the magnitude of the correlations. For instance, althoughresults from I-O laboratories tended to be good predictors offield results, I-O laboratory studies of performance evaluationswere less predictive than I-O laboratory studies of other topics,and leadership studies within I-O psychology were less predictivethan leadership studies within social psychology (r = .63for 10 paired laboratory and field effects from leadership metaanalysesconducted by I-O-affiliated authors vs. r = .93 for 7paired effects from leadership meta-analyses conducted bysocial-affiliated authors). Laboratory studies of gender differencesfared particularly poorly compared with other types ofsocial psychological research, which may be due to the smalleffect sizes found in these studies.141.00.50–.50–.100.00Difference (Lab Effect Minus Field Effect)SocialI-OPersonalityConsumerPsychometrics Stats & MethodsDevelopmentalEnvironmentalHuman ExperimentalEducationFig. 3. Boxplot of differences between lab and field effect sizes by subfield.Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012114 MitchellDiscussionThis expanded comparison of laboratory and field effects replicatedAnderson and colleagues’ (1999) basic result, but it alsoraises questions about treating the external validity of psychologicallaboratory research as an undifferentiated whole: In theaggregate, laboratory and field effect sizes tended to covary (r =.71 vs. Anderson et al.’s r =.73, if we exclude a potential outlierfrom social psychology), but this result depended on theextremely high correlation of laboratory and field effects fromI-O psychology. If we exclude I-O effects, the aggregate correlationdrops considerably (to r = .55).External validity differed across psychological subfieldsand across research topics within each subfield, and all subfieldsshowed considerable variation in the relative size ofeffects found in the laboratory versus the field. External validityalso differed by effect size: Small laboratory effects wereless likely to replicate in the field than larger effects. This latterresult empirically demonstrates the importance of consideringeffect size when planning a field test, not only todetermine sample size but also to determine the sensitivitywith which measurements should be made and the type ofresearch design needed to isolate the influence of the variablesof interest (Cohen, 1988).Despite the variations in generalizability observed, it istempting to invoke Cohen’s effect size rule of thumb and concludethat all of psychology is performing well in terms ofexternal validity because all subfields showed large lab–fieldcorrelations, but doing so would ignore Cohen’s (1988) injunctionthat “the size of an effect can only be appraised in the contextof the substantive issues involved” (p. 534). For aninvestigator considering whether to pursue a new line ofresearch building on prior work, even small lab–field correlationsmay be sufficient to proceed. For an organization or governmentagency considering whether to implement a programbased on psychological research, even large lab–field correlationsmay be insufficient, particularly if the costs of implementationare high relative to the likely benefits. To determinelikely benefits, the constancy of effect direction and the relativemagnitude of the effect in the lab versus that found in the fieldshould be considered, but aggregate correlations between laband field effects do not provide this information.Reliance on a subfield’s “external validity effect size” couldbe particularly misleading for results from social psychology,where more than 20% of the laboratory effects changed signsbetween research settings. Shadish, Cook, and Campbell(2002) emphasize constancy of causal direction over constancyof effect size in their discussion of external validity on groundsthat constancy of relations among variables is more importantto theory development and the success of applications. Thenumber of sign reversals observed across domains should because for concern among those seeking to extend any psychologicalresult to a new setting before any cross-validation workhas occurred.Whether these sign reversals should be cause for concern inany particular case depends on the goals of the research. Mook(1983) correctly noted that some studies require external invalidityto test a prediction or determine what is possible. In suchstudies, what matters is whether the study helps advance atheory, not whether a specific finding will generalize. ButMook (1983) also noted that, “[u]ltimately, what makesresearch findings of interest is that they help us understandeveryday life” (p. 386). Psychologists often examine minimal,manageable interventions to open a window on psychologicalprocesses and causal relations among variables (Prentice &Miller, 1992), and that approach is justifiable if it ultimatelyproduces theories that explain and predict behavior outside thelaboratory. Small effects found in the lab can be important, andlarge effects found in the lab can be unimportant (Cortina &Landis, 2009); whichever is the case must eventually be establishedin the field.ConclusionMy results qualify the conclusion reached by Anderson et al.(1999): Many psychological results found in the laboratorycan be replicated in the field, but the effects often differ greatlyin their size and less often (though still with disappointing frequency)differ in their directions. The pattern of results suggeststhat there are systematic differences in the reliability oflaboratory results across subfields, research topics, and effectsizes, but the reliability of these patterns depends on the representativenessof the laboratory studies synthesized in themeta-analyses that provided the data for this study.Also, it is possible that alternative divisions of the datawould yield different patterns. The data divisions that wereTable 3. Correlation of Lab-Field Effects and Standard Deviationsof Effect Size Differences by Research Topic ClassificationsClassification (n) r SDPsycINFO classificationGroup Processes & InterpersonalProcesses (33).58 .18Social Perception & Cognition (9) .53 .17Personality Traits & Processes (20) .83 .13Behavior Disorders & Antisocial Behavior[aggression studies] (14).68 .14Personnel Management & Selection &Training (14).92 .12Personnel Evaluation & Job Performance (21) .74 .16Organizational Behavior (18) .97 .09Author classificationAggression-focused comparisons (17) .63 .13Gender-focused comparisons (22) .28 .13Group-focused comparisons (43) .63 .19Leader-focused comparisons (18) .69 .21Note: Sample sizes reflect number of paired effect sizes. Results excludepossible outlier effects from Mullen et al. (1991).Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012External Validity of Laboratory Research 115chosen reflect two ideas: (a) different subfields develop andteach unique research design customs and norms (see, e.g.,Rozin, 2001), and (b) different research topics require differentcompromises to enable their study in the laboratory (e.g.,prejudice and stereotyping research in the laboratory mustoften use simulated work situations, whereas research into theaccuracy of impressions based on thin slices of behavior maybe well-suited for laboratory study;15 Secord, 1982). Determiningthe mix of factors responsible for the observed variationsin external validity will require further research.A good starting place for such further inquiry is I-O psychology.Results from I-O labs varied in their generalizability,but the high degree of convergence in I-O effects acrossresearch settings indicates that something about this subfield’spractices or research topics tends to produce externally validlaboratory research. It may be that I-O psychologist’s traditionalskepticism of laboratory studies (Stone-Romero, 2002)is adaptive: In a culture that trusts well-done laboratory studies,internal validity challenges will likely command theresearcher’s (and journal editor’s) attention, whereas in a culturethat distrusts even well-done laboratory studies, externalvalidity challenges may grab much more of the researcher’s(and editor’s) attention.16 It may be that the topics I-O psychologistsstudy are more amenable to laboratory study thanthose studied by social psychologists, but that seems unlikelygiven the focus in both subfields on behavior in complexsocial settings. It may be that I-O psychologists, as primarilyapplied researchers, benefit from the trial and error of basicresearchers in other subfields and are able to devote theirattention to robust results. If the explanations all reduce downto the applied focus of I-O psychology, then the external andinternal validity of research within the basic research subfieldscould benefit from greater attention to applications, for replicationin the field reduces the chances that relations observedin the laboratory were spurious (Anderson et al., 1999).Anderson et al. (1999) presented a positive message aboutthe generalizability of psychological laboratory research, butthe message here is mixed. We should recognize those domainsof research that produce externally valid research, and weshould learn from those domains to improve the generalizabilityof laboratory research in other domains. Applied lessonsare often drawn from laboratory research before any crossvalidationwork has occurred, yet many small effects from thelaboratory will turn out to be unreliable, and a surprising numberof laboratory findings may turn out to be affirmativelymisleading about the nature of relations among variables outsidethe laboratory.AppendixLiterature SearchSeveral exhaustive searches were employed in an effort tolocate all meta-analyses of psychological studies in whichmean effect sizes in the laboratory and field were computed.First, the EBSCO social science database (which included allpsychology journals indexed in the PsycINFO database aswell as business, communications, education, health, politicalscience, and sociology journals) and the SAGE psychologydatabase were searched for items with abstracts containingone or more terms from each of the following three sets ofterms: (a) meta-analysis, meta analysis, research synthesis,systematic review, systematic analysis, integrative review, orquantitative review; (b) lab, laboratory, artificial, experiment,simulation, or simulated; and (c) field, quasi-experiment,quasi-experimental, real, realistic, real world, or naturalistic.This search was repeated in the PsycINFO database but withthe terms allowed to appear in any search field. AnotherPsycINFO search was conducted for any term from the firstlist of terms above in the keywords or methodology field andthe term research setting in any field. These searches producedover 1,100 hits, and the abstracts of all hits werereviewed to eliminate obviously inapplicable materials (e.g.,articles focused on research methodology that did not reportmeta-analytic findings and single studies making reference tometa-analyses of laboratory and field studies) before the textsof hits were examined in detail.To ensure that the search terms employed in the searchesdescribed above did not exclude relevant articles, an additionalsearch was performed in the following journals for anyarticles containing the term meta-analysis: Academy of ManagementJournal, Academy of Management Review, AmericanPsychologist, Journal of Experimental Social Psychology,Personnel Psychology, Psychological Bulletin, Journal ofApplied Psychology, Journal of Social and Personality Psychology,Personality and Social Psychology Bulletin, and anyadditional journal within the EBSCO database with applied,cognition, cognitive psychology, or decision in its publicationname.17 Finally, the reference sections of Richard, Bond, andStokes-Zoota (2003) and Dieckmann, Malle, and Bodner(2009) and the chapters in Locke (1986) were reviewed forcandidates for possible inclusion.The online supplement, which is provided as a downloadablespreadsheet at http://pps.sagepub.com/supplemental-datalists the meta-analyses included; the research question(s)addressed for each lab–field comparison; and the meta-analyticresults for each research setting that was compared, includingthe number of effects and sample size included in each metaanalyticcomparison where this information was reported andthe mean effect size associated with each research setting. Thesupplement also indicates the subfield of psychology intowhich each meta-analysis was classified by PsycINFO, independentlyby the present author, and by psychological subfieldof the meta-analysis’s first author.AcknowledgmentsHart Blanton, John Monahan, and Fred Oswald provided helpfulcomments.Declaration of Conflicting InterestsThe author declared that he had no conflicts of interest with respectto his authorship or the publication of this article.Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012116 MitchellNotes1. It is more accurate to say that Anderson, Lindsay, and Bushman(1999) primarily compared effects in the lab with those in the field;they did not strictly limit their comparisons to lab versus field studiesbut also compared findings for real versus artificial groups and forreal versus hypothetical events.2. Proctor and Capaldi (2001) called for an extension of Andersonet al. (1999) to include more research domains, but no such extensionhas previously been reported.3. There are a few meta-analyses that examined effects under differentresearch settings, but they could not be included because they didnot report effect size information for each of the settings (e.g.,Frattaroli, 2006).4. The results also include those meta-analyses that had some overlapin coverage (these overlapping meta-analyses are identified in thenotes to the online supplement). None of the results differ greatly if theearlier of the overlapping meta-analyses are excluded (e.g., aggregatelab-field r =.64 with overlapping studies included and excluded).5. For instance, a journal-based approach to classifying research bysubfields (e.g., comparing traditional social to I–O journals) leads toa loss of data because several meta-analyses from different subfieldswere published in Psychological Bulletin. Nevertheless, every alternativeclassification of the effects examined produced results similarto those reported here, including classification of the effects by journalsubfield.6. One set of paired effects from Mullen et al. (1991) comparing theeffect of interpersonal distance on permeability of group boundariesin imaginary and real groups showed an extreme disparity betweenlab and field results (see the lower right quadrant of Fig. 1).Accordingly, the results reported in the text do not include this pairof effects. With Mullen et al. included in the analysis, the overall r =.64.7. This count excluded two comparisons (one from social and onefrom I–O) in which one of the paired effect sizes equaled zero.8. Mullen et al. (1991) fell within the domain of social psychology;with Mullen et al. included in this analysis, the correlation for socialpsychology drops to r = .29 (n = 81, 95% CI [.08, .48]).9. The first author of Mullen et al. (1991) was a social psychologist;with Mullen et al. included in this analysis, the correlation for socialpsychology drops to r = .27 (n = 77, 95% CI [.05, .47]).10. When Mullen et al. (1991) is included in the social psychologyeffects, y = .325x + .098.11. Four of 19 paired effects within PsycINFO’s “Psychometrics &Statistics & Methodology” classification changed signs (21%), butmeta-analyses in this method-focused classification implicated subjectmatter from other subfields (the four sign reversals within thisclassification involved the impact of test expectancies on multiplechoicetests, the relation of two different aspects of leader styles towork performance, and the impact of question wording on causalattributions for success and failure). Using my subfield classifications,which distributed these 19 studies into other subject mattersubfields, 18 of 80 (23%) social psychology comparisons, 8 of 96(8%) I–O psychology comparisons, and 1 of 19 (5%) clinical psychologycomparisons produced sign changes, ?2(2) = 8.64, p = .013.12. Lab effect sizes were categorized based on Cohen’s (1988) ruleof thumb for the size of correlation coefficients (small r = .10,medium r = .30, and large r = .50) using the following ranges: smalleffects are absolute effect sizes of .20 or less, medium effects areabsolute effect sizes from .201 to .40, and large effects are absoluteeffect sizes of .401 or greater.13. Only eight large laboratory effect sizes were found for socialpsychology, one of which was the possible outlier; the lab–field correlationbased on the remaining seven large effects from social psychologylaboratories (r = -.13) is thus susceptible to considerableinfluence by new results.14. With gender studies excluded, the lab–field correlation increasesslightly for social psychology (from r = .53 to r = .56) and does notchange for I-O psychology (r = .89).15. Suitability for study in the lab does not ensure generalizability;many factors on the design side will also come into play (Dhami,Hertwig, & Hoffrage, 2004; Hammond, Hamm, & Grassia, 1986).16. Attempts to pre-empt external validity challenges may explainwhy laboratory studies of aggression by social psychologists performedbetter in the field than some other areas of social psychologicalresearch. Aggression researchers have long faced skepticismabout their work’s applied implications (Berkowitz & Donnerstein,1982); indeed, such skepticism seems to have been part of the reasonfor the study by Anderson et al. (1999).17. Only post-1998 issues of Psychological Bulletin, Journal ofApplied Psychology, Journal of Social and Personality Psychology,and Personality and Social Psychology Bulletin were searched tosupplement the relevant articles found in pre-1999 issues of thesejournals by Anderson et al. (1999).ReferencesAnderson, C. A., Lindsay, J. J., & Bushman, B. J. (1999). Research inthe psychological laboratory: Truth or triviality? Current Directionsin Psychological Science, 8, 3–9.Aronson, E., Wilson, T. D., & Akert, R. M. (1994). Social psychology:The heart and mind. New York, NY: Harper Collins.Avolio, B. J., Reichard, R. J., Hannah, S. T., Walumbwa, F. O., &Chan, A. (2009). A meta-analytic review of leadership impactresearch: Experimental and quasi-experimental studies. LeadershipQuarterly, 20, 764–784.Behrman, B. W., & Davey, S. L. (2001). Eyewitness identificationin actual criminal cases: An archival analysis. Law and HumanBehavior, 25, 475–491.Berkowitz, L., & Donnerstein, E. (1982). External validity is morethan skin deep: Some answers to criticisms of laboratory experiments.American Psychologist, 37, 245–257.Black, V. (1955). Laboratory versus field research in psychology andthe social sciences. British Journal for the Philosophy of Science,5, 319–330.Cohen, J. (1988). Statistical power analysis for the behavioral sciences(2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.Cook, T. D. (1990). The generalization of causal connections: Multipletheories in search of clear practice. In L. Sechrest, E. Perrin,Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012External Validity of Laboratory Research 117& J. Bunker (Eds.), Research methodology: Strengthening causalinterpretations of nonexperimental data (DHHS Publication No.PHS 90-3454, pp. 9–31). Rockville, MD: U.S. Department ofHealth and Human Services.Cook, T. D. (2001). Generalization: Conceptions in the social sciences.In N. J. Smelser, J. Wright, & P. B. Baltes (Eds.), 9 Internationalencyclopedia of the social and behavioral sciences(pp. 6037–6043). Oxford, UK: Pergamon-Elsevier.Cortina, J. M., & Landis, R. S. (2009). When small effect sizes tella big story, and when large effect sizes don’t. In C. E. Lance &R. J. Vandenberg (Eds.), Statistical and methodological mythsand urban legends (pp. 287–308). New York, NY: Routledge.Dhami, M. K., Hertwig, R., & Hoffrage, U. (2004). The role of representativedesign in an ecological approach to cognition. PsychologicalBulletin, 130, 959–988.Dieckmann, N. F., Malle, B. F., & Bodner, T. E. (2009). An empiricalassessment of meta-analytic practice. Review of General Psychology,13, 101–115.Ellis, A. K. J., Humphrey, S. E., Conlon, D. E., & Tinsley, C. H.(2006). Improving customer reactions to electronic brokered ultimatums:The benefits of prior experience and explanations. Journalof Applied Social Psychology, 36, 2293–2324.Flowe, H. D., Finklea, K. M., & Ebbesen, E. B. (2009). Limitationsof expert psychology testimony on eyewitness identification. InB. L. Cutler (Ed.), Expert testimony on the psychology of eyewitnessidentification (pp. 201–221). New York, NY: Oxford UniversityPress.Frattaroli, J. (2006). Experimental disclosure and its moderators: Ameta-analysis. Psychological Bulletin, 132, 823–865.Greenwood, J. D. (2004). What happened to the “social” in socialpsychology? Journal for the Theory of Social Behaviour, 34,19–34.Hammond, K. R., Hamm, R. M., & Grassia, J. (1986). Generalizingover conditions by combining the multitrait-multimethod matrixand the representative design of experiments. Psychological Bulletin,100, 257–269.Harré, R., & Secord, P. F. (1972). The explanation of social behavior.Lanham, MD: Rowman & Littlefield.Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdestpeople in the world? Behavioral and Brain Sciences, 33, 61–83.Henry, P. J. (2009). College sophomores in the laboratory redux:Influences of a narrow data base on social psychology’s view ofthe nature of prejudice. Psychological Inquiry, 19, 49–71.Locke, E. A. (Ed.). (1986). Generalizing from laboratory to field settings.Lexington, MA: Lexington Books.McKay, D., & Schare, M. L. (1999). The effects of alcohol and alcoholexpectancies on subjective reports and physiological reactivity:A meta-analysis. Addictive Behaviors, 24, 633–647.Mook, D. G. (1983). In defense of external invalidity. American Psychologist,38, 379–387.Mullen, B., Copper, C., Cox, P., Fraser, C., Hu, L., Meisler, A., . . .Symons, C. (1991). Boundaries around group interaction: Ameta-analytic integration of the effects of group size. The Journalof Social Psychology, 131, 271–283.Prentice, D. A., & Miller, D. T. (1992). When small effects areimpressive. Psychological Bulletin, 112, 160–164.Proctor, R. W., & Capaldi, E. J. (2001). Empirical evaluation andjustification of methodologies in psychological science. PsychologicalBulletin, 127, 759–772.Richard, F. D., Bond, C. F., & Stokes-Zoota, J. J. (2003). One hundredyears of social psychology quantitatively described. Reviewof General Psychology, 7, 331–363.Rosenthal, R. (1994). Parametric measures of effect size. In H. Cooper& L. V. Hedges (Eds.), The handbook of research synthesis(pp. 231–244). New York, NY: Russell Sage Foundation.Rozin, P. (2001). Social psychology and science: Some lessons fromSolomon Asch. Personality and Social Psychology Review, 5, 2–14.Secord, P. F. (1982). The behavior identity problem in generalizingfrom experiments. American Psychologist, 37, 1408.Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimentaland quasi-experimental designs for generalized causal inference.Boston, MA: Houghton Mifflin.Stone-Romero, E. F. (2002). The relative validity and usefulness ofvarious empirical research designs. In S. G. Rogelberg (Ed.),Handbook of research methods in industrial and organizationalpsychology (pp. 77–98). Malden, MA: Blackwell.von Wittich, D., & Antonakis, J. (2011). The KAI cognitive styleinventory: Was it personality all along? Personality and IndividualDifferences, 50, 1044–1049.West, B. J., Patera, J. L., & Carsten, M. K. (2009). Team level positivity:Investigating positive psychological capacities and team leveloutcomes. Journal of Organizational Behavior, 30, 249–267.Wilson, T. D., Aronson, E., & Carlsmith, K. (2010). The art of laboratoryexperimentation. In S. T. Fiske, D. T. Gilbert, & G. Lindzey(Eds.), Handbook of social psychology (Vol. 1, pp. 51–81). Hoboken,NJ: Wiley.Downloaded from pps.sagepub.com at UNIV WASHINGTON LIBRARIES on April 23, 2012


Last Completed Projects

# topic title discipline academic level pages delivered
6
Writer's choice
Business
University
2
1 hour 32 min
7
Wise Approach to
Philosophy
College
2
2 hours 19 min
8
1980's and 1990
History
College
3
2 hours 20 min
9
pick the best topic
Finance
School
2
2 hours 27 min
10
finance for leisure
Finance
University
12
2 hours 36 min