Predeliberation Juror Discussion Leads to Bias in Jury Deliberation

Discussion of trial evidence by jurors, prior to jury deliberation, can introduce a systematic bias in jury verdicts. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2018, Vol. 42, No. 5, 413-426

Should Jurors Be Allowed to Discuss Trial Evidence Before Deliberation?: New Research Evidence


Norbert L. Kerr, Michigan State University and Claremont Graduate University
Jiin Jung, Claremont Graduate University


Traditionally, jurors are not permitted to discuss trial evidence with one another prior to jury deliberation. Allowing such discussions, at least in civil trials, is a jury innovation that has become increasingly popular. Prior field research has generally supported the assumption that this innovation is benign and, in particular, introduces no systematic bias in jury verdicts. These issues are examined again here within an experimental jury simulation study. The opportunity for predeliberation juror discussion (PJD) between the plaintiff and defense cases-in-chief was manipulated. The results revealed that PJD biased jury verdicts. The nature of this bias was not, as commonly suspected, a commitment to evidence heard prior to PJD, but rather a greater weight placed on evidence heard following the PJD. One good explanation of this bias was that jurors acted as if evidence heard prior to PJD had “already been covered” during the PJD, and so primary attention was given to post-PJD evidence in jury deliberations. Little evidence was found to corroborate several other purported benefits or drawbacks of PJD.


jury, predeliberation discussion, jury innovation, bias, recency effect

Summary of the Research

“Traditionally, the first real opportunity that jurors have to discuss a trial’s evidence with one another occurs during jury deliberations at the end of the trial. Indeed, jurors are routinely instructed at the outset of the trial that they must not discuss the trial evidence with anyone— fellow jurors, friends, or even a spouse—before such deliberations begin. The primary reason for this prohibition is a concern that such discussions could lead jurors to make up their minds about the key issues in the case prematurely—that is, before they have heard all the evidence or been instructed on the law governing their verdicts” (p. 413).

“However, in the last few years a number of states (e.g., Arizona, Colorado, the District of Columbia, Indiana, Maryland, Michigan) have relaxed this prohibition, permitting jurors to discuss the evidence prior to deliberation under certain conditions. For example, in Arizona civil trials jurors are now permitted to discuss the evidence during trial recesses, but only among themselves in the jury room and only when all jurors are present. Furthermore, jurors are cautioned that they must not form final opinions about any fact or about the outcome of the case until they have heard and considered all of the trial evidence. A number of other states (e.g., California, North Dakota) have actively considered making similar changes to their procedures. Others (e.g., Anderson, 2002) have called for permitting such discussion to occur in criminal or military juries as well. And there are indications that even in states where predeliberation discussion is prohibited, judges are allowing it if counsel consent” (p. 413-414).

“The general structure of trials require that one side present its case before the other—defendants cannot answer charges until the plaintiff/prosecution first present their case. If jurors have heard one side (e.g., the plaintiff’s) first, and then discussed the case before hearing the other side (e.g., the defendant’s), prejudgment and early commitment would appear to advantage the plaintiff in civil trials (and the prosecution in criminal trials). This reasoning has reasonably made a prediction of a type of “primacy effect” (viz., more verdicts for the side presenting first, the plaintiff or prosecution) in prior PJD [predeliberation juror discussion] research the most popular alternative to the null hypothesis (i.e., same verdicts in juries permitted and forbidden to discuss). But an opposite, “recency effect” could also be predicted. For example, if jurors tended to discount, ignore, or underweigh in final deliberations evidence presented prior to their discussions (which would tend to favor the side that presents first) for any of several reasons (e.g., “we’ve already covered that”; more confidence in one’s evaluation of evidence after an opportunity to socially validate one’s understanding of evidence heard prior to discussion), then a recency/prodefense effect would result” (p. 415).

“Both of these predictions also make a simplifying but questionable assumption—that each side’s prospects for winning the trial hinge primarily on the evidence presented in each side’s case-inchief (i.e., during the early plaintiff/prosecution case or the late defense case). Although this may often be true, at the end of the trial it is not the relative strength evidence presented early versus late that is crucial, but rather the relative strength of the totality of each side’s supporting evidence that should determine the trial outcome. This means that a primacy effect— greater weight placed on information presented early—need not result in more proplaintiff/prosecution verdicts, and that a recency effect— greater weight placed on information presented late—need not result in more prodefense verdicts. For example, suppose in a civil trial the plaintiff’s case-in-chief is much weaker than the defense’s casein- chief. A primacy effect might manifest itself as highlighting the weakness of the plaintiff’s case, and hence lead to more prodefense verdicts. Conversely, a recency effect might result in more proplaintiff / prosecution verdicts if the defense case-in-chief were extremely weak” (p. 415-416).

“The primary objective of this article was to explore whether predeliberation juror discussion (PJD) is verdict neutral—that is, whether such discussion has no systematic impact on juror/jury verdicts, as the prior literature has suggested, or whether such discussion does have some impact. Our results clearly indicated that PJD is not verdict neutral, at least under the conditions examined here. The impact of PJD on verdicts was significant and strong (e.g., overall, the difference in jury pro-plaintiff-verdict rates between those denied and permitted PJD was 26.5%). However, the effect of PJD was not a simple proplaintiff/proprosecution bias, as has been suspected in most prior commentary and research. Rather, the effect of PJD was a type of recency effect— the evidence presented later in the trial (and after the jury’s PJD) had relatively greater impact on the jury’s verdict than the evidence presented early in the trial (and prior to the jury’s PJD). This kind of recency effect would not produce a simple proplaintiff or prodefense bias unless the timing of evidence (early vs. late in the trial, and hence, usually before vs. after PJD) was strongly correlated with which side the evidence favored (plaintiff vs. defendant). It may well be true that the strongest plaintiff evidence often appears early (during the plaintiff’s case-in-chief) and the strongest defense evidence often appears late (during the defense’s case-in-chief). But it is also quite possible for the opposite to occur—strong defense evidence appearing early or strong plaintiff evidence appearing late—or for there to be no clear correlation between timing and side favored. If across all trials, this correlation were weak or absent, we would not expect any net effect of PJD on verdict, which is just the general pattern observed in the prior field research. Our experimental design permitted us to tease apart the timing of strong evidence (early vs. late) and the side favored by that evidence (plaintiff vs. defense). And our results suggest that the net effect of permitting PJD will be to bias verdicts in favor of whichever side would profit more from the jury paying greater attention and giving greater weight to the evidence presented after PJD than before PJD” (p. 422).

Translating Research into Practice

“For the sake of argument, let us momentarily assume that the recency bias found here will occur for a wide range of civil (or criminal) trials; what might be done to minimize it? If it could be shown that some ways of timing PJD were less likely to produce the bias (e.g., regular and frequent PJDs), perhaps juries might be encouraged or required to time their discussions accordingly. However, the evidence for such an ideal patterning of discussions would have to be compelling to justify such an intrusive remedy. Judges instructions might describe the bias and caution the jury not to consider evidence discussed during a PJD session as “already covered” and hence, worth less consideration during their final deliberations. Unfortunately, the research evidence on the effectiveness of such cautionary judicial instructions is not encouraging. Pending the research required to understand the full impact of permitting PJDs, and the effectiveness of alternative remedies, the safest option would appear to be to follow the long-standing tradition of prohibiting PJD” (p. 424).

“There are many trial practices which jurors dislike, such as being denied information on a defendant’s past criminal history, being denied access to sidebar conversations, or reaching verdicts without knowing exactly what sentence might be imposed. But in these and many other practices, the goal of unbiased jury decision making trumps juror preferences. Our results suggest that prohibiting predeliberation juror discussion might well be another such practice, and that the rush to implement this jury innovation should be reconsidered” (p. 425).

Other Interesting Tidbits for Researchers and Clinicians

“An ever-present issue for experimental jury simulation studies like this one is whether the key findings would be materially different under more realistic conditions (e.g., a more representative jurors; with a live trial; if the verdicts determined tangible consequences for the litigants). Fortunately, there is practically no evidence that results from mock jury simulation studies are materially altered by increasing realismalong such dimensions. A separate issue is the particular form the PJD took in our study—a brief discussion of the evidence between the two cases in chief. Of course, there are many other forms that PJD might take in actual trials, and some of these seem likely to modify the recency effect we observed. For example, the closer the last juror discussion occurred to the start or the end of the trial, the less impact a greater focus on the postdiscussion evidence should have. At the limits, all/none of the trial evidence would remain to be heard if there were only a single discussion at the start/end of the trial. And the length of a trial or of a jury’s discussion might well affect any recency bias; for example, the shorter the discussion, the harder it would be to maintain that the jury had “already covered” all the evidence presented prediscussion. Also, for good experimental reasons, our mock juries only considered trials in which the cases-in-chief for both sides were nicely balanced. But if there were a strong contrast between the strength of the plaintiff’s and the defense’s cases, the recency bias might be altered—it might be attenuated / bolstered if the defense case were patently weaker/stronger than the plaintiff case. Clearly, much more research is required to settle such external validity questions” (p. 424).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Beltrani

Amanda Beltrani is a current doctoral student at Fairleigh Dickinson University. Her professional interests include forensic assessments, professional decision making, and cognitive biases.

Fighting for objectivity: Cognitive bias in forensic examinations

Forensic evaluations are not immune to various cognitive biases, but there are ways to mitigate them. This is the bottom line of a recently published article in International Journal of Forensic Mental Health. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | International Journal of Forensic Mental Health | 2017, Vol. 16, No. 3, [227–238]

Understanding and Mitigating Bias in Forensic Evaluation: Lessons from Forensic Science


Patricia A. Zapf, John Jay College of Criminal Justice
Itiel E. Dror, University College London


Criticism has emerged in the last decade surrounding cognitive bias in forensic examinations. The National Research Council (NRC, 2009) issued a report that delineated weaknesses within various forensic science domains. The purpose of this article is to examine and consider the various influences that can bias observations and inferences in forensic evaluation and to apply what we know from forensic science to propose possible solutions to these problems. We use Sir Francis Bacon’s doctrine of idols—which underpins modern scientific method—to expand Dror’s (2015) five-level taxonomy of the various stages at which bias can originate within forensic science to create a seven-level taxonomy. We describe the ways in which biases can arise and impact work in forensic evaluation at these seven levels, highlighting potential solutions and various means of mitigating the impact of these biases, and conclude with a proposal for using scientific principles to improve forensic evaluation.


Bias, cognitive bias, cognitive factors, forensic evaluation, forensic psychology

Summary of the Research

“Research and commentary have emerged in the last decade surrounding cognitive bias in forensic examinations, both with respect to various domains within forensic science […] as well as with respect to forensic psychology. […] Indeed, in 2009 the National Research Council (NRC) issued a 352-page report entitled, Strengthening Forensic Science in the United States: A Path Forward that delineated several weaknesses within the various forensic science domains and proposed a series of reforms to improve the issue of reliability within the forensic sciences. Prominent among these weaknesses was the issue of cognitive factors, which impact an examiner’s understanding, analysis, and interpretation of data.” (p. 227)

“While we acknowledge differences between the workflow and roles of various forensic science practitioners and forensic mental health evaluators, we also believe that there are overarching similarities in the tasks required between the forensic science and forensic mental health evaluation domains. Across these two domains, examiners and evaluators are tasked with collecting and considering various relevant pieces of data in arriving at a conclusion or opinion and, across both of these domains, irrelevant information can change the way an examiner/evaluator interprets the relevant data. Bias mechanism, such as bias cascade and bias snowball, can impact examiners in forensic science as well as in forensic psychology.” (p. 227–228)

“The purpose of this article is to examine and consider the various influences that can bias observations and inferences in forensic evaluation and to apply what we know from forensic science to propose possible solutions to these problems. […] We describe the ways in which biases can arise and impact work in forensic evaluation at these various levels, highlighting potential solutions and various means of attempting to mitigate the impact of these biases, and conclude with a proposal for next steps on the path forward with the hope that increased awareness of and exposure to these issues will continue to stimulate further research and discussion in this area.” (p. 228)

“Sir Francis Bacon, who laid the foundations for modern science, believed that scientific knowledge could only arise if we avoid factors that distort and prevent objectivity. Nearly 400 years ago, Bacon developed the doctrine of “idols,” in which he set out the various obstacles that he believed stood in the way of truth and science—false idols that prevent us from making accurate observations and achieving understanding by distorting the truth and, therefore, stand in the way of science. […] In parallel and in addition to Bacon’s four idols, Dror and his colleagues have discussed various levels at which cognitive factors might interfere with objective observations and inferences and contribute to bias within the forensic sciences. […] Here we present a seven-level taxonomy that integrates Bacon’s doctrine of idols with the previous work of Dror and colleagues on the various sources of bias that might be introduced, and apply these to forensic evaluation.” (p. 228)

“Forensic evaluation requires the collection and examination of various pieces of data to arrive at an opinion regarding a particular legal issue at hand. […] The common components of all forensic evaluations include the collection of data relevant to the issue at hand […] and the consideration and weighting of these various pieces of data, according to relevance and information source, to arrive at an opinion/conclusion regarding the legal issue being evaluated.” (p. 228–229)

“Forensic evaluation is distinct from clinical evaluation, which relies primarily on limited self-report data from the individual being evaluated. Forensic evaluation places great importance on collecting and considering third party and collateral information in conjunction with an evaluee’s self-report data, and forensic evaluators are expected to consider the impact and relevance of the various pieces of data on their overall conclusions. In addition, forensic evaluators are expected to strive to be as impartial, objective, and unbiased as possible in arriving at their conclusions and opinions about the legal issue at hand. […] Hence, it can be argued that forensic evaluations should aspire to be more similar to scientific investigations—where the emphasis is placed on using observations and data to test alternate hypotheses—than to unstructured clinical assessments, which accept an evaluee’s self-report at face value without attempts to corroborate or confirm the details of the evaluee’s account and with less emphasis on alternate hypothesis testing.” (p. 229)

“If we accept the premise that forensic evaluations should be more akin to scientific investigations than clinical evaluations, then forensic evaluators should conduct their work more like scientists than clinicians, using scientific methods to inform their conceptualization of the case and opinions regarding the legal issue at hand. […] We take the lessons from forensic science and apply these to forensic evaluation with the aim of making forensic evaluation as objective and scientific as possible within the confines and limitations of attempting to apply group-to- individual inferences. […] We do so by developing the framework of a seven-level taxonomy delineating the various influences that might interfere with objective observations and inferences, potentially resulting in biased conclusions in forensic evaluation. The taxonomy starts at the bottom with innate sources that have to do with being human. As we ascend the taxonomy, we discuss sources related to nurture—such as experience, training, and ideology—that can cause bias and, as we near the top of the taxonomy, the sources related to the specific case at hand. So, the order of the taxonomy is from general, basic innate sources derived from human nature, to sources that derive from nurture, and then to those sources that derive from the specifics of the case at hand. “(p. 229)

“At the very base of the taxonomy are potentially biasing influences that result from our basic human nature and the cognitive architecture of the brain. […] These obstacles or influences result from the way in which our brains are built […] The human brain has a limited capacity to represent and process all of the information presented to it and so it relies upon techniques such as chunking information (binding individual pieces of information into a meaningful whole), selective attention (attending to specific pieces of information while ignoring other information), and top-down processing (conceptually driven processing that uses context to make sense of information) to efficiently process information […] We actively process information by selectively attending to that which we assume to be relevant and interpret this information in light of that which we already know.” (p. 230)

“Ironically, this automaticity and efficiency—which serves as the bedrock for expertise—also serves as the source of much bias. That is, the more we develop our expertise in a particular area, the more efficient we become at processing information in that area, but this enhanced performance results in cognitive tradeoffs that result in a lack of flexibility and error. […] For example, information that we encounter first is more influential than information we encounter later. This anchoring bias can result in a forensic evaluator being overly influenced by or giving greater weight to information that is initially presented or reviewed. Thus, initial information communicated to the forensic evaluator by the referring party will likely be more influential, and serve as an anchor for, subsequent information reviewed by the evaluator.” (p. 230)

“We also have a tendency to overestimate the probability of an event or an occurrence when other instances of that event or occurrence are easily recalled. This availability bias can result in a forensic evaluator overestimating the likelihood of a particular outcome on the basis of being able to readily recall similar instances of that same outcome. Confirmation bias results from our natural inclination to rush to conclusions that confirm what we want, believe, or accept to be true. […] In the forensic evaluation domain, […] the confirmation bias can exert its influence on evaluators who share a preliminary opinion before the evaluation is complete by committing the evaluator in a way that makes it difficult to resist or overcome this bias in the final interpretation of the data. […] What is important is that we recognize our limits and cognitive imperfections so that we might try to address them by using countermeasures.” (p. 230–231)

“Moving up the taxonomy, the next three sources of influences that can affect our perception and decision-making result from our environment, culture, and experience. First among these are those influences that are brought about by our upbringing—our training and motivations […] Our personal motivations and preferences, developed through our upbringing, affect our perception, reasoning, and decision-making.” (p. 231)

“Closely related to an individual’s motivations are how one sees oneself and with whom that individual identifies. One particularly salient and concerning influence in this realm for forensic evaluators is that of adversarial allegiance; that is, the tendency to arrive at an opinion or conclusion that is consistent with the side that retained the evaluator. [The research shows that] forensic evaluators working for the prosecution assign higher psychopathy scores to the same individual as compared to forensic evaluators working for the defense. […] forensic evaluators assign higher scores on actuarial risk assessment instruments—known to be less subjective than other types of risk assessment instruments—when retained by the prosecution and lower scores when retained by the defense” (p. 231)

“In addition to the pull to affiliate with the side that retained the forensic evaluator is the issue of pre-existing attitudes that forensic evaluators hold and how these might impact the forensic evaluation process.” (p. 231)

“Language has a profound effect on how we perceive and think about information. The words we use to convey knowledge—terminology, vocabulary, and even jargon—can cause errors in how we understand and interpret information when we use them without attention and proper focus on the true meaning, or without definition, measurable criteria, and quantification. It is important to consider the meaning and interpretation of the words we use and how these might differ by organization, discipline, or culture. It is easy to assume that we know what someone means when they tell us something—whether it be an evaluee, a retaining party, or a collateral informant— but we must be cautious about both interpreting the language of others and using language to convey what we mean.” (p. 231–232)

“In the forensic assessment domain, different methods of conducting risk assessments (using dynamic risk assessment measures versus static risk assessment measures) have been demonstrated to affect the predictive accuracy of the conclusions reached by evaluators. […] highly structured methods with explicit decision rules and little room for discretion outperform unstructured clinical methods and show higher rates of reliability and less bias in the predicted outcomes.” (p. 232)

“Within existing organizational structures, using language with specific definition and meaning that serves to increase error detection and prevention is important for creating a more scientific discipline.” (p. 232)

“The ways in which forensic evaluators produce knowledge within their discipline can serve as an impediment to accurate observations and objective inferences. Anecdotal observations or information based on unsupported or blind beliefs can serve to create expectations about conclusions or outcomes before an evaluation is even conducted. Similarly, using methods or procedures that have not been adequately validated or that have been based on narrow, in-house research for which generalizability is unknown can result in inaccurate conclusions. Drawing inferences on the basis of untested assumptions or base rate expectations can lead to erroneous outcomes.” (p. 232)

“Perhaps one of the most potentially biasing considerations at the [level that deals with influences that result from information that is obtained or reviewed for a specific case but that is irrelevant to the referral question] involves the inferences made by others. […] Detailed information about an evaluee’s criminal history (offenses committed prior to the index offense), in most instances, is irrelevant to the issue of his or her criminal responsibility, which is an inquiry that focuses on the mental state of the individual at the time of the index offense. This irrelevant information, however, can become biasing for an evaluator. Even more potentially biasing can be the inferences and conclusions that others make about an evaluee—including collateral informants as well as retaining and opposing parties—since evaluators typically do not have access to the data or the logic used by others in arriving at these inferences and conclusions. […] It is naive to think that a forensic evaluator can only collect and consider relevant information, especially since many times it is not clear what is relevant and what is irrelevant until all collected materials have been reviewed; however, disregarding irrelevant information is nearly impossible.” (p. 233)

“Attempting to limit, as much as possible, the irrelevant information that is reviewed or considered as part of a forensic evaluation is one means of mitigating bias. Having a third-party take an initial pass through documents and records provided for an evaluation to compile relevant information for the evaluator’s consideration is one way of potentially mitigating against biasing irrelevant information. Another potentially mitigating strategy might be to engage in a systematic process of review where clear and specific documentation of what was reviewed, when it was reviewed, in the order in which it was reviewed, and with the evaluator detailing his or her thoughts, formulations, and inferences after each round of review, beginning with the most explicitly relevant case information (e.g., the police report for the index offense in a criminal responsibility evaluation) and moving toward the least explicitly relevant case information (e.g., elementary school records in a criminal responsibility evaluation).” (p. 233)

“Just as irrelevant case material can be biasing, so too can contextual information included in the reference materials for a forensic evaluation. […] reference materials would include whatever it is that the evaluator is supposed to be evaluating the evidence against and, of course, can include potentially biasing contextual information.” (p. 234)

“The reference materials also underpin the well-documented phenomenon of “rater drift,” wherein one’s ratings shift over time or drift from standard levels or anchors by unintentionally redefining criteria. This means that evaluators should be careful to consult the relevant legal tests, statutes, or standards for each evaluation conducted and no assume that memory for or conceptualization of the standard or reference material is accurate.” (p. 234)

“In addition to irrelevant case information and contextual information included as part of the reference materials for a case, the actual case evidence itself might also include some irrelevant, contextual, or biasing information. Here we conceptualize case evidence as information germane to the focus of the inquiry that must be considered by any forensic evaluator in arriving at an opinion about the particular legal issue. […] Influences at the case evidence level include biasing contextual information from the actual police reports or other data that must be considered for the referral question. Thus, contextual information that is inherent to the case evidence and that cannot be easily separated from it can influence and bias an evaluator’s inferences about the data.” (p. 234–235)

“Irrelevant or contextual information can influence the way in which evaluators perceive and interpret data at any of these seven levels—ranging from the most basic aspects of human nature and the cognitive architecture of the brain, through one’s environment, culture, and experiences, and including specific aspects of the case at hand—but it is important to note that biased perceptions or inferences at any of these levels do not necessarily mean that the outcome, conclusion, or opinion will be biased. […] Even if the bias is in a contradictory direction from the correct decision, the evidentiary data might affect the considerations of the evaluator to some extent but not enough to impact the actual outcome of the evaluation or ultimate opinion of the evaluator. What appears important to the outcome is the degree to which the data are ambiguous; the more ambiguous the data, the more likely it will be that a bias will affect the actual decision or outcome.” (p. 235)

“Consideration of the various influences that might bias an evaluator’s ability to objectively evaluate and interpret data is an important component of forensic evaluation. […] Knowledge about the ways in which bias can impact forensic evaluation is an important first step; however, the path forward also includes the use of scientific principles to test alternative hypotheses, methods, and strategies for minimizing the impact of bias in forensic evaluation. Using scientific principles to continue to improve forensic evaluation will bring us closer to the aspirational goal of objective, impartial, and unbiased evaluations.” (p. 236–237)

Translating Research into Practice

“The presence of a bias blind spot—the tendency of individuals to perceive greater cognitive and motivational bias in others than in themselves—has been well documented. […] forensic psychologists are occupationally socialized to believe that they can and do practice objectively (recall the discussion of training and motivational influences); however, emerging research on bias in forensic evaluation has demonstrated that this belief may not be accurate […] In addition, it appears that many forensic evaluators report using de-biasing strategies, such as introspection, which have been proven ineffective and some even deny the presence of any bias at all.” (p. 235)

“For forensic evaluation to advance and improve, we must behave as scientists. […] Approaching forensic evaluations like scientific inquiries and using rival hypothesis testing might place the necessary structure on the evaluation process to determine the differential impact of the various data considered.” (p. 235–236)

“Identifying weaknesses in forensic evaluation and conducting research and hypothesis testing on proposed counter measures to reduce the impact of bias will serve to improve the methods and procedures in this area. Being scientific about forensic evaluation and using scientific principles to understand and improve it appears to be a reasonable path forward for reducing and mitigating bias.” (p. 236)

“The need for reliability among evaluators (as well as by the same evaluator at different times—inter- and intra-evaluator consistency) is a cornerstone for establishing forensic evaluation as a science. By understanding the characteristics of evaluators—including training, culture, and experience—that contribute to their opinions we can begin to propose and study different ways of limiting the impact of these characteristics on objective observation and inferences in forensic evaluation.” (p. 236)

“Research has demonstrated that reliability improves when standardized inquiries are used for competence evaluation. […] Conducting systematic research on the methods and procedures used in forensic evaluation and the impact of these on evaluation outcomes and bias will ultimately allow for development of the most effective strategies for forensic evaluation.” (p. 236)

“Implementing professional training programs that address cognitive factors and bias in forensic evaluation and conducting systematic research on the impact of various training techniques for increasing understanding of these issues will likely improve the methods that forensic evaluators currently use to mitigate the impact of bias in their work. […] Understanding the most effective ways of training evaluators to perform forensic evaluations in a consistent and reliable way while limiting the impact of bias will allow for the implementation of best practices, both with respect to the evaluations themselves as well as with respect to training procedures and outcomes.” (p. 236)

Other Interesting Tidbits for Researchers and Clinicians

“[Sir Francis Bacon’s idols] were categorized into idola tribus (idols of the tribe), idola spectus (idols of the den or cave), idola fori (idols of the market), and idola theatric (idols of the theater).” (p. 228)

“Bacon makes the case that experiences, education, training, and other personal traits (the idola spectus) that derive from nurture, can cause people to misperceive and misinterpret nature differently. That is, because of individual differences in their upbringing, experiences, and professional affiliations, people develop personal allegiances, ideologies, theories, and beliefs, and these may “corrupt the light of nature” (p. 228)

“Bacon’s doctrine of idols distinguishes between idols that are a result of our physical nature (e.g., human cognitive architecture) and the ways in which we were nurtured (e.g., experiences), and those that result from our social nature and the fact that we are social animals who interact with others in communities and work together. The first two idols—those of the tribe and the den—result from our physical nature and upbringing respectively, whereas the others—those of the market and theater result from our social nature and our interactions with others.” (p. 228)

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Kseniya Katsman

Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.

Why Do Forensic Experts Disagree? Suggestions for Policy and Practice Changes

Unreliable opinions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as diminish confidence in psychological expertise within the legal system. This is the bottom line of a recently published article Translational Issues in Psychological Science. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Translational Issues in Psychological Science | 2017, Vol. 3, No. 2, 143-152

Why Do Forensic Experts Disagree? Sources of Unreliability and Bias in Forensic Psychology Evaluations


Lucy A. Guarnera, University of Virginia
Daniel C. Murrie, University of Virginia School of Medicine
Marcus T. Boccaccini, Sam Houston State University


Recently, the National Research Council, Committee on Identifying the Needs of the Forensic Science Community (2009) and President’s Council of Advisors on Science and Technology (PCAST; 2016) identified significant concerns about unreliability and bias in the forensic sciences. Two broad categories of problems also appear applicable to forensic psychology: (1) unknown or insufficient field reliability of forensic procedures, and (2) experts’ lack of independence from those requesting their services. We overview and integrate research documenting sources of disagreement and bias in forensic psychology evaluations, including limited training and certification for forensic evaluators, unstandardized methods, individual evaluator differences, and adversarial allegiance. Unreliable opinions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as diminish confidence in psychological expertise within the legal system. We present recommendations for translating these research findings into policy and practice reforms intended to improve reliability and reduce bias in forensic psychology. We also recommend avenues for future research to continue to monitor progress and suggest new reforms.


forensic evaluation, forensic instrument, adversarial allegiance, human factors, bias

Summary of the Research

“Imagine you are a criminal defendant or civil litigant undergoing a forensic evaluation by a psychologist, psychiatrist, or other clinician. The forensic evaluator has been tasked with answering a difficult psycholegal question about you and your case. For example, ‘Were you sane or insane at the time of the offense? How likely is it that you will be violent in the future? Are you psychologically stable enough to fulfill your job duties?’ The forensic evaluator interviews you, reads records about your history, speaks to some sources close to you, and perhaps administers some psychological tests. The evaluator then forms a forensic opinion about your case—and the opinion is not in your favor. You might wonder whether most forensic clinicians would have reached this same opinion. Would a second (or third, or fourth) evaluator have come to a different, perhaps more favorable conclusion? In other words, how often do forensic psychologists disagree? And why does such disagreement occur?” (p. 143-144)

“While forensic evaluators strive for objectivity and seek to avoid conflicts of interest, a forensic opinion may be influenced by multiple sources of variability and bias that can be powerful enough to cause independent evaluators to form different opinions about the same defendant” (p. 144).

“Interrater reliability is the degree of consensus among multiple independent raters. Of particular
interest within forensic psychology is field reliability—the interrater reliability among practitioners performing under routine practice conditions typical of real-world work. In general, the field reliability of forensic opinions is either unknown or far from perfect” (p. 144).

“Besides the unreliability that may be intrinsic to a complex, ambiguous task such as forensic evaluation, research has identified multiple extrinsic sources of expert disagreement. One such source is limited training and certification for forensic evaluators. While specialized training programs and board certifications have become far more commonplace and rigorous since the early days of the field in the 1970s and 1980s, the training and certification of typical clinicians conducting forensic evaluations today remains variable and often poor” (p. 145).

“This training gap is important because empirical research suggests that evaluators with greater training produce more reliable forensic opinions” (p. 145).

“One likely reason why training and certification increase interrater reliability is that they promote standardized evaluation methods among forensic clinicians. While there are now greater resources and consensus concerning appropriate practice than even a decade ago, forensic psychologists still vary widely in what they actually do during any particular forensic evaluation… This diversity of methods—including the variety and at times total lack of structured tools—is likely a major contributor to disagreement among forensic evaluators” (p. 146).

“Even within the category of structured tools, research shows that forensic assessment instruments with explicit scoring rules based on objective criteria yield higher field reliability than instruments involving more holistic or subjective judgments” (p. 146).

“In addition to evaluators’ inconsistent training and methods, patterns of stable individual differences among evaluators—as opposed to mere inaccuracy or random variation—seem to contribute to divergent forensic opinions… Stable patterns of differences suggest that evaluators may adopt idiosyncratic decision thresholds that consistently shift their forensic opinions or instrument scores in a particular direction, especially when faced with ambiguous cases” (p. 146).

“Upon these concerns about unknown or less-than-ideal field reliability of forensic psychology procedures, we now add concerns about forensic experts’ lack of independence from those requesting their services. As far back as the 1800s, legal experts have lamented the apparent frequency of scientific experts espousing the views of the side that hired them (perhaps for financial gain), leading one judge to comment,
‘[T]he vicious method of the Law, which permits and requires each of the opposing parties to summon the witnesses on the party’s own account[,] . . . naturally makes the witness himself a partisan’. More modern surveys continue to identify partisan bias as judges’ main concern about expert testimony, citing experts who appear to “abandon objectivity” and “become advocates” for the retaining party” (p. 147).

Translating Research into Practice

“While many clinicians cite introspection (i.e., looking inward in order to identify one’s own biases) as a primary method to counteract personal ideology, idiosyncratic responses to examinees, and other individual differences research suggests that introspection is ineffective and may even be counterproductive. Thus, more disciplined changes to personal practice are needed. For example, when conducting evaluations for which well-validated structured tools exist, evaluators could commit to using such tools as a personal standard of practice. This would entail justifying to themselves (or preferably colleagues) why they did or did not use an available tool for a particular case. Practicing forensic evaluators could also use simple debiasing methods to counteract confirmation bias, such as the ‘consider-the-opposite’ technique in which evaluators ask themselves, ‘What are some reasons my initial judgment might be wrong?’ To increase personal accountability, evaluators could keep organized records of their own forensic opinions and instrument scores, or even help organize larger databases for evaluators within their own institution or locality. Using these personal data sets, evaluators might look for mean differences in their own instrument scores when retained by the prosecution versus the defense, or compare their own base rates of incompetency and insanity findings to those of their colleagues. Ambitious evaluators could even experiment with blinding themselves to the source of referral in order to counteract adversarial allegiance” (p. 149).

“Although individual evaluators can make many voluntary changes today in order to reduce the impact of unreliability and bias on their forensic opinions, other reforms require widerranging structural transformation. For example, state-level legislative action is needed to mandate more than one independent forensic opinion. Requiring more than one independent opinion is a powerful way to combat unreliability and bias by reducing the impact of any one evaluator’s error” (p. 149).

“Even slower to change than state legislation and infrastructure might be existing legal norms, such as judges’ current willingness to admit nonblinded, partisan experts. While authoritative calls to action like the NRC and PCAST reports may have some influence, most legal change only happens by the accretion of legal precedent, which is a slow and unpredictable process” (p. 149-150).

Other Interesting Tidbits for Researchers and Clinicians

“Foundational research should establish field reliability rates for various types of forensic evaluations in order to assess the current situation and gauge progress toward improvement. Only a handful of field reliability studies exist for a few types of forensic evaluations (i.e., adjudicative competency, legal sanity, conditional release), and virtually nothing is known about the field reliability of other types of evaluations, particularly civil evaluations” (p 144-145).

“Given that increased standardization of forensic methods has the potential to ameliorate multiple sources of unreliability and bias described here, more investigation of forensic instruments, checklists, practice guidelines, and other methods of standardization is a second research priority. Some of this research should continue to focus on creating standardized tools for forensic evaluations and populations for which none are currently available, particularly civil evaluations such as guardianship, child protection, fitness for duty, and civil torts like emotional injury. Future research can also continue to seek improvements to the currently modest predictive accuracy of risk assessment instruments. However, given the current gap between the availability of forensic instruments and their limited use by forensic evaluators in the field, perhaps more pressing is research on the implementation of forensic instruments in routine practice. More qualitative and quantitative investigations of how instruments are administered in routine practice, why instruments are or are not used, and what practical obstacles evaluators encounter are needed. Without greater understanding of how instruments are (or are not) implemented in practice—particularly in rural or other under resourced areas— continuing to develop new tools may not translate to their increased use in the field” (p. 148).

“A clear recommendation for improving evaluator reliability is that states without standards for the training and certification of forensic experts should adopt them, and states with weak standards (e.g., mere workshop attendance) should strengthen them. What is less clear, however, is what kinds and doses of training can improve reliability with the greatest efficiency. Drawing from extensive research in industrial and organizational psychology, credentialing requirements that mimic the type of work evaluators do as part of their job (e.g., mock reports, peer review, apprenticing) may foster professional competency better than requirements dissimilar to job duties (e.g., written tests). Given that both evaluators and certifying bodies have limited time and resources, research into the most potent ingredients of successful forensic credentialing is a third research priority” (p. 148-149).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Beltrani

Amanda Beltrani is a current graduate student in the Forensic Psychology Masters program at John Jay College of Criminal Justice in New York. Her professional interests include forensic assessments, specifically, criminal matter evaluations. Amanda plans to continue her studies in a doctoral program after completion of her Masters degree.

What Works and What Doesn’t – Bias Awareness and Correction Strategies Among Forensic Evaluators

Forensic Training AcademyInitial inquiry into bias awareness in forensic psychological evaluation sheds light on bias awareness and correction strategies and informs practice and future research. This is the bottom line of a recently published article in Psychology, Public Policy, and Law. Below is a summary of the research and findings as well as a translation of this research into practice.

Forensic Psychologists’ Perceptions of Bias and Potential Correction Strategies in Forensic Mental Health Evaluation | Psychology, Public Policy, and Law | 2016, Vol. 22, No. 1, 58-76 pppl

Forensic Psychologists’ Perceptions of Bias and Potential Correction Strategies in Forensic Mental Health Evaluation


Tess M.S. Neal, Arizona State University
Stanley L. Brodsky, The University of Alabama


A qualitative study with 20 board-certified forensic psychologists was followed up by a mail survey of 351 forensic psychologists in this mixed-methods investigation of examiner bias awareness and strategies used to debias forensic judgments. Rich qualitative data emerged about awareness of bias, specific biasing situations that recur in forensic evaluations, and potential debiasing strategies. The continuum of bias awareness in forensic evaluators mapped cogently onto the “stages of change” model. Evaluators perceived themselves as less vulnerable to bias than their colleagues, consistent with the phenomenon called the “bias blind spot.” Recurring situations that posed challenges for forensic clinicians included disliking or feeling sympathy for the defendant, disgust or anger toward the offense, limited cultural competency, preexisting values, colleagues’ influences, and protecting referral streams. Twenty-five debiasing strategies emerged in the qualitative study, all but 1 of which rated as highly useful in the quantitative survey. Some of those strategies are consistent with empirical evidence about their effectiveness, but others have been shown to be ineffective. We identified which strategies do not help (such as introspection), focused on promising strategies with empirical support, discussed additional promising strategies not mentioned by participants, and described new strategies generated by these participants that have not yet been subjected to empirical examination. Finally, debiasing strategies were considered with respect to future directions for research and forensic practice.


bias, forensic, expert judgment, decision making, mixed methods, qualitative

Summary of the Research

“A historical controversy has existed in the legal and psychological literature regarding whether objectivity on part of the expert is possible. Social psychological literature attests to the difficulty people may have in divorcing their decisions from cognitive and emotional biases, and experts are not immune from these biases. Some researchers suggest forensic mental health evaluators underestimate the prevalence and severity of such influences on their work, but the literature does not address forensic clinicians’ personal experiences with bias or how they try to correct for them” (p. 58). “To address these gaps in the literature, the present study was designed to explore forensic clinicians’ experiences with and perceptions of biases, and to investigate the strategies they use to try to mitigate perceived biases. A few suggestions exist for how forensic clinicians might consider the impact of bias, such as actively generating alternative conclusions, identifying and using relevant base rates, minimizing the role of memory, and identifying and weighing the most valid sources of data. However, much remains to be learned about strategies used to manage bias” (p. 59).
“We asked what influences might bias clinicians in the forensic context, and what strategies forensic clinicians would report using to reduce bias. Qualitative methods were used in the first study, as little previous literature has asked these kinds of questions. We followed up with a complementary large-scale survey of forensic mental health professionals to generate more representative answers to our questions, asking participants to rate the perceived usefulness of various bias-correction strategies that emerged from the qualitative analysis” (p. 60). The first, qualitative study concluded that awareness of bias falls on a continuum. “Some of the clinicians immediately dismissed the possibility of bias in their work. For these psychologists, the objectivity mandate in forensic work may be so salient and accessible that it generates defensiveness when thinking or talking about any possibility of bias. For others, the topic may be less threatening, and they were able to reflect on the possibility of bias in their work, with some clinicians able to identify specific areas of potential bias in their work” (p. 63). Another interesting finding was that, “Participants had no trouble identifying bias in their colleagues, but fewer reported ever having any concern about their own potential biases… These results are consistent with research showing that people perceive themselves as less vulnerable to bias than others. Pronin [and colleagues] found that this ‘bias blind spot’ persisted even when people were explicitly taught how various specific biases could have affected their assessments” (p. 69).
The second, quantitative study used a survey to assess the use of debiasing strategies of 351 forensic psychologists. The items on this survey included the 25 debiasing strategies generated during study 1 including: critically examining conclusions, basing conclusions on sound data, taking careful notes during evaluation, reading professional literature, consulting with colleagues, continuous introspection about personal biases, emotional disengagement, and many other methods for counteracting bias. Most of these strategies were endorsed as being useful by the participants in the second study. While some of these strategies have been identified by relevant literature as useful techniques (i.e., professional training, taking time to think before writing a report, consulting colleagues, using structured evaluation methods), other strategies suggested by forensic mental health professionals, such as introspection, have been deemed ineffective by research. “Consistent with what the social psychological literature about the bias blind spot would suggest, all of the participants in the qualitative study emphasized introspection as their primary strategy for identifying potential biases. Introspection was similarly one of the highest rated strategies in the follow-up quantitative study with a much larger and more representative sample of forensic clinicians. Unfortunately, introspection is not a realistic strategy for debiasing success. Classic psychological science shows that people have little or no direct introspective access to higher-order cognitive processes. In fact, Pronin and Kugler described introspection as a cognitive illusion that actually functions as a source of the bias blind spot. They showed that people rely on overt behavior to assess bias in other people but that they look inward for biased motives when assessing for bias in themselves.… Forensic clinicians may believe that they can identify and then work on their biases after identifying them via introspection, however, the common cognitive “bias blind spot” is likely to prevent the success of this endeavor, as described previously. After engaging in the ineffective strategy of introspection, forensic clinicians may develop a false confidence that they are bias-free, a confidence they may convey to the courts. Pronin and Kugler did report one encouraging finding for forensic mental health professionals. The one situation in which their participants ceased denying their relative susceptibility to bias was when they were educated about the fallibility of introspection” (p. 72).

Translating Research into Practice

“…participants described the importance of training about “objectivity” in the abstract, whereas research indicates educational curricula need to focus concretely on how humans make decisions and what can go awry (and why) to educate clinicians effectively about bias and correction strategies. Graduate school coursework, internship and postdoctoral didactics, and continuing education workshops should explicitly focus on the psychology of decision making to train clinicians how bias might affect their work and what they can do about it” (p. 72).
“One of the strategies that emerged in our qualitative study was examining patterns of personal decision making (e.g., agreement with referral party preferences), which was also rated as useful in the larger survey. This strategy represents a behavioral marker forensic clinicians could use to examine their potential biases rather than introspection. These findings suggest that educational curricula in forensic psychology might focus directly on the reasons why introspection is not a useful bias recognition or mitigation strategy and stress attention to behavioral markers instead of introspection to examine one’s own biases” (p. 72-73).

Other Interesting Tidbits for Researchers and Clinicians

“In addition to ideas for future directions as discussed above, other future directions include investigating individual differences in biases and bias awareness. For example, how do individual differences in personality traits (e.g., openness to experience) and cognitive styles (e.g., rational vs. experiential modes of thinking) relate to biases and bias awareness? Is the “size” or “strength” of an individual forensic clinician’s bias blind spot systematically related to level of confidence, perhaps with larger or stronger blind spots associated with overconfidence? Neal and Brodsky showed that forensic clinicians with higher occupational socialization scores were more likely to believe in their ability to be objective in forensic work. Four questions follow: How does the occupational socialization process affect the perceived need for relying on bias-correction strategies? How does the strength of belief in one’s objectivity relate to the perceived need for implementing strategies to reduce bias (for oneself and others)? Might clinicians with more pride in their professional identity be paradoxically more biased because of desire for and confidence in objectivity? Does confidence in one’s ability to be objective paradoxically increase bias by preventing the use of strategies to mitigate bias?” (p. 73).
“If a single overarching research need calls out from these two studies, it is to investigate systematically the degree to which the promising strategies actually reduce bias. Once that becomes known, then the task is to mobilize ways in which such effective strategies become part of routine practice. The field is ripe for controlled and responsible study of issues of bias and generating empirically supported methods for improving clinical judgment and decision-making. On that foundation can we develop pedagogical and workplace interventions for implementing accountability and reducing bias in assessments” (p. 73).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored By Marissa Zappala

Marissa Zappala is currently a second-year Master’s student in the Forensic Psychology program at John Jay College of Criminal Justice in New York. Her main research interests include cognitive biases, forensic assessment, and evaluator training and education. Following her Master’s, Marissa plans to pursue a doctoral degree in clinical psychology and an eventual career in psychological assessment.