Why Do Forensic Experts Disagree? Suggestions for Policy and Practice Changes

Unreliable opinions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as diminish confidence in psychological expertise within the legal system. This is the bottom line of a recently published article Translational Issues in Psychological Science. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Translational Issues in Psychological Science | 2017, Vol. 3, No. 2, 143-152

Why Do Forensic Experts Disagree? Sources of Unreliability and Bias in Forensic Psychology Evaluations


Lucy A. Guarnera, University of Virginia
Daniel C. Murrie, University of Virginia School of Medicine
Marcus T. Boccaccini, Sam Houston State University


Recently, the National Research Council, Committee on Identifying the Needs of the Forensic Science Community (2009) and President’s Council of Advisors on Science and Technology (PCAST; 2016) identified significant concerns about unreliability and bias in the forensic sciences. Two broad categories of problems also appear applicable to forensic psychology: (1) unknown or insufficient field reliability of forensic procedures, and (2) experts’ lack of independence from those requesting their services. We overview and integrate research documenting sources of disagreement and bias in forensic psychology evaluations, including limited training and certification for forensic evaluators, unstandardized methods, individual evaluator differences, and adversarial allegiance. Unreliable opinions can result in arbitrary or unjust legal outcomes for forensic examinees, as well as diminish confidence in psychological expertise within the legal system. We present recommendations for translating these research findings into policy and practice reforms intended to improve reliability and reduce bias in forensic psychology. We also recommend avenues for future research to continue to monitor progress and suggest new reforms.


forensic evaluation, forensic instrument, adversarial allegiance, human factors, bias

Summary of the Research

“Imagine you are a criminal defendant or civil litigant undergoing a forensic evaluation by a psychologist, psychiatrist, or other clinician. The forensic evaluator has been tasked with answering a difficult psycholegal question about you and your case. For example, ‘Were you sane or insane at the time of the offense? How likely is it that you will be violent in the future? Are you psychologically stable enough to fulfill your job duties?’ The forensic evaluator interviews you, reads records about your history, speaks to some sources close to you, and perhaps administers some psychological tests. The evaluator then forms a forensic opinion about your case—and the opinion is not in your favor. You might wonder whether most forensic clinicians would have reached this same opinion. Would a second (or third, or fourth) evaluator have come to a different, perhaps more favorable conclusion? In other words, how often do forensic psychologists disagree? And why does such disagreement occur?” (p. 143-144)

“While forensic evaluators strive for objectivity and seek to avoid conflicts of interest, a forensic opinion may be influenced by multiple sources of variability and bias that can be powerful enough to cause independent evaluators to form different opinions about the same defendant” (p. 144).

“Interrater reliability is the degree of consensus among multiple independent raters. Of particular
interest within forensic psychology is field reliability—the interrater reliability among practitioners performing under routine practice conditions typical of real-world work. In general, the field reliability of forensic opinions is either unknown or far from perfect” (p. 144).

“Besides the unreliability that may be intrinsic to a complex, ambiguous task such as forensic evaluation, research has identified multiple extrinsic sources of expert disagreement. One such source is limited training and certification for forensic evaluators. While specialized training programs and board certifications have become far more commonplace and rigorous since the early days of the field in the 1970s and 1980s, the training and certification of typical clinicians conducting forensic evaluations today remains variable and often poor” (p. 145).

“This training gap is important because empirical research suggests that evaluators with greater training produce more reliable forensic opinions” (p. 145).

“One likely reason why training and certification increase interrater reliability is that they promote standardized evaluation methods among forensic clinicians. While there are now greater resources and consensus concerning appropriate practice than even a decade ago, forensic psychologists still vary widely in what they actually do during any particular forensic evaluation… This diversity of methods—including the variety and at times total lack of structured tools—is likely a major contributor to disagreement among forensic evaluators” (p. 146).

“Even within the category of structured tools, research shows that forensic assessment instruments with explicit scoring rules based on objective criteria yield higher field reliability than instruments involving more holistic or subjective judgments” (p. 146).

“In addition to evaluators’ inconsistent training and methods, patterns of stable individual differences among evaluators—as opposed to mere inaccuracy or random variation—seem to contribute to divergent forensic opinions… Stable patterns of differences suggest that evaluators may adopt idiosyncratic decision thresholds that consistently shift their forensic opinions or instrument scores in a particular direction, especially when faced with ambiguous cases” (p. 146).

“Upon these concerns about unknown or less-than-ideal field reliability of forensic psychology procedures, we now add concerns about forensic experts’ lack of independence from those requesting their services. As far back as the 1800s, legal experts have lamented the apparent frequency of scientific experts espousing the views of the side that hired them (perhaps for financial gain), leading one judge to comment,
‘[T]he vicious method of the Law, which permits and requires each of the opposing parties to summon the witnesses on the party’s own account[,] . . . naturally makes the witness himself a partisan’. More modern surveys continue to identify partisan bias as judges’ main concern about expert testimony, citing experts who appear to “abandon objectivity” and “become advocates” for the retaining party” (p. 147).

Translating Research into Practice

“While many clinicians cite introspection (i.e., looking inward in order to identify one’s own biases) as a primary method to counteract personal ideology, idiosyncratic responses to examinees, and other individual differences research suggests that introspection is ineffective and may even be counterproductive. Thus, more disciplined changes to personal practice are needed. For example, when conducting evaluations for which well-validated structured tools exist, evaluators could commit to using such tools as a personal standard of practice. This would entail justifying to themselves (or preferably colleagues) why they did or did not use an available tool for a particular case. Practicing forensic evaluators could also use simple debiasing methods to counteract confirmation bias, such as the ‘consider-the-opposite’ technique in which evaluators ask themselves, ‘What are some reasons my initial judgment might be wrong?’ To increase personal accountability, evaluators could keep organized records of their own forensic opinions and instrument scores, or even help organize larger databases for evaluators within their own institution or locality. Using these personal data sets, evaluators might look for mean differences in their own instrument scores when retained by the prosecution versus the defense, or compare their own base rates of incompetency and insanity findings to those of their colleagues. Ambitious evaluators could even experiment with blinding themselves to the source of referral in order to counteract adversarial allegiance” (p. 149).

“Although individual evaluators can make many voluntary changes today in order to reduce the impact of unreliability and bias on their forensic opinions, other reforms require widerranging structural transformation. For example, state-level legislative action is needed to mandate more than one independent forensic opinion. Requiring more than one independent opinion is a powerful way to combat unreliability and bias by reducing the impact of any one evaluator’s error” (p. 149).

“Even slower to change than state legislation and infrastructure might be existing legal norms, such as judges’ current willingness to admit nonblinded, partisan experts. While authoritative calls to action like the NRC and PCAST reports may have some influence, most legal change only happens by the accretion of legal precedent, which is a slow and unpredictable process” (p. 149-150).

Other Interesting Tidbits for Researchers and Clinicians

“Foundational research should establish field reliability rates for various types of forensic evaluations in order to assess the current situation and gauge progress toward improvement. Only a handful of field reliability studies exist for a few types of forensic evaluations (i.e., adjudicative competency, legal sanity, conditional release), and virtually nothing is known about the field reliability of other types of evaluations, particularly civil evaluations” (p 144-145).

“Given that increased standardization of forensic methods has the potential to ameliorate multiple sources of unreliability and bias described here, more investigation of forensic instruments, checklists, practice guidelines, and other methods of standardization is a second research priority. Some of this research should continue to focus on creating standardized tools for forensic evaluations and populations for which none are currently available, particularly civil evaluations such as guardianship, child protection, fitness for duty, and civil torts like emotional injury. Future research can also continue to seek improvements to the currently modest predictive accuracy of risk assessment instruments. However, given the current gap between the availability of forensic instruments and their limited use by forensic evaluators in the field, perhaps more pressing is research on the implementation of forensic instruments in routine practice. More qualitative and quantitative investigations of how instruments are administered in routine practice, why instruments are or are not used, and what practical obstacles evaluators encounter are needed. Without greater understanding of how instruments are (or are not) implemented in practice—particularly in rural or other under resourced areas— continuing to develop new tools may not translate to their increased use in the field” (p. 148).

“A clear recommendation for improving evaluator reliability is that states without standards for the training and certification of forensic experts should adopt them, and states with weak standards (e.g., mere workshop attendance) should strengthen them. What is less clear, however, is what kinds and doses of training can improve reliability with the greatest efficiency. Drawing from extensive research in industrial and organizational psychology, credentialing requirements that mimic the type of work evaluators do as part of their job (e.g., mock reports, peer review, apprenticing) may foster professional competency better than requirements dissimilar to job duties (e.g., written tests). Given that both evaluators and certifying bodies have limited time and resources, research into the most potent ingredients of successful forensic credentialing is a third research priority” (p. 148-149).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Beltrani

Amanda Beltrani is a current graduate student in the Forensic Psychology Masters program at John Jay College of Criminal Justice in New York. Her professional interests include forensic assessments, specifically, criminal matter evaluations. Amanda plans to continue her studies in a doctoral program after completion of her Masters degree.

Adversarial Allegiance May Be More Likely When Evidence Is Flawed

Forensic Training AcademyEvidence features may have an effect on the presence of adversarial allegiance in forensic evaluators. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Law and Human BehaviorFeatured Article | Law and Human Behavior | 2016, Vol. 40, No. 5, 524-535


Adversarial Allegiance: The Devil is in the Evidence Details, Not Just on the Witness Stand


Bradley D. McAuliff California State University, Northridge
Jeana L. Arter California State University, Northridge


This study examined the potential influence of adversarial allegiance on expert testimony in a simulated child sexual abuse case. A national sample of 100 witness suggestibility experts reviewed a police interview of an alleged 5-year-old female victim. Retaining party (prosecution, defense) and interview suggestibility (low, high) varied across experts. Experts were very willing to testify, but more so for the prosecution than the defense when interview suggestibility was low and vice versa when interview suggestibility was high. Experts’ anticipated testimony focused more on prodefense aspects of the police interview and child’s memory overall (negativity bias), but favored retaining party only when interview suggestibility was low. Prosecution-retained experts shifted their focus from prodefense aspects of the case in the high suggestibility interview to proprosecution aspects in the low suggestibility interview; defense experts did not. Blind raters’ perceptions of expert focus mirrored those findings. Despite an initial bias toward retaining party, experts’ evaluations of child victim accuracy and police interview quality were lower in the high versus low interview suggestibility condition only. Our data suggest that adversarial allegiance exists, that it can (but not always) influence how experts process evidence, and that it may be more likely in cases involving evidence that is not blatantly flawed. Defense experts may evaluate this type of evidence more negatively than prosecution experts because of negativity bias and positive testing strategies associated with confirmation bias.


expert testimony, adversarial allegiance, negativity bias, confirmation bias, suggestibility

Summary of the Research

“The use of expert witnesses in jury trials is common, especially when evidence is technical or difficult to understand. Experts are assumed to be neutral, objective parties who disseminate information to jurors; however, the adversarial system may bias experts in favor of the side that retained them (prosecution or defense). This tendency is known as adversarial allegiance.” (p. 524).

Adversarial allegiance means “that retention by or affiliation with a party in a legal proceeding may create bias that influences the expert’s thoughts, feelings, and behavior in favor of the retaining or affiliated party. Of course experts, like any other witness, can intentionally and consciously bias their testimony in favor of one party. However, such behavior would be unethical and violate established professional practice guidelines (e.g., Ethical Principles of Psychologists and Code of Conduct [American Psychological Association, 2010] and Specialty Guidelines for Forensic Psychology [American Psychological Association, 2013]). Furthermore, procedural safeguards such as cross-examination, opposing expert testimony, and even the threat of prosecution in extreme cases hopefully should minimize the presence of deliberate bias in the courtroom. As a result, researchers have concentrated primarily on the unintentional and unconscious bias stemming from adversarial allegiance” (p. 525).

“In the present study, we sought to advance our scientific understanding of adversarial allegiance. We used an experimental paradigm to randomly assign a previously unstudied population (witness suggestibility experts) from across the United States to conditions that systematically varied the retaining party in a simulated child sexual abuse case. We also manipulated certain features of the evidence—specifically, whether the police interview of the alleged child victim was low or high in suggestibility—to determine whether this variable moderated adversarial allegiance. No other published research has examined how evidence features might interact with retaining party to influence adversarial allegiance. Identifying the mechanisms underlying adversarial allegiance [researchers say], “is a crucial next step in understanding allegiance and, in turn, intervening to reduce allegiance” (p. 525).

The current study manipulated the suggestibility evidence provided to each group. “Suggestibility is the extent to which certain cognitive, social, and developmental factors influence an individual’s ability to encode, store, retrieve, and report an event. Scholars have studied suggestibility since the early 1900s; however, an unprecedented surge in research occurred after extreme allegations of child sexual abuse in preschool daycare settings surfaced in the early 1990s (e.g., McMartin Preschool case in California, Kelly Michaels case in New Jersey). Since that time, our scientific understanding of how the accuracy of memory can be influenced by suggestive questions has increased dramatically. For example, scholars in the field know that younger (vs. older) children answering leading (vs. open-ended questions) from a high (vs. low) authority interviewer are less accurate” (p. 525).

“Experts may impart their knowledge of witness suggestibility and interview protocols to jurors in court. Although jurors understand age-related trends in suggestibility, they lack crucial knowledge about how factors such as leading questions and interviewer authority can increase suggestibility and reduce accuracy. Expert testimony on these issues has been shown to improve jurors’ understanding and decision-making” (p. 525).

“The need for and helpfulness of expert testimony on suggestibility make it a fitting backdrop to examine the potential effects of adversarial allegiance. Do experts in a child sexual abuse case selectively focus their testimony on aspects of the police interview that favor the retaining party? Do prosecution-retained experts evaluate child accuracy and interview quality more favorably than experts retained by the defense or vice versa? Our study is the first to provide answers to these important questions” (p. 525-526).

“As predicted, experts asked by the prosecution were more willing to testify when interview suggestibility was low than experts asked by the defense and vice versa when interview suggestibility was high. This finding makes sense: Experts should be more willing to testify when they have something to say and believe their testimony will help jurors understand the evidence. Presumably this would be the case for prosecution experts reviewing a low suggestibility interview (“The interview was sound and does not raise concerns about the child’s accuracy”) and for defense experts reviewing a high suggestibility interview (“The interview was unsound and raises concerns about the child’s accuracy”)… Experts in our study may have perceived their testimony as being more relevant to the case and more helpful to jurors when the evidence favored the party soliciting their testimony than when it did not” (p. 531).

“Experts focused more on prodefense aspects of the case overall, but favored the retaining party only when interview suggestibility was low. Blind raters detected this bias. These results did not support our crossover interaction hypothesis. Prosecution-retained experts were more proprosecution (both in terms of the number and proportion of statements reported) and defense-retained experts were more prodefense (proportion of statements only) when interview suggestibility was low but not high. This evidence of adversarial allegiance is consistent with previous research” (p. 531).

Translating Research into Practice

“One potential theoretical explanation for the adversarial allegiance in our study is confirmation bias, which refers to “an inclination to retain, or a disinclination to abandon, a currently favored hypothesis.” A key determinant in this process is information gathering and assimilation. Wason’s “rule discovery” paradigm demonstrated that people engage in positive testing strategies in which they search for evidence that confirms, rather than disconfirms, a current belief.” (p. 531).

“Much like the clinicians and medical students in previous research, experts in our study engaged in a positive testing strategy consistent with confirmation bias. Yet clinical and medical students were provided a specific hypothesis to test (i.e., whether a patient had a particular medical or psychological condition), but experts in our study were not—they simply were asked by the prosecution or defense to review case materials and testify. This highlights a vexing aspect of adversarial allegiance that we touched on in the Introduction. Experts appear to have developed their own hypotheses and implemented a positive testing strategy that was influenced by retaining party even though they were not explicitly instructed to do so. This effect cannot be attributed to preexisting individual differences (recall experts were randomly assigned to condition and there was no systematic difference in how often they had testified for the prosecution or defense in the past)” (p. 531).

“Yet confirmation bias alone cannot entirely explain the adversarial allegiance that emerged in our study. Recall interview suggestibility moderated the effects of retaining party on what aspects of the police officer’s interview and the child’s memory that experts reported they would focus on if called to testify in the case… These findings are consistent with a negativity bias or a “bad is stronger than good” effect that researchers have observed in a variety of judgment and information processing tasks. In essence, negative stimuli attract more attention, receive greater weight in evaluations, and are recalled more frequently than positive stimuli” (p. 532).

“People’s penchant for negative information helps explain why experts disproportionately focused on prodefense aspects of the case. Prodefense is synonymous with unreliable evidence, and in our simulated child sexual abuse case, the key evidence against the defendant was the police officer’s interview and the child’s memory. Experts retained by both sides appear to have been naturally inclined to focus on weaknesses rather than strengths in how the police officer interviewed the child victim and what she said in response. This negativity bias may have been enhanced by the fact that it is easier to define, and therefore, pinpoint examples of what constitutes a bad versus good interview. A single flaw (inadequate ground rules, no narrative practice, or excessive direct questions) can compromise the quality of an entire interview, but an entire interview must be practically flawless to be considered good by some experts” (p. 532).

“With respect to the legal community, our results suggest that overly simplistic conclusions about whether adversarial allegiance exists and why are dangerous. In reality, this phenomenon is quite complex and depends on myriad factors. Based on our data, we know that the evidence features and the type of measures matter; however, more work is needed before drawing definitive conclusions for legal professionals. Tentatively we can suggest to judges and attorneys that adversarial allegiance exists, that it can (but not always) influence how experts process evidence, and that it may be more likely in cases involving evidence that is not blatantly flawed. What is striking about this conclusion is that from a statistical standpoint, experts are more likely to encounter evidence that rests at the middle of the quality distribution than either extreme end. Completely good or completely bad police interviews of children are much less common than interviews that are “sort of” good or bad. From this perspective, adversarial allegiance is probably more common than previously thought” (p. 533).

“That said, we must not overlook that experts in our study were able to correctly distinguish between a low versus high suggestibility police interview of a child and that adversarial allegiance did not significantly influence their evaluations of the child’s accuracy and quality of the police interview. Experts’ understanding of witness suggestibility and jurors’ lack thereof demonstrates that expert testimony on these issues should satisfy the helpfulness requirement of FRE Rule 702 and, therefore, be admissible in court. Courts that routinely disallow witness suggestibility expert testimony on the grounds that it is not helpful to jurors would be wise to reconsider their reasoning accordingly” (p. 534).

Other Interesting Tidbits for Researchers and Clinicians

“Our adversarial allegiance results have implications for the social scientific and legal communities. No other published research has examined how evidence features might interact with retaining party to influence adversarial allegiance. We did and observed that adversarial allegiance influenced experts only when the evidence did not contain egregious errors. In this sense, the devil is in the evidence details, not just on the witness stand” (p. 533).

“Future research examining different features of other types of evidence and experts with varying degrees of prior testimony for the prosecution, defense, or both will help advance the state of social science on adversarial allegiance. Our study also suggests that the distinction between process- and outcome-oriented variables is important. Focusing exclusively on one or the other in our study would have dramatically changed the conclusions we drew. Researchers should strive to include both types of variables in future work so that we can make more sophisticated conclusions about the effects of adversarial allegiance on how experts examine evidence, as well as what they conclude” (p. 533).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Reed

Amanda L. Reed is a first year student in John Jay College of Criminal Justice’s clinical psychology doctoral program. She is the Lab Coordinator for the Forensic Training Academy. Amanda received her Bachelor’s degree in psychology from Wellesley College and a Master’s degree in Forensic Psychology from John Jay College of Criminal Justice. Her research interests include evaluator bias and training in forensic evaluation.