When assessing risk for sexual recidivism, use of actuarial tools that were developed using relevant local samples—as opposed to professional judgment and global actuarial tools—is recommended. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.
Featured Article | Law and Human Behavior | 2018, Vol. 42, No. 3, 269–279
The Home-Field Advantage and the Perils of Professional Judgement: Evaluating the Performance of the Static-99R and the MnSOST-3 in Predicting Sexual Recidivism
Grant Duwe, Minnesota Department of Corrections, St. Paul, Minnesota
Michael Rocque, Bates College
When sex offenders in Minnesota are assigned risk levels prior to their release from prison, correctional staff frequently exercise professional judgment by overriding the presumptive risk level per an offender’s score on the Minnesota Sex Offender Screening Tool – 3 (MnSOST-3), a sexual recidivism risk-assessment instrument. These overrides enabled us to evaluate whether the use of professional judgment resulted in better predictive performance than did reliance on “actuarial” judgment (MnSOST-3). Using multiple metrics, we also compared the performance of a home-grown instrument (the MnSOST-3) with a global assessment (the revised version of the Static-99 [Static-99R]) in predicting sexual recidivism for 650 sex offenders released from Minnesota prisons in 2012. The results showed that use of professional judgment led to a significant degradation in predictive performance. Likewise, the MnSOST-3 outperformed the Static-99R for both sexual recidivism measures (rearrest and reconviction) across most of the performance metrics we used. These results imply that actuarial tools and home-grown tools are preferred relative to those that include professional judgment and those developed on different populations.
risk assessment, recidivism, sex offender, MnSOST-3, Static-99R
Summary of the Research
“Meta-analyses have indicated the average recidivism rate for sex offenders tends to be around 13% within 4–5 years, which is lower than estimates made by the public. This does not mean, of course, that sex offenders necessarily represent a low threat to public safety, because sexual offending is often seen as more dangerous and potentially damaging than are other types of criminal acts. Yet, not all sex offenders are created equally, for some are more at risk for sexual recidivism than are others.” (p. 269)
“Because sex offenders do not represent a monolithic class of high-risk offenders but rather vary tremendously with respect to recidivism risk, assessing their sexual recidivism risk is important for guiding treatment strategies and improving public safety. Given that research has demonstrated that clinical judgment does a poor job in predicting recidivism, a number of actuarial risk-assessment tools have been created specifically to classify sex offenders. […] Although risk-assessment tools have long been utilized, the ongoing revisions to the primary tools available suggest they are works in progress. Among the unresolved issues within the sex offender risk-assessment literature, there are two in particular that have received relatively little empirical scrutiny to date. First, even though it is now generally accepted that actuarial instruments outperform clinical judgment in predicting recidivism, the question of whether clinical judgment is a useful supplement to actuarial tools remains open. […] Second, it is unclear whether tools developed and validated specifically for one population are appropriate or as effective for other populations.” (p. 269–270)
“To address these questions, we analyzed sexual recidivism outcomes over a 4-year follow-up period for 650 sex offenders who had been scored on both the Static-99R and the Minnesota Sex Offender Screening Tool–3 prior to their release from Minnesota prisons in 2012. […] Although most of the sex offenders in our sample received a presumptive risk level according to their MnSOST-3 score, MnDOC [Minnesota Department of Corrections] staff can override the MnSOST-3 and assign a different risk level based on their professional judgment. The presence of these overrides enabled us to assess whether the use of professional judgment, in addition to actuarial tools, increases the accuracy of classification decisions. Moreover, because the 650 offenders were each assessed on the Static-99R and the MnSOST-3, we compared the predictive performance of these two instruments to determine whether there is a home-field advantage in sex offender risk assessment. Finally, we carried out a comprehensive assessment of predictive performance by using six different metrics.” (p. 270)
“Research has shown that clinical observations are relatively ineffective in discriminating between those who present higher from lower risk of reoffending. Studies evaluating the performance of actuarial tools and unguided clinical observation have tended to indicate clinical observation degrades predictive ability. […] In analyses of whether professional overrides improve predictive performance, research has also suggested actuarial tools work best without such changes. […] Although actuarial instruments generally outperform clinical judgment, their overall performance in predicting recidivism has varied widely across validation studies. Therefore, the question remains as to whether clinical judgment remains a useful tool for practitioners in the face of uncertainty or when information not considered by actuarial instruments is available. […] Some have suggested that due to the highly political nature of sex offender management, as well as the highly variable nature of the population, some degree of professional judgment is needed. Others, however, have suggested that risk-assessment approaches using actuarial tools often fail to translate to risk reduction. […] Whether some degree of “judgment” is necessary or even practical as a supplement to actuarial tools has not been determined.” (p. 270)
“Prior to their release from prison, sex offenders in Minnesota are assigned risk levels, which, in turn, determine the extent to which the community will be notified. Prisoners subject to predatory offender registration are assigned a risk level prior to their release from prison by an End of Confinement Review Committee (ECRC), which is composed of the prison warden or treatment facility head where the offender is confined, a law enforcement officer, a sex offender treatment professional, a caseworker experienced in supervising sex offenders, and a victim services professional. Following the ECRC meetings, sex offenders are assigned a Level 1 (lower risk), Level 2 (moderate risk), or Level 3 (higher risk). […] Before receiving a risk-level assignment from ECRCs, offenders are assessed for sexual recidivism risk by MnDOC staff from the Risk Assessment/Community Notification (RACN) unit. […] In assigning risk levels, ECRCs consider scores from actuarial risk-assessment tools as well as additional factors that ostensibly increase or decrease the risk of reoffense (e.g., an offender’s stated intention to reoffend following release or a debilitating illness or physical condition). As a result, ECRCs may override the risk level suggested by the risk-assessment tool. […] ECRCs overrode the MnSOST-3’s presumptive risk level in roughly half the cases involving offenders released from prison in 2012.” (p. 270–271)
“Actuarial tools, which draw upon a combination of empirically informed measures to create an overall risk score, can provide both absolute and relative risk assessments of offenders. Relative risk assessment simply provides information concerning whether an individual is more or less likely to reoffend than are others. Absolute risk assessment, on the other hand, provides an estimate of how likely it is that the individual will reoffend within a specific period of time. […] Estimates of absolute recidivism risk, however, are influenced by the base rate observed within the offender sample used to develop an instrument. […] In addition to the base rate, other differences between a tool’s development sample and the population on which the instrument is administered could potentially affect predictive validity. […] [It is imperative] to ensure tools are effective in populations outside of those in which they were developed.” (p. 271)
“One of the earlier actuarial tools developed was the MnSOST, which was updated to the MnSOST-3. […] In 2012, Duwe and Freske (2012) significantly revised the MnSOST–R with their development of the MnSOST-3. The sample used to develop the MnSOST-3 consisted of 2,535 sex offenders released from Minnesota prisons. […] The most popular tool in North America among criminal justice agencies is the Static-99, developed in the 1990s and updated to its Static-99R version. […] Originally developed using data from samples of sex offenders in Canada and the United Kingdom, the Static-99 is a “global” risk-assessment instrument that is the most widely used around the world. […]” (p. 276, 271–272)
“Our overall sample consists of 650 sex offenders released from Minnesota prisons in 2012 who had been scored on both the MnSOST-3 and the Static-99R. […] In comparing professional judgment with actuarial assessments in predicting recidivism, we used a subsample of 441 cases from the overall sample of 650 offenders. […] The predicted outcome in this study is sex offense recidivism, which we measured as a rearrest and reconviction. Consistent with the development of the MnSOST-3, we measured recidivism over a 4-year follow-up period from the date of the offender’s release from prison in 2012. Recidivism data were collected on offenders through December 31, 2016.” (p. 272)
“Among the 650 sex offenders in this study, 26 (4.0%) were rearrested for a new sex offense within 4 years of their release from prison in 2012. Of the 26 who were rearrested, 13 (2.0% of the 650) were reconvicted.” (p. 273)
“This study directly compared the MnSOST-3 and the Static-99R within a sample of Minnesota sex offenders who were scored with each tool. Findings demonstrated that the MnSOST-3 performed better than did the Static-99R on virtually all the metrics we used for both measures of sexual recidivism. Moreover, we examined the impact of professional judgment or clinical override on classification decisions by comparing the performance of presumptive and assigned risk levels in predicting sexual recidivism. If the ECRC overrides, which are professional judgment supplements to the actuarial tool, add incremental predictive validity, this would be evidence of the value of professional judgment. However, our results indicated unequivocally that clinical judgment in the form of overrides decreased predictive performance, which offers additional evidence that empirically based actuarial tools are superior to professional judgment.” (p. 276)
“It is interesting that the literature seems clear that professional judgment performs worse than do actuarial methods irrespective of the background of the professional making the observation or whether that judgment is structured or unstructured. This is true even for clinical judgment used in combination with actuarial tools. Some research has noted that raters are unfamiliar with or do not use base rate information appropriately in assigning risk. Another possibility is that judgment, whether structured or not, necessarily involves a higher degree of subjectivity than do actuarial measures and therefore are poorer in terms of prediction. Finally, it may be the case […] that clinical judgment often utilizes factors that are not related to recidivism.” (p. 276)
Translating Research into Practice
“Our study holds several important implications for research, policy, and practice. […] Given that the MnSOST-3 outperformed the Static-99R for our sample of Minnesota sex offenders, the results suggest local instruments may have a home-field advantage. To be sure, there are differences between the two instruments in terms of the items included and the classification methods used to develop the tools. In fact, to better demonstrate whether local instruments have a home-field advantage over global assessments, future research should attempt to more effectively isolate the effects of using a customized assessment compared to an imported instrument. Still, the evidence presented here suggests there may be value in applying an instrument to the same, or at least similar, population on which it was developed and validated.” (p. 277)
“In our view, home-grown instruments developed and validated within a particular population are the best option when considering tools for that population. Of course, many jurisdictions will not have a validated actuarial tool that was customized specifically for their own offender populations. In that case, universal tools (i.e., those developed using several populations, such as the Static-99 family) may be a good option, although such tools should be developed and validated on samples that are truly universal. Put another way, the population on which an instrument is being used should be very similar to the one on which the assessment was developed and validated. When a global instrument is used, it cannot be assumed the tool will deliver the same performance for a different assessment population. […] To understand whether a particular tool is effective with an agency’s population, one must evaluate the tool’s predictive performance on that population.”
“Our findings provide one more “nail in the coffin” for the value of clinical judgment in making recidivism predictions. Although some evidence exists that certain factors (dynamic ones in particular) may improve tools like the Static-99, the vast majority of empirical research has demonstrated that actuarial tools significantly outperform professional judgment. This does not mean clinical judgment is not important for the purposes of guiding treatment. Rather, when sex offenders are classified for recidivism risk-assessment purposes, actuarial tools should be the preferred method.” (p. 277)
“Given the consistently demonstrated superiority of actuarial assessments in predicting recidivism, we suggest it may be prudent to limit the extent to which professional judgment is used. Reducing the use of clinical judgment may involve restricting not only the types of cases in which overrides would be admissible but also how much an override would be allowed to deviate from an actuarial assessment. […] To develop guidelines that provide greater structure and clarity on when overrides are permissible, future research is needed to examine the conditions under which clinical judgment actually improves classification decisions or, at a minimum, does no worse than do actuarial assessments.” (p. 277)
Other Interesting Tidbits for Researchers and Clinicians
“Existing research on the validation of sex offender risk-assessment tools has often relied a single metric—namely, the AUC. As we noted earlier, the AUC has its strengths, but it also has some weaknesses. We suggest that future validation research begin using alternative measures of predictive discrimination such as Hand’s (2009) H measure and the precision-recall curve. But given that predictive discrimination addresses only one dimension of predictive validity, metrics that assess accuracy and calibration should also be used to provide a more comprehensive evaluation of predictive performance. As this study illustrates, accuracy metrics are informative for imbalanced data sets so long as there are at least some predicted positives in the data set. Moreover, if researchers and practitioners must rely on a single metric, we suggest that either the SAR or SHARP statistics would be preferable because both tap into multiple dimensions of predictive validity.” (p. 277)
“The AUC values for both the Static-99R and MnSOST-3.1 were lower in comparison to what most of the existing research has reported for either instrument. Much of this research, as we indicated earlier, has consisted of assessments that were scored for research purposes. In this study, we used assessments that had been scored by correctional staff for operational purposes, which provide what is arguably a truer test of predictive performance. Compared to field assessments, those administered strictly for the sake of research may yield overly optimistic estimates of predictive performance due to more favorable conditions in which raters are likely to have had more recent, thorough training. To provide a more realistic estimate of how sex offender risk-assessment tools perform in practice, future research should begin relying more on assessments performed by field staff. In addition, the results suggest that caution may be warranted in using an instrument whose predictive performance has yet to be evaluated on real-world assessments” (p. 277)
“Due to several limitations, however, these findings should be regarded as somewhat preliminary. First, because our study was confined to sex offenders from a single jurisdiction, it is unclear the extent to which the findings are generalizable. Second, the sample we used was relatively small (N = 650), and it was limited to releases over one calendar year. Third, similar to the case in prior research, the better findings for the MnSOST-3 may reflect an “allegiance effect” in which its scoring and use by MnDOC staff has been more consistent with its design in comparison to the Static-99R.” (p. 276)
Join the Discussion
As always, please join the discussion below if you have thoughts or comments to add!
Authored by Kseniya Katsman
Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.