Addressing Trauma via Juvenile Probation Officer’s Treatment Planning

Juvenile Probation Officers recognize trauma exposure and posttraumatic stress symptom information, but do not prioritize such information as a rehabilitation target during the case planning process. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2018, Vol. 42, No. 4, 369-384

Juvenile Probation Officers’ Evaluation of Traumatic Event Exposures and Traumatic Stress Symptoms as Responsivity Factors in Risk Assessment and Case Planning

Authors

Evan D. Holloway Fordham University
Keith R. Cruise Fordham University
Samantha L. Morin Fordham University
Holly Kaufman Fordham University
Richard D. Steele Pennsylvania Juvenile Court Judges’ Commission (JCJC), Harrisburg, Pennsylvania

Abstract

Juvenile probation officers (JPOs) are increasingly using risk/needs assessments to evaluate delinquency risk, identify criminogenic needs and specific responsivity factors, and use this information in case planning. Justice-involved youth are exposed to traumatic events and experience traumatic stress symptoms at a high rate; such information warrants attention during the case planning process. The extent to which JPOs identify specific responsivity factors, in general, and trauma history, specifically, when scoring risk/need assessments is understudied. In the current study, 147 JPOs reviewed case vignettes that varied by the adolescents’ gender (male vs. female), traumatic event exposure (present vs. absent), and traumatic stress symptoms (present vs. absent), and then scored the YLS/CMI and developed case plans based on that information. JPOs who received a vignette that included trauma information identified a higher number of trauma-specific responsivity factors on the YLS/CMI. Despite an overall high needs match ratio (57.2%), few JPOs prioritized trauma as a target on case plans. The findings underscore the importance of incorporating trauma screening into risk/needs assessment and case planning.

Keywords

juvenile justice, responsivity, risk assessment, RNR, trauma

Summary of the Research

“Approximately 1.5 million youth under the age of 18 are arrested each year. Regardless of whether they are detained or released, the most common disposition in the juvenile justice system is supervised probation in the community. Whether immediately
following disposition or post release from an out-of-home placement, many justice-involved youth are supervised by juvenile probation officers (JPOs) in the community. JPOs develop individualized case plans that guide specific case management and supervision strategies as well as service referrals. Increasingly, case plans are developed based on the results of structured risk assessment tools that facilitate identification of criminogenic needs (e.g., educational difficulties, unstructured leisure time) or impaired functioning (e.g., adverse living conditions, mental health problems)” (p. 369).

“Case planning should also account for current mental health symptoms given converging evidence of the elevated prevalence of mental health disorders among justice-involved youth. Often, justice-involved youth are screened for mental health concerns at probation intake and screening results inform referrals for subsequent mental health services. Researchers have begun to examine how JPOs analyze and translate results of risk assessment and mental health screening information into case plans and pre-dispositional reports. The focus of this research has been to identify how JPOs consider criminogenic needs when making case planning decisions; however, less attention has been paid to how JPO case plan decision making is affected by responsivity factors (e.g., learning styles, mental health symptoms). Thus, the aims of the current study were to examine how justice-involved youths’ histories of traumatic event exposure (TEE) and current traumatic stress symptoms (TSS) impacted JPO scoring of a risk assessment tool and whether such information was incorporated into case plans” (p. 369).

The specific intentions of the current study were to “(a) examine whether information about TEE and TSS impacted JPO scoring of the YLS/CMI, (b) identify whether the presence of TEE and TSS affected summary risk ratings on the YLS/CMI, (c) identify whether the presence of TEE and TSS affected the number of criminogenic needs and trauma-based specific responsivity ratings on the YLS/CMI, and (d) examine how often JPOs considered TEE and TSS as a relevant target on case plans. These aims were addressed through a field-based study utilizing a large sample of JPOs who have received extensive training in scoring the YLS/CMI and using risk/needs assessment results to develop case plans. Mirroring the process employed in the participants’ annual booster training, a vignette was developed that manipulated the presence of TEE and TSS to examine the impact of this information on YLS/CMI scoring and case plan development” (p. 379).

“Results were mixed regarding the impact of TEE and TSS on YLS/CMI scoring and case plans. First, there were no differences in overall risk rating between participants who received a vignette describing TEE or TSS and those who received a vignette with no
mention of trauma. Similarly, the number of high-risk needs identified on the YLS/CMI did not differ by vignette type. Second, JPOs who received a vignette describing a youth with TEE or TSS scored more trauma-relevant YLS/CMI responsivity factors. Therefore, JPOs correctly scored trauma-related information from the vignette on the corresponding section of the YLS/CMI. Although JPOs identified trauma-specific responsivity factors on the YLS/CMI, only three JPOs specifically targeted this information on the case plan. Likewise, JPOs who received a vignette with trauma information were not more likely to make a recommendation for further mental health evaluation or treatment” (p. 379).

Translating Research into Practice

“The presence of TEE and TSS did not result in elevated YLS/CMI risk scores. Although contrary to the hypothesis, this null result is in fact a positive indicator that information about history of traumatic events and specific trauma reactions do not bias ratings of criminogenic needs or inflate the overall risk level” (p. 380).

“TEE and TSS did not affect the number of high-risk needs documented on the YLS/CMI, the number of those needs targeted on the case plan, or the needs-match ratio. This finding is consistent with research demonstrating that TEE and TSS are associated with factors that interact or are related to criminogenic needs, but are not viewed as criminogenic needs on their own. The presence of TEE or TSS could have impacted the scoring of individual items comprising YLS/CMI domains…This finding has both positive and negative implications for case planning. On a positive note, the presence of TEE or TSS did not bias scoring of needs or inflate overall decisions about risk. However, when these same needs were elevated in the presence of TEE or TSS, the overall case plan results suggested that JPOs may be less likely to consider trauma as a
driver of such behaviors and not consider to what extent these needs could be addressed through trauma-specific or trauma informed interventions” (p. 380).

“The RNR model clearly delineates the relevance of specific responsivity factors when developing overall rehabilitation plans. Despite extensive training on the YLS/CMI and case planning, the fact that 30% of participants scored no specific responsivity factors suggests the need for additional training on the responsivity principle” (p. 380).

“JPOs very rarely targeted TEE or TSS for intervention on case plans; only three case plans specifically targeted trauma…This is a particularly troubling finding given the high rate of TEE and PTSD diagnoses among justice-involved youth” (p. 381).

“Just under half of the case plans included a recommendation for mental health services (counseling, therapy, or an evaluation), indicating that a number of JPOs recognized the importance of mental health services for the youth described in the vignette. However, JPOs in the TEE+ and TSS+ conditions were no more likely to recommend a general mental health evaluation or services, which indicates that the presence of trauma information did not result in a greater likelihood of mental health referrals” (p. 381).

“These findings suggest that youth under probation supervision who have a history of TEE, or are currently experiencing TSS, are unlikely to be referred or connected to trauma-specific services by their JPO. Given that youth rarely seek care on their own, such youth are unlikely to receive the potential benefits of trauma-specific assessment or treatment unless JPOs are able to identify trauma and develop case plans that support such referrals. These findings are generally consistent with previous research findings that JPOs are better able to identify externalizing symptoms (e.g., aggressive or delinquent behavior) than internalizing symptoms (e.g., sleep difficulties, negative mood, or PTSD). About 50% of JPOs included general mental health referrals in their case plans. This is a generally positive finding if it can be assumed that clinicians receiving that referral will accurately identify the specific mental health problems contributing to delinquent behavior. However, a generic mental health recommendation, in the presence of specific information about trauma-related symptoms, provides little guarantee that these symptoms will either be further evaluated or effectively treated. The purpose of rating responsivity factors on the YLS/CMI is to ensure that case planning and service referrals are properly informed and targeted. Thus, the fact that almost 30% of the current sample did not utilize the responsivity section of the YLS/CMI indicates that JPOs prioritize criminogenic needs over responsivity factors in case planning” (p. 381).

“Taken together, these findings suggest that JPOs may feel more comfortable deferring to clinicians to confirm a diagnosis and provide guidance as to how mental health information in general, and trauma information in particular, should guide case management practices. However, the relative lack of case plan strategies specifically targeting trauma in the presence of TEE and TSS is problematic; youth with this history will not be identified for further trauma screening and assessment, which represents a missed opportunity to link trauma-exposed youth to appropriate treatment services. This finding also has implications for JPOs’ role as gateway providers to mental health care among justice-involved youth with mental health concerns. For example, a recent study found that when justice-involved youth who screened positive for mental health concerns in juvenile detention were connected to mental health care, clients and their caregivers perceived their JPO as playing a gatekeeper role in their connection to care. Additionally,
recent findings suggest that receipt of mental health treatment is associated with addressing more criminogenic needs, and when case plans addressed both areas, recidivism rates were lower compared with youth with only one or neither area addressed. These findings underscore the importance of identifying and targeting mental health-based specific responsivity factors on case plans and connecting youth to appropriately matched services” (p. 381).

Other Interesting Tidbits for Researchers and Clinicians

“Future research should examine how JPOs consider the relevance of mental health-related specific responsivity factors. It is possible that JPO orientation, whether JPOs see their role as being more aligned with law enforcement or rehabilitation efforts, impacts identification of mental health difficulties and prioritizing this information on case plans. Regardless of orientation, evidence suggests that JPOs who do not feel competent to address mental health concerns with youth on their caseload may be less likely to use strategies associated with treatment” (p. 381).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Beltrani

Amanda Beltrani is a current doctoral student at Fairleigh Dickinson University. Her professional interests include forensic assessments, professional decision making, and cognitive biases.

Locally- v. globally-developed actuarial tools and professional judgment in predicting sexual recidivism

When assessing risk for sexual recidivism, use of actuarial tools that were developed using relevant local samples—as opposed to professional judgment and global actuarial tools—is recommended. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2018, Vol. 42, No. 3, 269–279

The Home-Field Advantage and the Perils of Professional Judgement: Evaluating the Performance of the Static-99R and the MnSOST-3 in Predicting Sexual Recidivism

Authors

Grant Duwe, Minnesota Department of Corrections, St. Paul, Minnesota
Michael Rocque, Bates College

Abstract

When sex offenders in Minnesota are assigned risk levels prior to their release from prison, correctional staff frequently exercise professional judgment by overriding the presumptive risk level per an offender’s score on the Minnesota Sex Offender Screening Tool – 3 (MnSOST-3), a sexual recidivism risk-assessment instrument. These overrides enabled us to evaluate whether the use of professional judgment resulted in better predictive performance than did reliance on “actuarial” judgment (MnSOST-3). Using multiple metrics, we also compared the performance of a home-grown instrument (the MnSOST-3) with a global assessment (the revised version of the Static-99 [Static-99R]) in predicting sexual recidivism for 650 sex offenders released from Minnesota prisons in 2012. The results showed that use of professional judgment led to a significant degradation in predictive performance. Likewise, the MnSOST-3 outperformed the Static-99R for both sexual recidivism measures (rearrest and reconviction) across most of the performance metrics we used. These results imply that actuarial tools and home-grown tools are preferred relative to those that include professional judgment and those developed on different populations.

Keywords

risk assessment, recidivism, sex offender, MnSOST-3, Static-99R

Summary of the Research

“Meta-analyses have indicated the average recidivism rate for sex offenders tends to be around 13% within 4–5 years, which is lower than estimates made by the public. This does not mean, of course, that sex offenders necessarily represent a low threat to public safety, because sexual offending is often seen as more dangerous and potentially damaging than are other types of criminal acts. Yet, not all sex offenders are created equally, for some are more at risk for sexual recidivism than are others.” (p. 269)

“Because sex offenders do not represent a monolithic class of high-risk offenders but rather vary tremendously with respect to recidivism risk, assessing their sexual recidivism risk is important for guiding treatment strategies and improving public safety. Given that research has demonstrated that clinical judgment does a poor job in predicting recidivism, a number of actuarial risk-assessment tools have been created specifically to classify sex offenders. […] Although risk-assessment tools have long been utilized, the ongoing revisions to the primary tools available suggest they are works in progress. Among the unresolved issues within the sex offender risk-assessment literature, there are two in particular that have received relatively little empirical scrutiny to date. First, even though it is now generally accepted that actuarial instruments outperform clinical judgment in predicting recidivism, the question of whether clinical judgment is a useful supplement to actuarial tools remains open. […] Second, it is unclear whether tools developed and validated specifically for one population are appropriate or as effective for other populations.” (p. 269–270)

“To address these questions, we analyzed sexual recidivism outcomes over a 4-year follow-up period for 650 sex offenders who had been scored on both the Static-99R and the Minnesota Sex Offender Screening Tool–3 prior to their release from Minnesota prisons in 2012. […] Although most of the sex offenders in our sample received a presumptive risk level according to their MnSOST-3 score, MnDOC [Minnesota Department of Corrections] staff can override the MnSOST-3 and assign a different risk level based on their professional judgment. The presence of these overrides enabled us to assess whether the use of professional judgment, in addition to actuarial tools, increases the accuracy of classification decisions. Moreover, because the 650 offenders were each assessed on the Static-99R and the MnSOST-3, we compared the predictive performance of these two instruments to determine whether there is a home-field advantage in sex offender risk assessment. Finally, we carried out a comprehensive assessment of predictive performance by using six different metrics.” (p. 270)

“Research has shown that clinical observations are relatively ineffective in discriminating between those who present higher from lower risk of reoffending. Studies evaluating the performance of actuarial tools and unguided clinical observation have tended to indicate clinical observation degrades predictive ability. […] In analyses of whether professional overrides improve predictive performance, research has also suggested actuarial tools work best without such changes. […] Although actuarial instruments generally outperform clinical judgment, their overall performance in predicting recidivism has varied widely across validation studies. Therefore, the question remains as to whether clinical judgment remains a useful tool for practitioners in the face of uncertainty or when information not considered by actuarial instruments is available. […] Some have suggested that due to the highly political nature of sex offender management, as well as the highly variable nature of the population, some degree of professional judgment is needed. Others, however, have suggested that risk-assessment approaches using actuarial tools often fail to translate to risk reduction. […] Whether some degree of “judgment” is necessary or even practical as a supplement to actuarial tools has not been determined.” (p. 270)

“Prior to their release from prison, sex offenders in Minnesota are assigned risk levels, which, in turn, determine the extent to which the community will be notified. Prisoners subject to predatory offender registration are assigned a risk level prior to their release from prison by an End of Confinement Review Committee (ECRC), which is composed of the prison warden or treatment facility head where the offender is confined, a law enforcement officer, a sex offender treatment professional, a caseworker experienced in supervising sex offenders, and a victim services professional. Following the ECRC meetings, sex offenders are assigned a Level 1 (lower risk), Level 2 (moderate risk), or Level 3 (higher risk). […] Before receiving a risk-level assignment from ECRCs, offenders are assessed for sexual recidivism risk by MnDOC staff from the Risk Assessment/Community Notification (RACN) unit. […] In assigning risk levels, ECRCs consider scores from actuarial risk-assessment tools as well as additional factors that ostensibly increase or decrease the risk of reoffense (e.g., an offender’s stated intention to reoffend following release or a debilitating illness or physical condition). As a result, ECRCs may override the risk level suggested by the risk-assessment tool. […] ECRCs overrode the MnSOST-3’s presumptive risk level in roughly half the cases involving offenders released from prison in 2012.” (p. 270–271)

“Actuarial tools, which draw upon a combination of empirically informed measures to create an overall risk score, can provide both absolute and relative risk assessments of offenders. Relative risk assessment simply provides information concerning whether an individual is more or less likely to reoffend than are others. Absolute risk assessment, on the other hand, provides an estimate of how likely it is that the individual will reoffend within a specific period of time. […] Estimates of absolute recidivism risk, however, are influenced by the base rate observed within the offender sample used to develop an instrument. […] In addition to the base rate, other differences between a tool’s development sample and the population on which the instrument is administered could potentially affect predictive validity. […] [It is imperative] to ensure tools are effective in populations outside of those in which they were developed.” (p. 271)

“One of the earlier actuarial tools developed was the MnSOST, which was updated to the MnSOST-3. […] In 2012, Duwe and Freske (2012) significantly revised the MnSOST–R with their development of the MnSOST-3. The sample used to develop the MnSOST-3 consisted of 2,535 sex offenders released from Minnesota prisons. […] The most popular tool in North America among criminal justice agencies is the Static-99, developed in the 1990s and updated to its Static-99R version. […] Originally developed using data from samples of sex offenders in Canada and the United Kingdom, the Static-99 is a “global” risk-assessment instrument that is the most widely used around the world. […]” (p. 276, 271–272)

“Our overall sample consists of 650 sex offenders released from Minnesota prisons in 2012 who had been scored on both the MnSOST-3 and the Static-99R. […] In comparing professional judgment with actuarial assessments in predicting recidivism, we used a subsample of 441 cases from the overall sample of 650 offenders. […] The predicted outcome in this study is sex offense recidivism, which we measured as a rearrest and reconviction. Consistent with the development of the MnSOST-3, we measured recidivism over a 4-year follow-up period from the date of the offender’s release from prison in 2012. Recidivism data were collected on offenders through December 31, 2016.” (p. 272)

“Among the 650 sex offenders in this study, 26 (4.0%) were rearrested for a new sex offense within 4 years of their release from prison in 2012. Of the 26 who were rearrested, 13 (2.0% of the 650) were reconvicted.” (p. 273)

“This study directly compared the MnSOST-3 and the Static-99R within a sample of Minnesota sex offenders who were scored with each tool. Findings demonstrated that the MnSOST-3 performed better than did the Static-99R on virtually all the metrics we used for both measures of sexual recidivism. Moreover, we examined the impact of professional judgment or clinical override on classification decisions by comparing the performance of presumptive and assigned risk levels in predicting sexual recidivism. If the ECRC overrides, which are professional judgment supplements to the actuarial tool, add incremental predictive validity, this would be evidence of the value of professional judgment. However, our results indicated unequivocally that clinical judgment in the form of overrides decreased predictive performance, which offers additional evidence that empirically based actuarial tools are superior to professional judgment.” (p. 276)

“It is interesting that the literature seems clear that professional judgment performs worse than do actuarial methods irrespective of the background of the professional making the observation or whether that judgment is structured or unstructured. This is true even for clinical judgment used in combination with actuarial tools. Some research has noted that raters are unfamiliar with or do not use base rate information appropriately in assigning risk. Another possibility is that judgment, whether structured or not, necessarily involves a higher degree of subjectivity than do actuarial measures and therefore are poorer in terms of prediction. Finally, it may be the case […] that clinical judgment often utilizes factors that are not related to recidivism.” (p. 276)

Translating Research into Practice

“Our study holds several important implications for research, policy, and practice. […] Given that the MnSOST-3 outperformed the Static-99R for our sample of Minnesota sex offenders, the results suggest local instruments may have a home-field advantage. To be sure, there are differences between the two instruments in terms of the items included and the classification methods used to develop the tools. In fact, to better demonstrate whether local instruments have a home-field advantage over global assessments, future research should attempt to more effectively isolate the effects of using a customized assessment compared to an imported instrument. Still, the evidence presented here suggests there may be value in applying an instrument to the same, or at least similar, population on which it was developed and validated.” (p. 277)

“In our view, home-grown instruments developed and validated within a particular population are the best option when considering tools for that population. Of course, many jurisdictions will not have a validated actuarial tool that was customized specifically for their own offender populations. In that case, universal tools (i.e., those developed using several populations, such as the Static-99 family) may be a good option, although such tools should be developed and validated on samples that are truly universal. Put another way, the population on which an instrument is being used should be very similar to the one on which the assessment was developed and validated. When a global instrument is used, it cannot be assumed the tool will deliver the same performance for a different assessment population. […] To understand whether a particular tool is effective with an agency’s population, one must evaluate the tool’s predictive performance on that population.”

“Our findings provide one more “nail in the coffin” for the value of clinical judgment in making recidivism predictions. Although some evidence exists that certain factors (dynamic ones in particular) may improve tools like the Static-99, the vast majority of empirical research has demonstrated that actuarial tools significantly outperform professional judgment. This does not mean clinical judgment is not important for the purposes of guiding treatment. Rather, when sex offenders are classified for recidivism risk-assessment purposes, actuarial tools should be the preferred method.” (p. 277)

“Given the consistently demonstrated superiority of actuarial assessments in predicting recidivism, we suggest it may be prudent to limit the extent to which professional judgment is used. Reducing the use of clinical judgment may involve restricting not only the types of cases in which overrides would be admissible but also how much an override would be allowed to deviate from an actuarial assessment. […] To develop guidelines that provide greater structure and clarity on when overrides are permissible, future research is needed to examine the conditions under which clinical judgment actually improves classification decisions or, at a minimum, does no worse than do actuarial assessments.” (p. 277)

Other Interesting Tidbits for Researchers and Clinicians

“Existing research on the validation of sex offender risk-assessment tools has often relied a single metric—namely, the AUC. As we noted earlier, the AUC has its strengths, but it also has some weaknesses. We suggest that future validation research begin using alternative measures of predictive discrimination such as Hand’s (2009) H measure and the precision-recall curve. But given that predictive discrimination addresses only one dimension of predictive validity, metrics that assess accuracy and calibration should also be used to provide a more comprehensive evaluation of predictive performance. As this study illustrates, accuracy metrics are informative for imbalanced data sets so long as there are at least some predicted positives in the data set. Moreover, if researchers and practitioners must rely on a single metric, we suggest that either the SAR or SHARP statistics would be preferable because both tap into multiple dimensions of predictive validity.” (p. 277)

“The AUC values for both the Static-99R and MnSOST-3.1 were lower in comparison to what most of the existing research has reported for either instrument. Much of this research, as we indicated earlier, has consisted of assessments that were scored for research purposes. In this study, we used assessments that had been scored by correctional staff for operational purposes, which provide what is arguably a truer test of predictive performance. Compared to field assessments, those administered strictly for the sake of research may yield overly optimistic estimates of predictive performance due to more favorable conditions in which raters are likely to have had more recent, thorough training. To provide a more realistic estimate of how sex offender risk-assessment tools perform in practice, future research should begin relying more on assessments performed by field staff. In addition, the results suggest that caution may be warranted in using an instrument whose predictive performance has yet to be evaluated on real-world assessments” (p. 277)

“Due to several limitations, however, these findings should be regarded as somewhat preliminary. First, because our study was confined to sex offenders from a single jurisdiction, it is unclear the extent to which the findings are generalizable. Second, the sample we used was relatively small (N = 650), and it was limited to releases over one calendar year. Third, similar to the case in prior research, the better findings for the MnSOST-3 may reflect an “allegiance effect” in which its scoring and use by MnDOC staff has been more consistent with its design in comparison to the Static-99R.” (p. 276)

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Kseniya Katsman

Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.

The link between risk assessment and management: Not as straightforward as it seems

When applied to risk management, risk assessment tools are not sufficient in and of themselves: They should be considered in context of their implementation and potential utility. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2018, Vol. 42, No. 3, 181–214

Do Risk Assessment Tools Help Manage and Reduce Risk of Violence and Reoffending? A Systematic Review

Authors

Jodi L. Viljoen, Simon Fraser University
Dana M. Cochrane, Simon Fraser University
Melissa R. Johnson, Simon Fraser University

Abstract

Although it is widely believed that risk assessment tools can help manage risk of violence and offending, it is unclear what evidence exists to support this view. As such, we conducted a systematic review and narrative synthesis. To identify studies, we searched 13 databases, reviewed reference lists, and contacted experts. Through this review, we identified 73 published and unpublished studies (N = 31,551 psychiatric patients and offenders, N = 10,002 professionals) that examined either professionals’ risk management efforts following the use of a tool, or rates of violence or offending following the implementation of a tool. These studies included a variety of populations (e.g., adults, adolescents), tools, and study designs. The primary findings were as follows: (a) despite some promising findings, professionals do not consistently adhere to tools or apply them to guide their risk management efforts; (b) following the use of a tool, match to the risk principle is moderate and match to the needs principle is limited, as many needs remained unaddressed; (c) there is insufficient evidence to conclude that tools directly reduce violence or reoffending, as findings are mixed; and (d) tools appear to have a more beneficial impact on risk management when agencies use careful implementation procedures and provide staff with training and guidelines related to risk management. In sum, although risk assessment tools may be an important starting point, they do not guarantee effective treatment or risk management. However, certain strategies may bolster their utility.

Keywords

systematic review, risk assessment, violence, offending, risk management

Summary of the Research

“In the past several decades, researchers have developed over 400 different tools designed to assess risk of violence and offending. Professionals, such as psychologists, probation officers, nurses, psychiatrists, and police, have widely adopted these tools in as many as 44 countries. In addition, administrators and policymakers have created policies and, in some cases, laws mandating the use of tools. […] Risk assessment tools are also commonly used in forensic psychiatric facilities, general psychiatric hospitals, treatment programs, correctional centers, and in a variety of court evaluations, such as those involving the transfer of adolescents to adult court and the civil commitment of individuals who have sexually offended.” (p. 181)

“Broadly speaking, risk management refers to the process of planning and implementing strategies to help prevent violence and other forms of offending. It is carried out by a variety of professionals (e.g., psychologists, probation officers, nurses, police) and encompasses not only treatment (e.g., therapy), but also strategies such as supervision, case management, and placement decisions. […] There is widespread agreement, among risk assessment researchers, that risk management is a key goal of risk assessment. […] As a result, the field of risk assessment has evolved to focus increasingly on the management and reduction of risk, rather than solely its prediction. […] However, even though many risk assessment tools aim to help manage and reduce risk, it is unclear whether tools do, in fact, achieve this goal.” (p. 182)

“Although the mechanism between risk assessment and risk management is not especially well-articulated, the primary hypothesized mechanism appears to be twofold. First, it is thought that tools will increase professionals’ level of adherence to the risk-need-responsivity (RNR) model of offender treatment. Second, it is believed that this will, in turn, decrease offending. […] [Risk assessment] tools are viewed as a means by which to deploy interventions that are evidence-based and individually tailored, thereby avoiding a “one size fits all” approach.” (p. 182–182)

“Although many researchers believe that risk assessment tools can help manage and reduce risk, some researchers have noted that the impact of tools may be contingent upon other factors, such as the type of tool and whether the tool is followed through on to match individuals to appropriate interventions. Whereas some tools focus primarily on historical or static factors, such as history of offending, other tools include dynamic or modifiable factors, such as anger management difficulties (i.e., criminogenic needs). In addition, on some tools (i.e., actuarial tools), evaluators add up risk factors to generate a total score. On other tools (i.e., structured professional judgment tools), evaluators use their discretion to make a separate summary risk rating after considering the items and additional case-specific considerations. As a result of these variations, tools may differ in their utility for various risk management decisions.” (p. 183)

“Besides variation in item content, risk assessment tools also vary in terms of their validity or their ability to predict reoffending. Some tools have been found to significantly predict reoffending in multiple studies, with effect sizes falling in the moderate range. However, other risk assessment tools have poor predictive validity, or have not yet been tested at all. Presumably, unless a tool has adequate validity in predicting reoffending, it will have limited value for risk management.” (p. 183)

“Finally, tools vary in the extent to which they include a focus on risk management. Whereas most risk assessment tools do not include explicit instructions or guidance regarding how to manage risk, some tools aim to bridge risk assessment to risk management by providing greater structure and support for risk management, such as by including case management planning forms.” (p. 183)

“Not only does the utility of risk assessment depend on the nature of the tool, it could also depend on how professionals use and apply tools. In particular, as is true of any type of assessment, the value of risk assessment likely lies primarily in what happens after the assessment. Although risk assessment tools may be a starting point for treatment and risk management, they are not a treatment in and of themselves. As such, it may be unrealistic to expect that using a tool will help manage risk or reduce reoffending unless (a) appropriate treatments are, in fact, available, and (b) professionals meaningfully apply tools to match individuals to these treatments.” (p. 183)

“Despite the debate about whether some tools may be better suited for risk management than others, most risk assessment researchers appear to believe that tools can, in principle, aid in risk management. However, this viewpoint is not embraced by everyone. Some critics have expressed concern that risk assessment tools, in general, are not only ineffective in managing risk, they might even cause harm to patients and offenders. […] Furthermore, some critics have questioned the motives that underlie the adoption of risk assessment tools. […] Finally, critics have pointed out that despite claims that risk assessment tools help to manage risk, there is little evidence to support such assertions.” (p. 183–184)

“In sum, opinions about the value of risk assessment tools for risk management efforts range considerably. […] It is concerning that many assertions about the utility of tools (or lack of utility) have been offered without reference to research findings. One possible explanation for the lack of empirical grounding for these assertions is that relevant research simply does not exist, as of yet. However, another possibility is that existing research has not yet been adequately integrated into the literature due to a lack of comprehensive reviews.” (p. 184)

“To our knowledge, only one systematic review has examined the utility of risk assessment tools for risk management, and this review was not designed to focus on risk assessment per se. […] The goal of the present systematic review was to expand on [that] review. First, rather than focusing exclusively on psychiatric patients in acute care settings, we included a range of populations (e.g., patients, offenders) and settings (e.g., jails, forensic hospitals). Second, rather than restricting our review to RCTs, our review encompassed a variety of designs (e.g., RCTs, prepost studies). […] Third, instead of solely examining whether the use of risk assessment tools reduce violence, we also investigated their impact on professional practices. […] As such, we reviewed research on whether tools facilitate professionals’ adherence to the risk and need principles, as these are the hypothesized mechanism by which tools might help manage risk. We also examined whether professionals perceive tools as useful for risk management and whether they use tools to guide their risk management effort; tools are unlikely to be effective if professionals do not apply them or view them as useful. […] Our goal was to understand, more thoroughly, the pathway between risk assessment and risk management. Furthermore, to develop an agenda for future research, we reviewed studies on strategies to enhance the utility of risk assessment tools for risk management, such as staff training.” (p. 184)

“To examine our research questions, we chose to conduct a systematic review rather than a traditional literature review because systematic reviews are more transparent, comprehensive, and objective. […] To synthesize findings, we used a narrative approach. An empirical synthesis (i.e., meta-analysis) was neither feasible nor appropriate because our review included a wide range of designs (e.g., RCTs, surveys), populations (e.g., offenders, patients), tools (i.e., 34 different risk assessment tools), and outcomes.” (p. 184–185)

“In total, 73 studies met inclusion criteria. Sixteen of these studies were unpublished; nine of the unpublished studies were dissertations or theses and the remaining seven were reports by researchers, government, or other organizations. These studies included 31,551 offenders or patients, 10,002 professionals, and 34 risk assessment tools. Most tools included dynamic or modifiable factors (i.e., criminogenic needs; 76.5%, k = 26), and were validated (i.e., have been found to significantly predict violence or reoffending; 82.4%, k = 28).” (p. 187)

“If professionals do not “buy-in” to tools or perceive them as useful, they may not adequately utilize them. Thus, as an initial step, we examined professionals’ attitudes toward risk assessment tools. We found that, although some professionals held positive views about tools, in many studies, professionals had mixed views about the utility of tools for risk management (e.g., treatment planning, placement decisions). This is not particularly surprising; professionals often feel reluctant to adopt new assessment and intervention approaches even when these approaches have strong research support. Furthermore, manuals and training for risk assessment tools often focus on how to complete item ratings rather than how to apply the tool to risk management efforts. As such, professionals’ questions about the utility of tools may be understandable.” (p. 203)

“Not only did professionals have mixed views about the utility of tools for risk management, in most of the identified studies, the use of risk assessment tools for risk management was mixed. Specifically, although some professionals reported that they relied on tools to guide their risk management decisions (e.g., decisions about services or placements), others reported that they did not use tools, even when employers mandated their use. As such, these findings illustrate that risk assessments do not necessarily flow through to risk management efforts. Slippage might be more likely to occur when risk assessors do not have direct control over risk management decisions, but instead act as intermediaries to decision-makers (e.g., judges). In such cases, the application of tools to risk management may depend not only on evaluators’ use of tools, but also on whether subsequent decision-makers perceive tools as useful and relevant.” (p. 203)

“Despite the mixed application of tools to risk management efforts overall, match to the risk principle was moderate following the use of risk assessments tools. In a number of studies, high-risk individuals were referred to more services than low-risk individuals. They were also more likely to receive secure placements. However, most studies did not have a comparison group of individuals who did not receive risk assessments. As such, it is difficult to determine if such findings are attributable to the use of the tool; some research suggests that high-risk individuals receive more intensive risk management strategies than low-risk individuals even when a risk assessment tool is not used.” (p.203)

“Contrary to the positive findings relating to the risk principle, match to the need principle appeared limited following the use of risk assessment tools (match was rated as mixed or low in all but one study). This means that many of offenders’ and patients’ needs remained unaddressed even when risk assessment tools were used. This could indicate that professionals are not paying adequate attention to risk assessments when they are making decisions about services. Alternatively, these low rates of overall match could occur because professionals opt to focus on only a couple “high impact” needs at a given time, as it may not be feasible to simultaneously target all needs. Another possibility is that low rates of match occur because services to address needs are simply not available. Clearly, identifying needs has limited value if there are no viable means by which to address these needs. Finally, low match to the needs principle may, in part, arise from limited compliance; offenders and patients may not necessarily attend or engage in the services to which they are referred.” (p. 203–204)

“In light of the preceding findings, it is perhaps not surprising that evidence on whether tools reduce violence and offending was inconsistent. Although two RCTs found that the use of the BVC resulted in decreases in violence, another RCT did not find significant changes in violence or other criminal incidents when another tool, the START, was implemented. In addition, although two pre-post studies found that the implementation of risk assessment tools was associated with decreases in violence or offending, the bulk of pre-post studies (k _ 7) did not. Thus, at present, there is insufficient evidence to conclude that tools reduce violence or offending. One possible explanation for these findings is that it may be unrealistic to expect risk assessment tools to directly reduce violence or offending. […] Another possible explanation for the modest findings is that the effectiveness of tools might vary by factors such as the setting, population, or tool. […] It is also possible that some tools may be more effective than other tools.” (p. 204)

“Overall, our results suggest that there is a disconnect between the theory of risk assessment and what actually happens in real-world practice. However, although research is limited, preliminary evidence suggests that it may be possible to enhance professionals’ risk management practices by combining the use of risk assessment tools with approaches such as risk management training and structured risk management guidelines. Though such approaches may not directly reduce violence and offending, they have been found to improve match to the risk and need principles, thus providing potential avenues by which to enhance the utility of tools. Sound implementation practices, such as policies and protocols to guide the use of tools, are also critical” (p. 204)

“Overall, this review suggests that even though risk assessment tools may be a starting point for risk management, they are not sufficient in and of themselves. Although some studies found positive results, indicating that tools might help achieve better match to the risk principle or even reductions in violence in some circumstances, the findings also revealed that “there is no guarantee that the results of these protocols flow through to front line service provision”. […] To ensure that risk assessment instruments are optimally used and do not degenerate into merely a bureaucratic exercise further efforts are needed. In particular, rather than focusing exclusively on predictive validity studies and the development of new tools, researchers need to pay greater attention to how tools are applied to guide real-world decisions, such as by testing the pathways between risk assessment and risk management, identifying areas of slippage, and developing strategies to facilitate the ability of risk assessments to translate into better risk management efforts.” (p. 205)

Translating Research into Practice

“Our findings suggest that risk assessment tools are not sufficient to guarantee sound risk management practices or reductions in violence. Thus, researchers and tool developers should be careful to not overstate the potential value of risk assessment for risk management. Likewise, professionals, agencies, and policymakers should not rely on risk assessment tools as their sole or primary risk management strategy. Instead, agencies who use risk assessment tools should work to build staff buy-in, regularly monitor adherence, and ensure that they are providing effective treatment, rather than funneling all their resources into assessment.” (p. 204)

“That said, even though risk assessment tools have limitations, they remain a best available practice. Although tools may not reduce violence or offending in and of themselves, there is no evidence that alternative approaches, such as assessing risk via unstructured clinical judgment or not assessing risk at all, do so either.” (p. 204)

Other Interesting Tidbits for Researchers and Clinicians

“To ensure that our systematic review met best practice standards and followed relevant reporting guidelines, we followed the criteria set forth in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement and the Assessment of Multiple Systematic Reviews (AMSTAR) tool.” (p. 185)

“Studies were required to meet the following inclusion criteria: (a) empirical study that was published or disseminated in English; (b) included a sample of individuals who were assessed with a structured risk assessment tool in real-world practice and/or a sample of professionals who used risk assessment tools in practice; and (c) included an outcome relevant to at least one of our research questions (e.g., perceived utility, adherence to the risk principle). We defined structured risk assessment tools as tools that included a designated list of risk factors and an overall rating of risk level for violence or offending. Thus, we did not include measures of psychopathy. Also, we did not include qualitative research, as systematic reviews of qualitative studies use different methodologies, nor did we include research on the responsivity principle, as this principle encompasses a wide range of constructs (i.e., culture, trauma, mental health) and is not as well-researched as the risk and need principles. When disseminations were based on the same sample, we selected the study that was the most comprehensive and rigorous (e.g., largest sample).” (p. 185)

“In interpreting our findings, several limitations of this review are important to note. Although we systematically searched 13 databases, reviewed reference lists, and contacted experts, our review likely missed some studies, such as studies written in languages other than English. In addition, although we attempted to summarize our findings with terms such as low, mixed, or high, definitions of such terms are somewhat subjective by nature. As such, to increase transparency and objectivity, we provided operational definitions of our summary terms, and summarized study findings in more detail using evidence tables. In addition, two independent raters coded each study and we conducted consensus ratings. Another limitation of this systematic review is that there is a lack of appropriate tools for appraising risk of bias in risk assessment studies. As such, we drew items from other tools, and adapted the wording for this context. However, the approach that we used to appraise observational and survey studies was brief and, as such, our review likely failed to capture some relevant study limitations. Finally, although we examined differences in general patterns of results across published and unpublished studies, it is difficult to evaluate publication bias in narrative reviews.” (p. 204)

“On the basis of this review, there are number of important areas for future research. First, many studies have lacked appropriate comparison groups, making it impossible to determine if tools improve practices per se. As such, there is a strong need for further research with comparison groups of individuals who were not assessed with a risk assessment tool, including studies with both mental health and justice populations. Second, to determine if certain tools may have a more beneficial impact on risk management than other tools, head-to-head comparisons of tools are needed. Third, adherence to risk assessment tools appears to be poor in some cases, making it difficult to evaluate the impact of tools. As such, research should routinely measure and report adherence. Fourth, given that the utility of tools for risk management likely depends heavily on what happens after the risk assessment, researchers examine the pathway between risk assessment and risk management, such as by developing and testing conceptual models. Finally, researchers should create and evaluate approaches to improve the utility of tools for risk management, such as training initiatives, structured risk management guidelines, and quality improvement or audit systems.” (p. 204–205)

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Kseniya Katsman

Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.

Examining the Validity of Risk Assessments in Predicting General, Nonsexual Violence in Sexual Offenders

This study, published in Law and Human Behavior, provides implications with respect to the accurate assessment of institutional violence risk within a sex offender population. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2018, Vol. 42, No. 1, 13-25

Predictive Validity of HCR-20, START, and Static-99 Assessments in Predicting Institutional Aggression Among Sexual Offenders

Authors

Joel K. Cartwright, North Carolina State University and RTI International, Research Triangle Park, North Carolina
Sarah L. Desmarais, North Carolina State University
Justin Hazel and Travis Griffith, California Department of State Hospitals–Coalinga, Coalinga, California
Allen Azizian, California Department of State Hospitals–Coalinga, Coalinga, California, and California State University, Fresno

Abstract

Sexual offenders are at greater risk of nonsexual than sexual violence. Yet, only a handful of studies have examined the validity of risk assessments in predicting general, nonsexual violence in this population. This study examined the predictive validity of assessments completed using the Historical-Clinical-Risk Managment-20 Version 2 (HCR-20; Webster, Douglas, Eaves, & Hart, 1997), Short-Term Assessment of Risk and Treatability (START; Webster, Martin, Brink, Nicholls, & Desmarais, 2009), and Static-99R (Hanson & Thornton, 1999) in predicting institutional (nonsexual) aggression among 152 sexual offenders in a large secure forensic state hospital. Aggression data were gathered from institutional records over 90-day and 180-day follow-up periods. Results support the predictive validity of HCR-20 and START, and to a lesser extent, Static-99R assessments in predicting institutional aggression among patients detained or civilly committed pursuant to the sexually violent predator (SVP) law. In general,
HCR-20 and START assessments demonstrated greater predictive validity—specifically, the HCR-20 Clinical subscale scores and START Vulnerability total scores—than Static-99R assessments across types of aggression and follow-up periods.

Keywords

HCR-20, Static-99R, START, risk assessment, sexual offender

Summary of the Research

Background
“Structured risk assessment protocols have become integral components of the criminal justice system, as part of efforts to mitigate the potential risk to public safety, as well as strategies designed to rehabilitate offenders. One subgroup of offenders for which this is especially true is sexual offenders. Indeed, many jurisdictions mandate the use of risk assessment instruments with sexual offenders to assist in making decisions regarding placement, treatment, and other management concerns within the institutions. As a result, the use of instruments designed to inform assessments of institutional aggression among sexual offenders is common practice in correctional and forensic psychiatric settings” (p. 13).

“Accordingly, much empirical attention has been focused on testing the psychometric properties and establishing the validity of assessments completed using these instruments for predicting sexual violence in this population. Overall, findings of the extant research provide overwhelming support for the validity of risk assessment tools in predicting sexual recidivism . . . In contrast, less empirical attention has focused on the assessment of risk for general (nonsexual) aggression among sexual offenders, and institutional aggression specifically, which is the focus of the present study” (p. 14).

“While the importance of assessing risk for sexual recidivism is without question, sexual offenders demonstrate higher rates of nonsexual violent recidivism than rates of sexual recidivism . . . Many sexual offenders are hospitalized for very long periods, particularly under SVP laws. Thus, risk of aggression posed by sexual offenders against staff, peers, and property within the institution is a pressing safety and risk management concern. Consequently, the accurate assessment of institutional violence risk within sexual offenders would benefit case management and treatment, as well as assist in decisions regarding supervision and release. However, there has been limited evaluation of the validity of assessments completed using tools designed to forecast general (nonsexual) violence risk among sexual offenders” (p. 14).

“Meta-analytic research shows that many violence risk assessment instruments can have good validity in predicting violence. Research also shows that instruments developed for predicting violent offending perform better than those developed to predict sexual or general offending. When risk for general violence is assessed among sexual offenders, tools designed to evaluate risk for sexual recidivism, such as the Static-99R, are frequently used rather than tools designed to evaluate risk for general (nonsexual) violence. However, the HCR-20 and, most recently, the START are two risk assessment instruments now used to assess risk for general violence in this population” (p. 14).

Current Study
“This study examines the predictive validity of the HCR-20 and START assessments, as well as Static-99R assessment, in predicting institutional aggression in a sample of 152 male sexual offenders. Our specific aims were to: (a) examine the distribution of HCR-20, START, and Static-99R assessment scores and risk estimates; (b) evaluate concordance among the HCR-20, START, and Static-99R assessments; and (c) test the predictive validity of the HCR-20, START, and Static-99R assessments in predicting institutional aggression over 90 and 180 days” (p. 15).

Results
“Overall, almost a quarter of the sample engaged in some form of aggression during the 90-day follow-up period and over a third of the sample at 180-day follow-up . . . Across both START and HCR-20 assessments, very few patients were rated as high risk for violence (4.1% and 6.5%, respectively). Using the HCR-20, the majority were rated as low risk (67.4%); few were rated as moderate (26.1%) or high (6.5%). START final risk estimates showed a similar pattern of results: most participants were rated as low risk (83.7%), followed by moderate (12.2%) and high (4.1%). The Static-99R risk classifications demonstrated an inverse pattern of results, with more than half of participants classified as high risk (54.5%). Approximately one third (32.1%) were classified as moderate risk by the Static-99R, and relatively few (13.4%) as low” (p. 17-19).

“Associations were moderate to strong between START Strength total scores and Vulnerability total scores, and between HCR-20 subscale and total scores. Conversely, START Strength and Vulnerability total scores were weakly associated with Static-99R scores, if at all. The HCR-20 and START risk estimates showed moderate agreement. There were no instances in which a patient was identified as high risk on the HCR-20 or the START and low risk on the other instrument, and both HCR-20 and START assessments showed similar distributions of violence risk estimates” (p. 19).

“Over the 90-day follow-up period, HCR-20 total scores predicted all forms of aggression except physical aggression toward objects. The HCR-20 Clinical subscale score was the most predictive of the HCR-20 subscales, predicting all forms of aggression. All three HCR-20 subscale scores and total score predicted both any aggression and verbal aggression. START Strength total scores predicted all forms of aggression with the exception of physical aggression toward objects. START Vulnerability total scores predicted all forms of aggression. Static-99R total scores predicted any aggression and verbal aggression, but not physical aggression toward others or toward objects” (p. 19).

“We found significant discrimination among participants classified as low compared with moderate and high risk on the HCR-20 for any aggression, verbal aggression, and physical aggression toward others. For example, those rated high risk on the HCR-20 were almost 20 times more likely, and those rated as moderate risk were over 4 times more likely, to engage in any aggression compared with those rated as low risk. For physical aggression toward objects, there was only significant discrimination between those classified as low versus high risk, but not moderate versus high risk. For the START assessments, there was significant discrimination among participants classified as low compared with moderate and high risk for any aggression and verbal aggression. To demonstrate, those rated as high risk on the START were almost 15 times more likely, and those rated as moderate risk were over 7 times more likely, to engage in any aggression compared with those rated as low risk. For physical aggression toward objects, significant discrimination was only found for those identified as low versus moderate risk. START violence risk estimates did not discriminate among participants in the prediction of physical aggression toward others. Statistics for Static 99-R risk categories could not be calculated due to empty cells” (p. 19-20).

“Over the 180-day follow-up period, HCR-20 Clinical and Risk Management subscale scores, as well as the HCR-20 total score, predicted all outcomes, with one exception: Historical subscale scores were not associated with physical aggression toward objects. START Vulnerability total scores showed strong predictive validity across outcomes. START Strength total scores predicted any aggression and verbal aggression, but not physical aggression toward others or toward objects. Static-99R total scores showed moderate associations with any, verbal, and physical aggression toward others, but not physical aggression toward objects” (p. 20-21).

“HCR-20 risk estimates predicted all forms of aggression during this time frame, with the greatest discrimination appropriately found between those estimated as low and high risk. To demonstrate, those estimated as high risk using the HCR-20 were over 20 times more likely to engage in any or verbal aggression, over 70 times more likely to engage in property damage, and almost 15 times more likely to engage in aggression toward others compared with those classified as low risk. In contrast, we did not find discrimination between those rated low versus high risk on the START. However, those estimated as moderate risk using the START were approximately 8 times more likely to engage in any or verbal aggression, over 12 times as likely to engage in physical aggression toward objects, and over 3 times more likely to engage in physical aggression toward others when compared with those classified as low risk. For the Static-99R, ORs were significant for only one comparison: those classified as high risk on the Static-99R were almost 5 times more likely to engage in verbal aggression than those classified as low risk” (p. 21).

Translating Research into Practice

“Although we may have anticipated ceiling effects on the HCR-20 subscale and total scores, and START Vulnerability total scores, as well as floor effects for the START Strength total scores, in our relatively homogenous sample of male sexual offenders, this was not the case. Instead, assessments made use of the full range of possible scores and violence risk estimates for both HCR-20 and START ratings. This finding suggests that HCR-20 and START assessments may be useful for distinguishing between patients more or less likely to engage in aggressive behaviors even within a somewhat homogenous, high-risk population. They also suggest that HCR-20 and START assessments may be useful for informing supervision decisions and risk management strategies (e.g., identifying which patients require higher security levels)” (p. 21).

“Further, we found high rates of concordance between the results of HCR-20 and START assessments, but low rates of concordance among HCR-20 and START total scores with Static-99R total scores and high rates of discordance among HCR-20 and START total scores with Static-99R. These patterns of results are not surprising, given that both instruments were developed to predict violence risk over the short-to-medium term. In contrast, the Static-99R is designed to predict sexual recidivism over much longer time frames. Nonetheless, these findings indicate that the HCR-20 and START are measuring constructs and risks that are distinct from those measured by the Static-99R. And, as such, they support the use of the HCR-20 and START in addition to the use of the Static-99R in clinical practice with sexual offenders. Indeed, results of the predictive validity analyses provided stronger support for the use of the HCR-20 and START, compared with the Static-99R, in assessing risk for different forms of institutional aggression among sexual offenders” (p. 21-22).

“Consistent with prior research examining the predictive validity of the HCR-20, HCR-20 assessments performed well across outcomes. Like prior studies, however, the HCR-20 Historical subscale demonstrated the lowest levels of predictive validity of the HCR-20 assessment components and failed to predict physical aggression toward objects or others. In contrast, the HCR-20 Clinical subscale performed the best of the HCR-20 scales and predicted all forms of aggression at good or excellent levels. Generally, performance of HCR-20 assessments was greater for the prediction of aggression over the 180-day than 90-day follow-up period, demonstrating good to excellent predictive validity. Taken together, these findings add to the empirical evidence supporting the use of the HCR-20 for identifying violence risk over the medium term (i.e., 6 months) among sexual offenders” (p. 22).

“START assessments, including the Vulnerability and Strength total scores, as well as violence risk estimates, showed good to excellent validity in predicting any aggression and verbal aggression over both 90-day and 180-day follow-up periods. In fact, of all the assessments, the START Vulnerability total score outperformed any other HCR-20, START, or Static-99R subscale or total score. The extant literature varies on whether the START Strength or Vulnerability total scores perform better than the other, but the current results suggest greater validity of the Vulnerability than Strength total scores in the prediction of institutional aggression among sexual offenders. Strength total scores nonetheless demonstrated good validity in predicting any aggression, verbal aggression, and physical aggression, particularly over the 90-day follow-up period. This finding is consistent with the START’s intended 3-month assessment and prediction time frame and is similar to, if not slightly better than, findings reported in prior studies of START assessments in forensic psychiatric patients. Overall, findings support the use of the START in the assessment and management of risk for short-term institutional aggression among sexual offenders” (p. 22).

“Finally, the Static-99R assessments showed fair to good validity in predicting any aggression and verbal aggression, as well as physical aggression toward objects, but not physical aggression toward others. Further, although the Static-99R assessments demonstrated validity in predicting these forms of aggression, performance was consistently poorer compared with the performance of both the HCR-20 and START assessments. Prior research has found good validity of Static-99R assessments in predicting general aggression; however the majority of these studies have focused on community-based rather than institutional aggression, have aggregated sexual offenses with general offenses, and have investigated much longer follow-up periods. For these reasons, findings of the current study suggest that the Static-99R is most appropriately used for estimating sexual recidivism risk and that general violence risk assessment instruments, such as the HCR-20 or START, should be used to assess general aggression within sexual offenders” (p. 22).

Other Interesting Tidbits for Researchers and Clinicians

“This study supports the validity of the HCR-20, START, and to a lesser extent, the Static-99R assessments in predicting institutional aggression among patients detained or civilly committed pursuant to the SVP law. Typically, risk assessment instruments have shown lower levels of predictive validity in field studies compared with development studies. However, this does not appear to be the case in this sample of SVPs” (p. 23).

“This study adds to the body of literature supporting the application of structured violence risk assessments across diverse populations in the criminal justice system, and sexual offenders specifically. Beyond the assessment of risk for sexual recidivism, our findings suggest that general violence risk assessment instruments, such as the HCR-20 or START, have a place in the assessment and management of sexual offenders. Indeed, results indicate that instruments designed to assess sexual recidivism risk, and the Static-99R in particular, are limited in their ability to assess risk for general (nonsexual) violence. This is in keeping with the recommendations of the Static-99R authors to administer the Brief Assessment for Recidivism Risk – 2002R (BARR-2002R) for predicting nonsexual violence among sexual offenders, though this is not always done in practice. Consistent with the risk-need-responsivity model, findings suggest that using the HCR-20 or START to identify general violence risk among sexual offenders would benefit case management and treatment, as well as assist in decisions regarding supervision and release. Yet, the contributions of such assessments to clinical practice and, ultimately, violence prevention among sexual offenders remain to be tested in future research” (p. 23).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add! To read the full article, click here.

Authored by Becca Cheiffetz

publicBecca Cheiffetz is a master’s student in the Forensic Psychology program at John Jay College of Criminal Justice. She graduated in 2015 from Sam Houston State University with a BS in Psychology and plans to continue her studies in a Clinical/Forensic Psychology PhD program in the near future. Her professional interests include providing clinical evaluations and treatment for individuals in prison as a prison psychologist and conducting forensic assessments for defendants in criminal court.

Keep out of trouble: Validation of a risk assessment measure in a correctional sample

Despite high interrater reliability and relative ease of administration, caution is advised when utilizing VRAG–R measure in predicting and managing recidivism risk. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2017, Vol. 41, No. 5, 507–518

A Cross-Validation of the Violence Risk Appraisal Guide—Revised (VRAG–R) Within a Correctional Sample

Authors

Anthony J.J. Glover, Correctional Services Canada, Kingston, Ontario, Canada
Frances P. Churcher, Carleton University
Andrew L. Gray, Simon Fraser University
Jeremy F. Mills, Carleton University
Diane E. Nicholson, Correctional Services Canada, Kingston, Ontario, Canada

Abstract

The Violence Risk Appraisal Guide—Revised (VRAG–R) was developed to replace the original VRAG based on an updated and larger sample with an extended follow-up period. Using a sample of 120 adult male correctional offenders, the current study examined the interrater reliability and predictive and comparative validity of the VRAG–R to the VRAG, the Psychopathy Checklist—Revised, the Statistical Information on Recidivism—Revised, and the Two-Tiered Violence Risk Estimate over a follow-up period of up to 22 years postrelease. The VRAG–R achieved moderate levels of predictive validity for both general and violent recidivism that was sustained over time as evidenced by time-dependent area under the curve (AUC) analysis. Further, moderate predictive validity was evident when the Antisociality item was both removed and then subsequently replaced with a substitute measure of antisociality. Results of the individual item analyses for the VRAG and VRAG–R revealed that only a small number of items are significant predictors of violent recidivism. The results of this study have implications for the application of the VRAG–R to the assessment of violent recidivism among correctional offenders.

Keywords

VRAG–R, risk assessment, violence, recidivism, offenders

Summary of the Research

“Risk assessment of offenders, particularly the assessment of violence risk, has long played a role within the criminal justice process. Use of structured risk assessment measures is increasing among clinicians, with 50% to 75% of clinicians using structured risk measures during forensic assessments. […] Structured risk assessment should serve four goals. First, salient risk factors for an individual should be identified. Second, an appropriate level of risk, known as a risk estimate, should be determined. Third, clinicians should identify strategies to reduce or manage risk. Finally, risk information should be effectively communicated.” (p. 507)

“Actuarial risk assessment measures are commonly used to appraise risk for various forms of recidivism (e.g., sexual, violent, and general). For the purposes of the current study, actuarial methods will be defined as measures that use empirically relevant items where their aggregate scores are then associated with a probability of future recidivism.” (p. 507)

“A recent update of the VRAG (i.e., the Violence Risk Appraisal Guide—Revised [VRAG–R; Rice, Harris, & Lang, 2013]) was undertaken to simplify scoring, integrate the VRAG and an actuarial measure designed to predict sexual recidivism (i.e., the Sex Offender Risk Appraisal Guide [SORAG; Quinsey et al., 2006]), and reduce time spent on scoring items.” (p. 508)

“A revised version of the VRAG, referred to as the VRAG–R, was recently developed, and has since been incorporated into clinical practice. […] A major strength of the revision was the extended length of the follow-up period for the sample (which ranged up to 49 years in length), which now afforded the inclusion of several participants who had yet to be released at the time of the earlier follow-up studies. […] Preliminary evaluations have found similar predictive validity for the VRAG–R relative to the VRAG. In the validation sample the VRAG–R obtained an AUC value of .75 for violent recidivism and an AUC [area under the curve] value of .76 for the entire sample. […] These values were similar to those obtained in using the VRAG in the same sample group. Furthermore, the authors tested the predictive validity of the VRAG–R after removing the Antisociality item, as this item requires training to score and may not always be readily available using file data. The VRAG–R obtained an AUC value of .75, indicating that its predictive accuracy is not limited if this item is missing. In contrast, however, preliminary research of the VRAG–R in psychiatric samples has shown that it is not predictive of inpatient aggression. Given the mixed results, it is important that the VRAG–R undergo cross-validation if it is to be used by clinicians in a broader forensic context.” (p. 508)

“The current study is a cross-validation of the VRAG–R in a correctional sample of adult male offenders that includes a comparative analysis with existing risk assessment measures (i.e., the VRAG, PCL–R, SIR–R1, and the Two-Tiered Violence Risk Estimates) […] In addition, our study will evaluate the interrater reliability of the VRAG–R among trained clinicians, which has not been previously examined for this measure. Establishing interrater reliability is important as it examines the consistency of the scoring and poor interrater reliability has been found to be associated with lower predictive accuracy. Finally, we will examine the predictive utility of the VRAG–R without the Antisociality (Facet 4) item, as well as with a substitute measure of antisociality.” (p. 508–509)

The sample included 120 federal male offenders from Canadian correctional facilities. The majority were Caucasian (78.3%), with age ranging from 19 to 48 years (M=30.37, SD=7.48). A little over 49% of the sample had an index offense of robbery. At the time of the outcome data collection, 71.7% have completed their sentence. In addition to the aforementioned measures, recidivism information and was collected from Canadian Police Information Centre records, and time-at-risk was calculated as the number of days from the offender’s release to the date of the first postrelease conviction. The first author scored the items for all the measures apart from SIR–R1 during the original incarceration. SIR–R1 was administered at the time of admission by the parole staff. TTV was scored using archival information postrelease by one of the authors. VRAG–R was scored similarly to TTV by the lead author. An independent rater coded 30 randomly selected files to assess interrater reliability.

“Results of the current study demonstrated an overall modest predictive validity of the VRAG–R within our correctional sample, but failed to support its application using the associated risk likelihood bins. Although the VRAG–R showed a high level of association with other measures utilizing historical items, it demonstrated only a moderate degree of predictive validity for both general and violent recidivism. […] It is interesting to note that little change in predictive validity was observed when Facet 4 was both removed from the VRAG–R, as well as replaced with the ARE of the TTV suggesting that the Antisociality item of the VRAG–R could be removed without changing the predictive utility of the measure.” (p. 514)

“When the predictive validity of the VRAG and VRAG–R was examined over time, both measures displayed poor short-term predictive accuracy. […] Despite the performance of the two measures appearing to increase over time and maintaining a relatively moderate level of predictive accuracy, the poor short-term performance of the two measures is worrisome as the greatest proportion of recidivism occurs early after the initial release from an institution. It may be that the fluctuation in predictive validity seen within the short-term is reflective of the impact of environmental factors on risk (e.g., community supervision, short-term treatment effects). Such factors may diminish with the passage of time, resulting in greater predictive accuracy in the long-term due to the influence of the underlying risk (i.e., static risk) posed by the offender (e.g., the offender reaches the expiry of his sentence and is no longer under the jurisdiction of the criminal justice system).” (p. 514)

“The VRAG–R’s high level of interrater reliability in the present study was consistent with the values found for actuarial measures in previous prediction studies. The items of the VRAG–R are clearly defined, easy to score, and less prone to scoring error. Moreover, the ability to remove the Antisociality item from the measure without compromising predictive accuracy could facilitate more efficient administration and less need for intensive training (e.g., PCL–R training). […] As the VRAG–R has replaced [the total PCL-R score] with the simpler Facet 4 (Antisociality) score, it may prove to have more consistent scoring between raters. Similarly, the VRAG–R does not contain the diagnostic items of the original VRAG such as schizophrenia and personality disorder which, like the PCL–R, require clinical judgment.” (p. 514)

Translating Research into Practice

“The VRAG–R may hold some promise in terms of clinical practice for risk assessment purposes. Much like the SIR–R1, it identifies salient historical risk factors that contribute to an offender’s likelihood of risk, provides a risk estimate of future offending, and effectively communicates this risk estimate by stating it as a percentage of reoffending at two future time points. However, as it is a measure that relies solely on static risk factors, the VRAG–R does not meet the criteria of helping to provide strategies for managing or reducing an offender’s level of risk, and is therefore unsuitable for this purpose. It must therefore be used in conjunction with a measure that would provide this information.” (p. 515)

“Overall, while providing some support for the use of the VRAG–R with male offenders, results of the current study have implications for clinical practice. With respect to positive aspects of the VRAG–R, first, results of the current study demonstrate that the predictive validity of the revised VRAG is comparable to that of the original version. Second, our results replicate earlier research findings regarding the limited utility of the PCL–R as part of the VRAG. Third, the strong interrater reliability of the measure between trained clinicians shows that the VRAG–R is both relatively easy to score and can be scored consistently across raters. This is important, as this consistent scoring reflects the stringent scoring criteria intended by the authors as described by Harris et al. (2015). Despite these positive aspects, caution is warranted when interpreting the results for short-term outcomes given the low AUC values observed for both the VRAG and VRAG–R following initial release from custody. However, given the increase in AUC values over time, clinicians may be somewhat more confident in using the VRAG and VRAG–R for making long-term predictions. However, we recommend that cross-validation with a larger sample is required before the VRAG–R can be adopted for clinical use in correctional settings.” (p. 515)

Other Interesting Tidbits for Researchers and Clinicians

“There are several limitations in the current study. For instance, the use of file information to retrospectively code some of the measures for the current study may limit the usefulness of the results due to missing information or a lack of opportunity to clarify file information. Despite this, every effort was made to ensure that all data could be accurately coded. […] Larger sample sizes will be required to provide reliable estimates of risk among correctional offenders be accurately coded.” (p. 515)

“Concerning statistical power, attempts were made to account for the sample size through the statistical methods selected (e.g., nonparametric statistical analyses). […] The sample size for the current study was sufficient for these types of analyses. Indeed, statistical significance was achieved for effect sizes considered small to moderate in magnitude and the sample size of the current study is not unlike the sample sizes in applied risk assessment studies previously conducted with Canadian offenders.” (p. 515)

“Another potential limitation concerns the generalizability of the results, which may be limited due to the homogenous nature of the sample given that the majority of the offenders within the current cross-validation sample were Caucasian. Validations with samples that are more racially diverse are needed before conclusions about the breadth of effectiveness of the VRAG–R can be drawn.” (p. 515)

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add! To read the full article, click here.

Authored by Kseniya Katsman

Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.

Impact of Risk Assessment on Recidivism Rates in Intimate Partner Violence

The level of assessed risk moderates the relation between risk management and recidivism; the greater the risk, the more serious the legal sentence. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2017, Vol. 41, No. 4, 344–353

Disentangling the Risk Assessment and Intimate Partner Violence Relation: Estimating Mediating and Moderating Effects

Authors

Kirk R. Williams, University of California, Irvine
Richard Stansfield, Rutgers University—Camden

Abstract

To manage intimate partner violence (IPV), the criminal justice system has turned to risk assessment instruments to predict if a perpetrator will reoffend. Empirically determining whether offenders assessed as high risk are those who recidivate is critical for establishing the predictive validity of IPV risk assessment instruments and for guiding the supervision of perpetrators. But by focusing solely on the relation between calculated risk scores and subsequent IPV recidivism, previous studies of the predictive validity of risk assessment instruments omitted mediating factors intended to mitigate the risk of this behavioral recidivism. The purpose of this study was to examine the mediating effects of such factors and the moderating effects of risk assessment on the relation between assessed risk (using the Domestic Violence Screening Instrument-Revised [DVSI-R]) and recidivistic IPV. Using a sample of 2,520 perpetrators of IPV, results revealed that time sentenced to jail and time sentenced to probation each significantly mediated the relation between DVSI-R risk level and frequency of reoffending. The results also revealed that assessed risk moderated the relation between these mediating factors and IPV recidivism, with reduced recidivism (negative estimated effects) for high-risk perpetrators but increased recidivism (positive estimate effects) for low-risk perpetrators. The implication is to assign interventions to the level of risk so that no harm is done.

Keywords

intimate partner violence, risk assessment, risk management, punishment, recidivism

Summary of the Research

“The present research … analyz[es] the interrelations between risk assessment (in this case using the DVSI-R), risk management strategies related to court orders for incarceration and/or supervision, and IPV recidivism during an average 18-month follow-up period postassessment. The present research seeks to replicate some aspects of prior work, but it focuses on the impact of risk management decisions on recidivism by exploring how sentences imposed for jail or probation mediate the risk-IPV recidivism relation” (p. 345).

“The first objective was to determine how much of the estimated effect of risk assessment on recidivism was mediated by risk management strategies … The second objective of the present research was to assess the extent to which the level of assessed risk moderates the relation between risk management strategies and recidivism” (p. 349).

“‘[H]igher-risk offenders need more intensive and extensive services if we are to hope for a significant reduction in recidivism. For the low-risk offenders, minimal or even no intervention is sufficient.’ This assumption implies that the relation between risk assessments and risk management strategies is a monotonic, linear function: the greater the assessed risk, the greater the intensity of interventions … more intense interventions were associated with an increase in recidivism for low-risk offenders, but they were associated with a decrease in recidivism for high-risk offenders” (p. 346).

“Higher risk perpetrators were likely to be sentenced to more days on probation or in jail, in addition to being more likely to recidivate. In short, the greater the risk level, the greater was the seriousness of sentencing (probation or jail time), and the greater was the frequency and likelihood of IPV recidivism … That is to say, greater time sentenced to probation or jail was more likely to be associated with IPV recidivism among low-risk perpetrators than high-risk perpetrators … the relation for low-risk perpetrators is positive, but it is negative for high-risk perpetrators, meaning that as the likelihood of any sentence for jail time increased, the probability of rearrest for IPV recidivism increased for low-risk perpetrators but decreased for high-risk perpetrators” (pp. 349-351).

Translating Research into Practice

“[R]isk management strategies are typically recommended for IPV perpetrators, based on the assessment of risk, with the intent of altering their behavior … the overall total effect of risk assessment on recidivism is a combination of the direct effect of risk assessment on recidivism and the indirect effects of risk assessment on recidivism as mediated by risk management strategies. The interrelations between risk assessment, risk management, and recidivism … will influence the nature of the bias in estimating the total effect. Hence, formally incorporating risk management strategies as mediators of the relation between assessed risk and actual subsequent recidivism becomes vital to delineate empirically the overall relation between risk and recidivism and to account for any confounding influences” (p. 346).

“Virtually all previous studies of IPV risk assessment, particularly those designed to determine the predictive validity of risk assessment instruments, focused on the relation between calculated risk scores and subsequent reoffending. As discussed earlier, by focusing solely on this relation, previous studies of the predictive validity of risk assessment instruments, including the DVSI-R, omitted an integral component of the RNR model—risk management. This omission increases the chances that the estimated relation between those scores and recidivism will be contaminated by what happened to supervised offenders as a result of risk management efforts” (p. 348).

“The results also revealed that assessed risk moderated the relation between risk management strategies and IPV recidivism, with low-risk perpetrators worse off following a recommendation of probation time or jail time” (p. 351).

Other Interesting Tidbits for Researchers and Clinicians

“The “risk principle” assumes that criminal and violent behavior can be predicted. More importantly, this principle further assumes the intensity of interventions should be aligned with the risk level of perpetrators. Their greater need means that programs serving a greater percentage of high-risk offenders may be more effective than programs focused on individuals with a lower risk for reoffending” (p. 344).

“The core of the need principle is to identify criminogenic needs and align treatment to those needs. The principle of responsivity requires that programs also match their services to the general and specific ways that offenders respond best to services: cognitive-behavioral and social learning approaches, in general, and individual characteristics and preferences, in particular” (p. 344).

“[S]entencing outcomes, such as probation and jail, may place low-risk and high-risk perpetrators in close contact with each other, thus facilitating the transmission of proviolence normative beliefs. Further, such outcomes may undermine many of the characteristics rendering low-risk perpetrators as low-risk—especially the prevalence of prosocial activities and individuals in their lives … imposing interventions on low-risk perpetrators that are better suited for high-risk perpetrators violates a principle of the RNR model: ‘match intensity of services to the risk level of cases’” (p. 351).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Eliza Kopelman

Eliza Kopelman is a first year master’s student in the Forensic Psychology program at John Jay College. She graduated in 2015 with her B.A. in psychology and English from Brandeis University, and then went on to work as a community residence counselor at McLean Hospital in Belmont, MA before coming back to school. Eliza’s research experience is on levels of psychopathy in sex offenders, and her professional interests include crime scene analysis and violent risk assessment.

ODARA + SARA = Prediction of Intimate Partner Violence and General Criminal Recidivism

The combination of two risk measures, the Ontario Domestic Assault Risk Assessment (ODARA) and the Spousal Assault Risk Assessment (SARA), has been effective in assessing violence risk and predicting recidivism in intimate partner violence offenders. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2017, Vol. 41, No. 5, 440-453

Incremental Prediction of Intimate Partner Violence: An Examination of Three Risk Measures

Authors

Mark E. Olver, Department of Psychology, University of Saskatchewan
Sandy Jung, Department of Psychology, MacEwan University

Abstract

Improvements in the risk prediction of domestic violence against intimate partners have the potential to inform policing practices in the prevention of further victimization. The present study examined the incremental predictive validity of 3 measures of risk for intimate partner violence (IPV)—Spousal Assault Risk Assessment (SARA), Ontario Domestic Assault Risk Assessment (ODARA), and the Family Violence Investigative Report (FVIR)—for IPV, general violence, and general recidivism outcomes. The sample featured 289 men and women who were reported to police for IPV and followed up approximately 3 years post release. Archival ratings of the 3 measures demonstrated that SARA scores showed incremental validity for IPV recidivism, ODARA scores incrementally predicted general violence, and both tools incrementally predicted general recidivism. The FVIR did not incrementally predict any outcomes. Fine grained analyses demonstrated that the Psychosocial Adjustment domain of the SARA contributed most uniquely to the prediction of IPV. Survival analysis supported the use of the SARA and ODARA in tandem for appraising risk for IPV or general criminal recidivism. Calibration analyses using logistic regression modeling also demonstrated 3-year recidivism estimates for SARA and ODARA scores. Implications for the use of multiple tools in clinical practice are discussed, particularly for combining the SARA and ODARA measures to augment IPV risk assessment and management.

Keywords

Intimate partner violence, SARA, ODARA, risk assessment, recidivism

Summary of the Research

“Although existing research has demonstrated that the SARA and the ODARA perform comparably in predicting IPV [Intimate Partner Violence] recidivism, these tools were developed for related but slightly different purposes. The ODARA was originally developed as an easy-to-use actuarial tool to assess risk of spousal assault…However, more than three quarters of the items are static in nature and the ODARA may thus be limited when informing intervention services or assessing changes in risk; that is, it would do little to inform dynamic areas to target for IPV intervention, and the tool is not structured to assess possible reductions in risk from treatment or other change agents. In contrast…nearly half of the items on the SARA are ostensibly dynamic variables that are potentially changeable and which have conceptual overlap with the Central Eight (e.g., recent employment instability, recent relationship instability, substance abuse concerns, assault supportive attitudes)” (p. 442).

“[Q]uantitative reviews of the IPV risk literature reveals that most, if not all, studies have heretofore examined the instruments in isolation, as IPV ‘risk instrument silos’; it has been our observation that efforts to examine the integration of any of these tools has been noticeably absent from the literature” (p. 442).

“Data was coded from a random selection of 300 cases of IPV that were reported to a local police service over a 4-year period from 2010 to 2013. Eligibility criteria included IPV cases that involved a perpetrator and a complainant who were, at one time, intimate partners (e.g., break-up prior to the occurrence, currently partners), the incident led to formal charges for direct violence or the threat of direct violence to an intimate partner, and it was clear who was the perpetrator (i.e., cases were excluded in scenarios where both parties were assaultive in the incident and both were charged or not charged)” (p.442).

“Items from the three measures described above [the ODARA, the SARA, the FVIR] and the recidivism outcomes were coded from extensive review of police file documentation. As the present study is retrospective in nature, these instruments were coded prior to obtaining recidivism data and thus blind to outcome to prevent criterion contamination” (p. 443).

“The sample was followed up an average of 3.30 years (SD = 1.16) in the community, during which time, 9.3% (n = 27) were convicted for a new intimate partner violent offense, 15.6% (n = 45) were convicted for any new violent offense, and 47.4 (n = 137) were convicted for any new offense” (p. 444).

“In the first block, ODARA scores were significantly associated with increased IPV, any violent, and general recidivism, with each one-point increase in ODARA score being associated with a 28% to 39% increase in the hazard of one of those recidivism outcomes. SARA scores were incrementally predictive of IPV recidivism at p = .063 (Model 1) after controlling for the ODARA, and significantly incrementally predicted general recidivism (Model 5); however, SARA scores did not incrementally predict general violent recidivism above and beyond the ODARA (Model 3). FVIR scores did not incrementally predict any of the three recidivism outcomes controlling for the ODARA (Models 2, 4, and 6)” (p. 445).

“First, the SARA seemed to outperform the ODARA and the FVIR in the prediction of IPV recidivism; it was the only measure to come close to the incremental prediction of recidivism when controlling for the ODARA, but not vice versa. By contrast, if there was a winner in the prediction of general violence, it would be the ODARA in this sample, as only it uniquely incrementally predicted this outcome irrespective of the instrument controlled; however, neither the SARA nor FVIR uniquely predicted general violence when controlling for the ODARA…finally, both the SARA and ODARA demonstrated incremental predictive validity in the prediction of general recidivism over time; as such, there was unique variance within each instrument that was informative in the prediction of this broad outcome” (p. 451).

Translating Research into Practice

“The implications are that use of a single instrument scoring in the medium range could result in an over estimate or under estimate of risk if a second instrument generated a different estimate. The results would seem to provide at least partial support for the incremental predictive validity of the tools at the categorical risk level for these two recidivism outcomes” (p. 451).

“The FVIR struggled the most of the three instruments and did not have incremental value in the prediction of any of the three recidivism outcomes after controlling for two well-established IPV risk assessment measures” (p. 451).

“These findings would suggest that there is value added in using more than one tool in IPV risk assessment, such as the SARA and ODARA…from a purely empirical standpoint, on their own they each predict their targeted recidivism outcomes, and when taken together, they seem to have different strengths in prediction tasks, at least in the present sample…the tools in tandem yield a more comprehensive volume of information to inform risk assessment and management” (p. 452).

Other Interesting Tidbits for Researchers and Clinicians
“Follow-up analyses demonstrated that the SARA Criminal History and Psychosocial Adjustment subscales contained most of the risk variance in the prediction of the three outcomes. The results of Cox regression analyses utilizing the scale components demonstrated that it was the Psychosocial Adjustment subscale that drove the SARA’s incremental prediction of IPV recidivism, while this and the Criminal History subscale performed well relative to the ODARA in the prediction of any recidivism” (p. 451).

“The results suggest that pairing the latter two measures [SARA and ODARA] in IPV assessment may be fruitful and that tools may complement one another; while the tools independently or in combination can estimate risk for IPV, any violence, and general recidivism, the SARA has domains that may be targeted for intervention and risk management” (p. 452).

“[M]ost measures are examined by networks of people linked to their development, and thus the present body of work provides an independent examination of important psychometric properties of the ODARA and the SARA. The inclusion of female perpetrators also builds on existing IPV risk assessment research with female perpetrators…further research may profit from examinations of larger numbers of female perpetrators to replicate and extend these findings” (p. 452).

“The dynamic predictive validity of the SARA, and more specifically, its capacity to measure changes in IPV risk, is currently an unknown property; further research on a treated or supervised sample of IPV men and women may be fruitful in evaluating its capacity to assess dynamic IPV risk” (p. 452).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Casey Buonocore

Casey Buonocore is currently a student in John Jay’s BA/MA Program in Forensic Psychology. Her research interests include serious mental illness, risk assessments, and competency evaluations. After earning her Master’s, she plans to pursue a doctoral degree in clinical psychology.

Clinical Experience May Affect The Predictive Validity of the PCL-R

Clinical experience may impact the variability in scores on the PCL-R and its predictive capacity for future violence. This is the bottom line of a recently published article in International Journal of Forensic Mental Health. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | International Journal of Forensic Mental Health | 2017, Vol. 16, No. 2, 130-138

Does the Predictive Validity of Psychopathy Ratings Depend on the Clinical Experience of the Raters?

Authors

Marcus T. Boccaccini, Sam Houston State University
Katrina A. Rufino, University of Houston
Hyemin Jeon, Sam Houston State University
Daniel C. Murrie, Institute of Law, Psychiatry, and Public Policy, University of Virginia

Abstract

We compared the predictive validity of Psychopathy Checklist- Revised (PCL-R; Hare, 2003) scores assigned to 80 sexual offenders by two trained graduate students (using file review) and at least one licensed clinician (using file review and interview). There was more variability among the scores assigned by the licensed clinicians than those assigned by the graduate students, and the scores assigned by licensed clinicians were stronger predictors of future offending than those assigned by the graduate students. Findings raise questions about a possible association between clinical experience and PCL-R score validity and highlight the need for more research in this area.

Keywords

Psychopathy checklist; PCL-R; risk assessment; clinical experience

Summary of the Research

“International surveys show that the Psychopathy Checklist-Revised (Hare, 2003; PCL-R) is among the most widely used measures in risk assessment, and is often introduced as evidence in legal proceedings addressing risk. Meta-analyses tend to support the use of these Psychopathy Checklist (PCL) measures, consistently showing that PCL measure scores are small to moderate-sized predictors of recidivism and misconduct. There is, however, a significant amount of heterogeneity in PCL score predictive effects across studies, with some studies reporting much larger effects than others. Sample and study design features that explain some of this variability include sex, race, location of study (e.g., country), and whether scores are assigned on the basis of file review only or both file review and interview. Existing meta-analyses have not examined whether rater training or experience might also explain variability in predictive effects” (p. 130).

“The PCL-R manual does not require users to have any specific level of prior experience or training, although the PCL-R publisher (MHS) lists the PCL-R as a “Level C” product, which requires an advanced degree in psychology or psychiatry and completion of a graduate-level course in psychometrics. The PCL-R manual does however recommend that users have “extensive relevant experience” with forensic populations, demonstrated by “completion of a practicum or internship in a clinical-forensic setting, or several years of relevant work-related experience” (Hare, 2003, pp. 16–17). In one recent field study examining PCL-R scores in sex offender risk assessment, scores from evaluators who appear to have been more experienced (i.e., had conducted 35 or more assessments) were predictive of future violent offending, while scores from less experienced evaluators were not. It could be that the less forensically experienced clinicians were unduly influenced by crime details (i.e., sexual offenses) or based their scoring on a more global “nice- or bad-guy” impression of the offender, which the PCL-R manual describes as a common scoring bias among novice raters” (p. 130 – 131).

“Ultimately, the extent to which clinical experience and training are related to PCL scoring accuracy is unclear because none of the existing PCL studies have compared predictive effects from more experienced to less experienced raters. It may be that predictive effects from studies with research assistant raters would be even stronger if researchers used experienced clinicians” (p. 131).

“In this study, we compare the predictive validity of PCL-R scores for 80 sexual offenders who had been scored by two graduate student raters—both of whom had ample PCL-R training and several years of supervised clinical experience—and at least one licensed clinician. The 80 offenders were part of a larger sample of 124 offenders in a treatment program for civilly committed sexual offenders. Each offender had been civilly committed as a sexually violent predator in the state of Texas, and each had been evaluated by a state-contracted psychologist or psychiatrist prior to commitment. These evaluators were required by statute to assess for psychopathy (Texas Health and Safety Code x841.023) and all of the evaluators used the PCL-R” (p. 131).

The graduate student raters had a high level of agreement with each other. However, when compared to the licensed clinicians from the original evaluations, there were only low to moderate levels of agreement. The expert evaluators also had greater variability in their scores than the graduate student raters. Even though the graduate students evidenced greater reliability in their scoring, the PCL-R scores assigned by the licensed clinicians were the only ones that predicted offender outcomes.

Translating Research into Practice

“In this study, PCL-R scores from licensed clinicians outperformed those from graduate student raters with MA degrees and several years of supervised clinical experience, suggesting a possible association between clinical experience and the validity of PCL-R scores. This finding seems unexpected when considered alongside the larger body of research examining the association between experience and accuracy in clinical psychology. Recent reviews show that there is only a very small association between experience and accuracy and graduate students tend to perform no better or worse than practicing clinicians in many contexts” (p. 134).

“One possible explanation for our findings and the recent PCL:YV findings is that PCL-R assessments require what one researcher has described as “skillful inquiry”; the ability to focus information gathering resources on diagnostically relevant issues. It may be that more experienced clinicians are better than less experienced clinicians at knowing what information to collect. In one of the few studies to examine the associations between training, experience, interview strategy, and diagnostic accuracy, graduate students and practicing doctoral level psychologists asked questions to a computer-simulated patient. The computer program used the content of the evaluator’s first question to generate one of 203 possible answers. The evaluators could ask as many follow-up questions as they wanted, each followed by a computer-generated response. Years of experience and level of training were associated with the number of diagnostically relevant questions evaluators asked and diagnostic accuracy, but not the number of non-diagnostically relevant questions (e.g., background, history) they asked. In other words, more experienced evaluators made more accurate diagnoses because they asked more diagnostically relevant questions” (p. 135).

“Of course, the concept of “skillful” inquiry need not apply narrowly, only to interviews, but seems relevant to the task of skillfully identifying relevant details amid lengthy records (a skill that may be especially relevant in this study give that the graduate student raters could not perform interviews). Skillful inquiry may help explain our finding that
experienced clinicians showed more variability than graduate students in the scores they assigned. If experienced evaluators ask more diagnostically relevant questions and fewer non-diagnostically relevant questions, their scores should vary more than those from less experienced clinicians due to them picking up on valid indicators of psychopathic traits and being less affected by the types of nondiagnostic information that can lead to score inflation (e.g., offense details)” (p. 135).

“This study adds to a growing body of research that addresses the complexity that underlies real world PCL-R scoring. While there is evidence that the reliability and validity of PCL-R scores may be weaker in the field than in structured research studies, recent findings suggest that scores from some evaluators are more predictive than scores from other evaluators. This study examined clinician experience as one variable that may explain some of the variability in the predictive validity of PCL measure scores. Although findings from our study must be interpreted cautiously due to the more experienced raters having access to more data (i.e., clinical interview), our study adds to the small, but growing empirical literature suggesting that evaluator experience might matter in the context of risk assessment. Our findings, along with those from other recent studies, suggest that it is time to reexamine what we know about the role of experience in the accuracy of forensic assessment. Rather than answering complex questions about experience and accuracy, these exploratory findings should prompt further studies carefully designed to better explore the role of training and experience in assessment” (p. 136).

Other Interesting Tidbits for Researchers and Clinicians

“Although we found an association between experience and predictive validity, findings from this one study alone certainly do not provide conclusive evidence that PCL-R scores from more experienced raters outperform those from less experienced raters. Our findings are limited to a setting in which we have already documented especially large amounts of measurement error in PCL-R scores, and the graduate students were not able to interview offenders. Because those with more experience always had access to more data (i.e., interview), it is impossible to know the extent to which the differences in predictive validity we observed were attributable to rater experience, access to interview data, both experience and access, or some other factor (e.g., other rater characteristics). Thus, it is best to view our findings as preliminary, documenting the need for further research that examines the possible role of experience and conducting interviews in PCL-R and forensic assessment instrument scoring” (p. 134).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Reed

Amanda L. Reed is a doctoral student in John Jay College of Criminal Justice’s clinical psychology program. She is the Lab Coordinator for the Forensic Training Academy. Amanda received her Bachelor’s degree in psychology from Wellesley College and a Master’s degree in Forensic Psychology from John Jay College of Criminal Justice. Her research interests include evaluator bias and training in forensic evaluation.

New Fordham Risk Screening Tool May Be Able to Accurately Identify Patients In Need Of A Full Violence Risk Assessment

The Fordham Risk Screening Tool may be able to identify patients in need of further violence risk assessment and screen out patients who would receive low risk ratings. This is the bottom line of a recently published article in Law and Human Behavior. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Law and Human Behavior | 2017, Vol. 41, No. 4, 325-332

Determining When to Conduct a Violence Risk Assessment: Development and Initial Validation of the Fordham Risk Screening Tool (FRST)

Authors

Barry Rosenfeld Fordham University
Melodie Foellmi Fordham University
Ali Khadivi Albert Einstein College of Medicine
Charity Wijetunga Fordham University
Jacqueline Howe Fordham University
Alicia Nijdam-Jones Fordham University
Shana Grover New School for Social Research, New York, New York
Merrill Rotter Albert Einstein College of Medicine

Abstract

Techniques to assess violence risk are increasingly common, but no systematic approach exists to help clinicians decide which psychiatric patients are most in need of a violence risk assessment. The Fordham Risk Screening Tool (FRST) was designed to fill this void, providing a structured, systematic approach to screening psychiatric patients and determining the need for further, more thorough violence risk assessment. The FRST was administered to a sample of 210 consecutive admissions to the civil psychiatric units of an urban medical center, 159 of whom were subsequently evaluated using the Historical Clinical Risk Management-20, version 3, to determine violence risk. The FRST showed a high degree of sensitivity (93%) in identifying patients subsequently deemed to be at high risk for violence (based on the Case Prioritization risk rating). The FRST also identified all of the patients (100%) rated high in potential for severe violence (based on the Serious Physical Harm Historical Clinical Risk Management-20, version 3, summary risk rating). Sensitivity was more modest when individuals rated as moderate risk were included as the criterion (rather than only those identified as high risk). Specificity was also moderate, screening out approximately half of all participants as not needing further risk assessment. A systematic approach to risk screening is clearly needed to prioritize psychiatric admissions for thorough risk assessment, and the FRST appears to be a potentially valuable step in that process.

Keywords

violence, risk assessment, screening, triage, psychiatric patients

Summary of the Research

“The use of structured instruments that can help guide clinical decisions about violence risk is well established. In some settings, a thorough violence risk assessment is a required element of the clinical or forensic evaluation (e.g., determining whether a patient acquitted not guilty by reason of insanity if suitable for release into the community). In many settings; however, the decision as to whether a thorough evaluation of violence risk is necessary is made on a case-by-case basis. Unfortunately, little guidance exists as to how clinicians or administrators should make the decision to utilize a risk assessment instrument. Although tools to assess violence risk are readily available, a thorough violence risk assessment requires considerable time and resources, both of which are in short supply in most clinical settings. [It has been] estimated that a thorough violence risk assessment requires approximately 15 hours for a trained evaluator, far exceeding the resources available in most clinical settings (although a less thorough assessment can certainly be done more quickly). Hence, institutions are likely forced to triage admissions to determine where to focus their resources. To date, the process of determining when or whether to conduct a violence risk assessment has been largely unstructured, untested, and inconsistent, with considerable potential for error” (p. 325).

“Several published instruments have been described as violence screening tools; however, these instruments are more accurately characterized as brief measures that gauge the likelihood of future violence. Instruments such as the V-RISK-10, the Clinically Feasible Iterative Classification Tree, and the Violence Screening Checklist do not provide a true screening function, which is typically conceptualized as casting a broad net to identify a subgroup of individuals who require further examination. Rather, these instruments are intended to efficiently differentiate higher and lower risk individuals, a goal more consistent with triage—a rapid approach designed to determine how to prioritize assessment or intervention resources. In the context of violence risk, a triage approach would determine which patients need a violence risk assessment most urgently. A violence risk-screening instrument, on the other hand, would identify those patients who need further evaluation—that is, a comprehensive violence risk assessment. Hence, screening tools are most effective when the have a very high degree of sensitivity (i.e., identifying all or most of the individuals with a designated condition) and a meaningful level of specificity (i.e., to eliminate a sufficient number of cases as not needing further attention) or the reverse (near perfect specificity and an adequate level of positive predictive accuracy). Other indices of classification accuracy, such as the tool’s positive predictive value, are less useful in the context of screening because they are unduly influenced by the base rate of the condition under investigation” (p. 326).

“In response to the need for a systematic approach to violence risk screening and following the recommendations of the New York State/New York City Mental Health-Criminal Justice Panel’s report, our research team created the Fordham Risk Screening Tool [FRST]. Drawing on face-valid content, the FRST is a flow chart designed to help clinicians decide which patients need a thorough violence risk assessment” (p. 326).

“[T]he FRST is an algorithm intended to determine the need for a comprehensive violence risk assessment in psychiatric inpatients. Based on face-valid variables that should raise concerns about the possibility of violence, the FRST classifies an individual as needing a violence risk assessment when the patient displays recent and severe violent behavior, threats or ideation. Recent is operationalized as the preceding 6 months, and severe reflects behavior, threats, or ideation that has or could plausibly result in physical harm that requires medical attention. In addition to the historical FRST risk factors that form the basis of the core algorithm, the tool also elicits clinician ratings of three current risk factors that should be considered: agitation/hostility, paranoia or threat/control override symptoms, and refusal of medication” (p. 327).

“Given that the goal of screening is to identify cases in which a condition might be present (in this case, high risk of future violence), the criterion that the FRST is designed to predict is a high risk rating on a well-validated risk assessment instrument. This criterion is ideal for a screening instrument, given that a wide range of intervening variables can impact whether an individual actually engages in violence (e.g., intervention, incapacitation). Indeed, one result of a thorough risk assessment is the identification and implementation of effective risk-management strategies, hopefully resulting in a lower risk of violence in the future. It follows that individuals deemed to represent a high risk of future violence should receive aggressive interventions intended to reduce the likelihood of actual violence, whereas individuals deemed to be low risk may require little or no further intervention” (p. 326).

“Study participants were a nonoverlapping, consecutive sample of psychiatric patients (N 210) admitted to a large, private, nonprofit hospital in New York City. Most participants were brought to the hospital by the police (n = 34, 16.0%) or emergency medical personnel (n = 97, 45.5%), and the vast majority of patients were involuntarily hospitalized (n = 158, 75.2%)… The sample included 120 males (57.1%) and 90 females (42.9%), ranging in age from 18 to 68 years old (M 37.5, SD 14.2). Half of the participants were African American (n = 108, 53.5%), whereas 62 (30.7%) identified as Hispanic and 17 (8.4%) as Caucasian, non-Hispanic; 15 individuals (7.4%) were classified as an other racial/ethnic group, and these data were missing for eight individuals (3.8%)” (p. 327).

“Following admission to one of the study institution’s three psychiatric units, patients were interviewed using the FRST by a study research assistant (all of whom were doctoral students in clinical psychology). Approximately 1 week after admission, each patient was also interviewed by a second graduate student using a structured interview developed for this study to rate the HCR-20V3 and generate risk estimates” (p. 327).

“The results of this study provide preliminary support for the FRST, demonstrating a very high degree of sensitivity in identifying high-risk individuals. Moreover, the FRST identified more than 80% of those individuals rated as moderate or high risk. Simultaneously, the screening process guided by the FRST eliminated approximately half of all patients from needing further evaluation regarding risk of violence. Indeed, even when the criterion was expanded to include individuals identified as posing a moderate risk of violence on the summary risk ratings, the FRST retained sensitivity rates of approximately 80%. Thus, this study provides strong support for the FRST in differentiating those psychiatric patients who require a more comprehensive risk assessment from those who do not” (p. 329).

Translating Research into Practice

“Few issues generate as much concern in mental health settings as the potential for violence. Many expect that mental health professionals will be able to identify those individuals who represent a serious risk of violence and thereby prevent or minimize the occurrence of violence. Although important advances have occurred in the field of violence risk assessment, the time and resources needed to adequately assess whether an individual poses a significant risk of violence are substantial. Given this dilemma, there is a clear need for an effective method of screening psychiatric patients to determine where to apply those scarce resources. The FRST is intended to provide this function by utilizing a structured, reliable, and objective approach to identifying those psychiatric patients that are most likely to require a further, more comprehensive assessment” (p. 328-329).

“Given the time and resources needed to conduct a violence risk assessment, it may be impractical to evaluate even half of all admissions to a psychiatric facility… administrators and clinicians will need to determine whether a tool such as the FRST is appropriate for their setting. These decisions involve weighing the risks and benefits of not only sensitivity and specificity rates but also the cost of maintaining the status quo… Of course, the FRST need not (and should not) necessarily be considered a final judgment; decisions not to conduct further assessment can always be revisited if additional information or behavioral changes heightened concerns about possible violence” (p. 329).

Other Interesting Tidbits for Researchers and Clinicians

“Obviously, determining an adequate level of sensitivity for a screening instrument is far more complex than simply generating seemingly strong classification accuracy. Even a single false-negative result (i.e., the failure to identify an individual who poses a high risk of violence) could result in severe consequences, both for the individuals who are the target of the violence as well as the clinicians and administrative staff who might be held responsible for failing to prevent the harm. Ideally, a risk-screening instrument would have perfect sensitivity, but that goal is likely impossible unless virtually all individuals are screened in. Indeed, the results of this study are perhaps as close to perfect accuracy as could be hoped for, albeit not eliminating as many false-positive cases as might be desired. Nevertheless, continued research is necessary to identify additional indicators that might help identify those high-risk individuals who are missed by the FRST” (p. 329).

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Amanda Reed

Amanda L. Reed is a doctoral student in John Jay College of Criminal Justice’s clinical psychology program. She is the Lab Coordinator for the Forensic Training Academy. Amanda received her Bachelor’s degree in psychology from Wellesley College and a Master’s degree in Forensic Psychology from John Jay College of Criminal Justice. Her research interests include evaluator bias and training in forensic evaluation.

Sex offenders with “deadly combination” of psychopathy and deviant sexual interests are not more likely to reoffend

Research found no evidence that sexual offenders who possess high levels of both psychopathic traits and deviant sexual interests are at a higher risk for reoffending than other sexual offenders. This is the bottom line of a recently published article in Psychological Assessment. Below is a summary of the research and findings as well as a translation of this research into practice.

Featured Article | Psychological Assessment | 2017, Vol. 29, No. 6, 639–651

Field Measures of Psychopathy and Sexual Deviance as Predictors of Recidivism Among Sexual Offenders

Authors

Paige B. Harris, Sam Houston State University
Marcus T. Boccaccini, Sam Houston State University
Amanda K. Rice, Sam Houston State University

Abstract

Offenders with high levels of both psychopathy and deviant sexual interests are often described as being more prone to recidivate than other sexual offenders, and many forensic evaluators report considering this psychopathy and sexual deviance interaction when coming to conclusions about sex offender risk. However, empirical support for the interaction comes from studies using sexual deviance measures that are rarely used in the field. We examined the ability of Psychopathy Checklist-Revised (PCL-R) field scores and possible field measures of sexual deviance (e.g., paraphilia diagnosis, offense characteristics) to predict sexual recidivism among 687 offenders released after being evaluated for postrelease civil commitment (M follow-up = 10.5 years). PCL-R total scores and antisocial personality diagnoses were predictive of a combined category of violent or sexual recidivism, but not sexual recidivism. Paraphilia diagnoses and offense characteristics were not associated with an increased likelihood of reoffending. There was no evidence that those with high levels of both psychopathy and sexual deviance were more likely than others to reoffend. Although the psychopathy and sexual deviance interaction findings from prior studies are large and compelling, our findings highlight the need for research examining the best ways to translate those findings into routine practice.

Keywords

psychopathy, PCL-R, sexual deviance, risk assessment, sexual offenders

Summary of the Research

“Offenders with high levels of both psychopathy and deviant sexual interests—the so called “deadly combination” of sex offender traits (Hare, 1999, p. 189)—are often described as more prone to recidivate than other sexual offenders (Hare, 2003; Witt & Conroy, 2008).” (p. 639)
“A recent survey of sex offender risk assessment practices revealed that the two most commonly used field measures of sexual deviance were a documented history of deviant sexual behavior (96%) and a paraphilia diagnosis (83%; Boccaccini et al., 2017). None of the psychopathy and deviance interaction studies have directly examined these commonly used field measures of sexual deviance.” (p. 639–640).
“Although standardized deviance instruments rely, to varying extents, on some of the same diagnosable and documented behaviors used by clinicians in the field, findings from standardized measures may not generalize to the less structured diagnostic and clinical judgment practices evaluators use in the field.” (p. 640).
“Evaluators in field settings have varying levels of training and experience, and there is often no oversight of their assessment and scoring practices. Research has shown that systematic evaluator differences in psychopathy measure scoring tendencies can lead to higher levels of measurement error in field scores than research scores.” (p. 640)
“There are also reasons to expect a significant amount of measurement error in some field measures of sexual deviance, particularly diagnoses. […] Although diagnoses are widely applied in practice, there is no specific training required for assigning diagnoses and almost certainly a large amount of variability in the practice of assigning them.” (p. 640)
“Although many evaluators report using information about the combined pattern of psychopathy and sexual deviance when coming to conclusions about sex offender risk, the interaction study literature is probably smaller and more variable than many evaluators suspect.” (p. 640–641)
“Our goal was to conduct the first field validity study of the sexual deviance measures that evaluators report using in the field and to examine whether offenders with high levels of deviance and psychopathy were more likely to reoffend than other offenders.” (p. 641)
“We obtained PCL-R scores, diagnoses, and information about postrelease sexual and violent offenses from 687 sexual offenders who were released from custody after being evaluated for civil commitment as sexually violent predators (SVP). […] Participants were 687 male sexual offenders who were evaluated for civil commitment as SVPs, but released (i.e., not committed) after their evaluations.” (p. 641)
“We collected information for this study from evaluator’s behavioral abnormality evaluation reports and a copy of the records that the Texas Department of Criminal Justice (TDCJ) provides to the evaluators. These records include information about index and prior offenses, prison disciplinary infractions, and prior testing conducted by TDCJ staff (e.g., Static-99, Personality Assessment Inventory).” (p. 642)
Measures used: Psychopathy Checklist Revised (PCL-R), Diagnosis, Documented history of deviant sexual behavior, Screening Scale for Pedophilic Interests (SSPI). Postrelease arrest data was used to assess recidivism rates.
“Using 10 possible field measures of sexual deviance, we found little evidence that offenders with higher levels of deviance were more likely to reoffend than others.” (p. 645)
“There was also no evidence that offenders with high levels of both psychopathy and sexual deviance were more likely to reoffend than other offenders.” (p. 646)
“There are at least three possible explanations for our deviance measure findings. First, it may be that our field measures of sexual deviance do not measure sexual deviance, or do not measure sexual deviance adequately enough for there to be an interaction effect. […] A second possible explanation for our findings is the questionable field reliability of our psychopathy and deviance measures. […] A third possible explanation is that there is something unique about our sample.” (p. 646–648)
“It seems more likely that evaluators rely on paraphilia diagnoses and offense characteristics when coming to conclusions about deviance. […] However, none of these deviance indicators were predictors of recidivism in this study.” (p. 648)
“PCL-R scores and antisocial personality disorder diagnoses were predictive of the combined category of violent or sexual recidivism, but not sexual recidivism. […] Although some have argued that this combined category of violent and sexual arrests may be a better indicator of true sexual recidivism than sexual arrests alone (Rice, Harris, Lang, & Cormier, 2006), the small negative effect for post-release sexual arrests in our sample argues against using antisocial personality disorder diagnoses for predicting sexual recidivism.” (p. 648)

Translating Research into Practice

“This study found that the types of offender and offense characteristics that field evaluators report using as indicators of sexual deviance were not predictive of postrelease sexual offending among a large sample of sexual offenders who underwent risk assessments before release from prison. In this field study, there was no evidence that offenders with high levels of both psychopathic traits and deviant sexual interests—the so called “deadly combination” of sex offender traits—were more likely to reoffend than other sexual offenders.” (p. 639)
“Although evaluators report using paraphilia diagnoses and documented incidents of deviant sexual behavior as their primary field measures of sexual deviance, we found that no field measure produced the same type of interaction effect documented in prior studies. Thus, at this point, there does not appear to be sufficient empirical support for using the combination of PCL-R scores and these field measures for coming to conclusions about offender risk, at least in the context of SVP evaluations. Evaluators who wish to base their risk assessment practices on documented empirical support should look to studies reporting significant interaction effects (e.g., Harris et al., 2003; Olver & Wong, 2006; Seto et al., 2004), and the deviance measures used in those studies, such as plethysmography or the Violence Risk Scale: Sex Offender version (Wong, Olver, Nicholaichuk, & Gordon, 2004).” (p. 649)

Other Interesting Tidbits for Researchers and Clinicians

“The average age at the time of the evaluation was 42.82 years (SD = 11.44). The number of sexual offense victims ranged from one to eight (M = 2.54, SD = 1.08). Although offenders eligible for SVP civil commitment must have been convicted of at least two contact sexual offenses and be serving a sentence for a sex offense at the time of evaluation (Texas Health & Safety Code, Title 11, Chapter 841, 2000), the qualifying sexual offenses can be against the same victim.” (p. 641)
“We did not include offenders who had been civilly committed because of the intensive monitoring and supervision within the SVP program. […] There is no doubt that the exclusion of committed offenders affected the study sample, but the extent to which their exclusion might explain the difference between our findings and prior studies is less clear. […] Although there would have been more variability in our sample if we had been able to study recidivism among the committed offenders, there was no evidence that our sample differed dramatically from the samples used in other psychopathy and sexual deviance studies. […] We also have no information about participation in sexual offender treatment or postrelease supervision, factors which may help explain the low base rate of recidivism in this study.” (p. 648)
“Although we did not find any evidence of an interaction between antisocial personality diagnoses and sexual deviance for predicting recidivism, this is clearly an area in need of more research.” (p. 648)
“Although our findings add to a growing body of research suggesting weaker reliability and validity in field settings, they also highlight possible areas for growth. Nonfield studies show us that instruments and assessment practices can attain desired levels of reliability and validity, and it is possible for field practices to improve. […] Documenting the current performance of field practices allows us to better understand where we are underperforming, helps us to identify areas in need of improvement, and provides a baseline for future studies aimed at improving in field performance.” (p. 649)

Join the Discussion

As always, please join the discussion below if you have thoughts or comments to add!

Authored by Kseniya Katsman

Kseniya Katsman is a Master’s student in Forensic Psychology program at John Jay College of Criminal Justice. Her interests include forensic application of dialectical behavior therapy, cultural competence in forensic assessment, and risk assessment, specifically suicide risk. She plans to continue her education and pursue a doctoral degree in clinical psychology.