Dropdown items
My Academies

Personal Library

Account settings

Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts (2023)

Chapter: Planning and Designing Evaluations

Project Insights

Visit NAP.edu/10766 to get more information about this book, to buy it in print, or to download it as a free PDF.

Previous chapter Next chapter
Page of 42
Search this publication

Previous Chapter: Understanding Sexual Harassment Evaluation

Page 5 Cite Bookmark

Suggested Citation: "Planning and Designing Evaluations." Lam, M., A. Falcon, and N. Merhill. 2023. Approaches to the Evaluation of Sexual Harassment Prevention and Response Efforts. Washington, DC: The National Academies Press. doi: 10.17226/27267.

Planning and Designing Evaluations

Challenges and Limitations That Arise When Evaluating Sexual Harassment Interventions

The Evaluating the Effectiveness of Interventions Workshop named several concerns unique to the evaluation of sexual harassment interventions, including “selection bias, emotional reactivity,” and “the fact that sexual harassment often goes unreported” (NASEM, 2021). The hidden nature of sexual harassment makes it difficult to evaluate, as does the fact that it is often minimized or stigmatized. Those who experience it may be “reluctant to respond to a survey on the topic or to admit being a target or victim because sexual harassment can be stigmatizing, humiliating, and traumatizing” (NASEM, 2018).

Although the problem of sexual harassment is a complex one with many challenges to successful evaluations, these challenges can be minimized through careful planning and design. For example, sexual harassment is often underreported due to fear of retaliation, feelings of institutional betrayal, and other concerns about the negative consequences of reporting (NASEM, 2018). Because of this, “relying on the number of official reports of sexual harassment made to an organization is not an accurate method for determining the prevalence” (NASEM, 2018). As discussed in the “Survey Methods” section of this paper, rates of sexual harassment can be estimated using validated survey instruments such as the Sexual Experiences Questionnaire (SEQ) (Fitzgerald et al., 1995). These instruments use descriptive, behavior-based questions to gather accurate information about the incidence of experiences that constitute sexual harassment regardless of whether respondents would label them that way.

One common concern with sexual harassment evaluation is whether recounting experiences of sexual harassment or assault may lead to the retraumatization of the respondent. Some research indicates that carefully designed studies can successfully minimize this risk, leaving participants feeling neutral or positive about participation (Cook et al., 2015; Cromer et al., 2006; DePrince and Chu, 2008). However, even within these studies, there is variation in whether participants experience stress or negative effects at the time of participation. Therefore, researchers should always consult Institutional Review Boards when conducting evaluations that ask about sexual harassment experiences and should consider how much information about an individual’s experience with harassment or other stressful life events is necessary to the evaluation, then establish clear plans within the research protocol to mitigate any potential harms (McMahon et al., 2022). For example, the protocol could include having a straightforward process for participants to access crisis counselor services at any time if they are triggered by the assessment.

Relatedly, evaluation methods that rely on the voluntary participation of those who have experienced sexual harassment (e.g., focus groups as opposed to unobtrusive observation) run into the problem of selection bias. Because focus groups involve sharing potentially sensitive information without a promise of confidentiality, those who feel especially traumatized or ashamed of their experiences may be less likely to participate compared to those who are not distressed by sharing information about their experiences. Because of the potential differences between respondents and non-respondents, the information gathered through these methods may be biased. Similarly, some populations are less likely to participate in research efforts for a variety of reasons. For example, men may be less likely to participate in research on sexual harassment

Page 6 Cite Bookmark

compared to women because they may not feel that the issue pertains to them. Black women may be less likely to participate in research performed by their organizations compared to white women due to the compounding effects of institutional betrayal based on both race and gender.

To address these biases, researchers sometimes oversample populations that are expected to have lower response rates, as well as advertise the research activities through trusted community members that can act as liaisons and diversify their outreach methods (e.g., using multiple social media platforms). Researchers can also use multiple methods of follow-up (e.g., email, phone calls, physical mail) to generate higher participation rates.

Ethical Considerations for Safe and Responsible Research

Certain evaluation methods may be considered research and therefore must meet the legal and ethical requirements for human subjects research. This includes, but is not limited to, institutional review board (IRB) oversight, and training requirements on human subjects research, responsible conduct of research, and research with vulnerable populations. IRBs are charged with independently reviewing proposed research applications to protect participants from potential harm (Grady, 2015). If an institution does not have its own IRB, that institution could coordinate with another institution that has one or submit proposals to a commercial IRB (Rice, 2008).

For research on efforts related to sexual harassment, IRBs are likely to be particularly concerned with mitigating the risk of retraumatization; however, they may not have specific expertise on sexual harassment or gender-based violence. Involving subject-matter experts in drafting the protocol would help ensure adequate protections for participants, including the following:

Carefully developing evaluation instruments to avoid retraumatization
Training interviewers and researchers on trauma-informed approaches
Training interviewers and researchers on violence against sexual and gender minorities
Tailoring efforts for underrepresented populations
Considering the research burden in terms of how frequently participants are asked to disclose traumatic experiences through studies and surveys
Making resources such as crisis counselors available for distressed participants
Asking participants about their experience completing the evaluation instrument

Page 7 Cite Bookmark

Thoughtful preparation of informed consent documents with unambiguous, detailed information about reporting requirements related to completion of the evaluation instrument and reporting resources in general, crisis counseling and mental health resources, and participant privacy and confidentiality is important.³ Clear communication about any obligations the researcher has to report allegations or knowledge of sexual harassment should be disclosed as part of the informed consent process so that participants have a choice in whether they want to report or complete the evaluation instrument (McMahon et al., 2022). Finally, providing training in university and federal reporting obligations for anyone involved in the research team would help ensure adherence to these protective measures (McMahon et al., 2022). If a researcher can obtain an exemption from mandatory reporting requirements, this will increase the confidentiality that can be offered to participants and may encourage more engagement and openness from participants.

At all times, but especially when working with groups who are underrepresented within a campus community, it is important to consider how best to protect participant privacy. Evaluation methods have different privacy and confidentiality concerns. For example, it is not possible to control the information shared in a group setting (e.g., focus groups) because the participants may tell others about what was discussed, even if asked not to. However, even with interview or survey data that is not disclosed in a group setting, it may be difficult to completely de-identify the data depending on how it is collected and stored. A combination of basic demographic questions may lead to the identification of a participant or group of participants, specifically those who are underrepresented. For example, if there are only six Black, transgender students at an institution, and public datasets for a survey that collected data on race and gender link respondents’ answers to those demographic fields, it would be easy for the public to determine the identity of the respondents. For these reasons, the maintenance of participant privacy and confidentiality would need to be addressed at every stage of the evaluation process, from data collection to storage to dissemination. Furthermore, public institutions may have different obligations for data transparency compared with private institutions, depending on the reporting requirements tied to state-funded activities.

Data ecosystems provide worthwhile opportunities for linking data and understanding services and interventions. Before sharing or centralizing data, it is important to consider the ethical uses of the data collected. Was it collected as part of a research project? Is the proposed data sharing or data centralization described in the research protocol? Was the potential for data sharing disclosed to participants when the data was collected? Is it publicly available? Could data sharing harm any of the participants involved in any way? What procedures and processes are in place to protect the confidentiality of the data? This is especially important to consider when the data cannot be fully anonymized, such as interview or exit survey data.

Building the Evaluation Team

An important first step in evaluating an intervention is considering the composition of the evaluation team. Some researchers, such as McMahon et al. (2022), recommend involving anyone who might use the data or services from the very start of the project. Individuals with expertise in diversity, equity, and inclusion are also important to include as well as representatives from any special populations within the campus

___________________

³ For more information on the components of an informed consent procedure, see Chapter 8 of Collecting Qualitative Data: A Field Manual for Applied Research (Guest, G., E. Namey, and M. Mitchell, 2013).

Page 8 Cite Bookmark

community. Applying an intersectional⁴ lens at the beginning of the project will help ensure that the intervention and evaluation processes address the needs and experiences of groups traditionally excluded (socially, politically, and institutionally) from both the community and research itself.

When developing interventions and designing an evaluation, it is important to consider who the evaluation team represents and how that could potentially influence the questions, approach, and results. Specifically, What population is being researched? Is the population of interest represented in the design and evaluation team? Are any forms of bias potentially being introduced through the design, survey methodology, or questions? Do any groups need to be engaged in a different way?

It is also important to involve individuals with expertise in the method and analysis that will be used for evaluation. In particular, local faculty or research personnel may be well suited for these roles, especially those involved in evaluation through the education and medical fields as well as implementation science. Graduate students and postdoctoral researchers may also be a source of expertise who could benefit from the additional research experience. Allowing faculty, graduate students, and other research personnel to publish the results of these evaluations not only provides motivation for them to participate, as publishing is a critical aspect of advancement in their careers, but also contributes to the breadth of knowledge available about the effectiveness of various interventions.

Institutions lacking specific expertise might consider partnering with regional or national institutions for data collection and analysis (Driver-Linn and Svensen, 2017). It is neither feasible nor sustainable for each institution to maintain in-house expertise inclusive of all potential evaluation methods. At some institutions, the Office of Institutional Research may be able to help with collection, analysis, or synthesis of data, but these resources may not be available on all campuses. Another approach to sharing expertise is through the creation of research networks or research centers⁵ for the study of sexual harassment prevention and response efforts. Examples of single-institution, multidisciplinary research groups are the University of New Hampshire’s Prevention Innovations Research Center, Johns Hopkins University’s Center for Injury Research and Policy, and Wellesley College’s Justice and Gender Based Violence Research Initiative. The National Violence Against Women Prevention Research Center is a multi-institution research initiative sponsored by the Centers for Disease Control and Prevention (CDC).

These centers and networks can bridge research and applied expertise across institutions and create opportunities for shared project development, implementation, and evaluation. They can also support staff

___________________

⁴ As coined by Kimberle Crenshaw (1989) and described by Kleinman and Thomas (2023), “intersectionality describes how the overlapping nature of social categories (such as being a woman and being Asian) combine to crsuch as advantage or disadvantage that is greater than just the sum of the experience for those with only one identity.”

⁵ One example of a research network from a different field is the Pediatric Emergency Care Applied Research Network (PECARN), which is a national, multi-institutional network created to address the small sample size issue with single institution studies in pediatrics. PECARN is organized into six nodes each representing three to four children’s hospitals in a variety of settings, ranging from rural and community to urban and academic, located across the country. In addition to increasing sample size through a multi-institutional model, this design also supports demographic and geographic diversity, which improves research generalizability. Data cleaning, analysis, project management, and study oversight are centralized through a Data Coordinating Center to ensure uniformity, maintain high quality data, and monitor safety. The Data Coordinating Center maintains a centralized data repository and also guides study implementation at each site to ensure rigorous research standards are applied uniformly across the network. Although there is no similar model for studying the effectiveness of sexual harassment interventions in higher education, the field may consider some of the lessons learned from PECARN.

Page 9 Cite Bookmark

to centralize data collection and analysis, coordinate regulatory review, oversee protocol adherence, and maintain a data clearinghouse. The cross-institutional nature of research centers and networks can increase generalizability, sample size, and power to detect differences. Lessons learned from differences or barriers to protocol implementation at each institution can also inform program development and design.

Rigorous Design and Quality Data

Another overarching concern with evaluating interventions is the rigor of the design process and resulting quality of the data. Rigorous designs are transparent, systematic, and adhere to methodologies that have been established as valid,⁶ reliable, and able to control for confounding factors. In fields such as medicine, randomized-controlled trials (RCTs) are considered the gold standard of rigorous experimental design because they can establish a causal relationship between the intervention and the outcome of interest. RCTs are the cornerstone of the drug development field and often employ quantitative methods such as the analysis of biological specimens.

Research on sexual harassment, by contrast, is rarely able to utilize RCTs (Perry, 2020).⁷ This is partially because it would be unethical to expose participants to traumatic or stressful experiences with little to no individual benefit to be gained from participation. However, it is also due to the complexity of the problem of sexual harassment and the broad range of interventions that may be employed in addressing sexual harassment.

Some sexual harassment interventions are able to be rigorously evaluated using the concepts of RCTs, such as Cornell University’s Intervene program (see Box 1).

BOX 1
An Example of Rigorous Design in Sexual Harassment Intervention Evaluations

“Cornell University employed an evaluation approach considered to be among the most rigorous to assess its Intervene program. The program includes a 20-minute online video and a 60-minute facilitated workshop. The video, a product of a joint collaboration between Cornell’s health center and its theater ensemble, depicts a number of scenarios (including sexual harassment) and shows how students can make a difference in each of them (health.cornell.edu/intervene). The university conducted a randomized controlled trial to evaluate the standalone effectiveness of the video among graduate and undergraduate students. Students were randomly assigned to either watch the video or were assigned to a comparison group that did not watch the video. Results of their study indicated that students who watched the video reported a higher likelihood to intervene for most situations compared to the comparison group that was not shown the video (Association of American Universities 2017)” (Perry, 2020).

___________________

⁶ Validity refers to the ability of a tool to accurately measure what it intends to measure, and reliability refers to the consistency of the tool’s measurements across different times and contexts (Sullivan, 2011).

⁷ “Complex social programs often do not lend themselves to the most rigorous experimental designs; they are difficult to standardize due to human delivery agents and to randomly assign and manipulate, and often more than one program can influence outcomes of interest. As a result, it is difficult to isolate the net effects of the entire program on targeted and measurable outcomes over and above the effects of other contextual factors. This reality runs counter to assumptions of a traditional evaluation approach, in which a standardized treatment can be experimentally manipulated and randomly assigned to a clearly defined target group, resulting in an indisputable causal impact on targeted outcomes” (Perry, 2020).

Page 10 Cite Bookmark

Although it is unlikely that researchers will be able to employ an RCT design for sexual harassment interventions (Perry, 2020), they can still prioritize rigor by carefully matching the goal of the intervention to the appropriate set of research questions and corresponding evaluation methods. Key to making these connections is first establishing a theory of change for the intervention itself. “Theory of change explains how each activity contributes to a chain of outcomes, each outcome being built upon the prior outcome(s), such that the final intended impact is observed” (Mahoney et al., 2022). Figure 8 in Environmental and Situational Strategies for Sexual Violence Prevention shows a modified logic model that can be useful for defining the expected short-term, intermediate, and long-term outcomes of any particular intervention (Mahoney et al., 2022). Other guides to creating a theory of change include Futures Without Violence’s “Theory of Change” and the Annie E. Casey Foundation’s four-part series “Developing a Theory of Change: Practical Theory of Change Guidance, Templates and Examples.”

However, “more rigorous designs, which include use of control groups and pretests… are not always necessary depending on the purpose of the evaluation” (McLinden, 1995, as cited in Perry, 2020). Argonne National Laboratory’s evaluation of its Core Values Shout-Outs Program is an example of a design with a lower requirement for rigorous methodology (see Box 2).

Overall, the process of alignment is intended to promote transparency and systematic design, even in a field where gold standards of rigor have not yet been established or are not strictly necessary. The figure below (Figure 1) illustrates the connection between intervention goals, research questions, and evaluation methods.

Page 11 Cite Bookmark

**FIGURE 1 Example of the connection between theory of change and evaluation methods.**
SOURCE: Authors.

Intervention goals, research questions, and evaluation methods require alignment because the kinds of methods that most appropriately gather information about one type of outcome may not provide useful information for other outcomes. For instance, two outcomes discussed in the Sexual Harassment of Women report are the prevalence of sexual harassment and the perception of institutional betrayal (NASEM, 2018). The methods and tools that will best answer research questions around prevalence (e.g., validated surveys) will not necessarily be the same tools that can best assess whether the community feels betrayed by their institution (e.g., interviews and focus groups).

Given the pervasive and harmful nature of sexual harassment, as a field and a society, we need real metrics on what is effective and high-quality, rigorous evidence derived from high-quality survey data with strong designs that enable us to understand changes in impact. The methods described in this paper are neither intended to be presented as equally effective strategies, nor are they intended to always be used individually. In most cases, a combination of methods will allow for a more comprehensive understanding of the data. The most appropriate method or combination of methods will depend on the intervention being evaluated and the research questions associated with the evaluation itself.