§ Jour 2 Taylor & Francis Group Beuth on

Edecational Eerie Journal of Research on Educational Effectiveness

ISSN: 1934-5747 (Print) 1934-5739 (Online) Journal homepage:

Building Student Ownership and Responsibility: Examining Student Outcomes from a Research- Practice Partnership

Marisa Cannata, Christopher Redding & Tuan D. Nguyen

To cite this article: Marisa Cannata, Christopher Redding & Tuan D. Nguyen (2019): Building Student Ownership and Responsibility: Examining Student Outcomes from a Research-Practice Partnership, Journal of Research on Educational Effectiveness, DOI: 10.1080/19345747.2019.1615157

To link to this article:

N Bi View supplementary material @ sea Published online: 31 Jul 2019.

XJ (sg Submit your article to this journal @

® View Crossmark data@


Full Terms & Conditions of access and use can be found at



Building Student Ownership and Responsibility: Examining Student Outcomes from a Research-Practice Partnership

Marisa Cannata* @®, Christopher Redding” and Tuan D. Nguyen** @

ABSTRACT KEYWORDS This article is situated at the intersection of two trends in education student ownership and research: a growing emphasis on the importance of co-cognitive traits responsibility and the emergence of research-practice partnerships to more effect- Koscognitivestialts ively scale effective practices. Our partnership focused on building stu- pone elu dent ownership and responsibility for their learning, which means ftpece creating school-wide practices that foster a culture of learning and partnership engagement among students. We find no evidence of an overall rela-

tionship between the student ownership and responsibility innovation

and student outcomes that is robust to model specification. However,

when results are separated by school, two schools each saw increased

student grades and fewer absences that persisted across both years of

implementation. We also use qualitative data about the quality of

implementation to understand how school-level engagement in the

improvement partnership may be related to observed outcomes.

Despite decades of ambitious high school reform, substantial evidence demonstrates that reforms are inconsistently implemented and struggle to impact student learning (Datnow, Hubbard, & Mehan, 2002; Mazzeo, Fleischman, Heppen, & Jahangir, 2016). In response, there has been a proliferation of new approaches to achieving school improve- ment at scale, such as improvement science and design-based implementation research (Bryk, Gomez, Grunow, & LeMahieu, 2015; Cohen-Vogel et al., 2015; Fishman, Penuel, Allen, & Cheng, 2013). While these methods differ in specifics, they share an assump- tion that improvement at scale comes not from replicating a proven program, but by practitioners and researchers working together with iterative, continuous improvement approaches to design and implementation (Bryk et al., 2015; Cohen-Vogel et al., 2015; Fishman et al., 2013).

These new approaches to scale reflect an increasing demand for a new type of research and design infrastructure such as research-practice partnerships (RPPs)

CONTACT Marisa Cannata Q © Vanderbilt University, PMB 414, 230 Appleton Place, Nashville, TN 37203, USA

*Peabody College of Education and Human Development, Vanderbilt University, Nashville, Tennessee, USA;

School of Human Development and Organizational Studies in Education, University of Florida, Gainesville, Florida, USA *Tuan D. Nguyen is now affiliated with the College of Educatoin, Kansas State University, Manhattan, Kansas, USA

A version of this paper was presented at the annual meeting of the Association of Education Finance and Policy in Denver, CO, March 17-19, 2016.

i) Supplemental data for this article is available online at on the publisher's website. © 2019 Taylor & Francis Group, LLC

(Tseng & Nutley, 2014). RPPs are long-term, mutualistic, and intentionally structured collaborations between researchers and practitioners that bring original, rigorous research to bear on a particular problem of practice (Coburn, Penuel, & Geil, 2013). RPPs can take multiple forms, such as research alliances, design-research projects, and Networked Improvement Communities (NICs), the latter of which has particularly grown in prominence due to work by the Carnegie Foundation for the Advancement of Teaching to advance improvement science and continuous improvement approaches (Coburn et al., 2013; Cohen-Vogel et al., 2015). With an increasing array of funding sources emphasizing RPPs, such as the Institute of Education Sciences (IES), National Science Foundation, Spencer Foundation, and William T. Grant Foundation, research on RPPs is accumulating. For example, innovations developed through research-practice partnerships have been tested in rigorous efficacy studies with desirable outcomes (Booth et al., 2015; Sowers & Yamada, 2015). There is also research on the challenges of engaging in RPPs and the internal structures that support RPP work (Coburn & Penuel, 2016; Conaway, Keesler, & Schwartz, 2015; Lopez Turley & Stevens, 2015), including the role of rapid-cycle continuous improvement processes in NICs (Hannan, Russell, Takahashi, & Park, 2015; Russell et al., 2017; Tichnor-Wagner, Wachen, Cannata, & Cohen-Vogel, 2017).

While the research base on RPPs in general, and NICs in particular, is growing, so is the recognition that improving these partnerships requires greater attention to how spe- cific dynamics of RPPs are related to the outcomes they achieve (Coburn & Penuel, 2016). That is, in the words of IES director Mark Schneider (2018, para. 8), research is needed “[identifying] the functions, structures, or processes that work best for increasing the impact of RPPs.” In this article, we report evidence of student outcomes from a multiyear partnership within one large, urban district. In this partnership, we established an NIC with a shared theory of improvement in which three schools co-developed prac- tices to improve student ownership and responsibility using a continuous improvement process. We evaluate evidence by assessing changes in grades, course failures, discipline, and attendance. We adopt a mixed-methods framework to describe both evidence of student outcomes and the features of the RPP and improvement approach that may have shaped these outcomes. We seek to answer two research questions:

1. To what extent did the co-developed practices reduce students’ disciplinary infractions and the number of failed courses and improve student grades and attendance?

2. How do specific features of our improvement approach (shared theory of improvement, rapid-cycle testing, and research-practice partnership) explain differences in implementation quality and observed outcomes?

We begin by describing the three core features of our improvement approach and situate those features within the broader literature on NICs and RPPs. We then detail the specific context of our partnership and how these improvement features were enacted. Next, we describe the data used for this study, as well as the quantitative and qualitative methods. We then present our results, first providing quantitative evidence on four student outcomes: attendance, discipline, grades, and course passing. We then


Partnership-level Improvement Features

e Building Theory of Improvement e Developing Capacity for Rapid-Cycle Testing and Measurement Infrastructure e Building Norms and Capacity to Engage in Partnership e Leading and Organizing the Network

School-level Implementation Short-term Improvement of deep and outcome Features sustained change indicators

Long-term outcomes

e Understanding Theory of Improvement

e Rapid-Cycle Testing

e Capacity to Engage in Partnership

Figure 1. Conceptual framework of how networked improvement communities shape implementation and student outcomes.

describe how each school enacted the improvement approach and their level of imple- mentation to explain school-level differences in outcomes.

Networked Improvement Communities and Continuous Improvement

Several models of RPPs have been proposed to address the contextual factors that shape implementation and scale up (Coburn et al., 2013). Understanding how the structures and processes of RPPs contribute to student outcomes requires unpacking the different features of RPPs and theorizing how those features contribute to positive outcomes. Because our partnership was structured as a Networked Improvement Community, we draw heavily from four frameworks about RPPs and NICs (Coburn et al., 2013; Cohen-Vogel, Cannata, Rutledge, & Socol, 2016; Henrick, Cobb, Penuel, Jackson, & Clark, 2017; Russell et al., 2017). Looking across these frameworks and the literature on scaling up school reform, we identify three core features of NICs that seem particularly poised to shape the successful implementation and scaling of improvement initiatives: deep understanding of a theory of improvement, rapid-cycle testing, and building educator capacity to engage in the network and lead improve- ment in their school. Further, we argue that the ways in which NIC work shapes out- comes depends on both partnership-level activities and school-level activities. Frameworks to assess RPPs attend to both indicators that reflect the partnership over- all, such as communication processes and research infrastructure, and the work of individual members, such as capacity to engage in new roles. Figure 1 presents a vis- ual representation of our framework and illustrates how both school-level and part- nership-level features shape implementation and outcomes. We focus first on the school-level improvement features and how they contribute to successful implementa- tion at scale.


Understanding the Theory of Improvement

RPPs are defined, in part, by a shared focus on a particular problem of practice (Coburn et al., 2013), and NICs are distinguished by their use of a theory of improvement. Russell and colleagues describe the importance of this shared theory of improvement by noting that “theory grounds the collaborative work of the NIC by specifying the problem and aim that the NIC is pursuing and unpacking the sys- temic context that produces the problem” (Russell et al., 2017, p. 17). Importantly, the theory of improvement is not devoid of context, but is grounded in the context in which the improvement work is occurring (Cohen-Vogel, Cannata, Rutledge, & Socol, 2016). A NIC’s theory of improvement reflects both expertise from the research community and an understanding of the system that is producing the cur- rent problem (Henrick, Cobb, Penuel, Jackson, & Clark, 2017). The development of the theory of improvement that connects the ultimate goal with a shared under- standing of the drivers that contribute to that goal is critical to successfully launch- ing a NIC (Russell et al., 2017).

Shared understanding of the theory of improvement is important for achieving success at scale because many school reform efforts result in superficial changes in classroom practices or grafting new practices onto old routines without shifts in deeper pedagogical principles (Elmore, 1996; Spillane, Reiser, & Reimer, 2002; Supovitz, 2008). Deep instruc- tional change requires altering teachers’ beliefs about how students learn, expectations for students, and the norms of interaction in schools and classrooms (Coburn, 2003). Educators are often not aware of how the theories of learning embedded in reform initia- tives may conflict with their own unstated theories of learning, which then creates chal- lenges for implementation (Hatch, 2002). On the other hand, when educators have a deep, internalized understanding of the ideas embedded in a reform, they can apply them in situations where the reform itself does not offer explicit guidance (Honig, Venkateswaran, McNeil, & Twitchell, 2014). Attending to how educators understand the theory of improvement underlying the reform initiative is even more important as NICs shift from a focus on fidelity of implementation to maintaining integrity with adaptive integration (Cannata & Rutledge, 2017; Hannan et al., 2015). Successful adaptations hinge on whether educators understand not only the innovation practices themselves, but the theory behind them (Dede, 2006; Thompson & Wiliam, 2008).

Rapid-Cycle Testing

NICs seek to engage educators in collecting data for rapid-cycle improvement efforts and build up to larger scale change through continuous improvement (Coburn et al., 2013; LeMahieu, Grunow, Baker, Nordstrum, & Gomez, 2017; Russell et al., 2017). Bringing educators and researchers together to collaboratively design, study, and iterate on effective practices as educators adapt them to their specific contexts is a common element of a variety of emerging approaches to collaborative reform initiatives (Cohen- Vogel et al., 2015). The Plan, Do, Study, Act (PDSA) cycle is one common approach that requires identifying the aim of a particular improvement, testing the change idea, and monitoring whether the observed changes led to the intended improvement (Langley, 2009). Rapid-cycle testing should be iterative, as results from an individual test


can lead to either revising and testing the change again, or deciding to scale it into more diverse contexts. Rapid-cycle testing should also be problem-focused and tied to the theory of improvement (Langley, 2009).

Rapid-cycle testing is an important component of this improvement paradigm because, while there are many innovations that have positive outcomes in rigorous efficacy trials, it is less clear whether these innovations are always usable for schools (Coburn & Penuel, 2016). Reforms that are not consistent with the local organizational context—no matter how effective they may be in controlled trials—face serious difficulties with implementa- tion (Bodilly, 1998; Bryk et al., 2015; Elmore, 1996; Fullan, 2001). Further, educational implementation research has long noted that schools adapt innovations to focus on their unique needs, sometimes to ill effect (Datnow & Park, 2009; Siskin, 2016). Continuous improvement approaches to scale can bring discipline to the adaptation process as school teams share evidence of what they have accomplished with others focused on the same problem (Cannata, Cohen-Vogel, & Sorum, 2017; LeMahieu et al., 2017). Rapid-cycle test- ing can also address another challenge to scaling up reform—buy-in and ownership— because local practitioners are involved in developing and testing the innovation (Cohen- Vogel et al., 2016; Datnow, Hubbard, & Mehan, 2002). This attention to local context is particularly important for achieving scale because innovations must be able to fit within contexts that vary greatly while coping with change, promoting ownership, building cap- acity, and enabling effective decision-making (Cohen et al., 2013). At the same time, NIC members must engage in rapid-cycle testing and adaptive integration through a disci- plined process and have sufficient capacity (Russell et al., 2017). This disciplined approach to improvement ensures that educators are making evidence-based decisions about which practices to implement and how to implement them, as multiple forms of evidence are examined to test and refine the practices in their context.

Educator Capacity to Engage in Partnership

Research on school reform demonstrates that successful implementation of change ini- tiatives requires some existing capacity at the school level (Hatch, 2002). Scholarship on RPPs makes clear that engaging in an RPP requires new roles for both researchers and practitioners. For example, Coburn et al. describe a core component of NICs as having a “primary focus on developing local capacity” (2013, p. 13). Indicators of this dimen- sion include whether members “develop professional identities” consistent with their work, “assume new roles and develop the capacity to conduct partnership activities,” and experience “change in the practice organization’s norms, culture, and routines around the use of research and evidence” (Henrick et al., 2017, p. 25).

The delineation of these different dimensions of capacity reflect the need for human, social, and cultural capital to engage in an NIC (Rubin, Nguyen, & Cannata, 2015). At both an individual and organizational level, schools need to have sufficient human cap- ital, including the knowledge, skills, resources, and personnel to engage in the work expected of them (Bryk, Sebring, Allensworth, Luppescu, & Easton, 2010; Durlak & Dupre, 2008; Spillane et al., 2002). At the organizational level, a productive social infra- structure, such as a history of collaboration in the school, stability of faculty, and trust allow individuals to access social capital to support implementation (Bryk et al., 2010;


Murphy & Torre, 2014; Redding, Cannata, & Taylor Haynes, 2017). Finally, capacity in an NIC involves normative components because educators must adopt new professional identities, suggesting that new forms of cultural capital need to be established (Henrick et al., 2017; Russell et al., 2017).

Partnership-Level Improvement Features

School improvement research suggests that understanding how these improvement features are enacted at the school level helps to predict the quality of implementation and ultimately student outcomes. Yet, within the context of a collaborative reform model, school-level fea- tures are supported by partnership-level dynamics. For example, building the theory of improvement based on evidence from participating sites is a core element of what the NIC’s hub organization should be doing when launching the NIC (Russell et al., 2017). The other core elements are helping NIC members learn how to use improvement research methods and building the analytic infrastructure (Russell et al., 2017), both of which support school- level engagement in rapid-cycle testing. In addition to supporting the school-level improve- ment features, the partnership level should also be focused on leading and operating the network as a whole, including establishing collective norms, maintaining the partnership’s focus on learning, and providing evidence that connects classrooms to district-level proc- esses (Coburn et al., 2013; Henrick et al., 2017; Jaquith, 2017; Russell et al., 2017).

The presence of both school- and partnership-level contributions to NICs brings meth- odological challenges when assessing the impact of the innovations developed through them. First, as all schools benefit (or suffer) from the partnership-level work, there is no way to distinguish the effect of this overall partnership from the specific practices that are developed. In other words, the treatment is comprised of both the partnership itself and innovation design. That said, the school-level improvement features can be assessed separ- ately at each school. Second, while the innovation was rooted in a common theory of improvement, the use of rapid-cycle testing and adaptive integration resulted in differen- ces in how the innovation was implemented across these schools. Consequently, to pro- vide lower and upper bounds of the effects of the innovation on course failure, grades, days absent, and number of disciplinary infractions, we adopt two estimation strategies—a gains model and a difference-in-differences estimation strategy to compare student out- comes among the innovation schools to the remaining schools in the district (Angrist & Pischke, 2009). To address the concern with school-level variation, we examine both the overall effect of participating in this collaborative reform initiative and differences among the innovation schools to examine how each school’s enactment of the improvement fea- tures are associated with student outcomes. In the next section, we describe the particular context of the NIC from which the data are drawn.

Research-Practice Partnership: Developing the SOAR Innovation

This partnership began with intensive study of higher- and lower-performing high schools in the district, and identified Student Ownership and Responsibility (SOAR) as the key differentiating feature. The NIC was launched in 2012-13, when the SOAR design team was established. The SOAR design team was comprised of about 25


Table 1. Timeline of partnership activities.

District design School SOAR Phase Time period team activity team activity Research activity Design and Winter/Spring —_ Develop initial n/a Observation of design Development 2013 innovation team meetings

prototype; monthly two-day meetings

Piloting 2013-14 Monthly meeting to Monthly two-day Lead PDSA trainings


Scale out

oversee PDSA cycles and work with SOAR teams to

develop innovation

Quarterly meetings to

plan for scale out and sustainability

District offices gradually

assume responsibility for facilitating network and supporting work in schools; quarterly

network meetings; engage in PDSA cycles to develop the innovation; initial teacher professional development; biweekly check-

in meetings

Implementation of

fully developed innovation; continued engagement in PDSA cycles; quarterly network meetings to share learning; monthly check-ins; teacher professional development approximately monthly

Year 2 of full

implementation in innovation schools; continue to engage in PDSA and share learning in quarterly

and facilitate cycles; observations of network meetings; One research visit to each school

Support PDSA cycles;

two research visits to each school; observations of network meetings

Support PDSA cycles;

One research visit to each school; observations of network meetings

meetings; four meetings;

schools join network professional development as necessary

individuals, including teachers, school administrators, central office administrators, pro- gram developers, and researchers. The first author, whose background is in studying school reform and educational policy and led the research that identified SOAR, was part of the research team and on the district design team. The central office administra- tors included the deputy superintendent, who oversaw teaching and learning, leadership, and student support; the data and accountability director; curricular specialists; and advanced academics. School-level members included assistant principals and teachers across subject areas. A retired principal served as a coordinator who supported logistics and acted as a liaison to both the research team and schools. The deputy superintendent and coordinator recommended central-office members of the design team. School-level members were recommended by their principal. In 2012-13, the design team met for two days every month to examine the initial research, conduct needs analysis related to SOAR, engage in capacity-building activities, and design a SOAR prototype aimed at creating norms and school-wide practices that foster learning and engagement among


students (see Table 1 for timeline of design activities and meetings). These meetings were organized around two connected learning goals: learning about implementation and scale and learning about SOAR and its enactment within the district context. Thus, the district design team spent time building the theory of improvement around SOAR while also building capacity and norms to engage in the NIC.

The theory of improvement around SOAR was grounded in both the specific findings from this district and the broader literature on the importance of co-cognitive student attributes such as efficacy, problem solving, and academic and behavioral engagement (Dweck, 2007; Fredricks, Blumenfeld, & Paris, 2004; Schunk, 1991). This focus on changing students’ mindsets and providing them problem-solving skills to engage in academic work builds on a robust empirical research base on co-cognitive factors (Farrington et al., 2012). Specifically, SOAR focused on building a student growth mind- set and developing problem-solving skills to improve student engagement (Blackwell, Trzesniewski, & Dweck, 2007).

With the districtwide team outlining the theory of improvement, each innovation school established a SOAR team in 2013-2014 to pilot practices and further develop them within their context. SOAR teams had six to eight members who were almost all teachers (one school had an assistant principal on the team). Each school team had a teacher in the Advancement Via Individual Determination (AVID) program, which was an existing district program considered to be related to SOAR. The SOAR team was responsible for leading implementation in their school, often by working with the administration, developing SOAR practices, using PDSA, and providing training for other teachers in the school to enact SOAR practices. During this year, school teams met as a whole group once a month to deepen their knowledge of SOAR, learn how to engage in rapid-cycle testing, and share what they were learning through their PDSA cycles. The district design team continued to provide overall leadership for the network. Specifically, they organized trainings around PDSA, determined the capacities the net- work members needed, designed learning activities around those areas, and facilitated network sharing of what each school was learning. This contributed to revising the shared practices to develop SOAR in students. Through this development process, school teams were also given leeway in customizing the common district design to their particular context. While each school design team implemented these common elements of the design, its delivery varied in ways that may shape student outcomes. By the end of the 2013-14 development year, the core practices of the innovation included (1) teaching about growth mindset, (2) student grade monitoring and goal-setting activities, (3) problem-solving activities that supported students in improving their grades, and (4) a behavioral reflection form designed to get students to reframe problematic behaviors before creating a disciplinary referral. The final SOAR component focused on building a school culture around SOAR. Full implementation began in 2014-15 and continued in 2015-16.

The research team engaged in other activities that supported the NIC’s work. One, the research team conducted visits to each innovation school to learn about implemen- tation and shared memos about implementation with the schools and district design team. More details on these visits are provided below. Two, the research team, along with the program developers, provided training and coaching to support teams in


conducting rapid-cycle testing. Finally, the research team worked with the district design team to develop outcome indicators by which the NIC would assess their overall pro- gress. These outcome measures were designed to capture both shorter- and longer-term outcomes that reflected the SOAR theory of improvement and were tied to important district goals. The long-term outcomes of GPA and course failure reflect the focus on co-cognitive traits. Recent research has indicated that high school course grades are bet- ter predictors of college access, college graduation, and longer-term life outcomes than test scores. GPA, for example, is a consistent predictor of graduating from both high school and college, and a “primary driver of differences by race/ethnicity and gender in educational attainment” (Farrington et al., 2012, p. 3). Further, failing a course predicts dropping out of high school (Bowers, Sprott, & Taff, 2013). The short-term measures are attendance and disciplinary infractions, which reflect the SOAR theory of action of academic and behavioral engagement. Improving attendance can also improve high school graduation and college enrollment (Faria et al., 2017; Mac Iver & Messel, 2013). In the district in this study, student disciplinary infractions cover a range of behaviors, such as bullying, fighting, or disrespect to teachers, but are often met with a similar out- come: in-school or out-of-school suspension. Such disciplinary action is associated with student grades and achievement test scores (Arcia, 2006).

Research Design Study Sample

This southwestern district served approximately 80,000 students; the majority were low income or from traditionally underserved racial/ethnic groups. The innovation schools were selected through a collaborative process with district personnel and school admin- istrators. The selected schools expressed an interest and willingness to participate in this innovative reform model. While a school’s value-added performance was not used in the selection of these schools, their school value-added suggests that they were moder- ately performing schools in the district.

Table 2 presents the characteristics of the three innovation schools and other high schools in the district. Fewer innovation school students received free or reduced-price lunch or identified as Black, although more innovation school students identified as Hispanic. Students from Hancock failed more classes and had lower average grades than students in non-innovation schools in the district. Students at Williams and Smith had higher grades than students in non-innovation schools, and students at Smith also failed fewer courses. Students at Smith and Hancock also were absent less frequently. Compared to the district, Smith and Hancock had fewer Black students but more Hispanic students.

We use qualitative and quantitative data to understand outcomes of the partnership in one district over two years of implementation. The qualitative data for this study come from two sources: observations of NIC meetings and field visits in these three innovation high schools. During the 2014-15 and 2015-16 school years, we observed all 13 meetings where SOAR teams met together as an entire network. The first author, as a member of the design team, was a participant observer in these meetings, and the second and third authors (along with additional researchers) took field notes, collected


Table 2. Descriptive characteristics prior to implementation.

Non-Innovation Schools Innovation Schools (all) Williams Smith Hancock Number of failed classes 1.10 1.07 1.25 0.74** 1.35* Average grade 82.22 83.03*** 83.19% 83.84** 81.28% Days absent 11.36 9.98*** 10.39 9.91* 9.32** Number of disciplinary infractions 0.58 0.45*** 0.40 0.58 0.33 Free or reduced-price lunch 0.69 0.64*** 0.44*** 0.74 0.84* Black student 0.25 0.14*** 0.20 0.12* 0.05** Hispanic student 0.59 0.69*** 0.46 0.80* 0.92*** Other race 0.04 0.03*** 0.04 0.03 0.01** Gifted 0.12 0.12 0.17 0.07* 0.11 Days enrolled 169.42 169.70* 170.74 168.73 169.45 Withdrew 0.13 0.13 0.16 0.11 0.13 Late start 0.10 0.10 0.10 0.09 0.09 Number of courses 13.21 12.85*** 12.99 12.89 12.52** Fraction of Black students 0.25 0.14 0.21 0.12 0.05* Fraction of Hispanic students 0.58 0.68 0.45 0.79* 0.92** Fraction FRPL 0.62 0.59 0.40** 0.68 0.79** School size 1766.30 1740.51 2010.00 1859.00 1016.00 Observations 14406 4439 1798 1695 946 Note. t-test of significant differences accounts for school-level clustering. Descriptive statistics reported for 2013-2014

school year. *p < 0.05; **p < 0.01; ***p < 0.001.

feedback forms from NIC members, and collected artifacts of documents shared or cre- ated during the meeting. Further, two four-day field visits occurred in the first year of implementation (October 2014 and April 2015) and one three-day visit in March 2016, the second year of implementation. Over these three visits, we conducted nine principal interviews, 17 interviews with other administrators, 72 interviews with members of the SOAR team, 173 interviews with other teachers, 19 focus groups with teachers or sup- port staff, and 34 student focus groups. We use the fieldwork data to provide evidence on enactment of the practices and how participants described the outcomes they were achieving as a result of this work. The interviews and focus groups focused on their understanding of student ownership and responsibility and specific innovation practices, support for the innovation, the extent to which they enact SOAR practices, how the SOAR team worked as a group, the capacities of the SOAR team, and how they engaged in rapid-cycle testing. Interview and focus group’ guides are in _ the Supplementary Appendix.

We also take advantage of rich administrative data from the district for all high school students enrolled in the 2010-2011 to 2015-2016 school years. The data used for this study includes 91,410 student-year observations. About 3% are dropped from the analysis due to missing data. The analytic sample includes 33,215 unique student observations.

Improvement Approach and Implementation Measures

Following each research visit, data were coded using an a priori framework for imple- mentation that focused on facilitating conditions (will, capacity, beliefs about SOAR, and alignment to context); implementation supports (implementation team dynamics, engagement in rapid-cycle testing, leadership, resources/training); implementation


quality, which itself involved teacher experiences with implementation (enactment of innovation practices, feedback on practices); and student experiences with implementa- tion (responsiveness, perceived outcomes). The coding team first coded several tran- scripts independently, and then compared coded transcripts to ensure they were applying codes consistently. Through multiple rounds, the coding framework was revised or clarified. For example, capacity was expanded to differentiate between cap- acity of teachers to enact SOAR practices, capacity of the implementation team to lead the work, and organizational capacity of the school.

Once the coding team agreed on the final coding scheme, they independently coded all transcripts. After coding was complete, a researcher prepared detailed memos for each school for each major theme in the coding framework. This process was repeated after each field visit. In Year 2, the coding scheme was further expanded to include antecedents to sus- taining and scaling the practices. Memos around these themes at each time point served as the primary documents for investigating the enactment of the improvement approach and quality of implementation. Specifically, three coders independently categorized each school on the three improvement features (understanding the theory of improvement, engagement in rapid-cycle testing, and capacity to engage in the partnership). Detailed rubrics that guided these categories are included in Supplementary Table Al. For understanding the the- ory of improvement, we sought out evidence that both the SOAR team and other school stakeholders demonstrated an understanding of SOAR and how the specific practices were theorized to contribute to student ownership. For rapid-cycle testing, we sought out evidence that the SOAR team’s enactment of PDSA was problem centered, iterative, used multiple forms of evidence, and resulted in evidence-based decisions on how to improve SOAR prac- tices. For capacity to engage in the partnership, we sought evidence that the SOAR team had the human, social, and cultural capital necessary.

For implementation quality, there were five SOAR practices for which we analyzed the quality of how the schools enacted the innovation practices: teaching growth mindset, goal-setting and grade-monitoring practices, problem-solving practices, rewarding positive behavior, and building a school culture around these practices. For each of these practices, four researchers independently read memos on imple- mentation quality and categorized each school as high, medium, or low enactment. High implementation quality existed when the practice was consistently implemented throughout the year. Medium implementation quality existed when the practice was implemented, but was inconsistent throughout the year. For example, a practice may have had high implementation in the beginning of the year, but waned over time. Low implementation quality reflected little to no indication this practice was imple- mented. For both the improvement approach and implementation quality measures, researchers met to reconcile their independent categorization, using a consensus pro- cess to determine the final rating.

Outcome Measures

Outcomes include students’ grades, passing rates, absences, and disciplinary infrac- tions. Student’s grades are the averages of the students’ scores for each class. In 2013-2014, this measure ranged from 0 and 100, with an average student grade of 82


(see Table 2). When operationalizing a students’ passing rate, we focus on the number of courses a student did not pass throughout the school year. Students were consid- ered to be failing a course if they did not score at least 70% in a course. As students could be registered for up to nine courses per semester, the maximum value for this variable is 18. Although the modal value for this variable is 0, on average, students did not pass one course. The measure for days absent is the number of days a stu- dent did not attend in a particular year. Student infractions is a measure of the num- ber of infractions a student received in a particular school year. Infractions include code of conduct violations for behaviors such as cheating, disrespect toward teachers, bullying, fighting, disobeying school rules, dress code violations, or possession of tobacco. Infractions also include more serious offenses such as drug or alcohol use, criminal mischief, assault, arson, felony, possession of a weapon, public lewdness, gang violence, or serious misbehavior.

We also include controls for binary indicators of student race/ethnicity (Black, Hispanic, or other race/ethnicity), free and reduced-price lunch (FRPL) status, gifted status, and grade level. Additionally, we control for the number of days in which a stu- dent was enrolled in a school, indicators of whether or not they withdrew or started after the beginning of the school year, and the number of courses in which a student is registered throughout the school year. At the school level, we control for student enroll- ment as well the proportion of Black students, percentage of Hispanic students, and stu- dents who receive FRPL.


For this study, we adopted a sequential mixed methods research design (Smith, Cannata, & Taylor Haynes, 2016; Teddlie & Tashakkori, 2006). We first conducted the quantitative analysis to ascertain the extent to which students in the innovation schools benefited from the partnership. We then drew on qualitative fieldwork data to deter- mine the degree of engagement in the NIC, quality of implementation, and participant understandings of accomplishments. This analytic process used several strategies to address potential threats to the validity of our inferences from the qualitative data, including cross-validation between researchers, triangulation among sources and per- spectives, and member checking (Miles & Huberman, 1994; Patton, 2002). For example, we sought out comparisons between perspectives of the SOAR team and perspectives of others in the school, recognizing that overreliance on the SOAR team may reflect elite bias (Miles & Huberman, 1994). We also shared versions of the researcher-developed memos on implementation with the SOAR teams and district design team. Triangulating between the qualitative and quantitative findings also encouraged us to consider rival hypotheses.

For the quantitative analysis, we used a lagged dependent variable and difference-in- difference (DD) approach to give us plausible bounds on the estimated treatment effect of the SOAR innovation under different assumptions (Angrist & Pischke, 2009). This ordinary least squares (OLS) gains model can be estimated:

Yist = Bo + B, Innovation,, + Bo Yist—1 B3Xist Ba Sse + Ve + Eist (1)


where Yj is the outcome for student i in school s in year t and Innovation, is a dummy variable for whether and when the school implemented the SOAR innovation’, Yiste_1 is the lagged dependent variable, Xj, is a vector of student controls, S, is a vector of time-varying school characteristics, y, is a year fixed effect, and €j,; is an error term. In this model, B, can be interpreted as gains in each outcome among students in the innovation schools in the post-treatment period.

This gains model would be biased by unobserved school-level factors that differ between innovation and non-innovation schools in the district and influence student performance on any of the outcome variables. To address this concern, we add to the model a school fixed effect (5,) to compare students’ prior outcomes to average student outcomes in non- innovation schools. When innovation and non-innovation schools have a similar pretreat- ment trend, the DD model represents the counterfactual change of implementing the co- developed innovation. This ordinary least squares (OLS) model can be estimated:

Vist = Bo + B, Innovations + BX ise B3 Sse 8s Vt Ejst (2)

In this model, 8; can be interpreted as the difference in student outcomes between innovation and non-innovation schools after implementation. To account for repeated observations of students over time, standard errors were clustered at the school level in Models 1 and 2 (Bertrand, Duflo, & Mullainathan, 2004).

This initial analysis estimated an average treatment effect for the students in schools that participated in this continuous improvement process. In addition to this overall treatment effect, we examined several heterogeneous treatment effects. These include differences across the three innovation schools and post-treatment year.

In additional analysis, we examined the robustness of the DD research design. An assumption of this research design is that innovation and non-innovation schools had similar pretreatment trends in the outcome variables. We tested for this assumption in two ways. We first estimated the relationship between innovation school participation and student outcomes not only in the post-treatment period but also in all years in which we have data. These estimates function as a placebo test where, conditional on covariates, any significant differences in the slope between innovation and non-innovation schools prior to treatment indicates a violation of the parallel-trends assumption. Graphically, we also show the predictions from this regression to visually examine the presence of pretreat- ment trends, when holding all other variables at their mean. Evidence of a violation of this assumption would indicate pretreatment differences between innovation and non- innovation schools that could explain why innovation schools have more positive out- comes in the post-treatment period, outside of their participation in the improvement process. In general, we found evidence of parallel trends in terms of student grades and number of failed courses but not attendance and disciplinary infractions. In addition, we found evidence that Williams High School and Smith High School pretreatment trends generally resembled the non-innovation schools. The evidence of pretreatment differences

"The post-treatment period does not include the year when the SOAR team developed and piloted the practices of the SOAR innovation. This decision is justified for two reasons. First, the piloting that did occur was limited to members of the SOAR team. Second, when a practice was piloted, it tended to only be implemented once or twice, limiting its potential impact on student outcomes. Nevertheless, it is possible that this piloting would lead to pre-treatment differences.


at Hancock High School limit our inference of the effect of SOAR at this school. We dis- cuss the full results of this sensitivity analysis below.

Findings Impact of the Co-Developed Innovation on Student Outcomes

Our first research question asks about the extent to which the co-developed innovation reduced students’ disciplinary infractions and the number of failed courses and improved student grades and attendance. We found no evidence of an overall relationship between the SOAR innovation and student outcomes that is robust to model specification. However, when results are separated by school, Williams and Smith each saw increased student grades and fewer absences persisting across both years of implementation.

Table 3 reports the gains model and DD estimates of the four outcomes: days absent, the number of disciplinary infractions, the number of classes failed, and average grades (full results are in Supplementary Appendix Table A2). In the gains model, the coeffi- cient on innovation school indicates that students in these schools had average decreases in the number of days absent (-1.17, p=0.02) and increases in their grades (0.95, p=0.08). These slight improvements translate to relatively small effect sizes: a 0.04 decrease in days absent and a 0.04 standard deviation increase in average grades. At the same time, we found a slight increase in the average number of infractions (0.08, p=0.002). However, when conditioning on unobserved school-level characteristics in the DD model, we find no evidence of a relationship between the SOAR innovation and any student outcomes.” This finding suggests that any overall gains experienced by stu- dents in the innovation schools can be explained by unobserved, but fixed, school char- acteristics between innovation and non-innovation schools.

When the results were separated by innovation school (Table 4), we found notable het- erogeneity across the innovation schools, with fairly robust evidence of improvements in Williams and Smith High Schools. In Williams High School, we found students were absent between 1.05 and 1.25 fewer days. Students’ grades improved by 0.74 to 1.42 points, on average, depending on the model. We also show a positive effect on student grades, number of failed classes, and absences in Smith High School. Students’ grades improved by an average of 1.24 to 1.64 points; they failed between 0.33 and 0.44 fewer courses, and they were absent between 1.10 and 1.36 fewer days. For both schools, the estimates were less consistent for the number of disciplinary infractions in terms of the magnitude, direction, and level of significance. The estimated effects of the SOAR innovation were less consistent in Hancock High School. In the gains model, there was no evidence of a relationship between the SOAR innovation and grades or course failure, a marginally significant decrease in absences, and a slight increase in disciplinary infrac- tions. Estimates from the DD model show that SOAR was linked with worse student out- comes, thereby offsetting the positive educational effects of the innovation in the other two schools.

?In Supplementary Tables A3 and A4, we included the lagged dependent variable when estimating the school fixed effects model. In the Supplementary Table A3 model, the estimates on the number of failed classes and average grades are significant. In Supplementary Table A4, the estimates on the number of failed classes, average grades, and days absent are significant or marginally significant for all schools, with the exception of days absent for Smith High School.


Table 3. Estimates of the effect of the innovation on student passing rates, average grades, attend- ance, and number of infractions.

Number of Number of Number of Number Days disciplinary _ failed Average Days disciplinary of failed Average absent infractions _classes grades absent infractions classes grades (1) (2) (3) (4) (5) (6) (7) (8) Innovation school —1.17* 0.08** —0.21 0.95+ —0.77 —0.02 —0.23 0.67 (0.46) (0.02) (0.15) (0.50) (0.64) (0.08) (0.20) (0.82) Constant —5.04*** —0.10 0.22 17A5*** = 5 36*** 0.30 0.32+ 77.60*** (0.98) (0.18) (0.16) (1.41) (0.88) (0.20) (0.16) (0.99) Lagged dependent variable x X X Xx Year fixed effect Xx X X x x X X Xx School fixed effect Xx X X x Observations 60,456 62,408 58,817 58,811 85,680 85,680 85,680 85,673 R? 0.41 0.27 0.27 0.54 0.12 0.12 0.11 0.21

Note. All models control for FRPL, student race/ethnicity (Black, Hispanic, other race), gifted status, days enrolled, num- ber of courses, grade level, and indicators if the student started school after the beginning of the school year or with- drew before the end of the year. Robust standard errors clustered at the school level in parentheses.

*p < 0.05; **p < 0.01; ***p < 0.001.

Table 4. Estimates of the effect of the innovation on student passing rates, average grades, attend- ance, and number of infractions, by innovation school.

Number of Number of Number of Number of Days disciplinary _ failed Average Days disciplinary failed Average absent infractions _ classes grades absent infractions classes grades (1) (2) (3) (4) (5) (6) (7) (8)

Williams HS —1.25** 0.05* —0.11 0.74* —1.05* 0.04 —0.45*** 1.42**

(0.38) (0.02) (0.08) (0.30) (0.46) (0.05)