The Making of a Metric Co-Producing Decision-Relevant Climate Science

: Developing decision-relevant science for adaptation requires the identification of climatic parameters that are both actionable for practitioners as well as tractable for modelers. In many sectors, these decision-relevant climatic metrics and the approaches that enable their identification remain largely unknown. “Co-production” of science with scientists and decision-makers is one potential way to identify these metrics, but there is little research describing specific and successful co-production approaches. This paper examines the negotiations and outcomes from Project Hyperion, wherein scientists and water managers jointly developed decision-relevant climatic metrics for adaptive water management. We identify successful co-production strategies by analyzing the project’s numerous back-and-forth engagements and tracing the evolution of the science during these engagements. We found that effective mediation between scientists and managers needed dedicated “boundary spanners” with significant modeling expertise. Translating practitioners’ information needs into tractable climatic metrics required direct and indirect methods of eliciting knowledge. We identified four indirect methods that were particularly salient for extracting tacitly held knowledge and enabling shared learning: developing a hierarchical framework linking management issues with metrics, starting discussions from the planning challenges, collaboratively exploring the planning relevance of new scientific capabilities, and using analogies of other “good” metrics. The decision-relevant metrics we developed provide insights into advancing adaptation-relevant climate science in the water sector. The co-production strategies we identified can be used to design and implement productive scientist–decision-maker interactions. Overall, the approaches and metrics we developed can help climate science to expand in new and more use-inspired directions.

A daptation practitioners across many sectors, including resource management, land-use planning, and public health, urgently need decision-relevant science to plan for and manage the impacts of climate change (ACCNRS 2015; Moss et al. 2013;Lemos and Morehouse 2005;Kirchhoff et al. 2013a;Kerr 2011). There have been several efforts toward developing actionable (or decision-relevant) science broadly, and more specifically toward providing scientific details of the climate impacts that planners need to account for (Mach et al. 2020;Bremer and Meisch 2017;Beier et al. 2017). Resource managers, however, still report that climate information that can help to develop adaptation decisions, is not readily available to them (Moss et al. 2019; Barsugli et al. 2013;USGAO 2015;Vogel et al. 2016). This is partly on account of unresolved mismatches between scientists' and decision-makers' perceptions of what constitutes "actionable" climate information (Lemos et al. 2012;McNie 2007). One important example of this mismatch is that current climate modeling and model evaluation efforts typically focus on broad climatological metrics, such as averages or extremes in temperature and precipitation. However, in order to be actionable, resource managers need information on management-specific metrics, such as the start date of the rainy season or number of extreme heat days in the summer (Briley et al. 2015;Roncoli et al. 2009;Moss et al. 2019;Bornemann et al. 2019). This lack of focus on management-specific climate science can preclude its use in adaptation decisions, as even translation or communication of such broader information cannot move the science "off the shelf" to make it usable (Moss et al. 2019;Lemos et al. 2012;Hackenbruch et al. 2017).
The literature recognizes the importance of determining specific climatic metrics that could be most applicable for specific problems (Hackenbruch et al. 2017;Briley et al. 2015;Bornemann et al. 2019). But this task is often assumed to be solely the decision-makers' responsibility (Briley et al. 2015), and is not considered a research problem per se. However, resource managers may not know, a priori, the types of climatic metrics that could be most useful, and scientists may not always know whether they can provide information on decision-relevant metrics with reasonable skill (Briley et al. 2015;Porter and Dessai 2017;Lemos et al. 2012). This means that directly asking decision-makers to explain the types of climate information they need is rarely sufficient. Therefore, few studies have systematically identified decision-relevant metrics for sectoral adaptations (Hackenbruch et al. 2017;Vano et al. 2019;Bornemann et al. 2019). "Co-production," or iterative and continual engagement between scientists and decision-makers, is often suggested as a means to enable mutual learning and reconciliation between managers' needs and scientific priorities (Lemos 2015;Kirchhoff et al. 2013a;Weaver et al. 2014;Vogel et al. 2016;Kolstad et al. 2019). It can thus help to identify decision-relevant climatic metrics that are also tractable for modelers.
That being said, not all co-production efforts have led to positive outcomes (Lemos et al. 2018), or have been successful at understanding and responding to resource managers' needs (Lemos et al. 2018;Porter and Dessai 2017). The success of co-production is predicated on the level and quality of interactions between (and within) different groups (Porter and Dessai 2017;Wall et al. 2017;Kirchhoff et al. 2013b;Mach et al. 2020;Lemos et al. 2018;Meinke et al. 2006). While the literature provides rich guidance on the general principles and prerequisites for successful co-production (Hegger et al. 2012;Meadow et al. 2015;Lemos and Morehouse 2005;Beier et al. 2017), there is a dearth of empirically grounded guidance on co-production processes that have worked in practice (Djenontin 2018;Lemos et al. 2018;Parker and Lusk 2019).
Hence, the process of co-production is often a black box; there is no clarity on the types of scientist-decision-maker engagement processes that can be expected to result in effective two-way communications and to enable the creation of usable climate science (Porter and Dessai 2017;Mach et al. 2020;Jagannathan et al. 2020a).
In this paper we present both the process of, and outcomes from, a case of co-production, Project Hyperion, that (eventually) led to the identification of decision-relevant climatic metrics for water management decisions. As a response to calls to detail the practice of "how" co-production works (Porter and Dessai 2017;Lemos et al. 2018;Mach et al. 2020), we focus this paper on not just the knowledge outcomes from the effort (i.e., the decision-relevant metrics), but also on how the metrics evolved iteratively through multiple engagements over the course of a year. The rest of the paper details the boundary spanning and engagement strategies that enabled the project to overcome institutional and epistemological barriers, and allowed a shared understanding across professional communities to emerge.
Project Hyperion and the process of co-production Project Hyperion is a basic science project that aims to advance climate modeling by evaluating regional climate datasets for decision-relevant metrics. While there has been an explosive growth in the number of regional climate datasets available to users, there is limited understanding of the credibility and suitability of these datasets for use in different management decisions (Moss et al. 2019;Barsugli et al. 2013;Jones et al. 2016;Jagannathan et al. 2020b;VanderMolen et al. 2019). Hyperion aims to address this need by developing comprehensive assessment capabilities to evaluate the credibility of regional climate datasets, understand the processes that contribute to model biases, and improve the ability of models to predict management relevant outcomes.
Since decision-relevance is a core motivation for the project, Hyperion is designed on the principles of co-production. The project brings together scientists from nine research institutions with managers from 12 water agencies in four watersheds: Sacramento/San Joaquin, Upper Colorado, South Florida, and Susquehanna. In addition, the project structure explicitly allows for both the groups to co-develop the science plan and research questions, in addition to co-producing the science itself. The scientists include atmospheric and Earth system scientists as well as hydrologists. The water managers, depending on the agency, have functions including planning, operating and managing water quality, water supply, stormwater management, flood control, and water infrastructure design. These water managers have high levels of technical expertise in engineering, hydrology or other sciences, and were purposefully selected because of their interest in the project concept and their willingness to dedicate time to the engagement efforts. In addition, the project team for Hyperion includes three dedicated "boundary spanners" (including two of the authors), i.e., people whose primary role is to facilitate and mediate the scientist-water manager boundary.
In this paper we focus on Phase 1 of the project and describe how decision-relevant metrics in each of the study regions were co-produced by this group. From the water managers' perspective, such metrics quantitatively describe climatic phenomena that are directly related to practical management problems; changes in these quantities would necessitate shifts in water infrastructure planning and operations. From the scientists' perspective, these metrics can be used to test model fidelity for decision-relevant phenomena and hence push model development and scientific inquiry in more use-inspired directions. To identify these metrics, a series of iterative engagement methods were used. Structured engagement methods included workshops, remote and in-person focus-group discussions, and quarterly project update calls. There were also continual less-structured, informal conversations between scientists, managers, and boundary spanners over phone calls or emails. Approval from Lawrence Berkeley Laboratory's Human Subjects Committee Institutional Review Board was obtained for key engagements. The timeline of engagement activities, along with goals and milestones at each stage, is presented in Fig. 1.

The role of boundary spanners
The boundary spanners in Project Hyperion had varying degrees of social science, climate science, and adaptation expertise; they also had prior experience in co-production and similar participatory research activities. It is generally acknowledged that boundary spanners are necessary for the translation of jargon and assumptions among different actors and across epistemic divides (Bednarek et al. 2016b;Kirchhoff et al. 2013b;Cash et al. 2003). At the same time, the literature recognizes that this role is challenging in practice (Bednarek et al. 2018;Safford et al. 2017) and that the functions and attributes of effective boundary spanning are not well understood (Goodrich et al. 2020;Bednarek et al. 2016a).
The challenges of boundary spanning are often discussed in instances where actors are resistant to crossing epistemic boundaries or "compromising" their expertise (Cash et al. 2003). In Hyperion, most of the water managers wanted to incorporate climate change information in their decisions, and most scientists were committed to developing decision-relevant science. This collective goodwill notwithstanding, several rounds of deliberations were needed to mediate differences in incentives and priorities, and to translate the water managers' needs into quantitative metrics and scientific research questions. The boundary spanners needed to actively ensure that feedback from both groups was not just heard and documented, but also incorporated into the overall science plan for the project.
The mediation of the scientist-manager boundary to arrive at actionable rainfall metrics illustrates these tensions and also their eventual resolution. Several of the managers wanted information on intensity-duration-frequency (IDF) curves for rainfall events (Srivastava et al. 2019) that formed the basis of their flood-related decisions. The scientists, based on their expertise and modeling capabilities, prioritized metrics such as frequency and intensity of specific storm events (e.g., tropical cyclones) and associated rainfall. While these storm metrics were related to decision-relevant rainfall quantities, they were often one step Fig. 1. Co-production process and timeline summarizing key engagement activities over the course of a year, along with the most important outcomes at each stage (depicted by the blue document icon). "Sci" refers to scientists, "WM" refers to water manager, and "HC ph." refers to hydroclimatic phenomena. For details of each of these activities, please see the supplement. There was constant boundary spanning work during and between each of these activities.
"upstream" (in both the hydrological and metaphorical senses) of what the water managers wanted for detailed planning. The upstream metrics represented drivers of phenomena of interest rather than the decision-relevant phenomena themselves. Recognizing this tension, the boundary spanners worked with the group to co-create a shared understanding of the term "metric." We introduced a hierarchical framework that distinguished decision-relevant from upstream metrics, illustrating the overlaps and linkages between the two, and showing how both types of metrics could fit within the project's larger goals. With the explicit linking of metric types, managers could better appreciate the scientists' focus on upstream storm metrics for modeling causal processes that could eventually make IDF predictions more accurate. Scientists saw why it was necessary to include the metric of interest to managers, i.e., IDF curves, in the science plan, and how linking their storm metrics with IDF results added to the novelty and impact of their efforts.
This and similar resolutions were highly dependent on the presence of a boundary spanner with domain expertise in climate modeling. While the literature recognizes the importance of "background and experience" in the subject matter (Safford et al. 2017;Meadow et al. 2015;Bednarek et al. 2016b), there is, we would argue, less appreciation of the technical expertise required to execute techno-scientific translations (Bednarek et al. 2018). For our project, having a boundary spanner who was also a modeler proved essential. Given the aims of Hyperion, many boundary functions toward the later stages of the project needed in-depth (and often painful) discussions on model parameters, types of simulations, decision-relevant thresholds, statistical measures of model performance, etc., which were beyond the technical capacities of the non-modeler boundary spanners (Fig. 2). In hindsight, we believe that a boundary spanner with expertise in water management could have been equally beneficial, and may have augmented our eventual list of metrics. Overall, we found that, depending on the nature of what is being co-produced, boundary spanners need considerably higher levels of domain expertise than is generally acknowledged in the literature.
Direct and indirect approaches to "making" metrics A common approach to user needs assessments in conventionally designed as well as coproduction projects is to directly ask decision-makers for the types of information they want (Hudlicka 1996;Briley et al. 2015). This approach is based on the prevalent assumption that decisionmakers not only know the climatic metrics they want, but are also able to articulate their knowledge in response to direct questions (Hudlicka 1996). Neither of these assumptions is true for every engagement. We found that determining the quantitative details of decision-relevant information required both direct and indirect approaches. We did explicitly ask managers to identify any metrics for which they required projections, and this direct approach was partially successful. But it put the onus of metric identification on the water managers, who did not always know what to ask for or what the scientists had to offer by way of quantification. For example, the direct approach revealed water supply and floods as key climate-related management issues in California, with snowpack, snowmelt, streamflow, dry spells, and rainfall as hydroclimatic phenomena of interest. But managers were not used to translating these phenomena into tractable parameters or thresholds (Briley et al. 2015;Hackenbruch et al. 2017).
We therefore supplemented the direct approach with an indirect approach that assumed that relevant knowledge cannot be revealed by direct questions, but needs to be extracted through more open-ended scenario analysis and contextual inquiry. Although such discussions are a time-intensive way to access internal knowledge structures (Hudlicka 1996), combining direct and indirect conversational methods have been shown to be an effective way of eliciting user needs (Zhang 2007). This indirect approach is used in software development for user requirements engineering (Hudlicka 1996;Zhang 2007), but is not commonly used in the co-production or actionable environmental science literatures. Partly guided by research on tacitly held knowledge, and partly through trial and error, we developed four indirect strategies that enabled scientists and water managers to collaboratively identify decision-relevant metrics.
1) Developing hierarchical frameworks: There was often confusion among scientists and managers on how specific a metric needs to be to have an unambiguous interpretation from a modeling perspective. For example, in the initial engagements, the whole group understood "peak streamflow" or "flooding" to be potential metrics. However, when modeling methods were being developed, the scientists had questions as to what peak might mean or how flooding was defined by the managers. Further direct questions that probed the managers for "more specific" metrics were unsuccessful in eliciting the details that scientists were looking for. At the same time, scientists were not able to clearly articulate what constituted an unambiguous metric. To resolve this stalemate, the boundary spanners asked the scientists to provide examples of what might constitute a specific metric for their modeling exercises. The group then decided to contextualize metrics by developing a hierarchical framework: a management issue came first, then the hydroclimatic phenomena related to the issue, then the aspects of each phenomenon that were of most relevance to the water managers, and finally a tractable metric for each aspect (Fig. 3) (see also Maraun et al. 2015). For Hyperion, the hierarchy represented a logical framework that helped us to understand that peak streamflow could have varied interpretations for modeling; it could be daily maximum flow, or the high end of streamflow distribution, or values above certain thresholds. Each interpretation represented a very different metric with unique results. Through the framework we collectively understood that peak streamflow was best characterized as an "aspect" of a hydroclimatic phenomenon, and one step ahead of being an unambiguous metric, which required further quantitative details describing the characteristics of the peak that were important to managers. 2) Starting from the planning challenge/goal rather than the science question: A focus on current and future planning challenges or goals as they related to different hydroclimatic phenomena was a productive path toward metric identification. For example, when asked about planning goals with respect to streamflow quantity, some managers suggested that the aim was to have a full reservoir on 1 July. Through this exchange we identified cumulative runoff on 1 July as a decision-relevant metric. Another discussion centered on recent climate-or weather-related planning challenges (such as Hurricane Irma, or the Oroville Dam failure) in the managers' regions. One of the managers discussed an ice-jam-related flooding event and described how warm temperatures and heavy rain conditions in early spring caused the snow to melt rapidly, leading to flooding. This prompted a collective discussion about whether frequency of rain-on-snow events and the associated runoff could be an actionable metric to help anticipate and manage such events. These results support recommendations from other studies that also suggest starting the co-production process from the management goal rather than from a scientific "puzzle" (Beier et al. 2017;Kolstad et al. 2019). 3) Collaboratively exploring the planning relevance of new models, tools, or datasets: It is often assumed that practitioners are mainly interested in pragmatic solutions and may be less open to exploring novel models and tools (Vogel et al. 2016). However, in Hyperion, collaboratively and critically examining whether and how new models, datasets or tools could be relevant to managers' contexts, proved to be a productive strategy for identifying metrics. For example, one of the scientists sought the water managers' opinion on a new type of satellite data on terrestrial water storage (TWS) that had the potential to aid in flood/drought prediction. Managers responded that their agencies mainly used 10-yr groundwater (GW) baseflow as a key metric for drought predictions, but that it was not easy to collect data for computing GW baseflow. They were interested in alternatives to this metric, whereupon the scientist explained that new findings suggested that TWS can be a good predictor of GW flow (in some regions). The group collectively agreed that both TWS and 10-yr GW baseflow would be good metrics, and that TWS would be explored as a potential proxy or upstream metric to GW baseflow. 4) Using analogies for "good" metrics: Finally, some of the new metrics identified in our project came from discussions of other good metrics. For example, one well-received set of metrics  .g., flooding), and moves to the "hydroclimatic phenomenon" related to the issue (e.g., precipitation is a hydroclimatic phenomenon related to the issue of flooding), and then to the "aspect of the phenomenon" that is of specific interest for the management decision (e.g., extreme precipitation is the aspect of precipitation that is of specific management interest).

Fig. 4. Examples showing the evolution of decision-relevant metrics. (a)
The evolution of the metric that represents the 3-yr critical duration of October-March high flows at a 10-yr recurrence interval. The initial direct identification approach gave a broad understanding of the importance of runoff for nutrients and sediments, and then a discussion of runoff-based planning led to identifying hydrologic extremes as one of the important components of runoff. Using the hierarchy (Fig. 3), we came to understand that "extremes" were an "aspect of phenomenon," and we probed further to find that extremes actually meant flows above certain thresholds. We derived the final unambiguous metric at the next iteration, where we interrogated the types of exceedance thresholds that impact water quality management in the region. (b) The making of a rainfall metric. First, the direct approach highlighted that changes in rainfall patterns were an important challenge for the region. In the next two iterations, which also used direct engagements, we identified the specific aspects of rainfall that were of importance. Finally, with the analogy of the "good metrics" of the SWE triangle, we identified "rainfall geometry" as a promising concept for additional decision-relevant metrics.
was visualized through the "snow water equivalent (SWE) triangle," which uses a fitted triangle to characterize the annual cycle of snow accumulation and melt (Rhoades et al. 2018). The SWE triangle represents a composite of six metrics of management relevance: peak water volume and timing, snow accumulation and melt rates, and the lengths of the accumulation and melt seasons. Each metric is tractable as well as decision-relevant, and the triangle itself presents a visually digestible linear approximation of all six metrics comprising the snow cycle (Rhoades et al. 2018). The water managers thought this was a "nifty" multimetric representation as it allowed for both a comprehensive and an individual examination of the management-relevant components of seasonal snow dynamics.
Their response led to discussions on whether a similar set of metrics describing the annual cycle of rainfall would also be useful. A new composite approach, tentatively termed "rainfall geometry" (to signify whatever geometric figure fits the annual cycle of rainfall in a given location), and which includes the start date of the wet season, peak rainfall, and length of the wet season, was co-developed as a promising multimetric representation of key management-relevant components of rainfall.
Overall, we found that the making of decision-relevant metrics needed an iteratively derived mix of direct and indirect engagement approaches to capture the information needs of the water managers, and to translate them into tractable quantitative metrics for the scientists. Figure 4 shows the evolution of two decision-relevant metrics using different direct and indirect strategies. Table 1 presents examples of the metrics identified in the project (Table ES1 in the online supplement has the full list for all four regions). In some cases, these metrics already existed in other contexts (such as in engineering or hydrology manuals), but had not been recognized Table 1. Examples of decision-relevant metrics for each region, highlighting management issues, hydroclimatic phenomena, aspect of phenomena and then each decision-relevant metric. "CA" refers to the Sacramento/San Joaquin watershed, "CO" is Upper Colorado, "FL" is South Florida, and "SQ" is Susquehanna. The last column also describes some of the potential decisions or uses for these metrics that were identified by the case study water managers. as metrics relevant for climate modeling prior to our co-production process. We also observed that not every identified metric mapped onto a specific management decision. Some metrics, such as deviations from historical mean snowpack, were more useful for understanding the future state of watersheds than for making decisions. The interest in snowpack shows that there are overlaps between upstream and decision-relevant metrics; several water managers were, in fact, interested in understanding upstream processes in addition to working with actionable metrics (Vano et al. 2019). Finally, we found that the relevance of metrics depends on, and evolves with, the availability of climate information. In regions with limited availability of climate data even simple climatic metrics such as monthly or annual runoff were considered relevant enough. In regions with more information such simple metrics were not as useful; managers identified more detailed metrics, such as the runoff associated with highest snowmelt rate, or maximum daily or 3-day flow volumes, as actionable. An analysis of how and why the characteristics of decision-relevant metrics differed among the water management agencies is planned for the next phase of the project.

Discussion and conclusions
In this paper, we open up the black box of co-production and document in detail the strategies that enabled (and did not enable) the creation of decision-relevant science. We illustrate how co-production works in practice by analyzing the numerous back-and-forth collaborative engagements of Project Hyperion, and describing how the science changed and evolved during the process. By describing how climate scientists and water managers (eventually) crossed the boundaries of both mandate and epistemology to co-produce decision-relevant metrics, we add to the sparse literature on "how and when" co-production works. To our knowledge, this is the first study to document in detail the actionable climatic metrics for adaptive water management, and the co-production processes needed to arrive at such metrics. Our outcomes (i.e., the co-produced decision-relevant metrics) can be used as inputs for developing actionable climate science for adaptation in the water sector. Our learnings on engagement approaches provide co-production scholars with insights on how to design and implement productive scientist-decision-maker interactions.
We found that identifying problem-specific climatic metrics is even more iterative, and needs more social and technical negotiations, than is generally implied in the literature promoting co-production. These metrics often represent new scientific directions for the scientists as well as new ways of management for the water managers. The commonly used direct approach to identifying decision-makers' information needs was insufficient for getting at the quantitative details of climatic metrics, even when the decision-makers had high levels of scientific knowledge. We found that the task of translating user needs into quantitative metrics needs the expertise of both resource managers and climate scientists, as well as an enabling process for both groups' knowledge(s) to evolve. Hence, a judicious mix of direct and indirect approaches was needed to "make" these metrics. The indirect methods, in particular, revealed the groups' tacitly held knowledge and allowed a comprehensive set of shared learnings to emerge. Key indirect strategies included developing a hierarchical framework linking management issues with actionable metrics and upstream phenomena; starting discussions from the planning challenges and then moving to the model-specific metrics; collaboratively exploring the planning relevance of new models, datasets, and scientific findings that managers did not yet know about; and using analogies of good metrics from other hydroclimatic phenomena. Eventually, the twin functions of the metrics-of being decision relevant and extending model capability-spoke to both the decision-makers' and the scientists' priorities, and allowed both groups to co-exist within the project. Additionally, the institutionalization of the boundary spanning role, and the domain expertise of at least one boundary spanner (an underappreciated phenomenon in the co-production literature), proved to be crucial for effective transboundary translation.
Although the co-production was time consuming, the richness of our understanding came from analyzing the many iterative back-and-forth engagements, where even the processes that did not fully work were essential to get to the processes that did eventually work. Co-production is often presented as an outcome in itself, rather than as a means to an end (Lemos et al. 2018). This perspective may have its merits, but we argue that the ability to achieve desired outcomes is quite sensitive to how the co-production process is structured and implemented. More critical assessments of specific co-production processes would help to move the practice forward more efficiently, and to meet the growing need for actionable climate science across many sectors of society.