Abstract

Researchers are producing an ever greater number of web-based climate data and analysis tools in support of natural resource research and management. Yet the apparent absence or underreporting of evaluation in the development of these applications has raised questions as to whether, by whom, and for what they are utilized, and, relatedly, whether they meet the rationale for their development. This paper joins recent efforts to address these questions by introducing one approach to evaluation—developmental evaluation—and reporting on its use in the evaluation of the Southwest Climate and Environmental Information Collaborative (SCENIC). A web interface under development at the Western Regional Climate Center, SCENIC provides access to climate data and analysis tools to environmental scientists in support of natural resource research and management in the southwestern United States. Evaluation findings highlight subtlety in the improvements necessary for ensuring a useful and usable application that could not have been ascertained in the absence of end-user feedback. We therefore urge researchers to systematically evaluate web-based climate data and analysis tools in the interest of ensuring their usefulness, usability, and fulfillment of the proposed rationale. In so doing, we recommend that researchers test and apply established evaluation frameworks, thereby engaging end users directly in the process of application development.

Systematically evaluating web-based climate data and analysis tools can improve their usefulness and usability, and help to ensure that they meet the rationale for their development.

As the influence of climate variability on natural systems has prompted growth in the number of scientists and resource managers seeking climate information to inform their research and decision-making, an ever greater number of web-based climate data and analysis tools have been produced for their use (Barnard 2011; Lourenço et al. 2016; National Research Council 2010; Overpeck et al. 2011; Rossing et al. 2014). Often these are developed with some degree of end-user input but either have not been evaluated or evaluation of them has not been reported upon in the literature, leaving questions as to whether, by whom, and for what they are utilized (Hammill et al. 2013; Swart et al. 2017). Meanwhile, researchers add new applications regularly, often as key project deliverables, leading some to caution against duplication, or “portal proliferation,” for its potential to dilute messaging, and overwhelm and possibly discourage information seekers (Barnard 2011; Brown and Bachelet 2017; Narayanaswamy 2016, p. 69; Rossing et al. 2014). The apparent absence of evaluation in the development of existing applications returns attention to the question of whether publicly funded applied climate science is meeting the rationale for its realization [see Bozeman and Sarewitz (2011), Meyer (2011), and Swart et al. (2017)].

In an effort to reverse this trend, recently researchers have begun to explore different aspects of and approaches to the evaluation of such applications; for example, outlining methods for usability testing (Oakley and Daudert 2016) and examining questions of access and use through the collection of qualitative end-user feedback (Brown and Bachelet 2017; Swart et al. 2017). Findings highlight a clear role for evaluation in uncovering barriers to use—such as confusion in labeling (Oakley and Daudert 2016) and poor documentation of data sourcing (Brown and Bachelet 2017; Swart et al. 2017)—and for aiding researchers in overcoming the same. This paper builds on those efforts by reporting on the developmental evaluation (DE) of the Southwest Climate and Environmental Information Collaborative (SCENIC), which is an ongoing process of evaluation initiated under the usability testing of Oakley and Daudert (2016). A web interface under development at the Western Regional Climate Center (WRCC), SCENIC provides climate data and analysis tools to environmental scientists whose research supports natural resource managers in the southwestern United States. Our aim in reporting on our evaluation of SCENIC is to urge the systematic evaluation of web-based climate data and analysis tools for its potential to increase their usefulness and usability, and with it, the likelihood that they meet the rationale for their development. To achieve this end, we encourage researchers to test and apply established evaluation frameworks wherein they engage end users for their feedback in the process of application development, and introduce DE as one means for doing so.

Following, we provide brief introductions to user-informed approaches to climate science and to DE. We then relate our experience together with some of our findings and lessons learned from two phases of DE applied to SCENIC between 2014 and the present. In concluding, we emphasize the importance of targeted feedback in increasing the usefulness and usability of web-based climate data and analysis tools to environmental scientists and other potential end users.

MOVING TOWARD USER-INFORMED CLIMATE SCIENCE.

Within the last few decades, individual scholars and government and nongovernmental entities have drawn attention to the often limited applicability of publicly funded applied climate science to policy and decision-making [U.S. Government Accountability Office (GAO) 2015; National Research Council 2007], with some asserting that it has failed to meet the rationale for its realization (Meyer 2011). Persistent pressure thus to increase applicability (GAO 2015; Dilling and Lemos 2011; Meyer 2011) has urged transition from a “loading dock” approach wherein researchers produce climate information independently of end users, to more collaborative approaches ranging from contractual to coproduced (Cash et al. 2003; Dilling and Lemos 2011; Lemos et al. 2012; McNie 2007; Meadow et al. 2015). As researchers increasingly have pursued these efforts, they have discovered several points key to making climate information more useful and usable in policy, research, and decision-making contexts, including, for example, its salience, credibility, legitimacy (Cash et al. 2003, 2006; McNie 2007, 2013), and comprehensibility (Dilling and Lemos 2011). That is, climate data and analysis tools must be salient, or relevant, to the context (Cash et al. 2006; McNie 2007), reflecting appropriate spatial and temporal scales (Lemos et al. 2012) and agency or organizational research and decision frameworks (Briley et al. 2015; Vogel et al. 2016). They also must be credible in their accuracy, validity, and quality; legitimate in their transparency and lack of bias (Cash et al. 2006; Cash and Buizer 2005; McNie 2007); and comprehensible in their language and format (Dilling and Lemos 2011).

The contextual specificity of these points suggests they would be difficult to realize in the absence of end-user feedback, yet there is no prescribed level of engagement to ensure their achievement. Rather, this can depend on any number of influencing factors, like the type of research or management question, or the human, financial, and other resources available to support the development process (Meadow et al. 2015). Nonetheless, recently researchers have begun to develop a set of key indicators by which to evaluate different approaches to such engaged research in order to explore which best support the production of useful and usable science (Wall et al. 2017). These indicators reflect consistent levels of end-user participation and feedback as appropriate to the project objectives [Wall et al. (2017); see also Lemos and Morehouse (2005)]. We suggest that where researchers are engaged thus meaningfully with end users, evaluation can be incorporated at one or more iterative stages to assess progress toward and achievement of usefulness and usability. Or, as here, evaluation itself can serve as the medium for engagement, serving as an important guidepost in the development of the data and tools in question.

WHAT IS DEVELOPMENTAL EVALUATION? A BRIEF HISTORY.

Often mistaken as a more recent phenomenon, evaluation has a centuries-long history and is characterized today as a mature field and profession (Hogan 2007; Madaus et al. 1983). Here, we take as point of departure the distinction of “formative” and “summative” evaluation that philosopher Michael Scriven introduced in the late 1960s in order to explain the origins of DE, which emerged in direct response (Patton 1994, 1996). Since the 1960s, this distinction has served as the predominant heuristic device to suggest general differences in evaluation type, based largely on timing and purpose (see Scriven 1967). Simply put, formative evaluation takes place at one or more intervals during the development of the evaluand, be it a policy, program, or product, to assess its progress against predetermined goals and to make modifications as necessary to ensure achievement of the same. Summative evaluation, in contrast, takes place once development of the evaluand is complete, to assess its outcomes against those same goals (Scriven 1993; Stufflebeam 2001). Formative and summative evaluation approaches therefore are designed to answer questions about evaluands supported by relatively stable program theories, that is, theories for how and why implementation of the policy, program, or product will lead to a predicted set of outcomes. Often these are depicted graphically in logic models, or diagrams, showing the relationships between the inputs, activities, and outputs that are intended to produce the desired results (see Funnell and Rogers 2011).

Several variations of and alternatives to formative and summative evaluation have emerged since (Carden and Alkin 2012; Chen 2014; Morell 2010; Patton 2008), among them, DE, which evaluator Michael Quinn Patton developed in the mid-1990s in response to situations in which formative and summative did not fit the development context (Patton 1994, 1996). These were situations in which the evaluand was subject to uncertainty, making it difficult to theorize how the implementation or use of the policy, program, or product would lead to a predicted set of outcomes (Patton 1994, 1996, 2011). Therefore, DE diverges from the more traditional approaches of formative and summative evaluation in which progress and results are measured against predetermined goals, to support innovation and adaptation where multiple paths exist for moving the evaluand forward (Langlois et al. 2013; Patton 2011). Gamble (2008) explains: “Developmental evaluation applies to an ongoing process of innovation in which both the path and the destination are evolving…facilitates assessments of where things are and reveals how things are unfolding; helps to discern which directions hold promise and which ought to be abandoned; and suggests what new experiments should be tried” (pp. 15 and 18). As such, typically DE does not depart from an immutable program theory or logic, but rather one that evolves in step with emergent needs, reflecting creativity in development and response to change (Fagen et al. 2011; Patton 2011; Rogers 2008). All of this is not to suggest that formative and summative evaluation must remain separate from DE, but may in fact integrate with the latter once the evaluand becomes sufficiently stable (Dickson and Saunders 2014; Lam and Shulha 2015); for example, where DE has served in “preformative development” or the readying of an evaluand for traditional evaluation (Fagen et al. 2011; Patton 2011; Honadle et al. 2014). It is also important to note that the methods employed within DE can be any—quantitative, qualitative, or a combination thereof—but the primary criteria for their selection should be whether they are “utilization focused” or “in service to developmental use,” implying that they will elicit information that will inform directly the development of the evaluand (Patton 2011, p. 25). In this sense, researchers may find it helpful but should not limit themselves to explore and apply methods from user experience design theory (UX; Lachner et al. 2016), as relevant.

Finally, situation recognition, that is, matching the type of evaluation to the context and needs of its intended users, is essential to evaluation (Chen 2014; Glouberman and Zimmerman 2002) and so is a core competency for evaluators (Ghere et al. 2006; Stevahn et al. 2005). As Patton (2011) explains: “There is no one best way to conduct an evaluation…The standards and principles of evaluation provide overall direction, a foundation of ethical guidance, and a commitment to professional competence and integrity, but there are no absolute rules that an evaluator can follow to know exactly what to do with specific users in a particular situation” (p. 15). We therefore do not intend to generalize the applicability of DE to all web-based climate data and analysis tools, adding by the same principle that certain evaluation approaches will likely be suited better to certain types of end-user engagement (see Meadow et al. 2015). In the case of SCENIC, we consider DE to be appropriate given our objective to develop the application in step with the emergent and changing needs of the diverse end users whom we engage “consultatively” (Meadow et al. 2015), meaning not continuously but at specific stages of development for input.

CASE STUDY: DEVELOPMENTAL EVALUATION OF SCENIC.

SCENIC serves as an access point for climate data and analysis tools for environmental scientists conducting research in support of natural resource management in the southwestern United States. Available data include daily weather observations from stations across the United States, localized constructed analogs (LOCAs; statistically downscaled climate projections) developed at the Scripps Institution of Oceanography, dynamically downscaled climate data and model outputs from the North American Regional Climate Change Assessment Program (NARCCAP), and historic gridded datasets from the Parameter-Elevation Regressions on Independent Slopes Model (PRISM). SCENIC additionally provides a variety of analysis and visualization tools that may be used to summarize the data available, identify extremes, and generate custom time series graphs. SCENIC also offers a climate dashboard that allows end users to monitor current climate and weather conditions through access to anomaly maps; water, snow, and drought information; ENSO, Arctic Oscillation–North Atlantic Oscillation (AO–NAO), and Madden–Julian oscillation (MJO) updates; and climate outlooks, among other options.

The WRCC initiated development of SCENIC in 2013, at the time relying on decades of cumulative in-house experience and interaction with users of climate information to guide the selection of the data and analysis tools produced. Since then, different combinations of WRCC researchers and programmers have participated in two distinct phases of evaluation. As has occurred elsewhere (Dickson and Saunders 2014; see “retrospective developmental evaluation” in Patton 2011, p. 294), during the first phase the evaluators did not deliberately follow a DE approach though they enacted one by default to their purpose: to develop a suite of climate data and analysis tools in step with the emergent and changing needs of environmental scientists in the U.S. Southwest. It was not until the second and most recent phase that the WRCC formally framed the evaluation of SCENIC as DE.

In the following, we summarize the methods employed in each of those phases and highlight important insights in the development of SCENIC with regard to end-user feedback. We refrain from detailing all results as many were specific to SCENIC and not generalizable. A summary of the highlights combined with lessons learned from similar efforts (Brown and Bachelet 2017; Oakley and Daudert 2016; Swart et al. 2017) is included in Table 1. Examples of the proposed changes to SCENIC following the most recent phase of evaluation are provided in Figs. 1 and 2, which show the pre- and (a mock-up of) postevaluation station finders.

Table 1.

Basic principles of application development and evaluation.

Basic principles of application development and evaluation.
Basic principles of application development and evaluation.
Fig. 1.

SCENIC station finder preevaluation.

Fig. 1.

SCENIC station finder preevaluation.

Fig. 2.

Mock-up of proposed SCENIC station finder postevaluation.

Fig. 2.

Mock-up of proposed SCENIC station finder postevaluation.

Phase 1.

In 2014, as a first phase of evaluation, the WRCC conducted usability testing of SCENIC wherein usability may be understood as “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified use of context” [from the International Standards Organization, as cited in Oakley and Daudert (2016)]. As such, the purpose of this phase was to ensure the presentation of a coherent, user-friendly application to the public prior to its release (Oakley and Daudert 2016). The evaluation team included the primary developer of SCENIC and a WRCC climatologist and social scientist, who together designed the evaluation in consistency with standards in usability and human–computer interaction (see Krug 2014; Nielsen 2000). The evaluation thus consisted of a three-part test administered in person to 10 target end users (see Faulkner 2003). These included environmental scientists and representatives of local and regional government natural resource management agencies who were known or recommended to the WRCC. Such purposive (nonrandom) sampling is common in usability testing given the objective of deriving feedback from target user groups (Krug 2014). Here, we provide only an overview, but a more detailed recounting of the methods and results can be found in Oakley and Daudert (2016).

In brief, during part one, to gauge ease of use, the team asked testers to perform three common tasks (see Nielsen 2000). These were to 1) list data for all stations in Shasta County, California, that recorded snowfall and precipitation for all dates from 15 to 31 December 2013; 2) find the highest temperature ever recorded during March at Winnemucca Municipal Airport, Winnemucca, Nevada; and 3) find the lowest minimum temperature among grid points approximately covering the area of Pyramid Lake, Nevada, during December 2013 [using the Northeast Regional Climate Center (NRCC) interpolated dataset]. The three tasks were developed based on two criteria. The first, whether they were representative of the questions the target audience commonly asks of WRCC service climatologists with respect to other locations. The second, whether they required different user capabilities and modes of access (Oakley and Daudert 2016). Testers performed these tasks in a usability laboratory created on site at the WRCC; a small conference room isolated for quiet and equipped with a computer, full-sized screen, mouse, and keyboard (Oakley and Daudert 2016). Camtasia (by TechSmith; www.techsmith.com/camtasia.html) screen-recording software was used in conjunction with a microphone to record the screen movements and speech of the testers. Analysis was largely observational, and involved the team examining the fluidity, through direct observation and playback, with which end users completed each task.

During part two, the team asked end users to complete a SCENIC-adapted System Usability Scale (SUS), a 10-item Likert-type scale questionnaire used widely for measuring perceived ease of use (Brooke 1996; Bangor et al. 2009). Testers were asked to rate agreement–disagreement over a five-point scale on statements such as “I thought SCENIC was easy to use” and “I think I would need the support of a technical person to use SCENIC.” Following standard SUS analysis procedure, the responses were compiled into a single score, then converted into a percentile ranking for comparison against average usability as determined through SUS testing performed over 5,000 sites (see Sauro 2011).

During part three, the team asked testers to answer a set of questions relating to internal challenges in naming various features of SCENIC (e.g., What would you expect to find if you clicked on a link labeled “climate anomaly maps”?), as well as a general curiosity for how users understand and search for climate data (e.g., What is the difference between a data “tool” and a data “product”?). As in the first test, here the team analyzed the results qualitatively, utilizing them as anecdotal feedback in the refinement of terminology used within the site (Oakley and Daudert 2016).

Following an initial redesign based on the above, and having readied the interface and released SCENIC to the public, later in 2014, the WRCC conducted a brief follow-up via informal in-person interviews with three environmental scientists and known end users of SCENIC on the updated usability, and additionally, on the comprehensiveness of the data and tools provided. Interview questions explored the ease of use with respect to then-current and potential use of SCENIC. The results, reflecting a need for slight additional modification by way of customization, were summarized and prioritized for the incorporation of feedback.

Phase 2.

More recently, in 2017, in recognition that as a publicly available application the user base had expanded beyond the original beneficiaries to include private, nonprofit, and other public sectors (see also Swart et al. 2017), the WRCC conducted a more comprehensive review of the content and performance of SCENIC, this time under the clear designation of DE. Additionally, given that end users do not operate in fixed research and decision-making environments wherein information needs are constant or prescribed, a secondary priority was to identify any significant shifts in known contexts of use that might have prompted new needs. To explore these points, two WRCC social scientists, together with the primary developer of SCENIC and with input from two additional WRCC programmers, designed the evaluation and developed the interview questions. The 10 end users interviewed work either in the university-affiliated research, private industry, nonprofit, government, or university cooperative extension sectors, and were identified through previous correspondence with the WRCC. Specifically, prior to this phase, when end users contacted the WRCC with questions about SCENIC, service climatologists asked whether they would participate in an evaluation if contacted in the future, thereby generating a list of participants for evaluation. Here too, interviewees were sampled purposively (Bernard 2006). Interviews were conducted and recorded via the conferencing service Zoom so that the interviewer and interviewee could interact with SCENIC over a shared screen.

Interview questions were designed to elicit information to aid in the creation of end-user profiles, that is, descriptions of particular end users grouped by similar characteristics (e.g., professional background, objectives, interests, skills), in order to generate insight into who utilizes which features of SCENIC and, relatedly, how best to tailor those features to their skills and needs. Interview questions therefore inquired into an end user’s professional background, relevance of climate to research, management or other work activities, and use and critique of SCENIC. They also inquired into research and decision-making contexts and how and why end users make differential use of the climate data and analysis tools related, for example, to professional background, project or decision type, and changes in climate, environment, policies, and funding. Interviews were transcribed. Following a grounded theory approach (Bernard 2006), we created summaries of responses for individual questions and used those summaries to identify relevant analytic categories (e.g., comparison of SCENIC to similar applications), then reviewed the data within each category to identify patterned themes (e.g., comprehensiveness of data as a primary factor in determining the application selection) within and across end-user profiles. The anonymized results were shared within the team in the form of a report for prioritization and incorporation of feedback.

RESULTS.

During phase 1, the three-part usability testing revealed a primary issue and corresponding solution, thus, as stated, prompting an initial redesign. The issue was a lack of clarity leading to moderate usability (e.g., an SUS score of 67.5, falling just below the average of 68 and equating to a percentile rank of 50%). The solution was simply to follow general usability guidelines more closely. Such guidelines may not have been developed for the presentation of climate data in particular, yet the team found end-user feedback generally to align with basic web conventions like maintaining consistency in standards for layout within and across pages, providing thorough but streamlined help texts, reducing cognitive load by hiding options until needed, and clarity in labeling through use of meaningful terms (see Krug 2014; Nielsen 2000; Oakley and Daudert 2016). Follow-up interviews with environmental scientists after the initial redesign then allowed for refinement of a tested foundation, with this taking the form of customization of the data and tools present. For example, among the refinements was diversification of the means for selecting historic data (e.g., by station or county) to include customization in the way of polygons, either uploaded or drawn, that allow end users to capture data within the boundaries of a research site or management area.

During phase 2, the creation of end-user profiles enabled two important developments. First, it enabled the team to gauge the need for new content through informed conversation with end users; second, it enabled the tailoring of evaluation and product to those respective user groups, or profiles, as relevant. For example, regarding the first, conversation with a long-time end user and biologist at a nonprofit organization for conservation about changes in management concerns revealed emergent use of LOCAs, prompting incorporation of those remaining into SCENIC as a priority for development and, correspondingly, future evaluation. Regarding the second, conversation with engineers revealed the requirements for the presentation of data in specific formats for government approval of privately engineered projects on public lands, extending the need for legitimacy beyond the end user and highlighting for consideration the development of additional output options and the citation of data sourcing to meet that particular subset of needs.

End users also emphasized the value of a responsive “help desk” and its relevance to their preference for and continued use of SCENIC over similar sites. Certainly the absence of a help desk does not preclude continued use of a site by certain end users. It does however rank among the reasons interviewees provided for their desire for a “clearinghouse” or “one-stop shop” approach to the provision of climate data and analysis tools under entities capable of providing such services. Unprompted, interviewees overwhelmingly confirmed the tendency toward “portal proliferation” already reported in the literature (Barnard 2011). In this case, however, they did not comment on the documented potential for such proliferation to dilute messaging, and overwhelm and possibly discourage information seekers (Brown and Bachelet 2017; Narayanaswamy 2016, p. 69; Rossing et al. 2014). Rather, they commented on the disruptiveness of reliance on multiple sites to access the entirety of the data and tools desired. Most interviewees reported to prioritize their use of sites based—in this order—on data available, intuitiveness of operations, and speed of operations, hence ultimately favoring those most comprehensive in content. Yet, as interviewees noted, to prioritize the first of these (comprehensiveness of content) poses a potential challenge to the second (intuitiveness of operations), as the more information present can obfuscate paths for accessing that desired. This relates to three additional points that interviewees emphasized strongly: the importance of 1) succinct but robust on-site informational support (i.e., tutorials, hovers or pop-ups, clarification of acronyms), 2) minimization of changes made to a site once released, and 3) consistency in terminology and order of operations across sites in the absence of a single clearinghouse-style application that meets all needs.

As examples, regarding the first of these, interviewees related the experience of selecting stations for retrieval of historical data in SCENIC to what Barsugli et al. term the “practitioner’s dilemma,” that is, not knowing how to choose an appropriate dataset, assess its credibility, and use it wisely” (Barsugli et al. 2013, p. 424). Specifically, SCENIC provides background information about each station network via hovers, but these lack the detail necessary for end users to understand the differences necessary for making an informed selection, thereby necessitating additional context either within existing hovers or, for example, in a comparative table. The figures of the preevaluation (Fig. 1) and mock-up of the proposed postevaluation (Fig. 2) station finders illustrate this point. Whereas the preevaluation station finder reveals all possible station networks at once with limited detail in hovers, the postevaluation station finder hides them initially and incorporates more detail later so as not to overwhelm.

Regarding the second point, interviewees explained that they tend either to bookmark or download the datasets they use most frequently, and so become frustrated when links are broken due to modifications made to the site, effectively requiring them to relearn how to access the data or otherwise contact the WRCC for help, resulting in either case in a loss of time. Finally, regarding the third point, interviewees conveyed frustration with the terminological and operational discrepancies between sites that require continual transitioning. As an example, choosing a tool then a station to access the historical data in SCENIC, as opposed to the reverse in accessing the same data from the Remote Automated Weather Stations (RAWS) also available through the WRCC.

DISCUSSION.

The user base of any generalized climate data and analysis application is likely to be broad and diverse, and for those reasons somewhat anonymous (Swart et al. 2017). Methods for identifying members of that base include reliance on existing networks and knowledge of potential end users, or correspondence via a help desk as documented here in phase 2. Sampling from a diverse base can allow for a more comprehensive evaluation given that no single end user is likely to utilize all features of a given site, and in fact may bypass most through bookmarking. This tendency implies additional value in the creation of end-user profiles that enable evaluation and tailoring of site development accordingly. The more site developers know about end users and their respective informational needs and uses, as well as research or management contexts, the better they may be able to work prescriptively in identifying what additional content might be of benefit. This generally has been true in the case of SCENIC, wherein the WRCC built the initial application based on decades-long cumulative in-house experience, yet where such subtlety in ascertaining usefulness and usability still could not have been achieved in the absence of end-user feedback. Helpful is a diverse team of researchers that includes social scientists trained in qualitative methods of inquiry for the elicitation of in-depth understanding of context. Equally important is the inclusion of a climatologist who can aid in providing terminology for labeling in the dual interests of descriptive accuracy and consistency in use of terms across similar sites, as appropriate.

Acknowledgment among interviewees of portal proliferation and their stated desire for a clearinghouse-style approach to the provision of climate data and analysis tools brings into question the continued call for these or similar applications as key project deliverables (Swart et al. 2017). This, in turn, encourages exploration of different types of climate and climate-related information used within research and management contexts in the interest of identifying what, in addition to climate data and analysis tools, may have application (e.g., Rich 1997; Wall et al. 2017). Meanwhile, in an effort to stem proliferation, a worthwhile precursor to the development of any application is a thorough inventory of those existent (see also Swart et al. 2017).

Conducting usability testing early in application development is important for the presentation of a coherent, user-friendly product, and also for minimizing the need for subsequent iterative changes that may prove disruptive to existing end users. As stated, here usability testing of SCENIC reiterated the value of following general usability guidelines in web design for the development of climate data and analysis tools. Continued attention to the development of such sites under an expanding user base and new or changing needs presents an impetus for repeated usability testing (Oakley and Daudert 2016). However, it also presents the challenge of balancing end-user demands with the desire for minimal modification. Ample literature exists already that may be utilized to inform usability during initial development (see Oakley and Daudert 2016). As an additional aid, here we provide the beginnings of a basic suite of principles or patterns of effectiveness meant to inform the initial development, evaluation, or subsequent modification of web-based climate data and analysis applications (Table 1).

CONCLUSIONS.

The evaluation of web-based climate data and analysis tools remains either underexplored or underreported, but presents potential for increasing the likelihood that these will be of use to intended end users (Brown and Bachelet 2017; Oakley and Daudert 2016). Here, we report on one evaluation approach by way of our developmental evaluation of the web-based climate data and analysis application, SCENIC. Developmental evaluation (DE) has proven to be a useful approach to our evaluation of SCENIC given our intention to develop the application continuously in step with a growing user base and changes in end-user needs (see Patton 2011). Reiterating that there are many but no best approaches to evaluation (Chen 2014; Glouberman and Zimmerman 2002), we do not intend to convey a general applicability of DE to all web-based climate data and analysis tools, but rather to introduce it as a creative approach to conceptualizing and ultimately framing evaluation within contexts of continual product development (see Patton 2011).

Researchers have noted that end-user engagement can be time and resource intensive (Briley et al. 2015; see Swart et al. 2017). Certainly, the incorporation of evaluation activities into the production of web-based climate data and analysis tools implies additional commitment; however, we consider the benefits to be many and so the commitment is worthwhile. In the case of SCENIC, we employed DE within a process of engagement, wherein we consulted—through evaluation—end users at two specific stages of development for input, these being 1) usability testing and 2) qualitative feedback on the need for additional, or modification of existing, content under an expanded user base. In the absence of this ongoing evaluative process, we would have little insight into the usefulness and usability of SCENIC. For example, we would not have uncovered key semantic (e.g., confusion in labeling) and functional (e.g., diversification of data selection tools) issues inhibiting use, identified the needs of new end users (e.g., additional output options), or kept up with the changing needs of those known (e.g., use of climate projections).

With these insights we hope to encourage end-user engagement through evaluation with the aim of not only producing useful and usable applications, but also of continuing to build upon the suite of basic principles or patterns of effectiveness presented here (Table 1). Such a suite may serve researchers by alerting them to key points, for example, in presentation or design, to address ex ante. Adherence to such principles should not become a substitute for evaluation and so preclude end-user engagement, but would enable researchers to focus more time and attention on the development and evaluation of application-specific content during the same time. Certainly an increase in the evaluation of web-based climate applications will help to ensure their usefulness and usability, while the reporting of evaluation will help to answer compelling questions about whether the various rationales for their development are being fulfilled.

ACKNOWLEDGMENTS

We thank all test participants for generously contributing their time and insights to the development of SCENIC. We also thank the Western Regional Climate Center (WRCC) for its assistance in this project. This material is based upon work supported by the U.S. Geological Survey under Grant G11AC9008 from the Southwest Climate Adaptation Science Center. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the opinions or policies of the U.S. Geological Survey. Mention of trade names or commercial products does not constitute their endorsement by the U.S. Geological Survey.

REFERENCES

REFERENCES
Bangor
,
A.
,
P.
Kortum
, and
J.
Miller
,
2009
:
Determining what individual SUS scores mean: Adding an adjective rating scale
.
J. Usability Stud.
,
4
(
3
),
114
123
.
Barnard
,
G.
,
2011
:
Seeking a cure for portal proliferation syndrome
. Climate and Development Knowledge Network, https://cdkn.org/2011/06/portal-proliferation-syndrome/?loclang=en_gb.
Barsugli
,
J. J.
, and Coauthors
:
2013
:
The practitioner’s dilemma: How to assess the credibility of downscaled climate projections
.
Eos, Trans. Amer. Geophys. Union
,
94
,
424
425
, https://doi.org/10.1002/2013EO460005.
Bernard
,
R. H.
,
2006
: Research Methods in Anthropology: Qualitative and Quantitative Approaches.
AltaMira Press
,
824
pp.
Bozeman
,
B.
, and
D.
Sarewitz
,
2011
:
Public value mapping and science policy evaluation
.
Minerva
,
49
,
1
23
, https://doi.org/10.1007/s11024-011-9161-7.
Briley
,
L.
,
D.
Brown
, and
S. E.
Kalafatis
,
2015
:
Overcoming barriers during the co-production of climate information for decision-making
.
Climate Risk Manage
.,
9
,
41
49
, https://doi.org/10.1016/j.crm.2015.04.004.
Brooke
,
J.
,
1996
:
SUS—A quick and dirty usability scale
. Usability Evaluation in Industry, P. W. Jordan et al., Eds.,
CRC Press
,
189
194
.
Brown
,
M.
, and
D.
Bachelet
,
2017
:
BLM sagebrush managers give feedback on eight climate web applications
.
Wea. Climate Soc
.,
9
,
39
52
, https://doi.org/10.1175/WCAS-D-16-0034.1.
Carden
,
F.
, and
M. C.
Alkin
,
2012
:
Evaluation roots: An international perspective
.
J. Multidiscip. Eval.
,
8
,
102
118
.
Cash
,
D. W.
, and
J.
Buizer
,
2005
: Knowledge-Action Systems for Seasonal to Interannual Climate Forecasting: Summary to a Workshop.
National Academies Press
,
44
pp., https://doi.org/10.17226/11204.
Cash
,
D. W.
,
W. C.
Clark
,
F.
Alcock
,
N. M.
Dickson
,
N.
Eckley
,
D. H.
Guston
,
J.
Jäger
, and
R. B.
Mitchell
,
2003
:
Knowledge systems for sustainable development
.
Proc. Natl. Acad. Sci. USA
,
100
,
8086
8091
, https://doi.org/10.1073/pnas.1231332100.
Cash
,
D. W.
,
J. C.
Borck
, and
A. G.
Patt
,
2006
:
Countering the loading-dock approach to linking science and decision making: Comparative analysis of El Niño/Southern Oscillation (ENSO) forecasting systems
.
Sci. Technol. Hum. Values
,
31
,
465
494
, https://doi.org/10.1177/0162243906287547.
Chen
,
H. T.
,
2014
: Practical Program Evaluation: Theory-Driven Evaluation and the Integrated Evaluation Perspective. 2nd ed.
Sage
,
464
pp.
Dickson
,
R.
, and
M.
Saunders
,
2014
:
Developmental evaluation: Lessons for evaluative practice from the SEARCH Program
.
Evaluation
,
20
,
176
194
, https://doi.org/10.1177/1356389014527530.
Dilling
,
L.
, and
M. C.
Lemos
,
2011
:
Creating usable science: Opportunities and constraints for climate knowledge use and their implications for science policy
.
Global Environ. Change
,
21
,
680
689
, https://doi.org/10.1016/j.gloenvcha.2010.11.006.
Fagen
,
M. C.
,
S. D.
Redman
,
J.
Stacks
,
V.
Barrett
,
B.
Thullen
,
S.
Altenor
, and
B. L.
Neiger
,
2011
:
Developmental evaluation: Building innovations in complex environments
.
Health Promot. Pract
.,
12
,
645
650
, https://doi.org/10.1177/1524839911412596.
Faulkner
,
L.
,
2003
:
Beyond the five-user assumption: Benefits of increased sample sizes in usability testing
.
Behav. Res. Methods Instrum. Comput.
,
35
,
379
383
, https://doi.org/10.3758/BF03195514.
Funnell
,
S. C.
, and
P. J.
Rogers
,
2011
: Purposeful Program Theory: Effective Use of Theories of Change and Logic Models.
John Wiley and Sons
,
576
pp.
Gamble
,
J. A. A.
,
2008
: A Developmental Evaluation Primer.
J. W. McConnell Family Foundation
,
69
pp.
GAO
,
2015
:
Climate information: A national system could help federal, state, local, and private decision makers use climate information
. U.S. Government Accountability Office Rep. 16-37,
53
pp., www.gao.gov/assets/680/673823.pdf.
Ghere
,
G.
,
J. A.
King
,
L.
Stevahn
, and
J.
Minnema
,
2006
:
A professional development unit for reflecting on program evaluator competencies
.
Amer. J. Eval.
,
27
,
108
123
, https://doi.org/10.1177/1098214005284974.
Glouberman
,
S.
, and
B.
Zimmerman
,
2002
:
Complicated and complex systems: What would successful reform of Medicare look like?
Commission on the Future of Health Care in
Canada Discussion Paper 8
, 30 pp., www.alnap.org/system/files/content/resource/files/main/complicatedandcomplexsystems-zimmermanreport-medicare-reform.pdf.
Hammill
,
A.
,
B.
Harvey
, and
D.
Echeverria
,
2013
: Understanding needs, meeting demands: User-oriented analysis of online knowledge broker platforms for climate change and development.
International Institute for Sustainable Development Paper
,
32
pp., www.iisd.org/library/understanding-needs-meeting-demands-user-oriented-analysis-online-knowledge-broker-platforms.
Hogan
,
R. L.
,
2007
:
The historical development of program evaluation: Exploring past and present
.
Online J. Workforce Educ. Dev.
,
2
,
5
, https://opensiuc.lib.siu.edu/ojwed/vol2/iss4/5/.
Honadle
,
B. W.
,
M. A.
Zapata
,
C.
Auffrey
,
R.
vom Hofe
, and
J.
Looye
,
2014
:
Developmental evaluation and the ‘Stronger Economies Together’ initiative in the United States
.
Eval. Program Plann.
,
43
,
64
72
, https://doi.org/10.1016/j.evalprogplan.2013.11.004.
Krug
,
S.
,
2014
: Don’t Make Me Think, Revisited: A Common Sense Approach to Web Usability. 3rd ed.
New Riders
,
214
pp.
Lachner
,
F.
,
P.
Naegelein
,
R.
Kowalski
,
M.
Spann
, and
A.
Butz
,
2016
:
Quantified UX: Towards a common organizational understanding of user experience
. Proc. Ninth Nordic Conf. on Human-Computer Interaction,
Gothenburg, Sweden
,
Association for Computing Machinery
,
56
.
Lam
,
C. Y.
, and
L. M.
Shulha
,
2015
:
Insights on using developmental evaluation for innovating: A case study on the cocreation of an innovative program
.
Amer. J. Eval.
,
36
,
358
374
, https://doi.org/10.1177/1098214014542100.
Langlois
,
M.
,
N.
Blanchet-Cohen
, and
T.
Beer
,
2013
:
The art of the nudge: Five practices for developmental evaluators
.
Can. J. Program Eval.
,
27
,
39
59
.
Lemos
,
M. C.
, and
B. J.
Morehouse
,
2005
:
The co-production of science and policy in integrated climate assessments
.
Global Environ. Change
,
15
,
57
68
, https://doi.org/10.1016/j.gloenvcha.2004.09.004.
Lemos
,
M. C.
,
C. J.
Kirchhoff
, and
V.
Ramprasad
,
2012
:
Narrowing the climate information usability gap
.
Nat. Climate Change
,
2
,
789
794
, https://doi.org/10.1038/nclimate1614.
Lourenço
,
T. C.
,
R.
Swart
,
H.
Goosen
, and
R.
Street
,
2016
:
The rise of demand-driven climate services
.
Nat. Climate Change
,
6
,
13
14
, https://doi.org/10.1038/nclimate2836.
Madaus
,
G. F.
,
D.
Stufflebeam
, and
M. S.
Scriven
,
1983
: Program evaluation. Evaluation Models: Viewpoints on Educational and Human Services Evaluation,
G. F.
Madaus
,
M.
Scriven
, and
D.
Stufflebeam
, Eds.,
Springer
, 3–22.
McNie
,
E. C.
,
2007
:
Reconciling the supply of scientific information with user demands: An analysis of the problem and review of the literature
.
Environ. Sci. Policy
,
10
,
17
38
, https://doi.org/10.1016/j.envsci.2006.10.004.
McNie
,
E. C.
,
2013
:
Delivering climate services: Organizational strategies and approaches for producing useful climate-science information
.
Wea. Climate Soc.
,
5
,
14
26
, https://doi.org/10.1175/WCAS-D-11-00034.1.
Meadow
,
A. M.
,
D. B.
Ferguson
,
Z.
Guido
,
A.
Horangic
,
G.
Owen
, and
T.
Wall
,
2015
:
Moving toward the deliberate coproduction of climate science knowledge
.
Wea. Climate Soc.
,
7
,
179
191
, https://doi.org/10.1175/WCAS-D-14-00050.1.
Meyer
,
R.
,
2011
:
The public values failures of climate science in the US
.
Minerva
,
49
,
47
70
, https://doi.org/10.1007/s11024-011-9164-4.
Morell
,
J. A.
,
2010
: Evaluation in the Face of Uncertainty: Anticipating Surprise and Responding to the Inevitable.
Guilford Press
,
303
pp.
Narayanaswamy
,
L.
,
2016
: Gender, Power and Knowledge for Development.
Routledge
,
270
pp.
National Research Council
,
2007
: Evaluating Progress of the U.S. Climate Change Science Program: Methods and Preliminary Results.
National Academies Press
,
178
pp., https://doi.org/10.17226/11934.
Narayanaswamy
,
L.
,
2010
: Informing an Effective Response to Climate Change.
National Academies Press
,
346
pp., https://doi.org/10.17226/12784.
Nielsen
,
J.
,
2000
: Designing Web Usability.
New Riders
,
432
pp.
Oakley
,
N. S.
, and
B.
Daudert
,
2016
:
Establishing best practices to improve usefulness and usability of web interfaces providing atmospheric data
.
Bull. Amer. Meteor. Soc.
,
97
,
263
274
, https://doi.org/10.1175/BAMS-D-14-00121.1.
Overpeck
,
J. T.
,
G. A.
Meehl
,
S.
Bony
, and
D. R.
Easterling
,
2011
:
Climate data challenges in the 21st century
.
Science
,
331
,
700
702
, https://doi.org/10.1126/science.1197869.
Patton
,
M. Q.
,
1994
:
Developmental evaluation
.
Eval. Pract.
,
15
,
311
319
, https://doi.org/10.1016/0886-1633(94)90026-4.
Patton
,
M.
,
1996
:
A world larger than formative and summative
.
Eval. Pract.
,
17
,
131
144
, https://doi.org/10.1016/S0886-1633(96)90018-5.
Patton
,
M.
,
2008
: Utilization-Focused Evaluation. 4th ed.
Sage
,
688
pp.
Patton
,
M.
,
2011
: Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use.
Guilford Press
,
375
pp.
Rich
,
R. F.
,
1997
:
Measuring knowledge utilization: Processes and outcomes
.
Knowl. Policy
,
10
,
11
24
, https://doi.org/10.1007/BF02912504.
Rogers
,
P. J.
,
2008
:
Using programme theory to evaluate complicated and complex aspects of interventions
.
Evaluation
,
14
,
29
48
, https://doi.org/10.1177/1356389007084674.
Rossing
,
T.
,
A.
Otzelberger
, and
P.
Girot
,
2014
: Scaling up the use of tools for community-based adaptation. Community-Based Adaptation to Climate Change: Scaling It Up, et al., Eds.,
Routledge
, 103–121.
Sauro
,
J.
,
2011
: A Practical Guide to the System Usability Scale: Background, Benchmarks & Best Practices.
Measuring Usability LCC
,
162
pp.
Scriven
,
M.
,
1967
: The Methodology of Evaluation.
Rand McNally
,
140
pp.
Scriven
,
M.
,
1993
:
Hard-won lessons in program evaluation
.
New Dir. Program Eval.
,
58
,
1
107
.
Stevahn
,
L.
,
J. A.
King
,
G.
Ghere
, and
J.
Minnema
,
2005
:
Establishing essential competencies for program evaluators
.
Amer. J. Eval.
,
26
,
43
59
, https://doi.org/10.1177/1098214004273180.
Stufflebeam
,
D.
,
2001
:
Evaluation models
.
New Dir. Eval.
,
2001
(
89
),
7
98
, https://doi.org/10.1002/ev.3.
Swart
,
R.
,
K.
de Bruin
,
S.
Dhenain
,
G.
Dubois
,
A.
Groot
, and
E.
von der Forst
,
2017
:
Developing climate information portals with users: Promises and pitfalls
.
Climate Serv
.,
6
,
12
22
, https://doi.org/10.1016/j.cliser.2017.06.008.
Vogel
,
J.
,
E.
McNie
, and
D.
Behar
,
2016
:
Co-producing actionable science for water utilities
.
Climate Serv
.,
2–3
,
30
40
, https://doi.org/10.1016/j.cliser.2016.06.003.
Wall
,
T. U.
,
A. M.
Meadow
, and
A.
Horganic
,
2017
:
Developing evaluation indicators to improve the process of coproducing usable climate science
.
Wea. Climate Soc.
,
9
,
95
107
, https://doi.org/10.1175/WCAS-D-16-0008.1.

Footnotes

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).