• Bangor, A., P. Kortum, and J. Miller, 2009: Determining what individual SUS scores mean: Adding an adjective rating scale. J. Usability Stud., 4, 114123.

    • Search Google Scholar
    • Export Citation
  • Brooke, J., 1996: SUS: A quick and dirty usability scale. Usability Evaluation in Industry, P. W. Jordan et al., Eds., Taylor and Francis.

    • Search Google Scholar
    • Export Citation
  • Brooke, J., 2013: SUS: A retrospective. J. Usability Stud., 8, 2940.

  • Brugger, J., and M. Crimmins, 2011: Weather, climate, and rural Arizona: Insights and assessment strategies. Technical Input to the U.S. National Climate Assessment, U.S. Global Climate Research Program, Washington, DC, 80 pp.

    • Search Google Scholar
    • Export Citation
  • Dix, A., 2009: Human–computer interaction. Encyclopedia of Database Systems, L. Liu and M. T. Özsu, Eds., Springer, 1327–1331.

  • Dumas, J. S., and J. Redish, 1999: A Practical Guide to Usability Testing. Intellect Books, 404 pp.

  • Faulkner, L., 2003: Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behav. Res. Methods Instrum. Comput., 35, 379383, doi:10.3758/BF03195514.

    • Search Google Scholar
    • Export Citation
  • International Standards Organization, 1998: Ergonomic requirements for office work with visual display terminals—Part 11: Guidance on usability. ISO 9241, 22 pp. [Available online at www.iso.org/iso/catalogue_detail.htm?csnumber=16883.]

    • Search Google Scholar
    • Export Citation
  • Krug, S., 2005: Don’t Make Me Think: A Practical Guide to Web Usability. New Riders Publishing, 195 pp.

  • Krug, S., 2009: Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems. New Riders Publishing, 168 pp.

  • National Research Council, 2010: Informing an Effective Response to Climate Change. The National Academies Press, 348 pp.

  • Nielsen, J., 1997: Loyalty on the Web. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/loyalty-on-the-web/.]

  • Nielsen, J., 2000a: Designing Web Usability. New Riders Publishing, 419 pp.

  • Nielsen, J., 2000b: Why you only need to test with 5 users. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/.]

  • Nielsen, J., 2003: Recruiting test participants for usability studies. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/recruiting-test-participants-for-usability-studies/.]

  • Nielsen, J., 2004: Card sorting: How many users to test. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/card-sorting-how-many-users-to-test/.]

  • Nielsen, J., 2011: How long do users stay on web pages? Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/how-long-do-users-stay-on-webpages/.]

  • Overpeck, J. T., G. A. Meehl, S. Bony, and D. R. Easterling, 2011: Climate data challenges in the 21st century. Science, 331, 700702, doi:10.1126/science.1197869.

    • Search Google Scholar
    • Export Citation
  • Preece, J., Y. Rogers, H. Sharp, D. Benyon, S. Holland, and T. Carey, 1994: Human Computer Interaction. Addison-Wesley Longman, 773 pp.

  • Rood, R., and P. Edwards, 2014: Climate infomatics: Human experts and the end-to-end system. Earthzine, accessed 1 October 2014. [Available online at www.earthzine.org/2014/05/22/climate-informatics-human-experts-and-the-end-to-end-system/.]

  • Sauro, J., 2011: A Practical Guide to the System Usability Scale: Background, Benchmarks and Best Practices. Measuring Usability LLC, 162 pp.

  • Spillers, F., 2009: Usability testing tips. Usability Testing Central, accessed 1 May 2012. [Available online at www.usabilitytestingcentral.com/usability_testing_tips/.]

  • Tullis, T., and L. Wood, 2004: How many users are enough for a card-sorting study. Proc. Usability Professionals’ Association, Minneapolis, Minnesota, UPA, 9 pp. [Available online at http://home.comcast.net/∼tomtullis/publications/UPA2004CardSorting.pdf.]

    • Search Google Scholar
    • Export Citation
  • U.S. Department of Health and Human Services, 2014: What and why of usability; user research basics. U.S. Department of Health and Human Services, accessed 10 January 2012. [Available online at www.usability.gov/.]

  • Virzi, R. A., 1992: Refining the test phase of usability evaluation: How many subjects is enough? Hum. Factors, 34, 457468.

  • View in gallery

    Home page of SCENIC, the website assessed in this study.

  • View in gallery

    Examples of figures shown to participants in part 3, questions 3 and 4, in the SCENIC usability test.

  • View in gallery

    Percentile ranks associated with SUS scores and “letter grades” for different areas along the scale following the standard U.S. A–F grading scale. Scores from each round of testing are displayed as vertical lines on the graph. [Figure adapted from Sauro (2011).]

  • View in gallery

    Adjectives describing a site associated with various SUS scores. Mean SUS score ratings and error bars ±1 standard error of the mean. [Source: Bangor et al. 2009.]

  • View in gallery

    Normalized SUS scores by participant for the first and second rounds of testing on SCENIC. Note that unique participants are used in each round.

  • View in gallery

    Scores from rounds 1 and 2 of testing by question. Scores for each question are out of a total of 20 points. Higher scores imply more favorable responses about the site. Questions refer to those in Table 1, part 2.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 691 295 8
PDF Downloads 276 117 6

Establishing Best Practices to Improve Usefulness and Usability of Web Interfaces Providing Atmospheric Data

View More View Less
  • 1 Western Regional Climate Center, Desert Research Institute, Reno, Nevada
Full access

Abstract

Accessing scientific data and information through an online portal can be a frustrating task, often because of the fact that they were not built with the user’s needs in mind. The concept of making web interfaces easy to use, known as “usability,” has been thoroughly researched in the field of e-commerce but has not been explicitly addressed in the atmospheric and most other sciences. As more observation stations are installed, satellites flown, models run, and field campaigns performed, data are continuously produced. Portals on the Internet have become the favored mechanisms for sharing this information and are ever increasing in number. Portals are often created without being explicitly tested for usability with the target audience though the expenses of testing are low and the returns high. To remain competitive and relevant in the provision of atmospheric information, it is imperative that developers understand design elements of a successful portal to make their product stand out among others. This work informs the audience of the benefits and basic principles of usability that can be applied to web pages presenting atmospheric information. We will also share some of the best practices and recommendations we have formulated from the results of usability testing performed on a data provision site designed for researchers in the Southwest Climate Science Center and hosted by the Western Regional Climate Center.

CORRESPONDING AUTHOR: Nina Oakley, Western Regional Climate Center, Desert Research Institute, 2215 Raggio Pkwy., Reno, NV 89512, E-mail: nina.oakley@dri.edu

Abstract

Accessing scientific data and information through an online portal can be a frustrating task, often because of the fact that they were not built with the user’s needs in mind. The concept of making web interfaces easy to use, known as “usability,” has been thoroughly researched in the field of e-commerce but has not been explicitly addressed in the atmospheric and most other sciences. As more observation stations are installed, satellites flown, models run, and field campaigns performed, data are continuously produced. Portals on the Internet have become the favored mechanisms for sharing this information and are ever increasing in number. Portals are often created without being explicitly tested for usability with the target audience though the expenses of testing are low and the returns high. To remain competitive and relevant in the provision of atmospheric information, it is imperative that developers understand design elements of a successful portal to make their product stand out among others. This work informs the audience of the benefits and basic principles of usability that can be applied to web pages presenting atmospheric information. We will also share some of the best practices and recommendations we have formulated from the results of usability testing performed on a data provision site designed for researchers in the Southwest Climate Science Center and hosted by the Western Regional Climate Center.

CORRESPONDING AUTHOR: Nina Oakley, Western Regional Climate Center, Desert Research Institute, 2215 Raggio Pkwy., Reno, NV 89512, E-mail: nina.oakley@dri.edu

Addressing usability when developing a web portal for data access is relatively inexpensive, increases a site’s use and user satisfaction, and reflects positively on the organization hosting the site.

Atmospheric data and information (hereafter referred to as “information”) are becoming increasingly important to a wide range of users outside of the atmospheric science discipline. These include other scientists (hydrologists, social scientists, ecologists), resource managers, public health officials, farmers, and others (National Research Council 2010; Overpeck et al. 2011). As a result, providers of atmospheric information have a growing obligation to not only provide information (access is usually taxpayer funded), but also make the information easily digestible by the various members of a broadening audience (Brugger and Crimmins 2011; Overpeck et al. 2011; Rood and Edwards 2014). A site developed without the user in mind may prove frustrating or challenging to use (Krug 2005). Assessment of a site’s usability, [the extent to which the site can be used to achieve goals with effectiveness, efficiency, and satisfaction (International Standards Organization 1998)] is a cost-effective way to ensure users can fluidly accomplish intended tasks on a site. To summarize usability as it applies to web design today, Dumas and Redish (1999) offer four points: 1) usability means focusing on users (as opposed to developer/designer needs), 2) people use products to be productive, 3) users are busy people trying to accomplish tasks, and 4) users decide whether a product is easy to use. Building and testing a usable site requires employing these principles as well as following general guidelines that have become standard in the field of usability and human–computer interaction (HCI). Much of the literature assessed in this paper is focused on usability in the practical sense; we will leave the theory of HCI to others (e.g., Preece et al. 1994; Dix 2009). This work also does not approach the topics of accessibility and responsive design.

Here, we take a “small shop” approach to web development. In this case, a research scientist, data analyst, or programmer with no formal training in web design must develop a website to provide atmospheric information. This person generally has some support from his or her research group, but does not have a web development team to work with and must try to apply principles of usability with limited resources. Though this situation is not representative of all groups in the atmospheric sciences, it is the group that is likely the most challenged when tasked to build a usable site. We present the results of usability testing performed on a website providing climate information hosted by the Western Regional Climate Center (WRCC). Additionally, we outline general usability guidelines that are applicable to pages providing atmospheric information and we explain how our test participants perceive and search for climate data. Though this test focuses on station-based and gridded climate data for the elements temperature and precipitation, the results of the testing and general guidelines provided are easily applicable to other types of atmospheric data.

WHY IS USABILITY IMPORTANT WHEN PROVIDING ATMOSPHERIC DATA?

People are very goal driven when they access a website. A usable site design will “get out of the way” and allow people to successfully accomplish their goals in a reasonable amount of time (Nielsen 2000a). Krug (2005) defines a “reservoir of goodwill” that users have when entering a site. Each problem or challenge in using the site lowers the reservoir until it is exhausted and the user leaves the site altogether. It is important to note that each user’s goodwill reservoir is unique and situational; some users are by nature more patient than others and have a larger reserve. Some may have a predetermined opinion about an organization that influences the experience they will have on a site. Nielsen (2011) observes that people are likely to leave a site within the first 10–20 seconds if they do not see what they are looking for or become confused. If a site can convince a user the material presented is valuable and persuade the user to stay beyond the 20-s threshold, then the user is likely to remain on the page for a longer period of time. If principles of usability are not addressed, page visitors are likely to find, or at least search for, another site that makes the information they want easier to access (Nielsen 2000a). Additionally, a successful experience on a website makes people likely to return. In economic terms, loyal users tend to spend considerably more money on a site than a first-time user (Nielsen 1997). In atmospheric science, demonstration of a loyal website following can indicate to supporting agencies that the site provides information that is useful to stakeholders, which may help to secure future resources. Furthermore, having a usable site can make your organization stand out among others.

Compared with other parts of scientific research and data production, usability testing is relatively cheap and very effective. Nielsen (2000b) suggests that performing usability testing with five users per round of testing will uncover approximately 80% of the problems on a website. The usability tests themselves are typically an hour in length and the necessary equipment can often be located within a research institution, keeping technology costs to a minimum.

Another benefit of usability testing is the opportunity to learn about the culture of the intended data users. By watching the target audience for the website perform usability tests, their rules, habits, behaviors, values, beliefs, and attitudes become observable (Spillers 2009). This information can then be applied to future products generated by a research group or organization.

USABILITY TESTING.

The usability of a site is typically evaluated through a formal process called usability testing (Nielsen 2000b; Krug 2005; U.S. Department of Health and Human Services 2014). During a usability test, participants that have been chosen based on some criteria (e.g., users of climate data) are asked to perform specified tasks with little guidance under controlled conditions while one or more facilitators observe. Tests are often recorded for later viewing and analysis. To obtain the skills necessary to perform usability testing, the authors attended a workshop hosted by usability consultants Nielsen-Norman Group (www.nngroup.com). The workshop instructed on the basics of creating a usable website, developing a space for usability testing, facilitating the testing to achieve meaningful results, interpreting test results, and incorporating results into site design. In addition to attending a workshop, there are many texts and online resources that can provide support on how to conduct usability testing (e.g., Krug 2005; Krug 2009; U.S. Department of Health and Human Services 2014).

SITE TESTED: SCENIC.

To investigate how users interact with weather and climate data, we tested a website under development: Southwest Climate and Environmental Information Collaborative (SCENIC; http://wrcc.dri.edu/csc/scenic/; see Fig. 1). SCENIC is designed to serve scientists working for the Department of the Interior Southwest Climate Science Center (SW-CSC) and other such climate science centers. These scientists typically work in the fields of ecology, hydrology, forestry, or resource management. SCENIC acts as an interface to the Applied Climate Information System (ACIS) database, which contains daily climate data for the United States from many networks. Resources available in SCENIC are focused on the Southwest United States though data are available for locations throughout the nation. SCENIC has a wide variety of data acquisition and analysis tools for both gridded and station-based data.

Fig. 1.
Fig. 1.

Home page of SCENIC, the website assessed in this study.

Citation: Bulletin of the American Meteorological Society 97, 2; 10.1175/BAMS-D-14-00121.1

CREATING A USABILITY LABORATORY.

A formal usability test should take place in a usability laboratory. These laboratories may be extremely complex, such that the test takes place in an isolated room while the design team watches remotely via closed-circuit television. Eye or mouse tracking and screen recording software may be utilized as well. We opted for a simple laboratory and used a small conference room for a quiet space, and a computer and full-size screen, mouse, and keyboard. We used Camtasia (by TechSmith; www.techsmith.com/camtasia.html) screen recording software and a microphone to record the screen movements and verbalizations of the study participants. For comfort and ease of use, test subjects were able to work on either a Mac or Windows operating system with the web browser of their choice. Though it is unnatural for a person to be working on a computer while being observed, the goal is to make them as comfortable as possible during the test so they will act as they normally would when using a website and provide realistic feedback on the site’s usability.

During usability testing, a facilitator is used to help guide the participant through each of the tasks. The facilitator does not answer questions about the site or guide the participants in any way; the facilitator serves to prompt the participants to verbalize their thought processes they work through each task. The facilitator takes detailed notes as subjects complete the tasks in each section. The notes as well as video recordings are later reviewed to assess functions of the site that exhibit or lack usability. Clips created from the video taken during this testing can be viewed online (www.dri.edu/scenic-usability-research).

SELECTING AND RECRUITING TEST PARTICIPANTS.

In general, the usability literature suggests that five users will uncover most of the usability issues in a site (Virzi 1992; Nielsen 2000b; Krug 2005; U.S. Department of Health and Human Services 2014). However, Faulkner (2003) points out that it is not all cases where five users will uncover a majority of the issues. Faulkner’s work states that 10 users will uncover 82% of usability problems at a minimum and 95% of problems on average. We chose to test five users in each of two rounds to ensure that we uncovered a large majority of usability issues by all approaches in the aforementioned literature. The usability literature does not recommend any specific number of iterations of testing, though a new round of testing is suggested after major updates to a site (Nielsen 2000b; Krug 2009). We first performed an experimental round of testing utilizing five graduate students in natural resources, hydrology, and geography from the University of Nevada, Reno (UNR). This allowed us to develop our facilitation methods, work out any recording software issues, and refine our general questions about climate data before utilizing professionals. The results of this preliminary testing are not included here except for reporting on the card-sorting activity, for which 15 participants are recommended to achieve meaningful results (Nielsen 2004; Tullis and Wood 2004; U.S. Department of Health and Human Services 2014).

Usability testing yields the best results when the subjects chosen represent the target user group (Nielsen 2003; Krug 2005; U.S. Department of Health and Human Services 2014). As SCENIC is intended to serve SW-CSC scientists, we sought out people working in resource management and ecology who utilize climate data in their work to participate in the study. Our group of participants came from private, state, and federal agencies, including independent consulting firms, the Bureau of Land Management, the Nevada Department of Wildlife, the Great Basin Landscape Conservation Cooperative, SW-CSC, UNR, and the Desert Research Institute.

All participants were informed that they were being recorded and gave their verbal consent to participate in the study. Where regulations allowed, users were compensated for their time with a gift card as suggested in the literature to improve the quality of the participant’s involvement in the testing (Nielsen 2003; Krug 2005; U.S. Department of Health and Human Services 2014). Providing an incentive for participation helps to ensure users are motivated to perform the tasks.

DESIGNING TEST QUESTIONS.

We used both qualitative and quantitative techniques to assess participants’ ability to use SCENIC in a fluid manner. Each test was composed of three portions: a set of three tasks to complete on the website, a standardized usability test, and a set of questions relating to the general use of climate data that were not specific to the site tested (see Table 1 and Fig. 2).

Table 1.

Questions used in the three portions of the SCENIC usability test. In part 3, only questions whose results are discussed in this paper are shown for brevity. Here, NRCC = Northeast Regional Climate Center. NWS = National Weather Service. NRCS = National Resources Conservation Service. NIFC = National Interagency Fire Center.

Table 1.
Fig. 2.
Fig. 2.

Examples of figures shown to participants in part 3, questions 3 and 4, in the SCENIC usability test.

Citation: Bulletin of the American Meteorological Society 97, 2; 10.1175/BAMS-D-14-00121.1

We devised three tasks we expected target users to be able to perform on SCENIC, as Nielsen (2000a) recommends designing a site around the top three reasons a user would visit a site. The tasks were based on the common types of questions asked of WRCC’s service climatologists by members of the target audience. The tasks are given in Table 1. Each task utilizes different capabilities on the site and is achieved through a different set of steps to provide breadth in covering the site’s usability issues. The assessments of this portion of the test were qualitative and involved the fluidity with which the user was able to complete the task and their commentary about the site as they used it.

The System Usability Scale (SUS) is a widely used and reliable tool for measuring the ease of use of a product. It produces valid results on small sample sizes, making it an applicable quantitative tool for this usability evaluation (Brooke 1996; Bangor 2009). The SUS test should be administered immediately after the web-based tasks and before any posttest discussion takes place. SUS is a 10-item questionnaire with five response options for respondents presented in a Likert-type scale from strongly agree to strongly disagree. The SUS test is summarized in Table 1. Half of the questions (1, 3, 5, 7, and 9) are phrased such that they describe the site being evaluated in a positive way and half (2, 4, 6, 8, and 10) portray the site negatively. This design prevents biased responses caused by testers choosing an answer without having to consider each statement (Brooke 2013). An SUS questionnaire is scored by doing the following:

  • Each question is valued between one and five points.

  • For odd-numbered items, subtract 1 from the user response.

  • For even-numbered items, subtract the user response from 5—this scales all values from 0 to 4, with four being the most positive response.

  • Add up these adjusted responses for each user and multiply that total by 2.5. This converts the range of possible values from 0 to 100 instead of from 0 to 40.

Although the scores range from 0 to 100, they should not be considered as a percentage. Instead, SUS scores should be thought of as a percentile ranking that is based on scores from a large number of studies. The SUS score curve and percentile rankings (Fig. 3) are based on SUS testing performed on over 5,000 websites. An SUS score above 68 is considered above average (Sauro 2011).

Fig. 3.
Fig. 3.

Percentile ranks associated with SUS scores and “letter grades” for different areas along the scale following the standard U.S. A–F grading scale. Scores from each round of testing are displayed as vertical lines on the graph. [Figure adapted from Sauro (2011).]

Citation: Bulletin of the American Meteorological Society 97, 2; 10.1175/BAMS-D-14-00121.1

The climate data questions (summarized in Table 1, part 3) asked in this study stemmed from challenges we had internally in naming various items on SCENIC and other sites as well as a general curiosity of how people perceive and search for climate data. Questions 1–4 address naming conventions for various products generated from climate data. Question 5, the card-sorting activity, assesses how our participants search for climate data by having them order cards with various aspects of climate data on them from least to most important (Table 1, part 3, question 6). The last two questions allow users to evaluate SCENIC and provide detailed feedback. In covering the last two questions, we also explained to participants our intended method of answering any of the questions the users struggled with in the first portion of the test. With the exception of the card activity, answers were taken qualitatively and used as anecdotal information rather than concrete research findings, as the sample size (n = 10) of participants was not large enough to produce statistically significant results.

CONDUCTING USABILITY TESTS.

Only one to three testers was assessed each day. The time between tests was used to remove any bugs in the system, in this case referring to errors in code that cause the site to break or perform in a way not anticipated by the developer. This helped to keep the focus of the tests on the design rather than having subjects repeatedly encounter the same bug. One example of this while testing SCENIC was a Chrome browser issue that inserted a drop-down menu arrow into any form element that had an autofill option. The first participant to test on Chrome thought that the choice was from a drop-down menu rather than utilize the autofill option and could not move forward on the task. We viewed this as a Chrome browser issue rather than part of SCENIC’s design, so it was removed between testers within the first round of testing. After removing the arrow, subsequent participants easily utilized the autofill option. Major changes to the site design were made after the first round of testing such that new issues might be uncovered in the second round (Krug 2005).

LESSONS FROM USABILITY TESTING.

General usability guidelines from e-commerce, HCI.

In researching usability, we found a variety of general recommendations for usable web design that we sought to incorporate into SCENIC. The general purpose of following these recommendations is to reduce the cognitive load on the user (Krug 2005). These guidelines do not necessarily relate to the provision of atmospheric data in particular, though we feel they are valuable enough to be listed here. Where applicable, examples from our usability testing are given and video clips are available online (www.dri.edu/scenic-usability-research).

  • Adhere to web conventions—Web page standards include a navigation menu along the top of the page, search bar option near the top, and links presented in a recognizable style (Nielsen 2000a; Krug 2005). In an early version of SCENIC, participants clicked on what appeared to be a link that displayed text on the same page. This confused participants, so the text was changed to a different style and color to avoid confusion with links.

  • Be consistent within a set of pages—The same layout should be maintained from page to page with similar text styling and form layout. This will enable the user to quickly learn how to navigate and use a site (U.S. Department of Health and Human Services 2014).

  • Anticipate that text will not be read—Though it is tempting to provide detailed information and directions to the user, Krug (2005) suggests that people tend to “muddle through” a site rather than read instructive texts. Brief titles and concise labels will be read, but anything more than a few words will likely be overlooked.

  • Provide help texts—We found that after participants had muddled through and failed at accomplishing a task, they were receptive to reading help texts. Make the help text source easy to see and use a question mark symbol or information “i” and ensure the information provided is clear and concise. Our testing revealed that participants who read help texts found the text answered their questions and were likely to utilize help texts in later tasks as well.

  • Reduce options when possible—When presenting the user with a form element, hide options until the user indicates they are needed. Otherwise, the user has to scan and consider all options, increasing cognitive load (Krug 2005). An example of this in SCENIC is that the options for an output file name and delimiter are hidden until the user indicates a preference for outputting data to a file rather than to the screen.

  • Make labels clear and meaningful—Buttons and navigation menus should be labeled with meaningful terms (Nielsen 2000a; Krug 2005). SCENIC’s station finder tool displays stations on a map that meets criteria dictated by the user. The button to show stations meeting criteria was labeled "submit." Two participants in the first round of testing were confused when they made their selections and hit “submit” but did not receive data. Changing the submit action button to “show stations” eliminated this issue in later testing. Two participants in the second round repeated aloud, “show stations,” suggesting they were processing the outcome of clicking the button. Several other buttons were also changed to increase clarity on what clicking the button provides, such as “get data” for a data listing option rather than “submit.”

How did users rate the site?

The average SUS score in the first round of testing was 63, placing the site at a percentile rank of approximately 35% (Fig. 3). Several changes were implemented after the first round of testing to fix bugs as well as usability issues. Scores from the second round of testing increased to 67.5, which falls just below the average of 68 with a percentile rank of 50% (Fig. 3). This indicates the usability of the site increased from the first to the second round of testing, but there is still much room for improvement on the usability of the site. To ascribe an adjective to SCENIC, our participants collectively found the site to be in the “OK” to “good” range (Fig. 4).

Fig. 4.
Fig. 4.

Adjectives describing a site associated with various SUS scores. Mean SUS score ratings and error bars ±1 standard error of the mean. [Source: Bangor et al. 2009.]

Citation: Bulletin of the American Meteorological Society 97, 2; 10.1175/BAMS-D-14-00121.1

Figure 5 shows the overall SUS test scores from each participant for the two rounds of testing. Scores in each round were comparable, with the scores in the second round being slightly higher overall. There are no notable outliers in the dataset that would significantly affect the overall score for each test.

Fig. 5.
Fig. 5.

Normalized SUS scores by participant for the first and second rounds of testing on SCENIC. Note that unique participants are used in each round.

Citation: Bulletin of the American Meteorological Society 97, 2; 10.1175/BAMS-D-14-00121.1

Figure 6 shows the adjusted scores from each question on the SUS assessment. Questions 6, 9, and 10 stood out as showing considerable improvement. Question 6 focuses on consistency between pages. We removed quite a bit of clutter from the pages between rounds 1 and 2 and performed some layout adjustments and improved the instructiveness of the text labels as well. We hypothesize these changes led to the increase in this score. Questions 9 and 10 relate to the participant’s confidence in using the site. This suggests the changes in labeling and improvements in help texts helped increase participants’ confidence in using these pages.

Fig. 6.
Fig. 6.

Scores from rounds 1 and 2 of testing by question. Scores for each question are out of a total of 20 points. Higher scores imply more favorable responses about the site. Questions refer to those in Table 1, part 2.

Citation: Bulletin of the American Meteorological Society 97, 2; 10.1175/BAMS-D-14-00121.1

A different group of participants tested the site in each round, and the site was modified from the first to the second round. It is possible that removal of some usability issues in the first round allowed users to access other challenges in the second round. This, and the characteristics of the individual users in each group, may help to explain why some scores increased and some decreased on each question between rounds.

How people search for data.

The results of the card-ordering activity (summarized in Table 2) revealed that people search for climate information in different ways, though with some consistency. Sixty percent of participants rated “where,” the location of the data, as the first and most important thing they search for when acquiring climate data. In our participant group, 73% ranked the source of the data as least important when accessing climate data. This raises some concern as many in the climate services community feel, from their deep familiarity with data systems, that data source is very important. One participant commented, “I generally trust that the data I am getting is of quality. I may run [quality control] on it myself anyway, so I am not really concerned about the source, just getting the data.” Between these two extremes, responses were fairly spread across when, what, and type in that order with only one or two votes determining the rank. Several participants indicated that their responses may vary depending on the project and we asked them to focus on a current or recent project. Many climate data provision agencies, such as WRCC, provide data organized by network. Our results indicate that source (network) ranks as least important and location as most important to our study participants. It follows that it would be most useful to our target audience to allow data to be selected by region and then by time period. These results are incorporated into SCENIC by offering the spatial option first and foremost with “station finder” map tools. In part 1, task 2, participants were asked to find the record high March temperature at Winnemucca Municipal Airport, in Humboldt County, Nevada. The number of Winnemucca airport entries in the station finder table (one entry for each of several network memberships) puzzled the first user. We updated the site such that each unique station name had a single entry and its networks were grouped together. Subsequent users who utilized the station finder table were able to quickly locate the Winnemucca airport station.

Table 2.

Results of card-sorting activity. Fifteen participants were asked to perform the activity with one abstaining for a total n = 14. Two participants assigned two cards equal weight; thus, a single card may be counted in two categories for an individual participant. Values given are percent of total cards with the number of cards shown in parentheses for each rank and category. The results row gives the final ranking of each card from most important on left (where) to least important on right (source). The highest ranking value in column 3 (when) was already the highest ranking in column 2. Thus, the second highest value in the “what” column is given as the highest ranking for column 3. Table entries in bold indicate the card label that was ranked most frequently in the corresponding column (level of importance to user).

Table 2.

Challenges in labeling.

Comments from our test participants as well as prior experience in delivering climate services via the web at WRCC show that the labeling of links and items on pages providing weather and climate data is one of the greatest challenges to usability. Questions asked to explore labeling and terminology are given in Table 1, part 3. The terms we explore include modeled data, gridded data, tool, product, time series, anomaly map, raw data, and data analysis.

We struggled with the decision of how to title links to gridded data products (data generated by a model and put on a grid) provided through SCENIC. Possible terms included “gridded data,” “modeled data,” or “gridded/modeled data.” Several participants indicated the term "modeled data" was confusing to them and they were not sure what to expect if they were to click on it. One of the most useful responses was, “modeled data is not a very informative term; gridded gives me more useful information about the data.” In light of these responses, we selected the term “gridded data” to use after the first round of testing rather than “gridded/modeled.” All 10 of the participants were able to complete the gridded data question without confusion as to how to access the data, showing the term "gridded data" is a useful indicator.

“Climate anomaly map” and “time series graph” are two terms commonly used to describe graphics that depict climate. All participants readily agreed to "time series graph" as an adequate term and, with some hesitation, were unanimous in their agreement on the use of "climate anomaly maps" as well. These terms were incorporated into SCENIC where appropriate.

“Tool” and “product” are terms commonly used on sites providing weather and climate information [at the time of this writing, regional climate centers (www.ncdc.noaa.gov/customer-support/partnerships/regional-climate-centers), the National Integrated Drought Information System (www.drought.gov/drought/), and the National Climatic Data Center (www.ncdc.noaa.gov), to name a few]. All participants were in agreement that a tool allows the user to perform some sort of action or analysis using data, while a product is static. In essence, a tool creates a product, though only a few participants drew this conclusion. In spite of the general agreement on these terms, using “data tools” on SCENIC did not yield the desired results, leading us to look for a better phrase to guide people to tools that can be used to analyze and summarize the data.

The question on “raw data” yielded a variety of answers. Several participants viewed raw data as a list of data that they could download from a site to use in analyses. They assumed it had already had a quality control (QC) process applied; “raw” implied it was not an average or summary of any sort. Other participants viewed raw data as what came directly from the sensor (as is the standard terminology in climate services) and that may have numerous errors and other issues requiring clean up. We opted to use the term “data lister” because of the differences in the user responses and to be consistent with the phrasing on other WRCC pages.

The greatest challenge test participants experienced in the three tasks we posed to them was efficiently completing web-based task 2, finding the highest temperature ever recorded in March at the airport in Winnemucca. All 10 participants first went to the “historic station data lister” and listed maximum temperature for the station’s period of record. After listing daily data for the station’s period of record, they realized that was not the right way to answer the question and began to search for other options, eventually arriving at the data analysis tools. In the first round of testing, we intended participants to go to the navigation tab, labeled “station data tools,” where there were several tools available that allow participants to answer the question. As this labeling did not prompt participants to click on it, we changed the navigation tab to “data analysis” for the second round of testing. Unfortunately, participants were still not motivated to click on this link to complete the task that required data analysis. We remain challenged to find the best way to prompt people to utilize the variety of analyses we have provided. Interestingly, half of the participants said that when they got to the point of listing the period of record maximum temperature data, they would not have continued to look for analysis tools. They would have pulled the data into analysis software (such as MATLAB or Excel) to obtain the maximum March temperature. These participants said they preferred to do things in this manner, as they may need the data later for other applications. They stated that the analysis tools were “neat” and “good for quick answers.” The result of this piece of the test raises two questions: Does our target audience want analysis tools? If so, how do we advertise the tools and let it be known that they are available?

CONCLUSIONS.

Watching target users complete tasks on SCENIC provided us with valuable information on how people in the target audience, researchers with the SW-CSC, use the site and allowed us to fix a number of roadblocks to usability as well as programming bugs. Results of SUS scores rose from 63 in the first round of testing to 67.5 in the second round, indicating some level of improvement to the site. These scores fall in the “average” range for a website (Fig. 4), indicating there is still considerable progress to be made. We found that while usability testing uncovers usability issues on a site, it is not always clear how to modify the site to remedy these problems. We were not able to rectify all usability challenges in the two rounds of testing on SCENIC, though the testing made us aware that these issues exist and allows us to further work to improve the site.

Performing this testing allowed us to interact with our target audience and ask questions that helped us decide on the naming of certain elements of the site. Though the sample size for these questions (n = 10) is not large enough to be statistically significant, it still provided us with useful insights into how our target audience perceives various terms used frequently in climate data. The card-sorting activity revealed that our participants consistently rank location of data as the most important factor when searching for data and source as the least important. This challenges climatologists with how to provide data in a streamlined manner while still making sure the user is aware of any caveats to the data (which the user may or may not be concerned with).

The challenges and lessons learned presented here are not unique to the climate data presented on SCENIC. Any atmospheric or related data (satellite, air quality, streamflow, etc.) that can be offered over the Internet can benefit from usability testing, though the labeling and terminology challenges will likely be dataset specific. In summation, we suggest the following as best practices for creating web pages that provide climate data as well as other atmospheric data:

  • Usability testing is extremely useful in building this type of site; testing should be done early in the site development process and repeated often.

  • Work should be done with representatives from the target audience rather than office mates or research team members; this yields more meaningful results.

  • The way in which your target audience looks for the atmospheric data provided should be considered and the site should be designed to meet those needs.

  • Choosing labels and names of site elements can be extremely challenging and has a significant effect on the usability of sites providing atmospheric data; terms should be tested early, borrowed from other agency’s sites for consistency, and follow the respective historical terminology where acceptable.

  • Designers should adhere to the general usability guidelines described in the "General usability guidelines from e-commerce, HCI" section of this paper.

DISCUSSION.

We are by no means usability professionals and speak to the readers as fellow atmospheric scientists and programmers attempting to deliver atmospheric data and information to a target audience. From our experience performing usability testing, we highly recommend the process to any group providing atmospheric data online. The knowledge and experience gained from this research will propagate into future work and allow us to build better sites with the end user in mind. There is still much work to be done in the field of creating usable websites for the provision of atmospheric data. Some directions include

  • conducting a larger survey of how people in various audiences look for atmospheric information and how they expect data and tools to be organized;

  • working to achieve greater consistency in the terminology used on websites providing atmospheric information across agencies;

  • conducting research into how to develop effective “help” videos for atmospheric data websites; and

  • sharing results of usability testing in the atmospheric sciences community.

As more data are available online to an increasingly diverse audience, usability becomes more and more essential in the success of a website. We hope that the work presented here will encourage others in the field of atmospheric science to consider the fundamentals of usability when developing sites for accessing and exploring atmospheric data.

As described by the National Research Council (2010), Overpeck et al. (2011), and Rood and Edwards (2014), the future of informatics will be to provide data users with the information necessary to correctly interpret the data applicable to their particular question. Before we can reach this step, it is essential that we can first provide basic data and information via the Internet in a way that is easily utilized by the target audience. Achieving this step will help us move forward to supporting user interpretation of data.

ACKNOWLEDGMENTS

We thank Kelly Redmond, Marc Pitchford, David Herring, and two anonymous reviewers for their helpful feedback and comments. We would also like to thank all of our test participants for making this study possible. This project was supported by competitive award funds furnished by the Desert Research Institute’s Division of Atmospheric Sciences under its Effective Designs to Generate Enhanced Support (EDGES) program.

REFERENCES

  • Bangor, A., P. Kortum, and J. Miller, 2009: Determining what individual SUS scores mean: Adding an adjective rating scale. J. Usability Stud., 4, 114123.

    • Search Google Scholar
    • Export Citation
  • Brooke, J., 1996: SUS: A quick and dirty usability scale. Usability Evaluation in Industry, P. W. Jordan et al., Eds., Taylor and Francis.

    • Search Google Scholar
    • Export Citation
  • Brooke, J., 2013: SUS: A retrospective. J. Usability Stud., 8, 2940.

  • Brugger, J., and M. Crimmins, 2011: Weather, climate, and rural Arizona: Insights and assessment strategies. Technical Input to the U.S. National Climate Assessment, U.S. Global Climate Research Program, Washington, DC, 80 pp.

    • Search Google Scholar
    • Export Citation
  • Dix, A., 2009: Human–computer interaction. Encyclopedia of Database Systems, L. Liu and M. T. Özsu, Eds., Springer, 1327–1331.

  • Dumas, J. S., and J. Redish, 1999: A Practical Guide to Usability Testing. Intellect Books, 404 pp.

  • Faulkner, L., 2003: Beyond the five-user assumption: Benefits of increased sample sizes in usability testing. Behav. Res. Methods Instrum. Comput., 35, 379383, doi:10.3758/BF03195514.

    • Search Google Scholar
    • Export Citation
  • International Standards Organization, 1998: Ergonomic requirements for office work with visual display terminals—Part 11: Guidance on usability. ISO 9241, 22 pp. [Available online at www.iso.org/iso/catalogue_detail.htm?csnumber=16883.]

    • Search Google Scholar
    • Export Citation
  • Krug, S., 2005: Don’t Make Me Think: A Practical Guide to Web Usability. New Riders Publishing, 195 pp.

  • Krug, S., 2009: Rocket Surgery Made Easy: The Do-It-Yourself Guide to Finding and Fixing Usability Problems. New Riders Publishing, 168 pp.

  • National Research Council, 2010: Informing an Effective Response to Climate Change. The National Academies Press, 348 pp.

  • Nielsen, J., 1997: Loyalty on the Web. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/loyalty-on-the-web/.]

  • Nielsen, J., 2000a: Designing Web Usability. New Riders Publishing, 419 pp.

  • Nielsen, J., 2000b: Why you only need to test with 5 users. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/why-you-only-need-to-test-with-5-users/.]

  • Nielsen, J., 2003: Recruiting test participants for usability studies. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/recruiting-test-participants-for-usability-studies/.]

  • Nielsen, J., 2004: Card sorting: How many users to test. Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/card-sorting-how-many-users-to-test/.]

  • Nielsen, J., 2011: How long do users stay on web pages? Alertbox Newsletter, accessed 1 May 2012. [Available online at www.nngroup.com/articles/how-long-do-users-stay-on-webpages/.]

  • Overpeck, J. T., G. A. Meehl, S. Bony, and D. R. Easterling, 2011: Climate data challenges in the 21st century. Science, 331, 700702, doi:10.1126/science.1197869.

    • Search Google Scholar
    • Export Citation
  • Preece, J., Y. Rogers, H. Sharp, D. Benyon, S. Holland, and T. Carey, 1994: Human Computer Interaction. Addison-Wesley Longman, 773 pp.

  • Rood, R., and P. Edwards, 2014: Climate infomatics: Human experts and the end-to-end system. Earthzine, accessed 1 October 2014. [Available online at www.earthzine.org/2014/05/22/climate-informatics-human-experts-and-the-end-to-end-system/.]

  • Sauro, J., 2011: A Practical Guide to the System Usability Scale: Background, Benchmarks and Best Practices. Measuring Usability LLC, 162 pp.

  • Spillers, F., 2009: Usability testing tips. Usability Testing Central, accessed 1 May 2012. [Available online at www.usabilitytestingcentral.com/usability_testing_tips/.]

  • Tullis, T., and L. Wood, 2004: How many users are enough for a card-sorting study. Proc. Usability Professionals’ Association, Minneapolis, Minnesota, UPA, 9 pp. [Available online at http://home.comcast.net/∼tomtullis/publications/UPA2004CardSorting.pdf.]

    • Search Google Scholar
    • Export Citation
  • U.S. Department of Health and Human Services, 2014: What and why of usability; user research basics. U.S. Department of Health and Human Services, accessed 10 January 2012. [Available online at www.usability.gov/.]

  • Virzi, R. A., 1992: Refining the test phase of usability evaluation: How many subjects is enough? Hum. Factors, 34, 457468.

Save