We assess the representation of multiday temperature and rainfall extremes in southeast Australia in three coupled general circulation models (GCMs) of varying resolution. We evaluate the statistics of the modeled extremes in terms of their frequency, duration, and magnitude compared to observations, and the model representation of the midtropospheric circulation (synoptic and large scale) associated with the extremes. We find that the models capture the statistics of observed heatwaves reasonably well, though some models are “too wet” to adequately capture the observed duration of dry spells but not always wet enough to capture the magnitude of extreme wet events. Despite the inability of the models to simulate all extreme event statistics, the process evaluation indicates that the onset and decay of the observed synoptic structures are well simulated in the models, including for wet and dry extremes. We also show that the large-scale wave train structures associated with the observed extremes are reasonably well simulated by the models although their broader onset and decay is not always captured in the models. The results presented here provide some context for, and confidence in, the use of the coupled GCMs in climate prediction and projection studies for regional extremes.
General circulation models (GCMs) facilitate our ability to predict the weather and climate and to project the potential impacts of anthropogenic climate change. The use of GCMs for these purposes is predicated upon their adequate simulation of the climate system. As such, studies have assessed how well GCMs represent various aspects of the climate. A particular focus has been on the representation of single to multiday extremes in GCMs given their large socioeconomic impacts (Screen and Simmonds 2014). In Australia, for example, multiday high temperature extremes (i.e., heatwaves) are described as the deadliest natural disaster due to the number of associated deaths (Coates et al. 2014; Gasparrini et al. 2015). Additionally, past floods, resulting from extreme multiday rainfall events, have cost Australia billions of dollars in infrastructure damage and agricultural losses (Johnson et al. 2016). Beyond their multiday impacts, extremes at this time scale also influence seasonal and annual climate anomalies (Zhai et al. 2005; Grose et al. 2012; Davies 2015). For example, Zhai et al. (2005) found extreme daily rainfall events to account for a large proportion of annual rainfall across China. There is thus a clear need for GCMs to adequately simulate these types of events.
On this front, several studies have focused on the fidelity of GCMs in representing multiday rainfall and temperature extreme indices (e.g., Kharin et al. 2007, 2013; Sillmann et al. 2013). In general, these studies have found that GCMs represent temperature extremes better than rainfall extremes, though regional biases exist. For example, Sillmann et al. (2013) assessed the performance of GCMs in phase 5 of the Coupled Model Intercomparison Project (CMIP5) for their ability to simulate extreme climate indices defined by the Expert Team on Climate Change Detection and Indices (ETCCDI). The ETCCDI cover a range of extreme events from single to multiday rainfall and temperature extremes, providing an indication of the frequency, magnitude, and duration of these extremes. On the global and hemispheric scale, Sillmann et al. (2013) found closer agreement between the CMIP5 GCMs and reference reanalyses for temperature indices and less so for rainfall indices. Kharin et al. (2007, 2013) assessed the representation of present-day return values of global temperature and rainfall extremes in a suite of coupled GCMs contributing to CMIP3 and CMIP5 and found that global warm extremes are generally better simulated than cold extremes, while rainfall extremes in the extratropics are better simulated than rainfall extremes in the tropics. Again on a global scale, Lorenz et al. (2014) assessed the ETCDDI in the Australian Community Climate and Earth System Simulator 1.3b model (ACCESS1.3b), finding good representation of percentile based temperature indices and variable representation of rainfall indices. Both Jiang et al. (2015) and Ou et al. (2013) found that there are regional biases in the representation of extreme rainfall indices over China in CMIP5 models. White et al. (2013) assessed dynamically downscaled climate simulations from six CMIP3 models for their ability to capture a subset of ETCDDI extreme indices for Tasmania, finding high skill in the temperature extremes but some variability in the GCM representation of wet and dry rainfall extreme indices.
Beyond the statistics of extremes, some studies have focused on assessing how well GCMs simulate the atmospheric processes associated with weather and climate extremes. Process evaluation provides insight into why a model is producing the right (or wrong) answers (Sillmann et al. 2017; Maloney et al. 2019), such as why a GCM might or might not be simulating the correct frequency and magnitude of heatwaves. Undertaking process evaluation of a GCM requires an understanding of the relevant processes associated with the extremes of interest in the real world. We have a particular interest in multiday rainfall and temperature extremes in southeast Australia (i.e., Southern Hemisphere extratropics; Fig. 1). Extremes of this type in this region have been associated with quasistationary atmospheric wave trains (e.g., “trains” of alternating pressure and wind anomalies). For example, in the observational record, Tozer et al. (2018) found autumn wet and dry extremes in western and eastern Tasmania (Fig. 1) to be associated with coherent wave train structures in the polar jet waveguide. The events are characterized by the presence of an intense low pressure system (in the case of wet extremes) or block (dry extremes) over Tasmania, which forms part of a broader, hemisphere-wide wave train structure (Tozer et al. 2018).
Several observations-based studies have shown that heatwaves and frosts at different locations across southern Australia are similarly associated with wave train structures in the polar jet (Pezza et al. 2012; Parker et al. 2014a; Risbey et al. 2018, 2019). In particular, Risbey et al. (2018) found summer and spring heatwaves in southeast Australia to be associated with a blocking high pressure system over the region, which forms part of a large-scale wave train. The wave trains typically establish in the Indian Ocean and propagate eastward to the Pacific Ocean sector as the heatwave events progress.
Studies that have assessed the representation of wave trains in GCMs have typically focused on the Northern Hemisphere, with varying results in regard to the GCM ability to simulate these structures. For example, Loikith and Broccoli (2015) assessed historical simulations from a subset of CMIP5 models for their ability to simulate the circulation associated with extreme temperature days over North America, finding the majority of models able to capture the wave trains associated with these events. Teng and Branstator (2017) also show that the large-scale wave trains associated with droughts in California are well represented in the coupled Community Earth System Model (CESM1; Kay et al. 2015) preindustrial control run. Deng et al. (2018) assessed an ensemble member from the Geophysical Fluid Dynamics Laboratory’s (GFDL) global high-resolution atmosphere model (HiRAM) historical simulation and found that it fails to simulate all components of wave trains associated with heatwaves in Eurasia. Beverley et al. (2018) assessed seasonal hindcasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System model and found that a recurring circumglobal wave train pattern, responsible for modulating weather extremes in the Northern Hemisphere (Branstator 2002; Ding and Wang 2005), was not well represented in the hindcasts. Quinting and Vitart (2019) evaluated the representation of wave trains in the Northern Hemisphere in all GCMs in the Subseasonal to Seasonal Prediction Project Database (Vitart et al. 2017). The database provides hindcast and forecast data from a number of GCMs covering a range of time periods, lead times, and varying number of ensemble members. Quinting and Vitart (2019) used a Rossby wave packet tracking method and found that the general characteristics of the wave trains (e.g., frequency, lifetime) are reasonably well represented in all models, though the decay of the waves was poorly resolved in some regions, which they suggest is related to a bias in blocking frequency present in the models.
The representation of individual blocks (i.e., synoptic-scale structures), associated with both regional rainfall and temperature extremes, and which may or may not form part of a broader wave train structure, has also been a focus of process-based GCM fidelity studies. These studies suggest that blocking is currently poorly represented in GCMs, but increased resolution may improve this result (e.g., Scaife et al. 2010; Berckmans et al. 2013; Davini and D’Andrea 2016; Woollings et al. 2018). More specifically, Patterson et al. (2019) assessed several CMIP5 models and found seasonal biases in the simulation of blocking events in the Southern Hemisphere, with too many blocks simulated in winter in some models and too few in summer.
Others have assessed the representation of extratropical cyclones, associated with extreme rainfall, in GCMs. Catto et al. (2010) assessed 100 of the most intense Northern Hemisphere extratropical cyclones occurring in boreal winter within a 50-yr control run of the coupled high-resolution global environment model (HiGEM; Shaffrey et al. 2009). The authors found that the model represents their life cycle and structure well. Conversely, in their assessment of Northern Hemisphere extratropical cyclones in 10 winters in the Goddard Institute for Space Studies (GISS) GCM control run, Bauer and Del Genio (2006) found some deficiencies in the model representation of these structures in that it produced fewer and weaker cyclones than observed.
Our review of literature on the representation of daily to multiday rainfall and temperature extremes in GCMs has revealed the following:
Assessments of these types of extremes in GCMs tend to focus on either the representation of the statistics of the extremes (i.e., extreme indices) or the simulation of the large-scale (i.e., wave trains) and synoptic structures (e.g., blocks, extratropical cyclones) associated with the extremes, not both.
There is limited assessment of the ability of GCMs to simulate the onset and decay of extreme events.
Studies tend to focus on assessing the GCM representation of Northern Hemisphere extremes.
Of the studies focused on extreme indices, we summarize that extreme temperature indices tend to be better represented in GCMs than extreme rainfall indices. Of the process-based studies reviewed, there is some variability in the ability of GCMs to simulate wave train and synoptic structures. This may be driven by the different models, methods, and regions assessed and the types of data used (e.g., control runs vs hindcasts).
We differentiate our study from those discussed above with our Southern Hemisphere and, more specifically, southeast Australian focus. Given that GCMs may have regional biases (Angélil et al. 2016) there is value in undertaking region specific studies. We assess the GCM representation of multiday rainfall and temperature extremes at two locations in southeast Australia: Tasmania and western Victoria. The selection of these locations is motivated by applications in water resources (hydropower generation) and agriculture, respectively, and by the studies of Tozer et al. (2018) and Risbey et al. (2018), which provide an observational basis to this model focused study. We assess the ability of selected GCMs to simulate the frequency, magnitude, and duration of observed heatwaves and wet and dry extremes at the selected locations. We then evaluate the GCM representation of the midtropospheric synoptic and large-scale circulation associated with the identified extreme events. That is, we assess whether land surface extreme events in the models are associated with the same midtropospheric processes as the real world, at least as it is represented by gauged and reanalysis data. Importantly, we also assess the GCM representation of the onset and decay of the extreme events.
a. Observational and reanalysis data
We use reanalysis data from the Japanese 55-year Reanalysis (JRA; Kobayashi et al. 2015), which provides a four-dimensional variational analysis from 1958 to the present at a resolution of 1.25° latitude × 1.25° longitude. Details are presented in Kobayashi et al. (2015). Data from the Australian Water Availability Project (AWAP; Raupach et al. 2006; Jones et al. 2009) represent our observed dataset. AWAP provides interpolated gauged data at a grid resolution of 0.05° latitude × 0.05° longitude.
b. Model data
We use control run output from three coupled GCMs: the Australian Community Climate and Earth System Simulator ESM1.5 (ACCESS-ESM1.5), ACCESS-D, and CanESM2. The ACCESS-ESM1.5 coupled model (Law et al. 2017) is being developed for submission to CMIP6. The ocean, atmosphere, land, and sea ice components of the model are the Modular Ocean Model (MOM4.1) (Griffies 2008), U.K. Met Office United Model (UM7.3), Community Atmosphere Biosphere Land Exchange (CABLE) and CICE4.1, respectively. The model also includes terrestrial biogeochemistry from the Carnegie-Ames-Stanford Approach-Carbon, Nitrogen, Phosphorous model (CASA-CNP) and oceanic biogeochemistry via the World Ocean Model of Biogeochemistry and Trophic-Dynamic (WOMBAT). Forcings are fixed at the year 1850.
The ACCESS-D climate model is a part of the Climate Analysis Forecast Ensemble (CAFE) system (O’Kane et al. 2019) being developed by the Decadal Climate Forecasting Project at CSIRO. The model is based on the GFDL Climate Model 2.1 (CM2.1) (Delworth et al. 2006) and includes ocean (MOM4.1), atmosphere (AM2), land (LM2), and sea ice (SIS) components. Concentrations of atmospheric aerosols and radiative gases, as well as land cover, are based on 1990 conditions (O’Kane et al. 2019).
Note that for both ACCESS-D and ACCESS-ESM1.5, the ocean grid is from the ACCESS coupled model (Bi et al. 2013; Lorenz et al. 2014; Law et al. 2017), which has extra resolution in the Southern Ocean relative to the nominal MOM resolution.
CanESM2 consists of the coupled ocean–atmosphere model, CanCM4 (CanAM4 and CamOM4) (Merryfield et al. 2013), Canadian Land Surface Scheme (CLASS2.7), CanSIM1 sea ice model, ocean carbon model (CMOC), and terrestrial carbon model (CTEM). Forcings are fixed at the year 1850.
CanESM2 only had 40 years of data available for the relevant variables so we limit our analysis to a 40-yr period for each dataset. The atmospheric components of ACCESS-ESM1.5, ACCESS-D, and CanESM2 have resolutions of 1.25° × 1.88° (latitude × longitude), 2.01° × 2.50°, and 2.79° × 2.81°, respectively. Figure 1 indicates the original model grid sizes over our regions of interest. Note that we use the terms “GCM” and “model” interchangeably.
The comparison of extremes in observations and models needs to be conducted using consistent spatial averaging for both. In this study we have regridded all data to a common grid—the JRA grid (1.25° × 1.25°). We regridded the model data (i.e., from their native grid sizes given above) to the JRA grid using bilinear interpolation. We also compared a flux conservation regridding scheme (Jones 1999) and found little difference in the results. We regridded the AWAP data by spatially averaging all AWAP data to the JRA grid. The AWAP data are used as the observed reference for southeast Australian rainfall and temperature, and JRA as the observed reference for atmospheric circulation. We also include rainfall and temperature data from JRA as a secondary reference for the variables. They are not replacements for the AWAP values, but they serve as a first point of departure in comparing model-generated rainfall and temperature with the observed network. We would not expect the extremes generated in each of the GCMs to be any better than could be achieved in a reanalysis product. We selected a JRA grid box in Tasmania and western Victoria (gray boxes in Fig. 1) and extracted rainfall and maximum temperature data from the AWAP, JRA, and model datasets for these grids.
The thresholds used to define extremes need to be extreme enough to be meaningful for users but also moderate enough to provide an adequate sample of events (Hamilton et al. 2012). We use similar methods to those presented in Risbey et al. (2018) and Tozer et al. (2018) to define heatwaves and wet and dry extremes, respectively, for each location. That is, heatwaves are defined as three or more consecutive days above the 90th-percentile maximum temperature, which is calculated for each calendar day based on a moving 15-day window (Perkins and Alexander 2013; Risbey et al. 2018). Extreme wet events are initially identified by selecting any number of days in a row above the 50th-percentile daily rainfall value for a particular season. The total rainfall amount is recorded for each of these events and from these totals, the 95th-percentile event rainfall amount is determined. Only events with a total rainfall amount above this value are considered wet extremes (Tozer et al. 2018). Note that Tozer et al. (2018) tested two additional methods for defining wet extremes (based on single day events and a persistence threshold). These yielded similar results for the circulation composites of the events.
Dry spells are identified as consecutive days below the 1st-percentile daily rainfall amount, where the percentile is calculated across a moving 15-day window. We use a shifting window to account for any seasonal variability; however, it is noted that the threshold value is essentially always zero (it may increase to 0.1 mm in winter). The number of consecutive days required to be classed as a dry spell varies for each location in order to provide adequate sample sizes, consistent with Tozer et al. (2018), who used a shorter persistence criterion for the wetter western Tasmania relative to the drier eastern Tasmania. For the locations assessed here, a threshold of 8 days was selected for western Victoria (i.e., at least 8 days in a row of no rainfall). For the Tasmanian grid box, we identified very few cases of consecutive zero rainfall days, which is driven by the wetter climate in this region and also the area averaged data used, noting that Tozer et al. (2018) used gauged data in their analysis. A suitable dry spell threshold that was both adequately extreme and provided a reasonable sample size was therefore not identified for the Tasmanian grid box.
Heatwaves, wet events, and dry spells were identified for each location (excluding Tasmanian dry spells) for the 40-yr analysis period. We choose to present only a subset of results, however. This subset is selected where there are clear applications. For Tasmania, our application is in hydropower generation and hence rainfall extremes are of particular interest and heatwaves are of less concern. We thus present results for wet extreme events in Tasmania. For western Victoria the application is in agriculture, specifically the grains industry. Both heatwaves and rainfall extremes are of interest for this industry. We choose to present results from our analysis into heatwaves and dry spells. While wet events are similarly of relevance for this location and application we find qualitatively similar results to those presented for wet events in Tasmania.
We thus present three case studies: wet events in Tasmania and heatwaves and dry spells in western Victoria. For each case study we do the following:
Compare the distributions of daily rainfall (or temperature, depending on the extreme type) extracted at the relevant grid box for the AWAP, JRA, and GCM datasets. This gives us a first indication of the similarities (or differences) in the model representation of these variables compared to observations, where observations are represented by the AWAP dataset.
Identify and assess the statistics of the wet, dry, and heat extremes (i.e., frequency, duration, magnitude) in the AWAP, JRA, and GCM datasets. A Kolmolgorov–Smirnoff (K-S) test is employed to assess the statistical similarity between the JRA/GCM and AWAP distributions.
Identify and compare the atmospheric circulation associated with the extremes in AWAP using JRA circulation data, for JRA extremes using JRA circulation data, and for each of the model extremes using the respective model circulation data. The circulation is represented by geopotential height anomalies at 500 hPa (z500). We use composite analysis, which will reveal if there is a common atmospheric structure associated with these extreme events. The start day of each event is designated as “day 0” and the 10 days before and after day 0 are designated as days −10 to 10. Note that we do not show the composites for the JRA extremes with JRA circulation data because they are similar to the ones generated from the AWAP extremes.
Assess the spatial correlation between the model circulation composites relative to the AWAP–JRA-based composite (i.e., our observed combination) across day −5 to day 5 of the events to determine how similar the circulation composites are across the event life cycle. The significance of the correlation values are assessed using a Monte Carlo approach whereby we compare extreme event composites to random composite patterns produced from non–extreme event days. That is, we randomly sample, from a subset of days that does not include the extreme event days, the z500 anomaly field the same number of times as there are events. This process is undertaken 1000 times so that a distribution of mean nonevent z500 anomalies is produced for each grid point across the Southern Hemisphere. The correlation between the random model composite patterns and observed composite patterns is calculated for each sample. From the distribution of spatial correlation values the 95th-percentile correlation value is calculated for each model. The pattern correlations determined for each extreme event day are compared to this 95th-percentile value.
Further explore the temporal sequence and persistence characteristics of the composite atmospheric circulation around the observed and modeled extreme events via Hovmöller plots. We use the Monte Carlo sampling procedure described above (excluding the calculation of spatial correlations) to determine if our composite structures are notable [as in Tozer et al. (2018)].
Compare, using boxplots, how the relevant synoptic structure (e.g., cutoff low) develops across the life cycle of the extreme event in observations and the models. This analysis also provides an indication of the spread of the extreme event composite sample in the observations and models.
4. Wet events in Tasmania
a. The statistics of wet events in Tasmania
We begin by comparing the distributions of daily rainfall across each month in the AWAP, JRA, and model datasets for the Tasmanian grid box (Fig. 2a). The observations indicate a wetter and more variable period from April to October relative to reduced rainfall and variability during the November–March period. The observed rainfall seasonal cycle is simulated well by the JRA and model datasets, although there are some notable differences during the wetter months (April–October). During this period, ACCESS-D shows variability similar to AWAP, but has higher median rainfall. The ACCESS-ESM1.5 and CanESM2 models have similar median rainfall to AWAP but far reduced variability. JRA tends to have both lower median rainfall and reduced variability relative to AWAP.
We compare the number, duration, and magnitude of extreme wet events for all datasets in Figs. 2b and 2c. First, all datasets record a similar frequency of wet events over the 40-yr analysis period (bracketed numbers). For wet event duration (Fig. 2b), the ACCESS-ESM1.5 and ACCESS-D distributions are similar to observed (p value >0.05). This is a notable result given that no persistence threshold was used to define extreme wet events (the threshold was based on magnitude only). One outlier appears to be the JRA dataset, which has shorter wet events with most events 4–5 days in length and a maximum event duration of 13 days. For the other datasets, most events range from 3 to 11 days with maximum event durations of 24, 20, 18, and 22 days for the AWAP, ACCESS-ESM1.5, ACCESS-D, and CanESM2 datasets, respectively.
For observed wet event magnitude (Fig. 2c) most events range between 40 and 140 mm, though some exceed 200 mm, noting that summer wet events are likely to have reduced magnitude relative to winter events. The models underestimate the observed range, with modeled wet extremes typically ranging from 40 to 80 mm in magnitude. The distributions for JRA and all models are significantly different to the observed distribution at the 5% level (p value <0.05).
b. The circulation associated with wet events in Tasmania
In Fig. 3 we present the composite circulation for each model and our observed combination (AWAP and JRA) for autumn wet extreme events for a subset of the event days (days −5, −2, 0, 2, and 5). We present autumn results based on the season of focus in Tozer et al. (2018), noting that they found similar observed circulation structures for other seasons. Autumn rainfall in the region serves as an important precursor to the wetter winter season (e.g., catchment wetting). In Fig. 3, negative z500 anomalies are indicative of low pressure systems (troughs, cutoff lows), and positive anomalies indicate high pressure systems (ridges, anticyclones, blocks) in the midtroposphere. Beyond the visual comparison, we present the spatial correlation between each model and observed dataset.
For the observed circulation composite (Fig. 3a) there is limited coherency in the z500 anomalies at day −5, but by day −2 a clear wave train is evident in the Indian Ocean. This wave train becomes circumglobal around day 0, as a low pressure system shifts over southeast Australia, in concert with high rainfall over the region. The low pressure system persists in place, cut off by the downstream block. Both ACCESS-D and CanESM2 simulate the day −2 wave train structure well, as indicated by the significant pattern correlations. Although the ACCESS-ESM1.5 model also simulates a wave train structure at day −2 the pattern correlation with observed is relatively low. This is likely because the two high pressure systems directly upstream and downstream of the cutoff low pressure system sit farther south than in the observed structure and because of the presence of anomalies in the Pacific Ocean opposite in sign to those evident in the observed case.
All models are able to capture the low pressure system that shifts over southeast Australia around day 0 relatively well (along with the associated high rainfall), although there is some variability in its spatial extent relative to the observed pattern. All models appear to maintain some wave train structure from day 0 to day 2 and to at least day 5 for ACCESS-D and CanESM2. The wave train loses structure in the ACCESS-ESM1.5 model by day 5 where the flow has a more annular appearance. The blocking nodes within the wave trains in the model composites show some differences relative to those in the observed composite in intensity, extent, and position. For example, at day 2, the blocking node just upstream of the cutoff low in the ACCESS-ESM1.5 model extends from northern Australia to Antarctica but has limited longitudinal extent, whereas in the ACCESS-D model, this node has broader longitudinal extent but is shifted farther south.
We further explore the representation of the wave train structures associated with autumn wet events via the Hovmöller plots in Fig. 4. The contours are filtered to show z500 anomalies (averaged between latitudes −40° and −50°) that are significant relative to the 5th- and 95th-percentile z500 anomalies determined through the Monte Carlo analysis described in section 3. The observed Hovmöller plot in Fig. 4a highlights the wave train structure associated with extreme wet events in Tasmania. The low pressure system over the region is well represented in the Hovmöller diagrams for all models. The blocks in the model Hovmöller diagrams are not as well defined as those in the observed panel and therefore the model wave trains lack coherent structure. This highlights that although the models clearly simulate a wave train structure in association with wet extremes (as seen in Fig. 3) there is some variability in the latitudinal position of the wave train relative to observations. That is, some of the nodes that form part of the model wave trains, excluding the low pressure system over Tasmania, sit farther south than those in observations and therefore are not well represented in the Hovmöller plots.
In addition to the mean z500 anomalies presented in Figs. 3 and 4, we assess the distribution of z500 anomalies extracted at the Tasmanian grid box for a selection of days about autumn wet extremes (Fig. 5). This analysis shows how the low pressure system develops over the region, intensifies and decays across the wet extreme events (in terms of pressure variability), and also provides an indication of the within-sample spread of the wet events (Tozer et al. 2018) in the observations and models. On random days, which we assume are represented by day −20 and day 20 (i.e., far from the event in time), we would not expect an organized circulation but rather a well-spread distribution of anomalies centered around zero. As day 0 of the extreme event approaches, one would expect the distribution to narrow (i.e., increased signal-to-noise ratio) and shift to either a positive or negative anomaly depending on the type of extreme being assessed. In this case, wet events are associated with a low pressure system over the region so we expect a negative z500 anomaly. This expected pattern is illustrated in the observations in Fig. 5a whereby on a nonevent day (e.g., day −20) the z500 anomalies are well spread and centered around zero. From 4 days before the wet extreme commences (day −4), the distribution narrows but remains centered around zero. The distribution does not shift negative until around day 2. This fits with the composite circulation in Fig. 3a, which shows that the cutoff low does not completely encompass southeast Australia until day 2. The anomalies return to zero mean by day 20 (i.e., a non-extreme event day). Of interest is that there is relatively large spread in the day-2 observed distribution. One reason for this is that cutoff low pressure systems are subsynoptic structures and therefore likely to have more variability in their timing and persistence from event to event, which is indeed shown in Fig. 5a. Importantly, the model distributions all fit this observed pattern and in particular, they mimic the observed negative distribution shift and increased spread at day 2.
5. Dry spells in western Victoria
a. The statistics of dry spells in western Victoria
We focus now on the representation of dry spells in western Victoria in the JRA and model datasets relative to observations. We present the daily rainfall distribution for the grid box in western Victoria for each dataset in Fig. 6a. In the observed distribution the highest daily rainfalls occur over the winter season, with lower rainfall across the remainder of the year. This observed cycle is well simulated in JRA. ACCESS-D captures the observed winter rainfall well but is too dry across the remainder of the year, noting that for months when there is no box in Fig. 6, the 25th- and 75th-percentile rainfall is zero. The ACCESS-ESM1.5 and CanESM2 models similarly capture winter rainfall well but are too wet for the rest of the year.
Figure 6b compares the frequency and duration of dry spells for all datasets, noting that we do not compare the magnitude of the dry spells given that they are classified based on a zero (or very close to zero) rainfall threshold. The discussion around Fig. 6a above gives some indication of the model ability to capture rainfall magnitude in this region. In regard to dry spell frequency, AWAP, JRA, and ACCESS-D record a similar number of dry spells per year. ACCESS-ESM1.5 and CanESM2 record almost a third less dry spells per year, which fits with the findings from Fig. 6a that these models are wetter than observations in this region. For duration, the JRA and ACCESS-D distributions more closely follow the observed distribution relative to the ACCESS-ESM1.5 and CanESM2. Results of the K-S test indicate, however, that all model and JRA distributions are significantly different to observations (p value < 0.05).
b. The circulation associated with dry spells in western Victoria
We now explore the composite circulation for each model and our observed combination for autumn dry spells (Fig. 7). We focus on autumn given that the absence of rainfall in this season in this region impacts crop establishment (Pook et al. 2009, 2014). We note that in locations like western Victoria, where there are many zero-rain days, long dry sequences could result from a range of atmospheric processes. The compositing process, however, will reveal any consistent processes across dry spell events. Figure 7a indeed suggests there is a consistent large-scale atmospheric structure associated with dry spells in western Victoria. At day −5 there is not yet a clear structure in the polar jet waveguide, but by day −2 the pressure anomalies increase in intensity and a wave train structure is well established. At day 0 a high pressure system, which forms part of the wave train, moves over southern Australia. As the event progresses, the wave train moves into the Pacific sector, with the high pressure system persisting over southeast Australia until at least day 5.
We note first the different number of modeled autumn dry spells compared with observations (indicated in Fig. 7)–ACCESS-ESM1.5 and CanESM2 record around 40% of the number of observed dry spells, while ACCESS-D has around 20% more dry spells. Despite this, dry spells in the models are associated with wave train structures similar to those identified in observations. That is, all models capture the observed composite wave train structure from at least day −2 to day 5 as indicated by the high spatial correlations with the observed composite. We note that there are lower spatial correlations between the model and observed composites at day −5, which is likely because the observed structure is not well established at day −5 (i.e., the models could not be expected to capture an observed pattern that is not well established). We note that all model composites also capture the blocking high pressure system (i.e., synoptic structure) that moves across southern Australia and persists in the region, albeit with varying intensities.
The Hovmöller plots (Fig. 8) clearly show the persistent block occurring over the region during the dry spells and confirm the ability of the models to capture this structure. The broader wave train structure present in the observed Hovmöller (Fig. 8a) is well represented in ACCESS-D (Fig. 8c). For ACCESS-ESM1.5 and CanESM2, while a wave train structure is apparent in the Hovmöller plots, it lacks significant structure (i.e., limited significant contours).
Figure 9 presents the distribution of anomalies for dry spells in western Victoria and can be interpreted in the same way as Fig. 5, though in this case we expect the distribution to shift positive due to the presence of a high pressure system over the region during these events. For observations (Fig. 9a), during the onset stage of the event, the event anomalies are centered around zero. The bulk of the distribution then shifts positive as the event commences and the high pressure system moves over the region. As the event decays, the distribution shifts back to zero. This is the expected behavior; however, we note that the distribution of anomalies does not narrow leading up to the event as expected. This suggests that there is some variability event to event (i.e., within-sample spread), which could be associated with variability in the intensity and location of the high pressure system between events or variability in the exact start day of the events. For example, given their long duration (Fig. 6b), an individual dry spell may feature one or two zero-rain days at the beginning that are not directly related to large-scale processes (Tozer et al. 2018). Whatever the cause, the models show within-sample spread similar to that of the observations and, importantly, follow the observed positive shift of the distribution, confirming that they are able to capture the observed synoptic structure well.
6. Heatwaves in western Victoria
a. The statistics of heatwaves in western Victoria
Figure 10 presents the daily temperature for the grid box in western Victoria distributed across each month. In observations (Fig. 10a), the temperature is warmer between October and March and has greater variability relative to the cooler April–September period. The JRA and model datasets all capture this seasonal temperature cycle and show comparable seasonal variability to the AWAP dataset throughout the year.
We compare the number, duration, and magnitude of heatwave events for all datasets in Figs. 10b and 10c. In general, all datasets record a similar frequency of heatwaves per year over the 40-yr analysis period (bracketed number). Figure 10b indicates that for all datasets, most heatwaves are 3 days long, which is expected given that 3 days is the persistence threshold used to define heatwaves. The JRA, ACCESS-ESM1.5, and ACCESS-D heatwave duration distributions are all similar to observed as indicated by a K-S test (p value > 0.05). The CanESM2 event duration distribution is significantly different to observed at the 5% level (p value < 0.05), with a tendency to have shorter heatwaves.
Figure 10c presents the histograms of mean daily temperature of heatwave events for each dataset. These figures include heatwaves across the year and so include the seasonal cycle (e.g., mean daily temperature in winter heatwaves will be lower than summer heatwaves). It is expected, though, that the models would also be able to simulate the seasonal cycle in extreme events. This expectation is met for at least JRA and ACCESS-D, which have similar heatwave magnitude distributions to observations (p value < 0.05). The ACCESS-ESM1.5 and CanESM2 model distributions are significantly different (p value < 0.05) from observed with a tendency for cooler heatwaves.
b. The circulation associated with heatwaves in western Victoria
In Fig. 11 we present the composite circulation for each model and our observed combination for spring heatwaves. We focus on spring heatwaves because of their potential to negatively impact this important crop growth stage (Risbey et al. 2018).
For the observed composite heatwaves (Fig. 11a), there is an organized wave train in the Indian Ocean 5 days before the start of the event. As day 0 approaches, a blocking high shifts into place over southeast Australia, in concert with positive mean temperature anomalies. The block persists until at least day 5 when it reduces in intensity. This circulation pattern follows the results of other observational studies into the atmospheric dynamics associated with heatwaves in southeastern Australia (Pezza et al. 2012; Parker et al. 2014a,b; Risbey et al. 2018). Importantly, all models capture the observed circulation relatively well with significant pattern correlations between the models and observations from at least day −2 to day 2 of the events. The CanESM2 model appears unable to capture the early onset (day −5) and decay (day 5) of the observed circulation, particularly the large block around South America that builds in intensity as the heatwave persists.
The Hovmöller in Fig. 12 indicates that all models capture the wave train structure well over the heatwave duration, and in particular, the individual high and low pressure systems that form the wave trains, are robust. The significant contours indicate that by chance, we are unlikely to obtain the high-amplitude organized wave train structure of z500 anomalies identified around heatwaves in southeast Australia. Of interest is the reduced intensity and persistence of the blocking node over southeast Australia in ACCESS-D relative to the other datasets. Further review of Fig. 12c indicates that for ACCESS-D, the southeast Australian block does not extend as far south as the corresponding blocking nodes in the other datasets and therefore the maximum intensity and extent of the block does not sit well within the analysis band of the Hovmöller plots (i.e., −40° to −50° latitude). Note that we shifted the analysis band northward (i.e., −35° to −45° latitude) and found the block to be better represented in the ACCESS-D model (not shown). Ultimately it is clear from Figs. 11 and 12 that the synoptic-scale (i.e., the blocking high over southeast Australia) and large-scale circulation associated with spring heatwaves are generally well captured in the three GCMs assessed here.
We assess the distribution of z500 anomalies extracted at the grid box in western Victoria for days around the spring heatwave events (Fig. 13). In the observations (Fig. 5a), the z500 anomaly distribution narrows from day −20 to day 0 and sits around zero, as one might expect. Four days before the wet extreme commences the distribution shifts into positive anomalies. At day 0 the distribution narrows further and shifts completely positive. As the event persists and decays, the distribution widens and shifts back about zero. The distributions of z500 anomalies extracted for each model fit this expected pattern. That is, the variation of the flow through the life cycle of the event is well captured in the models.
7. Discussion and conclusions
Here we have evaluated the representation of extreme multiday wet events, dry spells, and heatwaves, relevant to applications in water resources and agriculture, in two locations in southeast Australia in observations, reanalysis, and three coupled GCM datasets. We had the particular aim to assess how well the models simulated the regional extremes and therefore challenged them to generate the observed statistics of the selected extremes (i.e., frequency, magnitude, and duration) and the associated observed synoptic- and large-scale atmospheric processes, including their life cycle (i.e., onset and decay). In general, we suggest that although not all extreme statistics were well simulated by the GCMs, they captured the associated atmospheric processes relatively well.
In more detail, we found that JRA and the selected models simulated the frequency, magnitude, and duration of heatwaves in western Victoria reasonably well. For rainfall extremes, both JRA and the models in general struggled to simulate the observed statistics of wet events in Tasmania and dry spells in western Victoria and in particular the magnitude of wet events and the frequency and duration of dry spells. We note that these findings are qualitatively similar for heatwaves in Tasmania and wet events in western Victoria (not shown). These results are in line with previous studies that have found extreme temperature indices generally better represented in GCMs than extreme rainfall indices (e.g., Sillmann et al. 2013).
For the process evaluation component of this study, we found that the models capture the observed synoptic structures (i.e., the low pressure system over Tasmania for wet events and the high pressure system over western Victoria for heatwaves and dry spells) very well across the life cycle (onset and decay) of the extreme events. In regard to large-scale processes, in observations heatwaves, wet events, and dry spells in the region occur in association with a coherent wave train pattern in the midtroposphere. Through composite analysis we show that the models are able to simulate a wave train structure in association with these extremes, including the observed persistence and propagation characteristics (e.g., propagation of the structures from the Indian Ocean to Pacific Ocean). This suggests that multiday extremes in the models are associated with similar processes to those in the “real world,” at least as it is represented here. This is despite the fact that the statistics of extremes are not always well simulated in the models (at least in the case of wet and dry extremes). That said, although the models are able to simulate a wave train structure, all features of the observed wave train features are not always well simulated. For example, for wet events, the broader wave train structures in the models appear to sit farther south than those in observations. We note that the onset and decay (e.g., at day −5 and day 5) of the observed wave train structure are also not always well simulated by the models but in general they reasonably simulated the day −2 to day 2 observed structures presented here.
While we did not explicitly assess the influence of model resolution on its ability to capture extremes, it is interesting to note that in our study model resolution did not play an obvious part in a model’s ability or inability to capture the extremes and their associated dynamics. For example, while CanESM2 showed a reduced ability to capture the onset and decay of the circulation associated with heatwaves, it showed an increased ability to capture the circulation associated with wet events compared to the higher-resolution ACCESS-ESM1.5, at least as measured by the composite analysis used here.
The results presented here provide some context for, and confidence in, the use of the coupled GCMs in climate prediction and projection studies for regional extremes, noting that we have not explicitly assessed the predictability of atmospheric processes identified in this study. The study additionally provides an end-to-end process for assessing extremes in GCMs, which can be transferred to other GCMs and regions.
This work was supported by the Decadal Climate Forecast Project and DIGISCAPE Future Science Platform at CSIRO and Hydro Tasmania. CanESM2 data were downloaded from http://climate-modelling.canada.ca/climatemodeldata/cgcm4/CanESM2/piControl/index.shtml.