Climate model ensembles, such as the Coupled Model Intercomparison Project, phase 3 (CMIP3), are used to characterize broadscale ranges of projected regional climate change and their impacts. The 17-member Hadley Centre perturbed physics GCM ensemble [Quantifying Uncertainty in Model Predictions (“QUMP”)] extends this capability by including data enabling dynamical downscaling of these ranges, and similar data are now being made available from the CMIP phase 5 (CMIP5) GCMs. These raise new opportunities to provide and apply high-resolution regional climate projections. This study highlights the importance of employing a well-considered sampling strategy from available ensembles to provide scientifically credible information on regional climate change while minimizing the computational complexity of ensemble downscaling.
A subset of the QUMP ensemble is selected for a downscaling program in Vietnam using the Providing Regional Climates for Impacts Studies (PRECIS) regional climate modeling system. Multiannual mean fields from each GCM are assessed with a focus on the Asian summer monsoon, given its importance to proposed applications of the projections. First, the study examines whether any model should be eliminated because significant deficiencies in its simulation may render its future climate projections unrealistic. No evidence is found to eliminate any of the 17 GCMs on these grounds. Second, the range of their future projections is explored and five models that best represent the full range of future climates are identified. The subset characterizes the range of both global and regional responses, and patterns of rainfall response, the wettest and driest projections for Vietnam, and different projected Asian summer monsoon changes. How these ranges of responses compare with those in the CMIP3 ensemble are also assessed, finding differences in both the signal and the spread of results in Southeast Asia.
Increasingly large ensembles of GCM simulations are becoming available to those undertaking climate change impacts studies—these include the Coupled Model Intercomparison Project, phase 3 (CMIP3), multimodel ensemble of 23 GCMs (Meehl et al. 2007); perturbed physics ensembles of varying size, such as those developed by the Met Office Hadley Centre (Collins et al. 2006, 2011) and collaborators (Pall et al. 2011); and, most recently, the CMIP phase 5 (CMIP5) ensemble (Taylor et al. 2012). Those engaged in impacts assessments often require climate information at higher spatial resolution than GCMs can offer, and downscaling by either a statistical or dynamical approach is often used to generate more detail in the projections of future climate. A challenge for those involved in downscaling GCM projections for use in impacts assessments lies in the need to incorporate information from multiple GCMs and multiple downscaling approaches to provide a defendable range of climate scenarios. While some large collaborative intercomparison projects, such as the Ensembles-Based Predictions of Climate Changes and Their Impacts (ENSEMBLES) (Van der Linden and Mitchell 2009), the North American Regional Climate Change Assessment Program (NARCCAP; Mearns et al. 2012), and the Europe-South America Network for Climate Change Assessment and Impact Studies (CLARIS; Menendez et al. 2010), are exploring the nature and magnitude of the uncertainty across different GCM and regional climate model (RCM) combinations, resources and difficulties with obtaining GCM data and interfacing them with the downscaling tool restrict most national climate projection programs and smaller-scale impacts projects to the use of just one or two GCM/downscaling combinations.
In this paper we demonstrate an approach for sampling from an ensemble of GCMs for downscaling that allows us to account for the range of uncertainty implied by the full ensemble while minimizing the computing and postprocessing resources required. We present the subselection from a 17-member perturbed physics ensemble to be downscaled with the Providing Regional Climates for Impacts Studies (PRECIS) regional modeling system to generate climate scenarios for Vietnam (VN). PRECIS (Jones et al. 2004) was developed at the Met Office Hadley Centre as a system for generating high-resolution climate change information for any region of the world. It incorporates an RCM with a simple user interface and is provided, with appropriate scientific and technical training and support, to climate centers in developing countries free of charge. PRECIS has been used in many regions to develop climate change scenarios and explore climate vulnerabilities and impacts [see examples from China, Xiong et al. (2009) and Xu et al. (2006); South America, Marengo et al. (2009); South Asia, UIIslam et al. (2009) and Kumar et al. (2006); and the Caribbean, Campbell et al. (2011)]. A recent development in PRECIS allows users to downscale 17 members of a GCM ensemble derived from perturbing parameters in the model physics of the third climate configuration of the Met Office Unified Model (HadCM3) to explore the implication of modeling uncertainty on the future climate response (Collins et al. 2011, 2006). Prior to this, PRECIS was limited to downscale from a small number of driving GCMs [Hadley Centre Atmospheric Model 3 version P (HadAM3P), ECHAM4, and ECHAM5] from which the 6-hourly instantaneous prognostic variable fields required to drive the regional model have been available. The availability of these data allows users of model projections to move on from using a sample of GCMs that is determined only by availability, to using carefully selected subsets to run “designed” downscaling experiments for their region.
Perturbed physics ensemble (PPE) projections of future climate change (Murphy et al. 2004) have been used increasingly (e.g., Collins et al. 2006; Piani et al. 2007; Stainforth et al. 2005; Murphy et al. 2009) to explore and quantify the range of outcomes that arise due to differences in the formulation between individual GCM simulations as an alternative to the use of the CMIP3 multimodel ensemble (MME). The perturbed physics approach systematically explores the uncertainties within a single model by identifying important but poorly constrained parameters in the model’s formulation and choosing multiple sets of these from expert-defined ranges of plausible values (Murphy et al. 2004). This allows for greater control over the experimental design of an ensemble than when using an MME, allowing for a systematic sampling of sources of uncertainty (Murphy et al. 2004) that creates a more robust basis for interpreting outcomes in probabilistic terms (e.g., in the U.K. Climate Projections report; Murphy et al. 2009). The ranges of both global and regional responses from large PPEs are wider than those produced through analysis of multimodel runs available from the global modeling community (New et al. 2007). In contrast, the MME approach samples uncertainty across GCMs formulated using different physical process representations so that it accounts for a wider range of structural choices in model formulation. Critically for those involved in generating high-resolution regional projections with RCMs, the 6-hourly prognostic variables required to drive RCMs at their boundaries were not distributed as part of the CMIP3 effort. This means that the opportunity to design an RCM experiment that includes a strategic selection of driving GCMs is currently only available for the PPE. Future plans to make 6-hourly prognostic fields available from every GCM included in CMIP5 (Taylor et al. 2012) will provide the opportunity for users of a range of RCMs to undertake multimodel downscaling experiments. This new capability will provide significant potential for generating high-resolution regional projections that account for GCM uncertainty more fully but also raise a number of scientific questions surrounding the role of experimental design in generating those projections, specifically regarding the subselection of driving GCMs to drive RCMs.
We describe here the approach taken in recommending a subset of the 17 members of the Quantifying Uncertainty in Model Predictions (QUMP) PPE for downscaling over Southeast Asia (SEA) at Vietnam’s Institute of Meteorology, Hydrology and Environment (IMHEN). Noting the remit of IMHEN to develop climate change scenarios for use in assessments of climate change risk, vulnerability, and adaptation, we analyze the QUMP GCM simulations to
eliminate ensemble members that perform particularly poorly in simulating the major features of Southeast Asian climate (specifically, the Asian summer monsoon);
select from those remaining a subset that captures a broad range of responses in temperature, monsoon characteristics and precipitation; and
assess how the range of responses in the QUMP ensemble and the selected subset compares with that in the multimodel CMIP3 ensemble to provide context for the interpretation of the range of QUMP projections.
In the remainder of this paper we describe the QUMP PPE (section 2), assess the realism of the QUMP members’ baseline simulations and the range of future outcomes to select 5–7 representative members (section 3), assess the degree to which the subset also represents the range of outcomes indicated by the CMIP3 ensemble (section 4), and then draw conclusions and discuss the role of model selection methodologies in future regional modeling projects (section 5).
2. The “QUMP” 17-member perturbed-physics ensemble
An approach pioneered in the United Kingdom by the Hadley Centre (Murphy et al. 2004) and climateprediction.net (Stainforth et al. 2005) has been the use of ensembles that systematically explore the implication of known uncertainties in model parameters. Parameters are identified that are known to be both uncertain and important for the model response. Large ensembles of these models are run that sample model parameter values across ranges defined via expert elicitation.
The boundary conditions used to drive the PRECIS simulations discussed within this paper are from a 17-member PPE based on HadCM3 (Murphy et al. 2007; Collins et al. 2011). The model HadCM3 comprises the atmospheric model HadAM3 (Pope et al. 2000) coupled to an ocean model (Gordon et al. 2000). The version of HadCM3 used in this study differs from the CMIP3 build of HadCM3, in that it uses flux adjustments [to correct for biases in sea surface temperature (SST) and salinity] and includes the first indirect aerosol–cloud microphysical effect, interactively. The former minimized the impact of biases in the resulting PRECIS boundary conditions and enabled stable simulations to be run with a wide range of physics perturbations despite the nonzero global radiation balance in which they resulted. The particular parameter combinations that make up the 17-member HadCM3 ensemble were drawn from a much larger ensemble of 300-plus members (Webb et al. 2006; Collins et al. 2011). From this larger ensemble, configurations were identified that both compared well with observational metrics and explicitly sample a broad range of climate responses (Collins et al. 2011). These 17 HadCM3-based models are hereon referred to as the “QUMP” ensemble, the name of the project under which they were developed. The individual ensemble members are referred to as Q0–16, where Q0 has the same parameter values used in the standard HadCM3 coupled model. The parameter values used in each member can be found in Collins et al. (2006).
Globally, and for many regions and variables, the range of climate futures projected by the QUMP PPE is equivalent or greater than those based on the CMIP3 MME (Collins et al. 2011). In these cases, boundary conditions from the PPE provide an invaluable tool to assess the implication for current climate model uncertainty on climate projections. In some regions, the projections from the PPE sample a smaller range than the current MME assessed in the Intergovernmental Panel on Climate Change Fourth Assessment Report (IPCC AR4), for example, the range of future precipitation change is smaller than in CMIP3 in the Sahel and Amazonia (B. Booth 2011, personal communication). However, using PPE boundary conditions with PRECIS still explores a wider range of projected changes than when using the boundary conditions from the previously available models only. In both cases, any results will need to be evaluated against the background of the range of responses in CMIP3 to provide the broadest context for the downscaled projections. We explore this issue further in our analysis by comparing the range of responses in the two ensembles for the Southeast Asia region.
3. Ensemble selection
In this section we examine criteria for selecting members of the QUMP ensemble to be used for downscaling climate projections over Southeast Asia. Their proposed use is to establish credible baseline climates for the region and a defendable range of projected future climates, that is, capturing a range of responses in temperature, monsoon characteristics, and precipitation consistent with our current understanding of plausible or likely range of changes. We thus assess the performance of ensemble members in simulating the climate of the region, important characteristics of their projected climate changes, and put these into the context of the CMIP3 projections that formed the basis of the last IPCC assessment.
a. Performance in simulating recent climate of Southeast Asia
First, we seek to eliminate from our subset any model that is unable to simulate key processes in this region, where the climate is dominated by the Asian summer monsoon and its variability. The monsoon is the major driver of the timing, magnitude, and distribution of wet-season rainfall and thus rainfall interannual variability and extremes in this region; however, realistic simulation of the circulation patterns, associated rainfall, and interannual variability remain a significant challenge in GCMs (e.g., Martin et al. 2000, 2006; Bollasina and Nigam 2009).
Implicit in eliminating or weighting models within an ensemble to favor those that make the most realistic representation of the baseline climate is an assumption that realistic models will generate more reliable projections of future climate (e.g., Murphy et al. 2004; Lopez et al. 2006; Piani et al. 2005; Giorgi and Mearns 2002, 2003). This raises a number of issues [see Knutti (2010) and Mote et al. (2011) for comprehensive reviews of these complexities], such as the wide range of possible criterion for validation, the influence of gaps or errors in our observational datasets, and the potential for models giving the “right answers for the wrong reasons.” The absence of consensus on an appropriate approach or criteria for weighting or “culling” is a strong argument for an inclusive “one model, one vote” approach to combining projections from multiple models. However, while it may be difficult to identify a clear subset of models that perform better at global or continental scale, Overland et al. (2011) argue that significant diversity in performance at the regional scale makes the identification, and consequent elimination, of poorer models both more feasible and more defendable.
While our primary concern in this application is to discount projections that might be physically implausible, a possible consequence of eliminating “poor” models is to reduce the range of future climate outcomes for the region in the resulting culled ensemble. Although there are examples where this is the case (e.g., Overland et al. 2011), the studies have largely indicated that the range of outcomes regionally is robust to culling (e.g., Knutti et al. 2010; Pierce et al. 2009; Brekke et al. 2008; Mote and Salathe 2010); whether such a narrowing of outcomes is a desirable is debateable [e.g., Dessai et al. (2009) warn that presenting a narrower range of uncertainty might be “falsely precise” leading decision makers to “bad decisions”]. We argue that a projection based on a model that does not represent key features of the climate in that region cannot provide a credible projection of future climate for that region, and we use this argument as a firm basis for eliminating models on a regional basis.
Knutti et al. (2010) provides some useful guidelines on the implementation of elimination or weighting of models, three of which have particular relevance to this application, including
Metrics and criteria for evaluation should be demonstrated to relate to projection.
It may be less controversial to downweight or eliminate models that are clearly unable to mimic important processes than to agree on the best model.
Process understanding must complement “broad brush metrics.”
Based on these recommendations, our approach is to assume in the first instance that projections from all models are equally plausible unless the validation implies that a model has clear inadequacies in simulating a key driving feature (in this case, the Asian summer monsoon) and thus should be eliminated, meeting guideline 2 above. For models in which the timing or geographical positioning of the monsoon and associated rainfall is unrealistic, projected changes in that system will be manifest as changes in an unrealistic position or time of year. The common practice of applying future-minus-baseline anomalies from model projections to observations to generate climate scenarios is highly vulnerable to errors generated in this way, and the potential bias-correction approaches become vastly more complicated (or, arguably, invalid). In this application we argue it is appropriate to eliminate members of the ensemble that do not represent the key features of the onset, timing, and geographical location of the Asian summer monsoon realistically (meeting guidelines 1 and 3 above). Additionally, to provide context for what might be considered a relatively good or poor model representation of these features, we show the same plots for the CMIP3 MME as we do for the QUMP PPE.
We find that the climatology of the horizontal winds at 850 hPa over the region in the QUMP models represent well the key characteristics of the monsoon circulation compared with 40-yr European Centre for Medium-Range Weather Forecasts Re-Analyses (ERA-40; Uppala et al. 2005), capturing the location of the Somali jet and the turning of the 850-hPa westerly flow northward into the Bay of Bengal (Fig. 1). These key features are not represented with the same consistency in CMIP3 models (Fig. 2)—all three Goddard Institute for Space Studies (GISS) models (Model AOM, Model E-H, and Model E-R), the Institute of Numerical Mathematics Coupled Model, version 3.0 (INM-CM3.0), and the Commonwealth Scientific and Industrial Research Organisation Mark, version 3.0 (CSIRO Mk3.0), fail to reproduce the strength of the southeasterly flow of the Somali jet along the coast of the Arabian peninsular, and the GISS and the Institute of Atmospheric Physics (IAP) models fail to maintain the direction and strength of the westerly flow beyond the Indian peninsular. All members of the QUMP PPE do, however, overestimate the strength of the monsoonal flow systematically (Fig. 1), a known systematic error in other Hadley Centre models (e.g., Martin et al. 2000). Tendencies to overestimate the strength of the monsoon circulation are shared with only the CCCMA and MIROC models while almost all others tend toward underestimation of the monsoonal flow in this region (Fig. 2).
Comparison of QUMP June–September (JJAS) rainfall fields with the Climate Prediction Center Merged Analysis of Precipitation (CMAP) gridded observational dataset (Xie and Arkin 1997) (Fig. 3) indicates that wet-season rainfall in the QUMP PPE members is generally overestimated at the three regional maxima (Bay of Bengal, South China Sea, and western Philippine Sea) by around 20%–60%, but the position of these maxima is realistic in all but Q16, and all models capture the northwest–southeast gradient of wet-season rainfall magnitude over Vietnam. The CMIP3 ensemble indicates a much wider range of errors in simulating wet-season rainfall patterns in southeast Asia (Fig. 4). Many of the models misplace, or do not replicate at all, the three prominent maxima. The generally poorer representation of the geographical distribution of wet-season rainfall in CMIP3 is evident in the spatial correlation values, which range between 0.5 and 0.75 (average 0.65) in the QUMP ensemble compared with 0.07–0.73 (average 0.47) in the CMIP3 ensemble. However, only the GISS models in CMIP3 systematically overestimate the rainfall maxima, as the wettest of the Hadley Centre models do.
The onset and timing of the monsoonal flow and rainfall in the region are examined. The regions SEA, the monsoon region (MN), and VN as defined for the purposes of spatial averaging are shown in Fig. 5. The timing and onset of the monsoon at both 850 and 200 hPa are simulated realistically in the QUMP models (Fig. 6)—it is only the magnitude of the 850-hPa peak in horizontal flow in JJAS that is erroneous. Despite the systematic overestimation of monsoonal flow in all the QUMP members, there is no systematic overestimation of wet-season rainfall. QUMP members reproduce the average annual cycle of rainfall in the monsoon region well. Members Q10 and Q16 are distinct from the other models in simulating less than the observed JJAS rainfall, while others are closer to or slightly above the observed. Over VN, simulated rainfalls are realistic in timing and broadly so in magnitude, although they have a tendency to overestimate September rainfall relative to the other wet-season months.
The CMIP3 models by comparison vary considerably more in their realism in the Asian summer monsoon and associated wet-season rainfall (Fig. 7). While most of the models reproduce the timing and onset of the U wind flows at 850 hPa, the magnitudes vary considerably. Most CMIP3 ensemble members underestimate the wet-season rainfall and represent the annual cycle of rainfall amounts less realistically—the GISS models in particular represent this very poorly. Several overestimate rainfall in the dry months, and even those with more realistic annual cycles mistime the onset and or the retreat of the wet-season monsoon rainfalls.
The question remains, Do the above-mentioned validation results provide evidence that any members of the QUMP ensemble demonstrate clear inadequacies in the region that substantially undermine their ability to represent future changes in the climate of that region? Clearly the QUMP models demonstrate skill in representing the timing, onset, and spatial features of the Asian summer monsoon, in terms of both the horizontal flow and associated seasonal rainfall in the region. The common weakness affecting all models in the ensemble is a systematic error in the magnitude of the monsoon flow, which is too strong in all QUMP members and appears to result in an overestimation of wet-season rainfall in the wettest regions. These characteristics of the monsoon flow and associated rainfall in baseline simulations from the CMIP3 models differ more substantially, and diversely, from that observed. While most CMIP3 models do not demonstrate the same systematic errors in rainfall amount at the wettest points, there are many that do not simulate the temporal and/or geographical key features as realistically as the Hadley Centre models.
On this basis, we do not find any good reason to eliminate any of the QUMP members from the downscaling experiments in this region. The QUMP models reproduce the main climatological features well and they err similarly to one another, with no model in the ensemble clearly perform “worse” than the others. Comparing performances of the QUMP models with those of the CMIP3 models provides a useful benchmark against which to assess the performance of the QUMP models. The comparison demonstrates that all the QUMP models tend to perform (equally) well in terms of the timing and location of the monsoon features and better than most CMIP3 models. The QUMP models do have biases in the magnitude of these features, though these are less likely to affect the ability of the models to simulate plausible scenarios of future climate than errors affecting the location or timing of key features. For example, the relatively straightforward bias-correction technique of applying climate projection anomalies to observations to create future scenarios will be more robust if the anomalies are derived from changes in features that are spatiotemporally well positioned.
Members of the CMIP3 ensemble that we would recommend eliminating from downscaling experiments or ensemble analyses based on these criteria are the three GISS models, the Flexible Global Ocean–Atmosphere–Land System Model gridpoint, version 1.0 (FGOALS-g1.0), from IAP, and INM-CM3.0. These models all suffer from significant shortcomings in their representation of the monsoon flow and wet-season rainfall.
b. Characterizing projected changes in future climate
Here we examine the future climate changes projected by the 17 QUMP GCMs to identify a subset of the ensemble that broadly represents the range of future outcomes of the full 17-member ensemble. For the purposes of this example for Southeast Asia, there is no specific upper limit on the size of the subset: while the minimum ensemble size that can estimate an upper and lower uncertainty boundary for a given variable is 2, we expect more would be required to encompass a range of magnitudes and characteristics of climate changes in relevant variables. We do not endeavour to interpret the distribution of climate outcomes from the 17 models, or the subset thereof, as an indication of which is most or least probable, but rather to provide a range of outcomes within which future climate change may lie.
When examining the changes in the GCM simulations to guide our selection for downscaling, we assume that the RCM downscaling process does not cause the higher-resolution simulations to diverge significantly from the broadscale climate simulated by the parent GCM and acts only to add high-resolution regional details. This is a necessary condition for the one-way downscaling approach to avoid the situation in which the mesoscale processes represented at the higher resolution of the RCM feedback to the large-scale circulation sufficiently to cause divergence from the driving GCM. Such a divergence would represent a fundamental inconsistency with the driving GCM and could lead to unreliable downscaled future simulations (Jones et al. 1997). Assuming no significant divergence, the range of large-scale outcomes in the downscaled simulations will be similar to those of the driving GCMs and that it is therefore appropriate to base our sampling strategy on the latter.
In our selection of models, we are looking to capture a range of responses on the bases of the following:
Magnitude of response: greatest/least warming and magnitude of change in precipitation regionally
Characteristics of response
average direction of change in precipitation (increases and decreases);
spatial patterns of precipitation response over Southeast Asia; and
response of the monsoon circulation.
Throughout these investigations, we compare the range of QUMP model responses to those from the CMIP3 MME to put the results in the context of the multimodel ensemble spread, and to assess the level of consistency between the range of responses given by the PPE and a multimodel ensemble. We address this issue further in section 3b(3).
1) Magnitude of response
The magnitude of response in the region of interest is examined by looking at the average temperature response over the regions of interest and the root mean of precipitation changes (Fig. 8). By this method we can isolate the magnitude of the change in the region from the characteristics of the changes and avoid cancelling out precipitation increases and decreases.
The members of the QUMP ensemble are named “Q1–Q16” according to their global climate sensitivity (Q1 being lowest sensitivity, Q16 highest). Regional and seasonal temperature responses do not necessarily follow linearly, and in both the SEA and smaller VN region, Q13 displays the highest response in both temperature and precipitation change in JJAS and December–February (DJF). The models Q1–Q5 have generally the lowest responses in both regions, in no particular order.
Over SEA, the magnitude of the precipitation change is approximately linear with the temperature change between models. There is less of a relation for the smaller region of VN, where the precipitation response is more variable due to the smaller averaging area. Q14, Q16, and Q11 tend to be the higher-response models over both the SEA and VN regions. Q10 and Q16 are particularly prominent in the VN plots, as they have high temperature and low precipitation responses.
Over both SEA and VN, the average response is higher in QUMP than in CMIP3. There are, however, some significant outliers in CMIP3, notably HadCM3 (high winter precipitation response), and the Model for Interdisciplinary Research on Climate 3.2, high-resolution version [MIROC3.2(hires); high temperature response], which extend the ranges of magnitudes toward/beyond the range of the QUMP models.
2) Characteristics of rainfall response
The ensemble mean and consensus maps in Fig. 9 indicate where rainfall change is positive and negative. Notably, in JJAS, continental Southeast Asia is dominated by increases in rainfall, while the islands of Southeast Asia are dominated by decreases. Vietnam lies on the margin of these two regions, bisected by the relatively narrow band of “disconsensus,” where the boundaries lie between the various models’ precipitation increases (to the north) and decreases (to the south). In DJF, drying occurs in most of Southeast Asia, south of 20°N, including Vietnam. The consensus in terms of the regions where increases or decreases in rainfall occur is high in the ensemble—notable areas of disconsensus, however, occur in DJF over Western Australia and also in a band extending through the Indian Ocean. We discuss the degree and characteristics of the consensus in this ensemble further in section 4.
A key motivation of identifying a subset is to capture the range and patterns of rainfall change. In the QUMP models’ subset selection, we look at the overall change in rainfall amount over the SEA and VN regions and then at the spatial correlation of rainfall change patterns over the SEA region (Figs. 10 and 11). Figure 10 shows the change in rainfall as an average of the positive and negative changes within the region to give an overall direction of the change signal. This differs from the changes in rainfall shown in Fig. 8, where the modulus of the rainfall change for each grid box is calculated so that the average reflects the magnitude of change rather than the direction. This allows us to separate the “characteristics” of the rainfall change from the “magnitude” to make sure that we are capturing both of these aspects in our subset.
Over SEA in JJAS, the increases in rainfall over the continent of Asia dominate the spatial average even though more than half of the region shows a drying signal (Fig. 10). Generally, the temperature and rainfall responses are quite linear, with higher-sensitivity models showing the largest rainfall increases. The signal is not as clear in DJF. Most QUMP models indicate overall drying in this season with higher-sensitivity models showing the largest drying. However, there are two exceptions; models Q10 and Q16 are wetter and so they do not follow the tendency seen in the other members for the seasonal changes to average out to annual decreases in SEA rainfall. Over VN, which is the region of more direct relevance for the application of the downscaled projections, all models dry in DJF and 6 of the 17 dry in JJAS (Fig. 10, bottom row). Overall, QUMP ensemble members tend to have drier future projections than the CMIP3 models.
Figure 11 shows the spatial correlation of the rainfall changes in each QUMP model with the ensemble mean across all models in SEA. This identifies models with spatial patterns of change that can be considered to be “typical” (high correlation with the ensemble mean) or “atypical” (low correlation with the ensemble mean) of the ensemble. Those models with the largest responses will have a larger impact on the ensemble mean and therefore might be expected to have higher correlations. This is the case here, where the highest-sensitivity models characterize the typical patterns of change in JJAS (seen in Q13 and Q15) but less so in DJF, where the lowest-sensitivity model, Q1, also demonstrates a high correlation with the ensemble mean along with other high-sensitivity models, Q9, Q11, and Q14. Many of the models have high correlations, indicating similar spatial patterns of change, compared with the range of correlations for CMIP3 indicated by box plots, and we discuss this further in section 4. The models with the least typical patterns are Q10, Q16, and Q2 in JJAS and Q8 and Q15 in DJF. The patterns of change in JJAS clearly dominate the annual changes, with Q15, Q13, and Q11 as the most typical and Q2, Q10, and Q16 as the most atypical.
The differences between the typical and atypical patterns of precipitation change are illustrated in Fig. 12, using the precipitation change fields for JJAS in Q10 and Q13. The typical band of drying that extends east–west across the region in Q13 is largely absent from the atypical projection from Q10.
Finally, we look for differences between the models in the characteristics of their monsoon response (Fig. 13). These differences will, to some extent, account for the differences in rainfall change in JJAS in the region. Increases in wet-season rainfall are consistent with the increases in 850-hPa wind speeds (Fig. 13), which show some linearity with the temperature response. Q13 and Q14 have a strong response in both rainfall and monsoon flow, while Q8, 3, and 15 have the weakest overall monsoon response, according to both metrics. Models Q10 and Q16 are exceptions to this approximate linearity; both exhibit the largest rainfall increases in the region but with relatively little overall change in 850-hPa flow. Q10 and Q16 also exhibit the lowest monsoon rainfall amounts in the baseline period (Fig. 6).
3) Model selection
Q0: Is the unperturbed model. We recommend that this model be included in any subset due to its similarity in formulation to the CMIP3 model HadCM3, and that it uses the base parameter set from which the perturbed physics parameter sets were created. Q0, in this case, represents a midrange sensitivity in the SEA and VN regions.
Q3 and Q13: Represent low and high sensitivity in the SEA and VN regions.
Q10: Selected to include the atypical wet-season rainfall responses displayed by both Q10 and Q16 (Figs. 11–13), opposing the typical pattern evident in Q13. Either of these models could represent this response, but Q10 was chosen to capture the driest projection over Vietnam for the wet season. (Fig. 10).
Q11: Together with Q10, these two models span the range of wet-season increases and decreases in rainfall (Fig. 10).
The subset captures the range of responses in magnitude and characteristics of precipitation in both the SEA and VN regions. Selection is steered toward the characteristics of wet-season rainfall (JJAS), as these are more important for the impacts studies in which the users of these data are interested. However, inclusion of additional models Q1 and/or Q2 would help to account for the lower and drier responses in VN dry-season rainfall (DJF) if these were deemed important.
We note that the five-model subset has been selected to encompass the range of responses but not to sample the distribution of sensitivities representatively: the five models are biased toward higher-sensitivity models. This has arisen because it is easier to identify different characteristics of change in simulations with stronger climate change signals. The distribution of sensitivities indicated by the ensemble subset should not therefore be interpreted as an indication of probability or likelihood.
4. Comparison between QUMP PPE and CMIP3 MME
In providing PRECIS users with an ensemble of climate projections for their region, our objective is to provide a defendable range of plausible future outcomes that can be used to explore vulnerabilities and inform adaptation and policy decisions. While we cannot test whether the QUMP or QUMP subset ensemble accounts for all physically plausible outcomes, we can test whether it samples all of the outcomes that current modeling science indicates are plausible by comparing our results with those of the CMIP3 ensemble.
It is clear from both the validations of baseline climate (Figs. 1–4, 6, 7) and the analysis of future projections (particularly Fig. 11 but also Figs. 8, 10, 13) from the two ensembles that the CMIP3 models differ more from one another than the QUMP models. This presumably results from the additional structural differences between models in the CMIP3 MME that do not affect the QUMP PPE. While the QUMP ensemble accounts for a wider range of global sensitivities than CMIP3 (Collins et al. 2011), the lesser heterogeneity in characteristics of the projections, particularly in terms of the spatial characteristics (Fig. 11), may mean that the range of projections at the regional level is narrower than in CMIP3, particularly in terms of precipitation.
In section 3b(3), we identified five members of the CMIP3 ensemble that might be eliminated if we applied the exclusion criteria that we have applied to the QUMP ensemble. In an analysis conducted but not shown in this paper, we found that the exclusion of these five models had very little impact on the range of outcomes indicated by the ensemble for the metrics explored here. As the exclusion of these models does not affect the outcomes of this analysis, we show the range of outcomes from the full CMIP3 from hereon.
Comparisons between PPE and MME projections of regional changes in temperature and precipitation in SEA and VN indicate that the range of outcomes of temperature is not substantially different between the two ensembles. The QUMP ensemble range of 2.0°–3.5° in SEA by the 2080s under the Special Report on Emissions Scenarios (SRES) A1B fails only to capture the lowest half a degree of the CMIP3 range (1.6°–3.6°). For VN specifically, the spread is similar, but the distribution of QUMP ensemble sits at 0.75° warmer than CMIP3 (Fig. 8).
More significant are the differences between the two ensembles in the ranges of precipitation changes. In Fig. 10 we see that in JJAS, the QUMP ensemble projects smaller increases in wet-season rainfall in both SEA and VN than CMIP3, but there is consistency between the two ensembles in increases in rainfall in the region. For DJF, however, the CMIP3 and QUMP ensemble yield quite different results for SEA: CMIP3 indicates a broad consensus for overall increases in DJF rainfall and QUMP shows a broad consensus toward overall decreases. We have paid most attention to changes in JJAS rainfall in our model selection due to the agricultural reliance on wet-season rainfall and higher risks of flooding due to very heavy rainfall events in this season. While DJF rainfall changes might be considered to be of secondary importance in terms of climate change impacts, these differences still have significance for the region.
The smoothed consensus plots in Fig. 14 indicate some clear differences between the characteristics of the two ensembles, and they highlight regions where the consensus between the two ensembles is opposing. First, the QUMP ensemble clearly shows more consistent patterns of change with much smaller regions of disconsensus. There are relatively large areas where QUMP shows consensus but CMIP3 does not. In JJAS, spatial patterns of change are similar, but CMIP3 models show greater disagreement with regard to the extent of the regions of drying, for example, the northward extent of the subtropical drying, which leaves much of Southeast Asia in disconsensus. The region of opposing consensus (red) arises because almost all of the QUMP models indicate this extended region of drying. In DJF, however, the differences between consensuses in the two models are larger, with a large region of opposing consensus. In this dry season, the regions of drying cannot easily be explained as different magnitudes of the same pattern, but they appear to represent a different mode of change.
Why might the two ensembles display such different behavior in this region and season? One explanation might be that this model of change is specific to structural aspects of Hadley Centre model formulation. However, Fig. 15 shows that the standard HadCM3 model, which is a member of the CMIP3 ensemble, follows the “CMIP3 like” pattern of change (the patch of increased rainfall is farther west, affecting much of Indonesia, parts of Malaysia, and the Philippines) rather than the “QUMP like” pattern of change (drying over the most of Southeast Asia, with the regions of moistening remaining well out into the Pacific).
The differences between the rainfall projections from the HadCM3 and Q0 models in Fig. 15 are interesting because these two models have identical formulation and parameter settings—the only difference between these models is that Q0 (as with the other QUMP members) is flux corrected with respect to SST, while HadCM3 is not. The association between SST and precipitation is evident in baseline simulations from these models and the two ensembles (Fig. 16), where the moister regions clearly map onto the regions of warmer SST. The corrections to SST in the Q0 model are evident when compared with SST in HadCM3; errors that appear in HadCM3 that the flux adjustments correct for include partial correction of the cold bias in the equatorial Pacific cold tongue that is evident in HadCM3 (a deficiency apparent in several other CMIP3 models; see the CMIP3 ensemble mean in Fig. 16, which is related to the double ITCZ—a systematic error observed in climate models; e.g., Bollasina and Nigam 2009) and the warm bias in Southeast Asia. The much-improved rainfall simulations for the region in Q0 compared with HadCM3 are demonstrated by the lower RMSE (from 3.6 to 1.6) and higher “r” values (from 0.72 to 0.90).
While it is clear that flux-adjusting SST results in a more realistic baseline precipitation simulations in the QUMP models compared with HadCM3, its is not clear whether this means that we can consider the future projections to be more realistic or more reliable. Studies such as that of Lin (2007) demonstrate that the dependence of precipitation on SSTs is too strong in climate models (due to an excessive sensitivity of surface air humidity to SST, and thus too strong a control of the atmosphere on evaporation); the air–ocean interactions in observations are significantly weaker. Bollasina and Nigam (2009) note this particularly in the Indian Ocean basin, where the relationship between Indian summer monsoon rainfall and SST shows a distorted representation of ocean–atmosphere interactions. While Bollasina and Nigam (2009) are concerned with summer rainfalls rather than winter rainfalls, the fact that SST–precipitation links in coupled climate models do not represent those observed implies that that flux corrections to SSTs in the QUMP ensemble might not have as large a benefit to the realistic simulation of future rainfall as the above-mentoned analysis of HadCM3 versus Q0 implies.
5. Discussion and conclusions
The objective of this analysis is to select a subset from a GCM ensemble that represents a defendable range of plausible future outcomes for use in informing adaptation and policy decisions. Lateral boundary conditions from these selected models will be used to drive RCMs for the region to provide high-resolution climate scenarios for impact assessment and climate change analyses.
In this study we considered a 17-member perturbed physics GCM ensemble for use in developing high-resolution climate scenarios for Vietnam. Models were selected based on 1) reasonable representation of key processes in the region and 2) sampling the range of future outcomes in terms of both the magnitude and the characteristics of changes. The focus here was on the Asian summer monsoon and ranges of changes in seasonal precipitation and temperature because of the importance of these aspects of the climate of Vietnam for applications of the projections. We also questioned the extent to which the subset, or indeed the full 17-member ensemble, can be considered to be representative of a range of future scenarios, which is consistent with the current understanding, by comparing the QUMP projections with those from CMIP3. Examining fields from the 17 QUMP GCMs, we found the following:
Comparison of baseline simulations from the 17 GCMs with observations indicated that all of the members simulated the key characteristics of the climate of the region realistically. In this case, while all members systematically overestimate the magnitude of the Asian summer monsoon and associated rainfall, the position and timing of these features are realistic, and on these grounds we did not find good reason to eliminate any of the ensemble members from the subset on grounds of validation. This approach based on exclusion of bad models based on specific key processes (in this case, the Asian summer monsoon) is consistent with recently published guidance on combining results from multiple models (Knutti et al. 2010; Knutti 2010; Overland et al. 2011).
A subset of the 17 members was selected based on capturing ranges of outcomes in terms of magnitude and characteristics of change, bearing in mind the characteristics of change over the whole Southeast Asia region as well as specifically for Vietnam. The model Q13 is selected, as it is the highest overall sensitivity model for the region, and Q3 is among the lowest. When considering changes in rainfall, the QUMP ensemble was found to project overall increases in monsoon intensity and monsoon rainfall under a warming climate. However, while average projections over SEA indicate overall increases in wet-season (JJAS) rainfall, Vietnam sits geographically on a north–south boundary between the regions of increased and decreased rainfall, such that slight differences in the position of this boundary in different members causes disconsensus in the overall direction of rainfall change for Vietnam. Q10 and Q11 were therefore selected to capture the disconsensus in projected rainfall over Vietnam: the model Q10 indicates the largest increases in wet-season rainfall and Q11 the largest decreases in wet-season rainfall. These four models also include examples of the most “typical” spatial patterns of change in Southeast Asia (Q13) and the least typical (Q10) when compared with the ensemble mean. Q0 is also included, as it is the standard or “unperturbed” version of the model. The models Q10 and Q16 were generally found to be distinct from other models in the ensemble, simulating different patterns of precipitation change and different characteristics in the Asian summer monsoon in future projections than the other members of the ensemble.
Comparison with the CMIP3 multimodel ensemble indicates that despite spanning a comparable range of global sensitivities (Collins et al. 2011), and a similar range of regional changes with respect to temperature, the patterns of precipitation change are considerably more similar to one another within the QUMP ensemble than in CMIP3. This causes large regions where the QUMP ensemble indicates consensus in terms of the direction of precipitation change, while CMIP3 shows a wider range of outcomes that span both increases and decreases. Given that the use of the range of outcomes solely from the QUMP PPE gives a narrower range of outcomes than the CMIP3 MME for some regions, we stress the importance of providing contextual information about the range of CMIP3 projections alongside downscaled QUMP projections. While users cannot currently downscale a wide range of GCMs from the CMIP3 ensemble, the need to downscale also from a multimodel ensemble will be addressed by the future availability of CMIP5 boundary data for dynamical downscaling.
A further outcome of the CMIP3–QUMP comparison is that there are regions where the two ensembles indicate opposing consensus in the direction of overall change in DJF precipitation amount. The differences in consensus reached by the two ensembles are most likely to arise due to differences in SSTs, which are flux corrected in the case of QUMP but not CMIP3. In absence of any evidence that the consensus reached by either ensemble is more reliable than the other, significant interpretative issues with regard to the range of projections from both ensembles arise from this comparison.
As simulations from the new generation of GCMs become available via CMIP5 in the near future, an unprecedented effort to make boundary data from this large pool of models available for use with regional models will mean that regional modelers will face new challenges in experimental design. Methodologies for subselecting from a large pool of available boundary data will be required to ensure that meaningful interpretations of the range of outcomes from those models can be made. The methodology outlined here provides one such framework. Subsets of boundary data that are made based on “ad hoc” criteria (such as ease of access to the data) will cause considerable interpretative difficulties to those who wish to use the resulting projections for policy-relevant applications. We intend for this paper to stimulate discussion of the issues involved in this aspect of RCM experimental design, generating awareness among those engaged in generating regional climate change projections, and prompting further development of methodologies and guidelines.
Based on our experiences of subselecting GCMs for RCM experiments presented here, we make the following recommendations to those developing methodologies of their own:
Most importantly, we highlight the value of investing adequate time and resources to model selection. A well-considered selection of a small number of ensemble members saves considerable resources compared with downscaling a large ensemble; however, if this is conducted in a rushed or ad hoc fashion, then the value of the information in terms of the range of plausible future outcomes and thus their use in policy-relevant applications is limited.
Applying the guidelines of Knutti (2010) and Overland et al. (2011) to this application, we stress the recommendation that models should be excluded on grounds of poor validation for a given region only when they demonstrate a clear inability to simulate key processes realistically. The identification of key processes for a particular region is an important stage in the model selection process; it is these processes by which we can link performance in simulating the baseline climate to the reliability of future projections.
The selection of models should be made to sample the range of outcomes represented by the full ensemble in terms of multiple characteristics. Spanning the range of “characteristics” may include changes at different spatial scales (e.g., both SEA and VN), different seasons, and different variables. The relative importance of the different characteristics will differ in each case depending on the intended use of the projections. This prioritization of variables, seasons, or spatial scales is a subjective component of the selection process, but one which may have important implications for how “useful” the resulting projections will be.
Following the recommendations of Overland et al. (2011), aspects of the selection process will inevitably be subjective due to the wide range of criteria that can be applied. For this reason it is important for criteria to be as simple as possible, and well documented.
Interpretation of the range of outcomes from either a complete or sampled ensemble should be made carefully. The scenarios generated via this methodology can be considered to represent a defendable range of plausible climate outcomes, or a “snapshot” of current scientific understanding, but we cannot assume that they are exhaustive of all plausible outcomes. Comparisons between ensembles, such as the comparison made here between QUMP and CMIP3, can provide useful additional information to guide interpretation. We expect that further analyses comparing the regional characteristics and differences between the CMIP3 and CMIP5 ensembles would provide a useful resource in these types of studies if included in the IPCC Fifth Assessment Report.
Finally, our analysis here has emphasized the assessment of driving GCMs to generate downscaled datasets that are suitable for use in climate impacts studies. Some further assessment of the downscaled projections is recommended to ensure their suitability for these applications, including 1) an assessment of realism of the downscaled simulated climate compared with observations and 2) an assessment of the consistency of the downscaled projections with their driving GCM.
This research on model selection for Vietnam was performed as part of the Met Office contribution to the Capacity Building for Climate Change Project (CBCC) project (www.cbcc.org.vn), a UNDP-funded initiative in collaboration with Vietnam’s Institute for Meteorology, Hydrology and Environment (IMHEN). Richard Jones and Ben Booth gratefully acknowledge funding from the Joint Department of Energy and Climate Change (DECC) and the Department for Environment, Food and Rural Affairs (Defra) Met Office Hadley Centre Climate Programme.
We acknowledge the modeling groups the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP’s Working Group on Coupled Modelling (WGCM) for their roles in making available the WCRP CMIP3 multimodel dataset. Support of this dataset is provided by the Office of Science, U.S. Department of Energy.