Africa lags the rest of the world in climate model development. This paper explores the potential for region-specific, process-based evaluation to promote progress in modeling and confidence assessments.
In recent decades, there has been remarkable progress made in climate modeling (Gates et al. 1990; Gates et al. 1995; McAvaney et al. 2001; Randall et al. 2007), but with limited discernible improvement over Africa (Flato et al. 2013; Rowell 2013; Watterson et al. 2014). This is most frequently highlighted with reference to the Sahel, where many models fail to capture the magnitude of the 1970s–1980s drought (Biasutti 2013; Roehrig et al. 2013; Vellinga et al. 2016). Other African regions also present demanding tests for climate models. Organized convection (Jackson et al. 2009; Marsham et al. 2013; Birch and Parker 2014) and sharp gradients in temperature, soil moisture, and potential vorticity (Cook 1999; Thorncroft and Blackburn 1999) are problematic given large grid spacing. The presence of strong land–atmosphere interactions (Koster et al. 2004; Taylor et al. 2013), large aerosol emissions from arid regions (Engelstaedter et al. 2006; Allen et al. 2013), influences from global ocean basins (Folland 1986; Rowell 2013), and prominent modes of interannual and interdecadal rainfall variability (Giannini et al. 2008) exacerbates the challenge. Furthermore, some of these features and systems are poorly understood as a result of limited access to readily available observations (Fig. 1) and research attention.

Station data contributing to the Climatic Research Unit (CRU, version 2.23) precipitation (black) and Integrated Global Radiosonde Archive (IGRA) u wind at 1200 UTC (pink). (top) Coverage map for 1979–2013. Black squares indicate the location of 0.5° × 0.5° grid boxes that have at least one contributing station during at least 50% of the months; pink circles indicate stations with at least 10 records per month in at least 50% of the months. (bottom) Time series showing the number of stations contributing in each month (40°S–40°N, 30°W–60°E).
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1

Station data contributing to the Climatic Research Unit (CRU, version 2.23) precipitation (black) and Integrated Global Radiosonde Archive (IGRA) u wind at 1200 UTC (pink). (top) Coverage map for 1979–2013. Black squares indicate the location of 0.5° × 0.5° grid boxes that have at least one contributing station during at least 50% of the months; pink circles indicate stations with at least 10 records per month in at least 50% of the months. (bottom) Time series showing the number of stations contributing in each month (40°S–40°N, 30°W–60°E).
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1
Station data contributing to the Climatic Research Unit (CRU, version 2.23) precipitation (black) and Integrated Global Radiosonde Archive (IGRA) u wind at 1200 UTC (pink). (top) Coverage map for 1979–2013. Black squares indicate the location of 0.5° × 0.5° grid boxes that have at least one contributing station during at least 50% of the months; pink circles indicate stations with at least 10 records per month in at least 50% of the months. (bottom) Time series showing the number of stations contributing in each month (40°S–40°N, 30°W–60°E).
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1
None of the current generation of general circulation models (GCMs) was built in Africa (Watterson et al. 2014), and the relevant processes operating there have not always been the first priority for model development. Now, there are growing efforts to bolster African climate science (Shongwe 2014), to run and evaluate regional and variable-resolution models over Africa (e.g., Endris et al. 2013; Engelbrecht et al. 2009, 2015; Gbobaniyi et al. 2014; Kalognomou et al. 2013), to develop the first global models in African research institutions (Engelbrecht et al. 2016), and to improve models from international modeling centers over Africa (Graham 2014; Senior et al. 2016; R. A. Stratton et al. 2017, unpublished manuscript). There is also a wealth of relevant expertise in African meteorological services and universities, with many scientists focusing on observations or weather time scales who have the potential to contribute to climate model development—particularly through evaluation.
Model development most commonly progresses through hypothesis development and sensitivity testing, including running climate models on weather and seasonal time scales (e.g., Rodwell and Palmer 2007) and adding known missing physics. Dedicated field campaigns are also important in data-sparse regions, and for processes that are not well monitored, providing boundary conditions and observations for parameter development (e.g., Redelsperger et al. 2006; Washington et al. 2012; Stevens et al. 2016). Another strategy is top-down model evaluation and intercomparison. Based on the contention that model comparison will lead to improvement, there has been an impressive effort to make data from different modeling groups publically available through the Coupled Model Intercomparison Project (CMIP; Meehl et al. 2000; Eyring et al. 2016a). As well as informing model development, model intercomparison is also seen as a route toward better understanding model information for decision-making.
Yet, so far the CMIP project has not resulted in improved performance for Africa (Flato et al. 2013; Rowell 2013; Whittleston et al. 2017), and it is still very difficult to make conclusions about which of the models, if any, might generate more credible projections (e.g., Druyan 2011; Washington et al. 2013). The pace at which new experiments are generated can exceed the resources to analyze and understand them, particularly in Africa, where capacity remains limited. As part of phase 6 of CMIP (CMIP6), there is a drive to advance evaluation through the routine deployment of community-based analysis tools to document and benchmark model behavior (Eyring et al. 2016b). This will initially build on existing repositories of diagnostics (Bodas-Salcedo et al. 2011; Luo et al. 2012; Phillips et al. 2014; Eyring et al. 2015), but, recognizing that there are important processes that require innovation in evaluation tools, the relevant World Climate Research Programme working groups are encouraging experts to develop and contribute additional analysis codes, and are working on developing the infrastructure necessary to incorporate these tools (Eyring et al. 2016b). This represents an excellent opportunity to reconsider how models should best be evaluated, particularly for Africa.
Model evaluation might conceptually be divided into (i) analysis of physical processes and (ii) quantification of performance. On a global scale, important work has been done to investigate model representation of clouds and water vapor (e.g., Jiang et al. 2012; Klein et al. 2013), tropical circulation (e.g., Niznik and Lintner 2013; Oueslati and Bellon 2015), and modes of variability (e.g., Guilyardi et al. 2009; Kim et al. 2009). This process-oriented evaluation is fundamental to inform model development. On a regional scale, particularly for understudied regions in Africa, existing evaluation work is largely restricted to the quantification of models’ similarity to observations. The ability to reproduce the historical climatology is a fundamental “validation” check, and statistics have been developed that can impressively summarize comparisons of large multivariate datasets into single plots (Taylor 2001) and scalars (Watterson 1996). These “skill scores” have important applications, for tracking model development over time (Reichler and Kim 2008) and for comparing or ranking models (Gleckler et al. 2008; Schaller et al. 2011; Watterson et al. 2014). However, while performance evaluations can reveal symptoms of model problems, and comparison with observations has demonstrated some large biases over Africa (e.g., Roehrig et al. 2013), performance metrics are less informative for illuminating causes and potential fixes (Gleckler et al. 2008). Identifying metrics to rank models, or constrain future projections, is also very challenging for regions and processes that are poorly understood (Collins 2017; Knutti et al. 2017) and poorly observed (Fig. 1), and culling ensembles based on existing metrics for Africa fails to reduce the range of uncertainty in precipitation projections (Rowell et al. 2016).
Here, we argue that the evaluation of climate models over Africa needs to move beyond scalar metrics, validation, and checks on performance toward investigating how models simulate processes on a regional scale. A better understanding of how the models behave is fundamental to help determine how to improve them, and it is also an important way to assess their adequacy for future projection (James et al. 2015; Rowell et al. 2015; Baumberger et al. 2017). Engagement with African experts is key to identify and analyze the processes that matter regionally. In this paper, we draw on expertise from across the continent to explore the potential for progress through process-based evaluation for Central, East, southern, and West Africa, as well as at a pan-African scale. For each region, we review existing model evaluation efforts, identify important processes, and present an example of process-based evaluation.
The analysis is applied to the Met Office Unified Model (MetUM), at the beginning of a four-year effort to improve its ability to simulate African climate [the Improving Model Processes for African Climate (IMPALA) project, part of the Future Climate for Africa program; www.futureclimateafrica.org]. The MetUM is a fitting example, since it is already subject to well-established evaluation procedures, and there is a good baseline understanding of the model’s performance (see the sidebar on “Baseline understanding of model performance over Africa”), yet important gaps exist in the analysis of the processes that matter for Africa. The model has been developed in the United Kingdom, and this paper illustrates how deliberate and explicit inclusion of a team of experts in Africa can advance region-specific evaluation.
BASELINE UNDERSTANDING OF MODEL PERFORMANCE OVER AFRICA
Existing assessment of HadGEM3-GC2 has highlighted some large-scale biases of potential relevance to Africa, providing a useful basis for the region-specific analysis. The Southern Ocean absorbs too much incoming solar radiation and has associated biases in SSTs, winds, and precipitation (Williams et al. 2014). The ITCZ is too far south, which may be linked to this Southern Hemisphere albedo error (Haywood et al. 2016), although targeted albedo corrections do not provide a simple fix (Hawcroft et al. 2016). In the Indian Ocean, the convection is too strong, which has been connected with the long-standing dry bias in the Indian summer monsoon, and may be linked to a similar dry bias in the WAM.
Figure SB1 displays precipitation biases for HadGEM3-GC2. Similar plots are routinely output as part of a MetUM assessment, and they are typical of many evaluation packages used for other models. Figure SB1 therefore gives an indication of the kind of inferences typically available without further process-based assessment. A dry bias occurs in the Sahel during the WAM, and this region is also too dry in MAM and SON. The bias in SON extends to the west of Central Africa, and almost the entire Congo basin is drier than GPCP during DJF, although there is large observational uncertainty for this region (Washington et al. 2013). Figure SB1 also shows that southern Africa is too wet for much of the year, as is common among CMIP models (Christensen et al. 2007). There are large biases in the Indian Ocean, and East Africa is too wet during the short rains season (October–December) and too dry during the long rains season (March–May), which has also been found by other CMIP5 models (Yang et al. 2014; Tierney et al. 2015).

HadGEM3-GC2 climatological precipitation biases (mm day−1) for Africa relative to GPCP (similar biases for other reference datasets are not shown), for four seasons: DJF, MAM, JJA, and SON.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1

HadGEM3-GC2 climatological precipitation biases (mm day−1) for Africa relative to GPCP (similar biases for other reference datasets are not shown), for four seasons: DJF, MAM, JJA, and SON.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1
HadGEM3-GC2 climatological precipitation biases (mm day−1) for Africa relative to GPCP (similar biases for other reference datasets are not shown), for four seasons: DJF, MAM, JJA, and SON.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1
We hope the examples presented here will provoke discussion about what other processes and diagnostics should be examined, and promote the development of a model evaluation “hub” for Africa. One successful example of this approach is the Working Group on Numerical Experimentation’s (WGNE) Madden–Julian oscillation (MJO) task force, which aims to facilitate improvements in the simulation of the MJO in weather and climate models (Wheeler et al. 2013). By collectively identifying priorities for evaluation, and sharing research insights about model behavior and analysis methods, a model evaluation hub on African climate could fast track research, and move toward identifying and developing diagnostics to be incorporated into the CMIP evaluation toolkit (Eyring et al. 2016b) and routinely applied across models, potentially delivering a step change in our understanding of climate models over Africa.
DATA.
The Met Office Unified Model.
The MetUM is the modeling system developed and used at the Met Office. It is continually updated and is “seamless”: a common trunk of model versions is used across different temporal and spatial scales (Brown et al. 2012). The model version assessed in this paper is the most recently published version of the coupled atmosphere–ocean global climate model: the Hadley Centre Global Environment Model, version 3, Coupled Model 2 (HadGEM3-GC2; Williams et al. 2015), at N216 resolution (approximately 90 km in the tropics). The analysis is based on a simulation run with historical natural and anthropogenic forcings, and unless otherwise stated, data are presented for the 35-yr period 1979–2013. Data from the atmosphere-only simulation, forced with observed sea surface temperatures (SSTs) for 1982–2008, were also analyzed, but the main focus here will be on the coupled run.
Observations.
Unfortunately, there is a dearth of readily accessible observational data for Africa. As a result, many of the commonly used archives have large gaps in space and time (e.g., Washington et al. 2006; Rowell 2013; Washington et al. 2013; Fig. 1), inhibiting both our understanding of historical climate and our evaluation of climate model simulations. Observational uncertainty cannot be eliminated, but it can be partly addressed through careful selection of datasets for specific applications; for example, precipitation might be better evaluated using mid- to late twentieth-century climatological estimates, for which gauge data (e.g., Nicholson 1986) are more readily available. It might also be advisable to compare multiple sources of data, including ground-based and satellite records, as well as proxies; for example, river flow data could be used as a proxy for rainfall (e.g., Todd and Washington 2004). Reanalysis data are another important resource to tap when investigating climate where there is a lack of, or gaps in, the tropospheric circulation records; although, in data-sparse regions the output may be dominated by modeled processes.
In this paper we have prioritized just one reference dataset for each variable in order to obtain consistency across regions, drawing on previous analyses to select datasets deemed to be most reliable (e.g., Parker et al. 2011), and repeating the analysis with additional reference datasets in regions with high observational uncertainty. The datasets are summarized in Table 1.
Observational, satellite, and reanalysis datasets used in this paper. HadISST = Hadley Centre Sea Ice and Sea Surface Temperature dataset; other acronyms defined in the text.


APPROACHES TO PROCESS-BASED EVALUATION.
Approaches to process-based evaluation are considered for four African regions—Central, East, southern, and West Africa—moving from the least to the most studied domain. First, though, we take a pan-African approach: perhaps the newest frontier for climate science in Africa. While many studies do consider Africa as a whole, there has been limited consideration of the processes that act across the continent and thus connect its distinct regional climates.
For each domain we review existing work to outline what is already known about climate model performance, identify important processes for evaluation, demonstrate an example of a process-based approach applied to HadGEM3-GC2, and discuss the lessons learned: for this particular model and for methods to evaluate other models.
Pan-African.
Climate models with large grid spacing have difficulty reproducing the exact spatial distribution of variables such as precipitation (Dai 2006; Levy et al. 2012), even if they show some skill in simulating thermodynamic responses (Allen and Ingram 2002; Shepherd 2014) and modes of variability (Guilyardi et al. 2009), and some consistency in large-scale circulation changes (Held and Soden 2006; Vecchi and Soden 2007). Location-based assessment such as the calculation of bias and root-mean-square error is therefore quite a rigid test of model ability. A larger-scale analysis might extract more meaningful information about this type of behavior and thus there is logic in beginning at the pan-African scale.
Relevant pan-African features include the intertropical convergence zone (ITCZ), African easterly jet (AEJ), tropical easterly jet (TEJ), and certain teleconnections. While the ITCZ is not as coherent and uniform as theory might suggest (Nicholson 2009), the meridional migration of tropical convection is nevertheless an underpinning driver of African climate (Waliser and Gautier 1993). The AEJ, often noted as an important influence on West African climate (Nicholson 2009; e.g., Thorncroft et al. 2011), including its role in modulating African easterly waves (AEWs) and mesoscale convective systems during the West African monsoon (WAM; June–September) (Mekonnen et al. 2006; Leroux and Hall 2009), is also important for Central Africa and has a southern component during September–November (SON; Nicholson and Grist 2003; Jackson et al. 2009; Adebiyi and Zuidema 2016). The TEJ, which is well known to play an important role in African climate during boreal summer (Koteswaram 1958; Rowell 2001; Caminade et al. 2006), may also manifest south of the equator in January and February (Nicholson and Grist 2003). Teleconnections to global ocean basins also often affect more than one region (Giannini et al. 2008), the most well documented being the dipole between East and southern Africa during El Niño–Southern Oscillation (ENSO) events (Goddard and Graham 1999; Indeje et al. 2000).
In the case of HadGEM3-GC2, existing analysis has already shown which areas of the continent have too much or too little precipitation (Fig. SB1), but an analysis of pan-African processes might help explain and contextualize these precipitation biases and point toward strategies for improvement. One fundamental question is where and when the model is placing its peak ascent, and whether that ascent induces dry biases poleward of the locus of convection. Here, we analyze the seasonal cycle of vertical velocity, based on omega at 500 hPa (ω500), a well-established measure of the large-scale vertical motion (Bony et al. 2004; Oueslati and Bellon 2015), which has been used to diagnose tropical circulation in previous work (e.g., Schwendike et al. 2014) and can be compared between models and reanalysis.
As would be expected, many of the regions that are too wet have more ascent than in the reanalysis, such as across southeast Africa in November. Regions that are too dry are associated with anomalous subsidence, most notably the Congo basin, which has a large ω500 bias relative to the European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-I; Fig. 2c) and to a lesser extent to the National Centers for Environmental Prediction–Department of Energy (NCEP–DOE) second Atmospheric Model Intercomparison Project (AMIP-II) reanalysis [NCEP-2; not shown (Kanamitsu et al. 2002)]. In other regions the ω500 bias does not necessarily map exactly onto the precipitation bias, but nevertheless might help to explain the results. For example Fig. 2e reveals that during August, when the model has too little precipitation in the Sahel, there are downward anomalies across most of North Africa; and in parts of the northern Sahel, peak uplift occurs in boreal winter rather than in summer (Fig. 2b).

Month of maximum uplift (ω at 500 hPa) in (a) ERA-I and (b) HadGEM3-GC2. (c)–(f) Differences in mean ω (Pa s−1) at 500 hPa between the model and ERA-I for selected illustrative months. Negative (positive) values of ω are associated with upward (downward) motion; therefore, red shading indicates reduced uplift in the model compared with the reanalysis and blue shows enhanced uplift.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1

Month of maximum uplift (ω at 500 hPa) in (a) ERA-I and (b) HadGEM3-GC2. (c)–(f) Differences in mean ω (Pa s−1) at 500 hPa between the model and ERA-I for selected illustrative months. Negative (positive) values of ω are associated with upward (downward) motion; therefore, red shading indicates reduced uplift in the model compared with the reanalysis and blue shows enhanced uplift.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1
Month of maximum uplift (ω at 500 hPa) in (a) ERA-I and (b) HadGEM3-GC2. (c)–(f) Differences in mean ω (Pa s−1) at 500 hPa between the model and ERA-I for selected illustrative months. Negative (positive) values of ω are associated with upward (downward) motion; therefore, red shading indicates reduced uplift in the model compared with the reanalysis and blue shows enhanced uplift.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1
As well as helping to explain precipitation biases, Fig. 2 also allows for inferences about linkages between regions. The model is broadly able to simulate the migration of the tropical convection, with maximum uplift occurring in much of the Sahel zone during the boreal summer months and in southern Africa during the austral summer (Figs. 2a,b). Existing work has already suggested that the ITCZ is shifted too far south in the MetUM (Haywood et al. 2016), and many of the omega biases in Fig. 2 imply that this southward shift is manifest over Africa, with upward anomalies in southern Africa and downward anomalies in the Sahel during boreal summer and the Congo basin during austral summer. The ω500 plots may also give an indication as to how the model represents the zonal overturning circulation. The MetUM (and many other models) generate overly strong convection in the tropical Indian Ocean (Williams et al. 2014; Johnson et al. 2017), which is visible in Figs. 2c,d,f. In January and November, this is located to the east of a strong downward bias over the Congo basin and Atlantic Ocean, suggesting a possible modification in the Walker circulation over Africa. These results point to the benefits of considering these regions together: further work to explore drivers of the migration of tropical convection—for example, the seasonal cycle of SSTs, and further analysis of Indian Ocean biases and Walker circulation patterns—could generate inferences about multiple regions and seasons.
It would be useful to investigate how ω500 compares across CMIP models, and whether common precipitation biases (such as the wet bias in southern Africa) are associated with common biases in vertical velocity. The CMIP evaluation toolkit might therefore benefit from a measure of ω500 across Africa similar to Fig. 2. Given the limited previous work on pan-African evaluation, other diagnostics warrant further discussion and investigation, but potentially useful figures might include maps of moisture flux (following Suzuki 2011), latitude-by-month plots of the AEJ and ITCZ (following Nicholson and Grist 2003), and rainfall–SST correlations to assess teleconnections (following Rowell 2013).
Central Africa.
Central Africa is here defined as western equatorial Africa, extending from the Atlantic coast to the Rift Valley and between 10°S and 10°N. The region is critically understudied, in part because of limited data availability (McCollum et al. 2000; Washington et al. 2013). Several studies have assessed climate model precipitation based on observations and reanalysis (Haensler et al. 2013a,b; Aloysius et al. 2016). However, given the lack of gauge data included in precipitation datasets for this region, particularly for recent decades (Fig. 1), and the large variation in precipitation estimates from satellite and reanalysis datasets (Washington et al. 2013; Creese and Washington 2016), validation of modeled precipitation is challenging. Process-based evaluation is beneficial in this context, because it allows for features and variables that are better observed or understood to become the focus.
As the third largest convective zone worldwide (Webster 1983), the Congo basin is dominated by convective processes, with peak rainfall in the transition seasons of March–May (MAM) and SON, governed by the migration of solar insolation, but with important intraseasonal variability and approximately 70% precipitation delivered by mesoscale convective systems (Nicholson and Grist 2003; Jackson et al. 2009). Recent studies, mainly using reanalysis datasets, have identified prominent drivers of regional circulation (Suzuki 2011; Pokam et al. 2014; Neupane 2016) and the water cycle (McCollum et al. 2000; Pokam et al. 2012), demonstrating an important interaction with the Atlantic Ocean (Hirst and Hastenrath 1983a; Dezfuli et al. 2015), and possible remote drivers including Indo-Pacific SSTs (Hua et al. 2016). During the main rainy season, SON, low-level westerlies (LLWs) from the eastern equatorial Atlantic play a key role in moisture provision (Pokam et al. 2012).
Here, we analyze low-level moisture transport and circulation during SON. Moist circulation patterns from reanalysis, influenced by observations of large-scale winds, might be expected to be better constrained than spatially heterogeneous variables such as precipitation. Nevertheless, given observational uncertainty in this region, we use two reanalysis datasets [including the NCEP–National Center for Atmospheric Research reanalysis (NCEP-1; Kalnay et al. 1996; NOAA 2011) as well as ERA-I (Table 1)].
The basic structures of moisture transport are similar between the model and both reanalyses, although there are important distinctions, including between NCEP-1 and ERA-I (Fig. 3a). Note that differences are still evident if the datasets are replotted at the resolution of NCEP-1 (not shown), in particular ERA-I shows more intense moisture divergence along the Atlantic coast south of 5°N relative to NCEP-1. This moisture divergence is even more pronounced in HadGEM3-GC2, implying an overestimation of the LLWs in the model. In HadGEM3-GC2 the strong low-level winds feed a cyclonic flow into the Angola low, potentially strengthening the southeastward transport of moisture. This might help explain the dry bias at the Atlantic coast and wet bias in eastern and southern Africa (Fig. SB1), associated with a southeastward shift in ω500 relative to the reanalysis (Fig. 2f). Analysis of circulation in other seasons (not shown) suggests that, while the LLWs are a seasonal feature in reanalysis, the model produces them throughout the year, and enhanced northwesterly flow may also help explain the dry bias and downward anomalies in the Congo basin during December–February (DJF; see Fig. SB1 and Fig. 2c).

SON climatologies in ERA-I, NCEP-1, and HadGEM3-GC2 of (a) moisture flux at 850 hPa with contours of moisture flux divergence and (b) divergent and rotational flow at 850 hPa with contours of zonal wind speed.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1

SON climatologies in ERA-I, NCEP-1, and HadGEM3-GC2 of (a) moisture flux at 850 hPa with contours of moisture flux divergence and (b) divergent and rotational flow at 850 hPa with contours of zonal wind speed.
Citation: Bulletin of the American Meteorological Society 99, 2; 10.1175/BAMS-D-16-0090.1