The Weather and Research Forecast model is tested over South America in different configurations to identify the one that gives the best estimates of observed surface variables.
Systematic, nonsystematic, and total errors are computed for 48-h forecasts initialized with the NCEP Global Data Assimilation System (GDAS). There is no unique model design that best fits all variables over the whole domain, and nonsystematic errors for all configurations differ little from one another; such differences are in most cases smaller than the observed day-to-day variability. An ensemble mean consisting of runs with different parameterizations gives the best skill for the whole domain.
Surface variables are highly sensitive to the choice of land surface models. Surface temperature is well represented by the Noah land model, but dewpoint temperature is best estimated by the simplest land surface model considered here, which specifies soil moisture based on climatology. This underlines the need for better understanding of humid processes at the subgrid scale.
Surface wind errors decrease the intensity of the low-level jet, reducing expected heat and moisture advection over southeast South America (SESA), with negative precipitation errors over SESA and positive biases over the South Atlantic convergence zone (SACZ). This pattern of errors suggests feedbacks between wind errors, precipitation, and surface processes as follows: an increase of precipitation over the SACZ produces compensating descent in SESA, with more stable stratification, less rain, less soil moisture, and decreased rain. This is a clear example of how local errors are related to regional circulation, and suggests that improvement of model performance requires not only better parameterizations at the subgrid scales, but also improved regional models.
The Weather and Research Forecast model (WRF), offers different options to be chosen according to the region, scale, and application of interest. Previous studies have emphasized the impact that different parameterizations have on precipitation estimates, from short-range forecasts of interest in numerical weather prediction to climate simulations, and therefore commonly focus on convection parameterization and/or representation of model microphysics (e.g., Jankov et al. 2005; Ries and Schlünzen 2009; Seluchi and Chou 2000). Model sensitivity to representation of surface processes and its initialization has also been explored in recent studies. These include analysis of other variables besides rainfall (e.g., Cheng and Steenburgh 2005; Guevard et al. 2006; Aligo et al. 2007; Case et al. 2008). The general conclusion that may be extracted from these studies is that model optimum configuration and performance are highly dependent on the specific application, including geographical area and time of the year.
The traditional approach of selecting the combination of parameterization that gives the best representation of atmospheric processes has recently been replaced by techniques that favor the best representation of uncertainty. To this end, ensembles with members from different models or with different parameterizations (e.g., Krishnamurti et al. 1999) have been used since there appears to be no unique model that consistently gives best results, due not only to the chaotic nature of the atmosphere but also to limitations in model design or specification of initial conditions. Model prediction improvements will ultimately depend on identification of members with poor performance and errors in model design that lead to such lack of forecast skill. This strategy applied to short-term model forecasts is also useful for long-term forecasts within the premise of “seamless prediction” advanced by Shukla et al. 2009, and others.
This paper focuses on the sensitivity of short-term forecasts (6–48 h) over South America. The region exhibits strong gradients of surface conditions, and it extends from the tropics into midlatitudes, presenting marked challenges in the identification of best parameterizations over the whole domain. The WRF model is used and it is evaluated with surface observations that are independent of model integrations allowing us to study model representations of the diurnal cycle. The period chosen (December 2002–February 2003) provides a dense observation network over central South America obtained during the South America Low-Level Jet Experiment (SALLJEX; Vera et al. 2006). Different land surface models, as well as convective and boundary layer parameterizations available for the WRF model are tested. The idea is not only to identify those parameterizations that lead to better model performance, which is of interest for operational applications, but also to detect sources of forecast deficiencies.
The paper is organized as follows: section 2 describes model experiments, observations used for evaluation of model results, and diagnostic techniques; section 3 discusses both systematic and nonsystematic errors for surface variables; and section 4 further discusses the results and offers conclusions.
2. Model experiments and methodology
a. Model experiments
All runs are conducted with the Advanced Research WRF (ARW-WRF) version 2.0 (Skamarock et al. 2005). (This model is freely available online at http://www.mmm.ucar.edu/wrf/users/ and can be executed on a variety of platforms.) The selected version is used operationally at the Center for Atmospheric and Oceanic Research (CIMA) at the University of Buenos Aires (UBA; Saulo et al. 2008).
The WRF model offers different options for the representation of convective processes, turbulent transports, evolution of surface temperature and soil moisture, and soil–air interaction. Radiative processes and specification of microphysics were left unchanged for all runs in the current investigation.
The following parameterizations were used.
2) Planetary boundary layer
The Yonsei University (YSU), a nonlocal scheme that includes countergradient flux terms that enables realistic development of a well-mixed layer (Hong and Pan 1996).
The Mellor–Yamada–Janjic (MYJ) a local implementation of the Mellor–Yamada 2.5 scheme (Janjic 2002).
The Grell–Devenyi (GRELL) that includes convective effects from ensembles generated with different closure assumptions (Grell and Devenyi 2002).
The Kain–Fritsch (KF), based on a simplified cloud model that also includes shallow convection (Kain 2004).
The Betts–Miller–Janjic (BMJ) in which deep convection is similar to other adjustment schemes except that it uses a thermodynamic profile that results from mixing the convectively unstable layer. This scheme has been used extensively in weather forecasts at the National Centers for Environmental Prediction (NCEP) and has been improved over the years (Betts 1986; Janjic 1994).
4) Soil models
Noah, a four-layer model that forecasts soil moisture and temperature. It includes a time-varying green vegetation fraction, soil type and snow cover with up to two vertical layers (Chen and Dudhia, 2001).
The Rapid Update Cycle (RUC) land surface model: a six-level soil model to calculate soil fluxes on the basis of time-dependent solutions for temperature and moisture in soil. It includes the effect of evapotranspiration from vegetation and complex canopies (Smirnova et al. 1997).
The five-layer soil model (5L): A simplified soil model that predicts ground surface temperature at five levels. Soil moisture remains constant through the model integration and it is determined from tables with different values according to the season and soil type given at each grid point.
Eta–Ferrier (Ferrier et al. 2002): This scheme is formulated for grid scales that are not able to explicitly resolve clouds and it is computationally efficient. Total condensed water is advected between grid points with proportions specified for different species such as rain, hail, cloud water and ice, snow, and graupel.
The model domain is shown in Fig. 1 and it includes most of South America in a Lambert projection. It is run at 40-km horizontal resolution and 30 vertical levels. Two 48-h forecast cycles, initialized at 0000 and 1200 UTC with the NCEP Global Data Assimilation System (GDAS) operational analysis, are run from 15 December 2002 to 15 February 2003. This period coincides with the SALLJEX intensive observing period and includes special upper-level radiosonde observations up to 4 times a day and an enhanced rainfall network. Boundary conditions are updated every 6 h with values obtained from the National Oceanic and Atmospheric Administration (NOAA) Medium-Range Forecast (MRF) global forecasts run with an approximate horizontal grid of 2.5° and 28 vertical levels.
Numerical experiments were designed to test the sensitivity of results to different parameterization and soil conditions as summarized in Table 1. There is ample local experience with this model at CIMA/UBA since it is run operationally at this institution (Saulo et al. 2008).
Forecasts were evaluated 8 times a day with SALLJEX data, and available surface data from the Global Telecommunication System (GTS) network, which has approximately 310 surface stations over South America (see Fig. 1 for station location). Surface data were assigned to the closest model grid point. Data from grid points classified as over water, with insufficient observations and or with values exceeding 2.5 standard deviations from their mean were discarded. Twenty-four hour accumulated precipitation from surface stations was obtained from the GTS, the Brazilian Water Agency (kindly provided by D. Allured 2007, personal communication) and enhanced with those of the SALLJEX database (Penalba et al. 2004). These values were box averaged over the model grid to compare with model results.
b. Evaluation methods
Forecast skill of 2-m temperature T2m, dewpoint temperature Td, 10-m zonal U and meridional V wind components, and sea level pressure (SLP) was assessed through different indices. These indices were obtained for each surface station, time of day, and forecast time. For 0000 and 1200 UTC (corresponding to 2100 and 0900 local time, respectively) the forecast times are available at 0, 12, 24, 36, and 48 h while at 0000 and 0600 UTC these are at 6, 18, 30, and 42 h.
Measures of skill are
where p and o represent forecast and observed values at a given time and station and the sum is over the N days of the experiments. Bias indicates systematic errors, while total errors are given by the root-mean-squared error [RMSE; see (2a)] and nonsystematic errors are obtained for each grid point subtracting the systematic component or bias [RMSEdb (2b) below]:
The RMSE describes forecast errors in variable units, but it is not useful to indicate the forecast potential to estimate forecast variability. This is done with the mean-squared error skill score (MSESS), similar to that of Wilks 2006. The MSESS used here is defined in (3) and it compares the mean-squared error (MSE) of the forecast with that of the climatology (MSEref):
a. Spatial variability of systematic errors
Systematic errors (biases) are presented next for several variables, forecast time, and time of day averaged over all experiments (i.e., for the ensemble mean). Values at each grid point correspond to the bias average—weighted by number of observations at each station—within a 1.5° radius from that point. Figure 2 shows the spatial variability of T2m for two forecast times (first two columns) and four times of the day (rows). The figure shows dependence of biases on time of day and forecast length, as found in previous studies over other regions (e.g; Cheng and Steenburgh 2005). There is a dominant cold bias (blue shading) over most of the domain, including the northeast region, which reverts to a warm bias (red shading). This is the only systematic error that decreases with integration time. Cold biases over and near the Andes Mountains are likely due to differences between model topography and the actual height of surface stations. Over central-eastern Argentina there is a weak warm bias that is found at all times of day and integration times. The bias drift shows warm biases over the northern boundary and northwest Argentina, with differences between these values at 0600 and 1800 UTC and those at 0000 and 1200 UTC possibly due to shorter integration times (42 h) in the first case.
Figure 3 shows biases for other variables at 1800 UTC only, the time with largest systematic errors. Dewpoint depressions show a dry bias from the initial time (not shown), which increases with forecast time. Humid biases, over Patagonia, southern Brazil, and Paraguay, decrease as the forecast progresses, which are also explained by the overall dry drift shown in the third panel of the first row. Zonal wind component biases appear to drift toward higher westerly values over high topography and in regions of maximum westerly winds, spilling over the adjacent ocean. Zonal, meridional wind components, and SLP exhibit consistent biases with positive SLP in central-western Argentina and negative values to the east causing a cyclonic anomaly in this region. This is conducive to weakened northerlies over western and northern Argentina, clearly shown by the bias drift, and it is consistent with an underestimation of the North Western Argentina low (NAL) as documented by Saulo et al. (2010). It is also confirmed by the diurnal evolution of biases shown in Fig. 4, which are largest during warm hours. The cyclonic anomaly over southern Brazil may be related to an increase of latent heat due to excessive rainfall as will be shown later.
b. Diurnal cycle of systematic errors
Figure 4 shows the mean daily evolution as a function of integration time, for the run cycle that starts at 1200 UTC (corresponding to 0900 local time). Only results for this cycle are shown since there are no substantial differences between runs started at a different time of day. Values are shown for averages for the two boxes shown in Fig. 1, where regions 1 and 2 correspond to the southern and northern boxes, respectively. Different variables exhibit markedly different mean errors and ensemble spread for both regions. The T2m shows a damped diurnal cycle, partly due to lack of model skill in simulating minimum temperatures, though the cold drift partly corrects this error as the integrations march on. The spread is largest during warm hours and also shows a diurnal cycle. The Td exhibits large sensitivity to model configuration with the mean value close to observations. The figure also shows the dry bias for region 1, discussed in section 3a. Both wind components are weak, with small differences between ensemble members and no evidence that any one member gives better estimates than the mean. The model tends to reproduce a larger diurnal cycle for both components, with a tendency for an increase in the southerly component in region 1 during warm hours and an increase of the northerly component for region 2. The latter is most evident between 0000 and 1200 UTC. The model SLP does not capture the minimum at 2100 UTC in region 1, and it tends to lower pressure in region 2.
Next, we discuss the effect of different parameterizations in estimates of surface variables, as listed in the table, according to PBL (configurations 1 and 7 for YSU and MYJ, respectively), soil model (1, 8, and 9 for NOAH, 5L, and RUC, respectively), and convection treatment (7, 2, and 5 for KF, BMJ, and GRELL, respectively). Figure 5 shows results for region 1 only, since region 2 does not add additional information in this regard. T2m (top panels, Fig. 5) is more sensitive to surface processes and convective parameterization. YSU and MYJ damp the diurnal cycle, with warmer temperatures during cold hours as already shown in Fig. 4. The Noah soil model shows smallest biases and the 5L largest warm biases, while the RUC experiment exhibits a more erratic diurnal cycle. All tested convective parameterization produce similar results. Noteworthy is the tendency for the GRELL configuration to produce a cold bias that increases with forecast time.
Largest differences in Td are found with different soil models with the RUC model giving the best performance, though its skill degrades with forecast time. In contrast, the 5L exhibits a strong humid bias that influences the ensemble mean to be closer to observations, while all other configurations maintain dry biases that start about −1°C. Different parameterizations do not seem to explain the systematic errors discussed in Fig. 4 for wind components and SLP (not shown).
c. Diurnal cycle of nonsystematic errors
The MSESS is analyzed to determine the potential of different configurations to describe interdiurnal variability independent of systematic errors. The systematic error is removed according to time of day, and thus the MSESS does not represent errors due to inadequate representation of the diurnal cycle. This parameter was computed for all variables, but shown here are only those that are most pertinent for the ongoing discussion.
Figure 6 shows the MSESS compared with persistence, defined as the variable value at the same time of the previous day. All configurations show better skill than persistence in region 1 within 48 h of forecasts, while in region 2, persistence is comparable to the deterministic forecast, especially during cold hours. The latter is expected since interdiurnal variability is low in tropical regions and thus persistence may give good forecasts. The scores are lower for Td, with larger sensitivity to specific configurations as previously shown. Therefore, it is possible that particular ensemble members perform worse than persistence. For region 2, the score is mostly negative and persistence is indistinguishable from ensemble members.
The 5L model (not shown) rates worse to estimate the interdiurnal variability of temperature, though systematic errors are not large. Nevertheless, this is the configuration that best explains the interdiurnal Td variability in spite of its simplicity and the use of a constant soil moisture given by climatology as a function of soil use. This presents a dilemma in configuration selection due to the opposite effect that this configuration has in T2m and Td.
Ensemble means give the best skill scores from the analysis of both systematic and nonsystematic errors, indicating the advantage of using multimodel ensembles to reduce both types of errors. Scores for wind components for region 1 are shown in Fig. 7, as an example of variables with a large proportion of negative values, due in part to a marked climatologic diurnal cycle with minimum values at night (not shown). Similar indices for SLP are not shown, but are consistent with those of wind components.
d. Total error
A best configuration of parameterizations over South America might not be attainable as shown in sections 3a–c, which indicates that the best estimates for different variables and regions are found for different model configurations. Next, the focus is in the RMSE that includes both systematic and nonsystematic errors. This measure of total error might be more relevant to evaluate model skill and its ability to simulate atmospheric physics. Furthermore, the chaotic nature of the atmosphere suggests that analyses of only one type of error (such as biases) are not sufficient to rate model forecasts, since errors in one variable may propagate to others and quickly degrade forecasts.
Values of RMSE are computed for each station and averaged for different forecast times. The best configuration would be identified as the one that gives the lowest RMSE for all the surface variables. Unfortunately, the RMSE analysis did not produce such results. Figure 8 summarizes results for the two boxes previously identified, and compares these for reference with values given by climatology, calculated as the root-mean-squared difference between each observation and the corresponding mean for the period under analysis.
Configuration 2 (MYJ, Noah, and BMJ) gives the best results for temperature for both regions. Though this is not the case for Td, this configuration gives acceptable results for this variable too. Nevertheless, estimates of wind and SLP are not adequately represented by MYJ, which shows largest errors for these variables (Fig. 8), and configurations that combine YSU with Noah give the best wind estimates. Combination of Noah, YSU, and BMJ is the best or close to it in three of the cases. A model design that includes YSU, Noah, and KF, used in CIMA/UBA forecasts, also gives good estimates of low-level winds over South America. The latter configuration gave good estimates of T2m in both regions and Td in the northern region, while in the southern region this combination exhibits substantial errors in its tendency to generate dryer-than-observed conditions.
It was found that for all cases considered the ensemble mean performed as well as the best member of the ensemble. Ensemble mean values do not have the limitation of particular ensemble members, with varying skills for different variable and regions.
It should be noted that in many of the experiments (particularly over region 2) errors are larger or similar to the daily variability (shown in the last column of each panel in Fig. 8), which limits the usefulness of forecasts. Differences between ensemble members are not significant and are smaller than the variability of climatology.
4. Discussion and conclusions
The performance of several configuration of the ARW-WRF model was evaluated for surface variables during a summer season over South America. The main goal was to select the best design for operational weather prediction and climate downscaling over the region. None of the tested configurations was a clear choice when several variables are simultaneously considered. This is partly due to the fact that different variables are distinctly influenced by different physical processes, and such processes are dependent on location. Much yet remains to be known about the nature of subscale processes and their parameterization. Surface variables show largest sensitivity to land surface models, as determined by systematic and nonsystematic errors and this is more pronounced for temperature and dewpoint than for wind components. Overall, the Noah model gives the best estimates, though the fact that the Td is better represented by the simplest land surface model (5L) with a constant soil moisture given by climatology indicates problems with the initialization of soil moisture and its evolution in land surface models. This critical issue of initialization of surface variables has been pointed out in previous studies (e.g., Cheng and Steenburgh 2005; Case et al. 2008). The Global Land Data Assimilation System (GLDAS; Rodell et al. 2004) provides us with analyses of soil moisture and it is a step forward on the solution of this problem.
It is also shown that all surface variables exhibit initial errors over the selected domain, with cold and dry biases both for the 0000 and 1200 UTC cycles. None of the tested configurations decreases the biases from the GDAS analyses used as initial conditions.
The diurnal cycle is also damped due to higher minimum temperatures (region1) and lower maximum temperatures (region 2) than observed, which depends not only on the configuration selected but also on initial surface conditions. The mean vertical profiles at two radiosonde stations (SGO, Santiago del Estero and SIS, Resistencia, see Fig. 1) were analyzed to identify the effect that PBL modeling has upon surface temperatures. Soundings were available 4 times a day at these locations during SALLJEX. Figure 9 shows the 30-h forecasts, initialized at 1200 UTC for configurations 1 (YSU) and 7 (MYJ) that only differ on boundary layer treatment, compared with the observed sounding at 1800 UTC. The figure shows that both parameterizations give colder and moister conditions at the surface, with largest differences for SGO, but temperature, humidity, and depth of the boundary layer are best represented by YSU. Underestimation of PBL height by MYJ is shown to extend over most of the region as seen in Fig. 10. The approximate depth of the PBL is obtained as the level for which the vertical gradient of potential temperature exceeds 2°C km−1.
Meridional wind biases are the second largest systematic errors, dynamically consistent with zonal wind and SLP biases. The V component biases decrease the intensity of the low-level jet, which carries heat and moisture over southeast South America in the summer (Vera et al. 2006, and references listed there) fueling convection in this region (Liebmann et al. 2004; Salio et al. 2002).
Meridional wind biases are related to systematic precipitation errors next. Figure 11 shows the mean ensemble precipitation bias at two forecast times. Precipitation is overestimated over central and southeast Brazil and underestimated over central and northern Argentina. This increases with integration time. The pattern is similar to that of the active phase of the South Atlantic convergence zone (SACZ) described by Nogués-Paegle and Mo (1997) among others. This similarity suggests that forecasts tend to the active phase of the SACZ independent of initial conditions, reducing moisture flux over central and northern Argentina and weakening rainfall over southeast South America (SESA). A potential feedback is the development of a more stable profile over SESA due to the compensating sinking motion outside the SACZ and soil moisture in the region (less rain→less soil moisture→less rain).
Rainfall estimates vary slightly among the different configurations but in all cases biases increase with integration time. This agrees with previous studies (Blázquez and Nuñez 2009; Pessacg 2008) and suggests that there is a common deficiency in the convective schemes used in this and other investigations. Precipitation biases need to be reduced, particularly during summer, if advances are to be made in short- and long-term prediction. This article offers an additional interpretation for the cause of such biases, based on lack of skill in the representation of regional circulations with decreased northerly winds and the concomitant reduction of heat and moisture advection.
We conclude that the ensemble mean gives the best simultaneous estimates for all variables, except for precipitation, which requires a more careful treatment (e.g., Ruiz et al. 2009). The ensemble mean reduces systematic and nonsystematic errors in the diurnal cycle. There is another option to calculate the ensemble mean over the region of interest, which can enlarge the value of related forecasts. This is based on weighing the mean with weights that depend on each model skill as proposed by Silva Dias et al. (2006) and Raftery et al. (2005). A related question is whether an ensemble generated by different configurations produces less ensemble spread than that of an initial condition perturbation ensemble. In either case, such statistical corrections are limited to locations with reliable observations over long periods.
It was found that regionally, the WRF model has better skill over region 1, characterized by more variable day-to-day changes than over region 2, a tropical area with little variability. The present study was carried out for a single warm season. The generality of conclusions for other seasons and years remains to be determined. Nevertheless, there is a more extensive verification experience with configuration 1, which corresponds to the WRF version used in operational weather prediction at CIMA/UBA. Such verification indicates forecast results similar to those described here, such as damping of the diurnal cycle and negative precipitation biases over northern and eastern Argentina.
This investigation was supported by Projects NA06AR4310048 from NOAA/OGP/CPPA, ANPCyT PICT 2004 25269, UBACyT X204, and CONICET PIP 112-200801-00399. The research leading to these results has received partial funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant 212492 [from the Europe–South America Network for Climate Change Assessment and Impact Studies in La Plata Basin (CLARIS LPB).]
Corresponding author address: Celeste Saulo, Dpto. de Cs. de la Atmósfera y los Océanos, FCEyN Centro de Investigaciones del Mar y la Atmósfera (CIMA), CONICET/UBA Intendente Guiraldes 2160, Ciudad Universitaria Pabellón II, 2do. Piso (C1428EHA), Ciudad Autónoma de Buenos Aires, Buenos Aires, Argentina. Email: email@example.com