## 1. Introduction

With the increasing computer power and the development of higher-resolution models including more detailed physical processes, several numerical weather prediction (NWP) centers routinely perform forecasts at the mesoscale. Nevertheless, the quality of these forecasts are always dependent on some factors: 1) initial state uncertainties due to observational errors and analysis methods producing the initial conditions (ICs) at these scales (e.g., Miguez-Macho and Paegle 2001; Mullen and Baumhefner 1989); 2) model errors, that is, the difference between the approximations of the numerical model and the real atmospheric processes (e.g., Vannitsem and Toth 2002); and 3) for limited-area models, errors caused by the treatment of lateral boundary conditions (LBCs) (e.g., Nutter et al. 2004). The predictability of most mesoscale phenomena being relatively low, these errors can potentially generate large impacts even on short-range forecasts. Mesoscale ensemble predictions partly address these issues by introducing the notion of probability to the forecasts.

Studies on mesoscale ensemble weather forecasting using limited-area models (LAMs) have been based on various strategies. The Short-Range Ensemble Forecasting (SREF) system has been developed (Hamill and Colucci 1997; Stensrud et al. 1999; Du and Tracton 2001) to generate forecasts up to a lead time of 3 days with the Eta Model and the Regional Spectral Model (RSM), using ICs produced by the breeding method and different in-house analyses, and using LBCs provided by the National Centers for Environmental Prediction (NCEP) global ensemble prediction system (EPS) based on the breeding vector approach. Since 2003, SREF has been updated and uses three different models (Du et al. 2003).

Some limited-area EPSs based on global singular vector (SV) perturbations for ICs and LBCs obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) EPS have been developed (Marsigli et al. 2001; Frogner and Iversen 2002; Montani et al. 2003; Chessa et al. 2004). Lately, short-range ensemble forecasting based on the ensemble transform Kalman filter (ETKF; see Bishop et al. 2001) method has been implemented at the Met Office (UKMO). Other initial condition perturbation methods were used, such as employing analyses from different national centers (Grimit and Mass 2002) and random initial condition perturbations (Du et al. 1997; Stensrud et al. 2000). The multiparameterization approach (Du et al. 1997; Stensrud et al. 2000; Bright and Mullen 2002), the multimodel method (Hou et al. 2001; Wandishin et al. 2001; Grimit and Mass 2002; Stensrud and Yussouf 2003; Du et al. 2003), and stochastic parameter perturbations (Bright and Mullen 2002) were used to represent the model uncertainties. These studies suggest that mesoscale EPSs can provide more valuable information for short-range forecasts compared to global EPSs and deterministic forecasts, especially for high-impact weather events such as heavy rainfall rates.

Singular vectors are based on linear theory (Lorenz 1965, but see Barkmeijer 1996 for a nonlinear approach) and are defined as the orthogonal set of perturbations that provide the maximum linear growth between two times with respect to specified norms. Until recently, most of the works using SVs as initial perturbations in EPSs were still based on a linearized version of the model without representation of subgrid-scale moist processes. Some studies showed that the inclusion of moist processes in SV calculations can increase the growth of perturbations (Ehrendorfer et al. 1999; Zadra et al. 2004; Coutinho et al. 2004; Hoskins and Coutinho 2005) and affect baroclinically unstable modes (Coutinho et al. 2004), and can better represent the shifting of energy of the SVs toward smaller scales (Zadra et al. 2004; Coutinho et al. 2004). The usefulness of moist SVs for short-range EPSs was also expected by Hoskins and Coutinho (2005) through the investigation of the predictability of several poorly forecasted European cyclones.

Singular vector calculation can be localized on a specific area using a local projection operator (Buizza 1994). These are also referred to as targeted SVs. Compared with ensemble predictions based on SVs calculated over the Northern Hemisphere, ensemble predictions based on targeted SVs generate larger ensemble spread and better verification scores over the targeted region (Hersbach et al. 2000; Frogner and Iversen 2001). Also, a limited-area EPS based on dry targeted SVs was developed with promising results on precipitation forecasts (Frogner and Iversen 2002).

Therefore, it is of interest to investigate the application of moist targeted SVs on regional ensemble forecasts. Zadra et al. (2004) examined the influence on SVs of different components of simplified physics using the Global Environmental Multiscale (GEM) model (Côté et al. 1998a, b; Yeh et al. 2002). This simplified physics includes stratiform and convective (Kuo type) precipitation processes, which are used to calculate the so-called moist SVs. The GEM model is the operational model at the Canadian Meteorological Centre for global and regional (variable-resolution grid) forecasts. A limited-area version of the GEM model (GEM-LAM) exists and shares the same dynamical core as the global and regional GEM model. Two of the goals of the present study are to validate a regional ensemble prediction system (REPS) using GEM-LAM with targeted SVs as initial and boundary perturbations, and also to investigate the impact of moist SVs on precipitation forecasts.

The model error introduced by neglecting subgrid-scale variability is also considered in this REPS. Parameterization schemes in numerical models are generally confined to simulating the mean effect of subgrid-scale processes on the resolved scales. Ignoring their stochastic effects potentially leads to prediction errors. In this paper, we consider the stochastic perturbation of physical parameters related to convection and condensation using a method inspired from Lin and Neelin (2000) and based on first-order Markov processes.

The paper is organized as follows. The model configurations are described in section 2. The initial and boundary condition perturbation method is explained in section 3. Section 4 presents the perturbation method for relatively unconstrained parameters of subgrid-scale processes. The description of experiments and verification data are addressed in section 5. Sections 6 and 7 describe results, and conclusions are drawn in section 8.

## 2. Model description

This REPS is based on the semi-Lagrangian, semi-implicit GEM model and its limited-area version. GEM is a nonhydrostatic gridpoint model and is described in Côté et al. (1998a, b) and Yeh et al. (2002). It includes a terrain-following vertical coordinate based on hydrostatic pressure (Laprise 1992) and the option of using a variable resolution discretization on an Arakawa C grid. GEM offers the possibility of producing global as well as limited-area integrations with the same dynamical framework. For GEM-LAM integrations, boundary conditions are necessary and will be provided by global GEM simulations.

The global GEM integrations are here performed with horizontal resolution of 0.9° on a uniform latitude–longitude grid and with 28 vertical levels from the surface to 10 hPa. The initial analyses are from the Canadian Meteorological Centre (CMC) three-dimensional variational data assimilation (3DVAR) system (Gauthier et al. 1999) at the same horizontal resolution as the global integrations and with 16 pressure levels. The time step of the global model is 15 min, and 48-h forecasts are performed. The GEM-LAM integrations are conducted with a horizontal grid of 194 × 176 points with a resolution of 0.14° (approximately 15 km) on a limited-area grid. The limited-area grid is formed by a subdomain of a uniform grid that is rotated in such a way that the grid equator passes through the limited-area domain. The LAM uses the same 28 vertical levels as the global GEM integrations. The time step of the GEM-LAM integrations is 4 min, and 48-h forecasts are performed. The simulation domain of the GEM-LAM integrations covers eastern Canada and the northeastern United States as shown in Fig. 1.

The physical parameterizations used in the global GEM model are as follows: a radiation scheme including infrared and solar components fully interactive with clouds (Garand 1983; Garand and Mailhot 1990; Yu et al. 1997); a turbulent kinetic energy (TKE)-based boundary layer vertical diffusion scheme (Mailhot and Benoît 1982; Benoît et al. 1989); a force–restore surface scheme (Deardorff 1978); a gravity wave drag parameterization scheme (McFarlane 1987); a low-level blocking parameterization (Zadra et al. 2003); a Kuo-type deep convective scheme (Kuo 1965, 1974); and a large-scale condensation scheme including prognostic variables for cloud water/ice content (Sundqvist et al. 1989; Pudykiewicz et al. 1992). GEM-LAM uses instead a modified TKE-based vertical diffusion scheme for partly cloudy boundary layers, and the Kain–Fritsch deep convection scheme (Kain and Fritsch 1990). The radiation, gravity wave drag, and low-level blocking schemes used in GEM-LAM are the same as in the global GEM model. For this study, the Interactions between Soil, Biosphere, and Atmosphere (ISBA) scheme (Noilhan and Planton 1989; Bélair et al. 2003) and the force–restore surface scheme have been employed in different experiments in GEM-LAM.

## 3. Initial and boundary conditions of the REPS

The perturbations of the initial conditions and lateral boundaries of this REPS are based on the targeted SV method. It consists of finding an orthogonal set of vectors with maximum linear growth during a finite time interval with respect to a specified norm (Lorenz 1965; Buizza and Palmer 1995). The choice of norm is somewhat arbitrary, but for perturbing initial conditions of an ensemble forecast system, a norm based on an analysis-error covariance matrix is certainly appealing (Barkmeijer et al. 1999). However, Buehner and Zadra (2006) found that, in practice, this more theoretically sound norm does not seem to provide any clear advantage over the so-called total energy norm (see below). In particular, they found that the shape of the evolved SVs is almost independent of the choice of norm. For producing initial perturbations in ensemble forecasts based on SVs, the total energy norm appears to be an acceptable choice (Molteni and Palmer 1993; Molteni et al. 1996; Palmer et al. 1998).

The SV solver for the global GEM model includes the following four linearized and simplified parameterizations: vertical diffusion in the boundary layer, subgrid-scale orographic drag including gravity wave drag and low-level blocking, condensation, and deep convection. The last two represent so-called moist processes, and the calculation of SVs using these are referred to as moist SVs. As discussed earlier, the moist SVs will be used as initial perturbations to investigate their impacts on regional-scale ensemble forecasts, especially for quantitative precipitation forecasts (QPFs).

As mentioned before, the targeted SV perturbation method is applied in this paper. The target area is chosen to be the northeastern North American region, 266°–315° longitude and 30°–62° latitude (referred to as

**X**

*is the solution of a generalized eigenvalue problem:*

_{i}^{T}are the tangent-linear and adjoint models, respectively; 𝗪

_{S}and 𝗪

_{S′}are the initial and final weight matrices, respectively, that define the norm; and

*γ*

^{2}

_{i}is the square of the linear growth rate of the SV

**X**

*. The norm 𝗪 is the total energy norm and is defined by*

_{i}*α*can be either

*η*is the vertical coordinate,

**u**is the horizontal wind vector,

*T*is the temperature, and

*π*is the surface pressure. Here

*c*= 1004 J K

_{p}^{−1}kg

^{−1},

*R*= 287 J K

^{−1}kg

^{−1},

*T*= 300 K, and

_{r}*p*= 1000 hPa. This norm is used for moist and dry SVs.

_{r}The first eight normalized SVs with the largest growth rates are interpolated at the same resolution as the global analyses (0.9°), and multiplied by a factor of 0.2. This factor has been chosen empirically to obtain reasonable ensemble spread compared with the ensemble mean error. The interpolated and rescaled SVs are added to, as well as subtracted from, a global analysis constructing 16 perturbed initial conditions. Starting from these 16 ICs, a global GEM ensemble with 16 members provides 16 lateral boundary and initial conditions for 16 GEM-LAM integrations.

The effects of LBC perturbations on mesoscale EPSs have been studied for different configurations (Hou et al. 2001; Frogner and Iversen 2002; Nutter et al. 2004). With a highly simplified model, Nutter et al. (2004) investigated the impact of LBC updating interval on error growth of a limited-area EPS. In the present study, the impact of the LBC updating frequency on the performance of the REPS for realistic simulations is investigated. Six- and three-hour intervals are tested.

## 4. Parameter perturbations with first-order Markov chains

It has been demonstrated that the quality of an EPS is generally improved when some stochasticity is introduced in the treatment of parameterized subgrid-scale phenomena (see Buizza et al. 1999b; Bright and Mullen 2002; Wilks 2005). Although inherent stochastic subgrid-scale physical parameterizations would seem more natural to use in an EPS, the vast majority of parameterizations that have been developed for numerical weather prediction employ a deterministic approach in which aspects of stochasticity are averaged out or simply omitted. To “randomize” deterministic parameterizations, modelers often vary stochastically parameters that are relatively unconstrained (Lin and Neelin 2000; Yang and Arritt 2002; Bright and Mullen 2002). This is the approach followed in the present experiments.

*f*(

*λ*,

*ϕ*,

*η*,

*t*), correlated in space and time, with a probability density function (PDF) symmetric around the mean, can be defined as

*λ*,

*ϕ*,

*η*, and

*t*are longitude, latitude, a vertical coordinate, and time, respectively. The

*Y*s are spherical harmonics, with

_{lm}*l*being the total horizontal wavenumber, and

*m*the zonal wavenumber;

*k*is the vertical wavenumber; and

*L*and

*K*are the horizontal and vertical truncations of the random function, respectively. Also, their inverse can be interpreted in terms of spatial decorrelation length scales. Here,

*τ*is the decorrelation time scale of the spectral coefficients, and Δ

*t*is the time step of the numerical model. The possibly complex

*R*s are uncorrelated random processes with mean zero and variance

_{lmk}*R*|

_{lmk}^{2}

*R*s are Gaussian processes. The denominator in the square root of Eq. (4.2) is equal to the number of degrees of freedom in the triple sum in Eq. (4.1). Note that

_{lmk}*a*

_{00}

*(*

_{k}*t*) ≡ 0 for all values of

*k*, implying that the global mean of the random function

*f*(

*λ*,

*ϕ*,

*η*,

*t*) is

*μ*. The constant

*σ*is the specified global standard deviation of

*f*. For real random fields (as will be the case here), the condition

*a*

_{lmk}(

*t*) = (−1)

^{m}a_{l−m−k}*(

*t*) must apply.

The upper panel of Fig. 2 depicts the power spectrum of *a _{lmk}* as a function of frequency. For time scales much larger than

*τ*,

*a*is white noise. For time scales smaller than

_{lmk}*τ*, the spectrum has an approximate −2 slope.

*f*(

*λ*,

*ϕ*,

*η*,

*t*) is bounded, say, between

*f*

_{min}and

*f*

_{max}. Moreover, the PDF of

*f*is very close to a Gaussian distribution. A modeler might need a different PDF. These potential drawbacks can be fixed in the following way: First,

*σ*is chosen to be much smaller than

*f*

_{max}−

*f*

_{min}, and the choice

*μ*= (

*f*

_{min}+

*f*

_{max})/2 is made. Second, for the (supposedly rare) cases for which

*f*(

*λ*,

*ϕ*,

*η*,

*t*) >

*f*

_{max}or

*f*(

*λ*,

*ϕ*,

*η*,

*t*) <

*f*

_{min}, one redefines

*f*(

*λ*,

*ϕ*,

*η*,

*t*) according to

*f*(

*λ*,

*ϕ*,

*η*,

*t*) is stretched to

*F*(

*λ*,

*ϕ*,

*η*,

*t*) using

*S*(

*f*;

*μ*) is chosen according to the modeler’s needs. In the present experiments,

*β*. By studying the behavior of

*F*in the neighborhood of

*f*

_{max}(or

*f*

_{min}), one can obtain a constraint on the constant

*β*. At

*f*=

*f*

_{max}−

*δf*(with

*δf*positive), the stretched function

*F*is given by

*F*to lie between

*f*

_{min}and

*f*

_{max}, the condition

*δF*< 0 must apply. This means that

*β*< −1.256 431 (approximately).

The lower panel of Fig. 2 shows the impact of the stretching on the PDF of *f*, that is, a broadening of the PDF due to stretching.

*F*that obeys a uniform distribution law, it can be shown that the stretching must be performed using the error function, provided that

*f*is a Gaussian random field with mean

*μ*and variance

*σ*

^{2}:

Since our concern is mainly on QPF, some uncertain key parameters influencing precipitation processes in GEM-LAM are perturbed. The trigger function in the Kain–Fritsch convective scheme is considered a limiting factor for mesoscale QPF (Kain and Fritsch 1992). The threshold vertical velocity involved in the definition of the trigger function is highly sensitive and always fine-tuned for deterministic forecasts (Kain 2004) and can be stochastically perturbed for ensemble forecasts (Bright and Mullen 2002). The condensation scheme used here is a predictive cloud water scheme developed by Sundqvist et al. (1989) and Pudykiewicz et al. (1992), referred to as the Sundqvist scheme. This scheme is based on several assumptions. An important one is the introduction of a relative humidity threshold to parameterize the subgrid cloud cover. This can affect the condensation processes and the production of stratiform precipitation. Sundqvist et al. (1989) pointed out that the determination of cloud cover is one of the most difficult questions in this predictive cloud water scheme. The threshold humidity used to determine cloud cover is assumed to be constant for deterministic forecasts. Based on these considerations, we chose to stochastically perturb two parameters: the threshold vertical velocity in the trigger function of the Kain–Fritsch scheme and the threshold relative humidity in the Sundqvist scheme.

The stochastically perturbed functions for threshold vertical velocity and threshold relative humidity are referred to as *F _{T}*(

*λ*,

*ϕ*,

*t*) and

*F*(

_{H}*λ*,

*ϕ*,

*t*), respectively. Their corresponding parameter settings are given in Table 1. The range of the stochastic perturbations of the two parameters are chosen within realistic empirically tuned values. Each ensemble forecast realization utilizes different random seeds to initiate the Markov process.

## 5. Validation tools and experimental design

### a. Verification data and verification measures

The regional analyses at a horizontal resolution of approximately 24 km from the 3DVAR regional data assimilation system at CMC (Laroche et al. 1999) are interpolated to the GEM-LAM grid and used to verify dynamical variables. The verification of precipitation forecasts are performed at station points. The observed rain gauge data are the ones used by the Canadian Precipitation Analysis (CaPA) project (Mahfouf et al. 2007) and partly from the Standard Hydrometeorological Exchange Format (SHEF) database over the northeastern U.S. area. The original data for producing CaPA are 6-h accumulations at approximately 380 stations 4 times daily over the Québec, Canada, region. Adding stations coming from the SHEF network for the specific area used in this study, we consider 775 stations for the precipitation verification. Figure 1 depicts the location of all the stations and shows that the precipitation observation network is much denser over the northeastern U.S. region. Mean precipitation scores are thus more representative of the denser area. Prior to verification, all precipitation forecasts are interpolated onto the station points using bilinear interpolation.

A set of standard verification measures is employed to assess the results of different ensemble experiments. They are ensemble spread/ensemble mean error, Talagrand diagrams (sometimes called rank histograms; Anderson 1996; Talagrand et al. 1999), Brier skill scores (BSS) and their decomposition (Brier 1950; Murphy 1973; Jolliffe and Stephenson 2003), relative operating characteristics (ROC; Mason 1982; Jolliffe and Stephenson 2003), and potential economic value (Richardson 2000). The detailed definition of these measures is not given here, and their information content will be introduced with the corresponding verification results.

### b. Synoptic situations during the studied periods

Tests were performed for 16 summer days with relatively high precipitation rates, eight cases in July (21–28 July 2003) and eight cases in August (4–11 August 2003).

During 21–28 July, the main weather systems causing the precipitation over northeast North America can be generally attributed to strong low pressure systems with obvious baroclinicity. A strong low pressure system persisted over the eastern part of the North American continent, although this low pressure system experienced different evolution phases during this period. On most days of this period, large-scale precipitation with middle to heavy amounts is found over the simulation domain. Because of the strong baroclinicity of the influencing systems, stratiform precipitation dominates despite the fact that there were some local convective precipitation events.

During 4–11 August, the dominant synoptic systems producing precipitation over our simulation domain were structurally different from those of the July cases, favoring weak synoptic-scale forcing that generates convective activity. From 4 to 7 August, the dominant weather systems over North America were two ridge–trough systems. One was located north and one south of 50°N. During the following days, the northern ridge gradually weakened and evolved into a flat westerly flow. By and large, the simulation domain was controlled by weak synoptic forcing (weak pressure gradient between the northern ridge and the southern trough, or weak southern trough) during the period of August. Localized and most likely convective precipitation with middle to heavy amounts was observed at many stations within the simulation domain during this period.

### c. Description of experiments

Two different sets of initial SV perturbations are considered for the global GEM ensemble piloting the LAMs. The first set is constructed from dry SVs. They are calculated with simplified and linearized dry processes: vertical diffusion and orographic drag schemes. Experiments with dry SVs will have names starting with DSV. The second set is constructed from moist SVs that include vertical diffusion and orographic drag, but also simplified and linearized stratiform condensation and deep convection in the tangent-linear and adjoint models of GEM. Experiments with moist SVs will have names starting with MSV.

As discussed before, parameter perturbations are also considered in this paper. In some experiments, the threshold vertical velocity in the Kain–Fritsch scheme and the threshold relative humidity in the Sundqvist condensation scheme for the LAM runs were perturbed by first-order Markov processes.

The impact of two different surface schemes in the GEM-LAM runs was also tested: the force–restore (Deardorff 1978) and the ISBA schemes (Noilhan and Planton 1989; Bélair et al. 2003) were tested separately.

Moreover, two different updating frequencies of LBCs by the piloting global runs for GEM-LAM were tested: 6 and 3 h, referred to as P6 and P3, respectively, in the remaining of the text.

Table 2 describes the combination of five different GEM-LAM ensemble configurations that have been designed. The relative impact of using dry versus moist SVs on ensemble forecasts is seen by comparing DSVF-P6 and MSVF-P6. The comparison of MSVF-P6 and MSVF-P3 provides the sensitivity of the REPS to different updating frequencies of lateral boundary conditions. The effect of replacing force–restore by ISBA is seen comparing MSVF-P3 and MSVI-P3. Finally, MSVI+TH-P3 investigates the impact of stochastically perturbing two model parameters: the threshold vertical velocity and the threshold relative humidity.

## 6. Verification of dynamical fields

### a. Spread and error

The ensemble standard deviation (spread) averaged over the limited-area inner domain excluding 35 points at each boundary (159 × 141 grid points) and over the 16 cases for all experiments is shown in Fig. 3. The statistics are accumulated over *M* = 358 704 realizations of the EPSs and then can be considered stable. The 500-hPa geopotential height (GZ500), 850-hPa temperature (TT850), and 850-hPa wind speed (WS850, without consideration of wind direction) spread are depicted. The difference in spread among the five experiments is in general small. For temperature and wind speed, the larger spread in MSVF-P6 compared with DSVF-P6 can be noticed before 24- or 36-h lead times. Beyond that, the spread of DSVF-P6 tends to be larger than in MSVF-P6. The faster growth in spread in DSVF-P6 after 24 h is clearer for the geopotential height.

One notable feature is that the inclusion of stochastic parameter perturbations in MSVI+TH-P3 produced the largest spread for temperature and wind speed during the whole integration period. Also, the use of ISBA seems to help the system increasing its spread (cf. MSVF-P3 and MSVI-P3). In addition, the use of a higher LBC updating frequency causes larger spread (MSVF-P3 versus MSVF-P6). This is more obvious for temperature and wind speed than for the geopotential height.

To evaluate the statistical significance of these differences, the Wilcoxon signed rank test (Wilcoxon 1945) has been applied. Taking into account the possibility that the data over the limited area can be correlated in space, one can consider the domain mean of the spreads at each day to get 16 realizations. Only a few differences are 5%–95% significant: (i) at 12- and 24-h lead times for TT850 and WS850, the dry SV experiment is significantly different from the four moist SV experiments; (ii) at 36- and 48-h lead times for GZ500, the dry SV experiment is significantly different from the first moist SV experiment (MSVF-P6).

The ensemble mean error [root-mean-square (rms) distance between the corresponding ensemble means and analyses] for the above-mentioned variables are calculated. The results show that the ensemble mean errors of the five experiments are quite similar. So only ensemble mean errors in MSVI+TH-P3 are shown by the solid lines with circles in Fig. 3. The averaged spread of the REPSs slightly overestimates the GZ500 rms error, while it underestimates the TT850 rms error, and seems to be comparable to the WS850 rms error. This provides a first estimation of a *spread–skill* relationship, but does not give complete probabilistic information, especially in terms of bias and dispersion. More probabilistic considerations about it is given in the next subsection.

### b. Talagrand diagrams

Talagrand diagrams (also called rank histograms) are used to evaluate the statistical indistinguishability of the observed values from the predicted ensemble members, that is, to evaluate the ability of an EPS to capture the observed data within the ensemble realizations. It provides information on forecast dispersion and bias, meaning that it is intrinsically different from the direct comparison between spread and root-mean-square error (section 6a). Figure 4 shows the Talagrand diagrams for GZ500, TT850, and WS850 at forecast lead time of 24 h for all experiments and for 16 cases. Moist SVs, as well as the other incremental additions to the REPS, do little to improve the Talagrand diagrams, even though they were seen to increase the spread for TT850 and WS850. The relative flatness of the curves observed for geopotential height for all experiments shows that this REPS can reasonably capture the corresponding observed field. On the other hand, the U shape observed for the two other variables is a characteristic of the underdispersion of the systems. Also, a large cold bias is observed in the diagrams of the 850-hPa temperature, while the systems produce slightly too fast winds at 850 hPa.

### c. Probabilistic forecast skill

The probabilistic skill of GZ500, TT850, and WS850 is evaluated through binary events. The GZ500 and TT850 anomalies greater than 10 m and 1°C, respectively, are calculated. The anomalies are obtained by calculating the difference between the instantaneous field and its related analysis mean, at each grid point. The analysis mean is the average, calculated at each grid point, of the verifying analyses over the 16 days of the test period. The binary event “WS850 greater than 14 m s^{−1}” is used for wind speed verification. We recall that the diagnostics are computed over *M* = 358 704 realizations of the systems and then we consider that the statistics are stable.

Figure 5 shows the BSS of GZ500, TT850, and WS850 for all experiments and for 16 cases. The BSS is positively oriented; that is, it increases numerically with increasing performance of the system. It is equal to 1 for a perfect deterministic system and it is null for a climatological system, that is, a system that always predicts the climatological probability of occurrence of the event under consideration. The climatological probabilities associated with variable GZ500, TT850, and WS850 are *p _{c}* = 46%,

*p*= 30%, and

_{c}*p*= 16%, respectively. Negative values of BSS indicate a poorer than climatological performance. In general, it can be noticed that the successive changes in the REPS improves the performances on the upper-air variables. In particular, the introduction of the surface scheme ISBA has a positive impact on the geopotential at 500 hPa (upper panel) after 24-h forecast range. For the variables at 850 hPa, it appears that stochastic perturbations of physical parameters (MSVI+TH-P3) allow slight improvements over all other experiments, except for the wind speed at the 48-h lead time (lower panel). On the contrary, the introduction of the moist SVs or the higher updating frequency by the piloting global model has no clear impact on the BSS for these dynamical variables.

_{c}Figure 6 is analogous to Fig. 5, but depicts results related to the area under the relative operating characteristics curves (AROC). The AROC is a measure of the discrimination; that is, it measures the ability of a system to separate the “good” and “wrong” forecasts. It was calculated from the ROC scores using the trapezium rule and 17 probability bins to define hit rates and false alarm rates. Generally, BSS—more precisely its resolution component—and AROC provide quite similar diagnostics, but they intrinsically measure two different properties of an EPS. Actually, the resolution measures distances between conditional probabilities (a posteriori observed frequencies) depending on the predicted objects (here, a priori predicted probabilities) while the discrimination measures distances between conditional probabilities depending on the observed (or verified) objects. The AROC is equal to 1 for a perfect deterministic system and 1/2 for a climatological system. It is generally suggested that an AROC greater than 0.7 characterizes a skillful system (Buizza et al. 1999a). Generally, we observe the same tendencies already seen from the BSS: the introduction of ISBA and the perturbations of the physical processes tend to improve the probabilistic skill of the system for the dynamical variables. But contrary to the BSS, we note here that the higher LBC updating frequency has a positive impact on the wind speed discrimination.

In general, it appears that, out of the five tested experiments, MSVI+TH-P3 exhibits the best probabilistic forecast skill. Perturbations to deep convection and condensation schemes provide the main part of the improvement on low-level temperature. Also note that higher LBC updating frequency and the surface scheme ISBA have positive impacts on wind speed and on the geopotential, respectively.

## 7. Probabilistic verification of precipitation

The probabilistic verifications for 24-h accumulated precipitation (precipitation accumulated from 12- to 36-h forecast lead times) are computed for the following thresholds: 2.5, 5, 10, 15, 20, 25, 30, 35, 40, 45, and 50 mm. The precipitation climatology is sampled from the verification data over the test period. The climatological occurrence obtained in such a way ranges from 40% at threshold 2.5 mm to 2% at threshold 50 mm. It should be noticed that the statistical significance of results may be low because of poor statistical sampling, especially for the large thresholds. Considering the missing observations, the statistics are accumulated over 4085 and 5059 realizations for July and August cases, respectively. To estimate the significance of the improvements (or the degradations) due to the configuration changes in the REPS, we apply bootstrap techniques (Efron and Tibshirani 1993; Candille et al. 2007) to the scores presented in this section. This means we compute 5%–95% confidence intervals, defined by bootstrap methods, for each score and each difference between two scores. The skill scores, or the skill score differences, are considered significant if the lower and upper bounds of the confidence interval have the same sign. For the clarity of the figures, we will only show some examples of confidence intervals for the Brier skill scores’ comparisons between the initial and final configurations of the system (DSVF-P6 and MSVI+TH-P3).

### a. The BSS and its decomposition

Figure 7 shows the Brier skill scores of 24-h accumulated precipitation for 11 thresholds and for 16 days. The upper panel shows the BSS for each REPS configuration. The moist SVs seem to improve the skill of the REPS, especially for the threshold range 10–40 mm. The lower panel, which represents the confidence intervals associated with the BSS differences between the initial (DSVF-P6) and final (MSVI+TH-P3) configurations of the REPS, confirms that the improvements are statistically significant for the threshold range 10–40 mm. The dry SV configuration is only skillful up to 20 mm, while the moist SV configurations seem to have skill up to more than 40 mm. If we consider the confidence intervals related to each configuration (not shown), it can be shown that the skills of the dry and moist SV configurations are only significant for the threshold ranges 5–10 and 2.5–15 mm, respectively.

Significance tests have been performed for each incremental addition to the REPS for the 16 cases altogether, as well as for the July and August cases separately (not shown). The improvement observed from DSVF-P6 to MSVF-P6 is statistically significant for the threshold range 2.5–40 mm. The higher LBC updating frequency significantly improves the skill of the REPS for the threshold 2.5 mm. But the ISBA surface scheme significantly degrades the skill for the lower thresholds of 2.5 and 5 mm. This negative effect due to ISBA is significantly counterbalanced by the perturbations of the physical processes for the threshold range 2.5–10 mm. As a result, Fig. 7b shows that the improvements from DSVF-P6 to MSVI+TH-P3 are not statistically significant for the threshold range 2.5–5 mm. Concerning the threshold range 45–50 mm, significant differences have not been observed between the five experiments.

Figure 8 shows the performance of the REPS for different synoptic weather systems: strong low pressure systems for July (upper panel) and weak synoptic forcing for August (lower panel). Better skill is obtained in July than in August. For the July cases, the skill of the dry and moist SV configurations significantly reaches the thresholds of 15 and 25 mm, respectively, while for the August cases no configuration has significant skill for any threshold (confidence intervals not shown).

For the July cases (Fig. 8, upper panel), the lowest BSS in the threshold range 15–30 mm is observed in DSVF-P6. In that threshold range, using moist SVs over dry SVs and using a higher LBC updating frequency significantly contribute to improve probabilistic precipitation forecasts. ISBA tends to significantly degrade precipitation BSS in the range 2.5–5 mm (Fig. 8, upper panel), as described above. The use of stochastic parameter perturbations counterbalances this negative effect. For thresholds greater than 10 mm, the configurations with the ISBA scheme and the perturbations of physical processes seem to provide the best BSS. But because of the small size of the verification dataset, it cannot be concluded that these improvements are statistically significant.

The BSS for the August cases (Fig. 8, lower panel) is much lower than for the July cases for all experiments. Most of the experiments exhibit low forecast skill for the 5–15-mm threshold range, and no skill for other thresholds. These results indicate that the current REPS based on moist SV perturbations optimized over a 24-h period as well as stochastic perturbations has much more difficulty forecasting precipitation rates caused by convective processes than by stratiform processes. In spite of this fact, all the moist SV configurations perform significantly better than the dry SV one.

As pointed out by Zadra et al. (2004) and Coutinho et al. (2004), there is more perturbation growth at smaller scales in the moist SV experiments than in the dry SV experiment. This was suggested to be a desirable feature for short-range ensemble forecasts by Hoskins and Coutinho (2005). In the present study, the large improvement of using moist SVs over dry SVs seems to confirm this expectation.

The BSS contains information on reliability and resolution and can be decomposed into these two components [Murphy 1973, and see Jolliffe and Stephenson 2003, Eq. (7.9)]. Contrary to the BSS, reliability and resolution are both negatively oriented (smaller values mean better results). The reliability is the statistical consistency between a priori–predicted probabilities and a posteriori–observed frequencies of occurrence of the event under consideration. The resolution is the ability of an EPS to a priori separate classes of predicted probabilities leading to observed frequencies sufficiently different and distinct. These two properties are the main attributes of a probabilistic prediction system.

Reliability and resolution for the total 16 days are shown in Figs. 9 and 10, respectively. The reliability shows significant improvements by using moist SVs instead of dry SVs, but shows no significant differences with respect to the other changes (Fig. 9, upper panel). This leads to significant improvements between the initial and final REPS configurations for the threshold range 10–45 mm (Fig. 9, lower panel). These improvements are mainly due to the significant improvements observed in August, while we obtain no differences in July (not shown). The use of moist SVs provides more reliable predictions than the use of dry SVs in convective situations, while the reliability remains the same with the two kinds of SVs for stratiform precipitation. Figure 10 (upper panel) shows that the resolution is increased by using moist SVs in the threshold range 15–35 mm, but this improvement is not statistically significant (lower panel). We have seen in Fig. 8 that the skill for the July cases is much larger than for the August cases. This is mainly due to the resolution component of the BSS (not shown for each month). Thus, worse resolution scores in August compared with July are indicative of the reduced predictability of convective processes.

To summarize, the use of moist SVs generally improves the skill of the REPS, which is essentially due to an improvement of the reliability in convective situation. However, the quality of the BSS for quiescent synoptic conditions (August cases) is mostly equivalent to the skill of the sample climatology. This indicates that the current ensemble approach (initial and boundary condition perturbations, as well as model perturbations) does not address ensemble forecasts of convection in a satisfying manner. This is indeed a very challenging topic.

### b. Reliability diagrams

The reliability diagram—a posteriori observed frequency versus a priori predicted probability—provides a detailed visualization of the reliability. Figure 11 gives the reliability diagrams of precipitation for the thresholds 5, 15, and 30 mm for the August cases. We cannot observe clear differences between the reliability of the five systems. But these diagrams and their corresponding sharpness diagrams mainly highlight the problems due to the small size of the verification sample data. The high noise level at higher probability, around *p* > 50% and *p* > 20% for the 15- and 30-mm threshold respectively, does not allow any clear conclusions about the lack of reliability of the REPSs for these probability ranges. Nevertheless, for the 5-mm threshold, the probabilities are underpredicted for the values less than the climatological frequency and overpredicted for the larger values. This characterizes EPSs, which are underdispersive (Atger 2004). More generally, if the global slope of the reliability curves is less than 1, this means the system is underdispersive. For the 15-mm threshold, the underprediction of the lower probabilities is also noticeable, and finally, for the 30-mm threshold, the lower probabilities seem to be reliable.

Talagrand diagrams of accumulated 24-h precipitation were produced (not shown). They indicate the existence of a systematic model wet bias for all the experiments. They also show the too-weak ensemble spread of rainfall-rate forecasts (in agreement with Fig. 11). This general wet bias for both July and August is related to the overestimation of light precipitation. A similar diagnosis was made with the reliability diagram for the 5-mm threshold. Among all experiments, DSVF-P6 has the strongest wet bias and highest missing rate (the sum of the frequencies of the two extreme bins; Hou et al. 2001), whereas MSVI+TH-P3 has the smallest missing rate, although there is not a clear reduction of the wet bias.

### c. Measure of the discrimination and potential economic value

The AROC score measures the statistical discrimination capability of an ensemble forecast system. Its information content is quite similar to the resolution term of the BSS. The AROC has been calculated for the 16 tested days, as well as for the July and August cases separately. We do not show the corresponding graphics because all the results and the conclusions are similar to those obtained with the Brier skill score and its resolution component.

Figure 12 depicts the potential economic values (Richardson 2000) of all experiments against the cost/loss ratio (*C*/*L*) for the 15- and 30-mm thresholds. We have chosen those thresholds because they correspond to the range of significant skill improvements (see Fig. 7, lower panel). The climatologies *p _{c}* associated with these thresholds are 15% (15 mm) and 6% (30 mm). We recall that the higher economic value is obtained for the

*C*/

*L*ratio equal to the climatological probability of the event under consideration. We first denote that for all experiments, the REPS provides better benefit for stratiform precipitation cases (July, Figs. 7a,b) than for convective situations (August, Figs. 7c,d). This result is consistent with the conclusion observed with the Brier skill scores (Fig. 8). In the August cases, the larger benefit is around 20% for

*C*/

*L*ratio ranges of 10%–40% (15 mm, Fig. 7c) and 0%–20% (30 mm, Fig. 7d), while the benefit in the July cases is up to 50% for larger

*C*/

*L*ranges up to 0%–60%. Moreover, in August, we cannot detect any significant changes between the five configurations of the REPS. On the other hand, in July, we observe significant improvements between the initial and final configurations for

*C*/

*L*ratio ranges of 30%–70% (15 mm, Fig. 7a) and 10%–30% (30 mm, Fig. 7b). For the 15-mm threshold, the significant improvement is mainly due to the increased LBC updating frequencies from 6 to 3 h. Unfortunately, this increased LBC updating frequencies leads to a degradation of the economic value for the

*C*/

*L*ratio range of 0%–20%, which is balanced by the introduction of the perturbation of physical processes. For the 30-mm threshold, the significant improvement comes from the increased LBC updating frequencies and from the introduction of the surface scheme ISBA.

To summarize, we can see that the use of the increased LBC updating frequencies from 6 to 3 h combined with the introduction of the surface scheme ISBA significantly improves the economic value of the REPS for stratiform precipitation situations.

## 8. Summary and discussion

A REPS based on GEM-LAM at 15-km resolution has been developed for short-range forecasts (up to 2 days), with focus on quantifying probabilistic QPF. Targeted SV perturbations (dry and moist) and stochastic parameter perturbations have been used to represent uncertainties of ICs and model physics, respectively. To validate the REPS, 16 summer cases in 2003 (8 in July and 8 in August) with abundant precipitation were chosen. Mid–high rainfall rates in July were dominated by stratiform precipitation, while precipitation in August was mainly controlled by convective processes. Five sets of experiments with different strategies were performed and evaluated by comprehensive verification scores. Results of this study are based on 16 cases with relatively heavy rainfall amounts for the northeast North American region. This impacts the actual numerical values of probabilistic scores that use a “climatology” based on that 16-day period, and not on a long-term climatology. It is more difficult to beat the very short “climatology” of the test period than a long-term climatology. This issue is addressed in Hamill and Juras (2006) and Candille et al. (2007).

In general, the difference in scores for dynamical fields when using moist SVs instead of dry SVs is small. Moist SV perturbation experiments provide a slightly better spread–skill relationship and larger ensemble spread before the first 24 h (optimization time of the SVs) than the dry SV perturbation experiment. However, for integration times beyond the optimization time, the ensemble spread growth of the moist SV experiments becomes smaller than the dry SV experiment. The inclusion of stochastic parameter perturbations and the use of higher LBC updating frequency can also slightly increase the ensemble spread. The forecast skill of moist SV experiments is better than the skill of the dry SV experiment for dynamic variables, but this advantage seems to disappear as the forecast lead time increases (this is consistent with the faster spread growth in DSVF-P6 after 24 h). The inclusion of parameter perturbations in the model physics can slightly enhance the ensemble spread and forecast skill.

The use of moist SVs instead of dry SVs leads to a noticeable skill increase for QPF over a wide range of thresholds and slightly reduces the wet bias of precipitation forecast for both stratiform and convective precipitation. The QPF skill is clearly dependent on the dominant weather regime. Precipitation caused by strong synoptic low pressure systems can be reasonably well forecasted compared to the precipitation caused by convective situations. The quality of the results are also strongly influenced by the following factors: 1) the surface scheme ISBA seems to produce a positive impact on stratiform precipitation forecasts for thresholds in the range 20–35 mm per 24 h; 2) increasing the LBC updating frequency produces a significant positive impact on QPF for thresholds larger than 10 mm per 24 h; 3) stochastic parameter perturbations tend to significantly enhance the performance of the REPS for the light rainfall rates. Moreover, these stochastic perturbations also seem to enhance QPF performance for heavy rainfall rates, but these results are not statistically significant because of the small sample size.

Poorer results are obtained when precipitation is mainly controlled by weak synoptic forcing favoring convective processes. These results are clearly less sensitive to the different perturbation and piloting strategies that were tested in this study. It shows a clear limitation of this REPS for convective precipitation probabilistic forecasts. However, the probabilistic measures used here might not necessarily reveal all the information content of the REPS simulations. As discussed in Du et al. (2000) and Buehner and Charron (2007), horizontal shifting of the ensemble members can contribute to improve scores without using additional simulations.

Nevertheless, the very limited prediction skill of precipitation for convective systems by stochastic parameter perturbations and especially SV perturbations (the latter being the dominant perturbation factor in the present REPS) is an important issue. Under quiescent synoptic conditions, initial condition perturbations with SVs, whether moist or dry, do not provide good probabilistic convective precipitation forecasts. The modest improvement when using moist SVs instead of dry SVs could be related to the stronger energy shift to smaller scales with moist SVs (see Coutinho et al. 2004). The use in a regional ensemble forecast system of low-resolution SVs with a 24-h optimization time might not be optimal to sample fast and small-scale processes such as deep convection. In the future, SVs calculated on the finer scale limited area will be tested. This could allow for a better ensemble sampling of fast and small-scale processes.

Stochastic perturbations of sensitive parameters in some model parameterizations seem to provide a useful method to account for part of the model error. However, it is likely that a multiparameterization approach (or the use of inherently stochastic parameterizations) would produce more diverse realizations and could be potentially more effective in probabilistic forecasts than the multiparameter approach. Nevertheless, the relatively simple parameter perturbations of the deep convection and condensation schemes performed here showed that the inclusion of some precipitation-related stochasticity can improve the QPF probabilistic skill.

Lateral boundary condition perturbations were provided by SVs calculated using a global initial norm and a targeted final norm. Thus, a global EPS piloting the regional EPS is needed. Another interesting approach that we are considering is to use so-called boundary singular vectors to provide boundary perturbations to the LAMs without the need of generating an ensemble of global piloting integrations. This would certainly reduce the numerical cost compared with the approach followed in the present study.

In brief, the main results of this work are as follows:

The use of moist SVs generates larger spread for dynamical fields than dry SVs, but the Talagrand diagrams are almost unaffected.

For piloting the limited-area model, an LBC updating frequency of 3 h improves results over a 6-h updating frequency.

The use of moist SVs, the ISBA surface scheme, and stochastic parameter perturbations yields the best probabilistic precipitation forecasts in terms of Brier skill scores and potential economic value.

Initial and boundary condition perturbations with large-scale SVs optimized over a 24-h time span, as well as stochastic parameter perturbations, generate poor results for probabilistic precipitation forecasts when the synoptic situation is quiescent.

In the future, the availability of a higher-resolution operational global EPS at the CMC will also allow us to test another boundary condition perturbation strategy. An ensemble Kalman filter is used to generate a pool of initial conditions to the Canadian global EPS. These integrations could also serve as an ensemble of lateral boundary conditions for a regional EPS.

## Acknowledgments

We thank Mark Buehner and Ayrton Zadra for fruitful discussions related to the singular vector method, Stéphane Bélair for help related to the ISBA and Kain–Fritsch schemes, and Peter Houtekamer, Ayrton Zadra, Peter Yau, and two anonymous reviewers for comments leading to improvements of the manuscript. Part of this work was funded by the Canadian Foundation for Climate and Atmospheric Sciences (CFCAS).

## REFERENCES

Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations.

,*J. Climate***9****,**1518–1530.Atger, F., 2004: Relative impact of model quality and ensemble deficiencies on the performance of ensemble based probabilistic forecasts evaluated through the Brier score.

,*Nonlinear Processes Geophys.***11****,**399–409.Barkmeijer, J., 1996: Constructing fast-growing perturbations for the nonlinear regime.

,*J. Atmos. Sci.***53****,**2838–2851.Barkmeijer, J., R. Buizza, and T. N. Palmer, 1999: 3D-Var Hessian singular vectors and their potential use in the ECMWF ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***125****,**2333–2351.Bélair, S., L-P. Crevier, J. Mailhot, B. Bilodeau, and Y. Delage, 2003: Operational implementation of the ISBA land surface scheme in the Canadian regional weather forecast model. Part I: Warm season results.

,*J. Hydrometeor.***4****,**352–370.Benoît, R., J. Côté, and J. Mailhot, 1989: Inclusion of a TKE boundary layer parameterization in the Canadian regional finite-element model.

,*Mon. Wea. Rev.***117****,**1726–1750.Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129****,**420–436.Brier, G. W., 1950: Verification of forecasts expressed in terms of probability.

,*Mon. Wea. Rev.***78****,**1–3.Bright, D. R., and S. L. Mullen, 2002: Short-range ensemble forecasts of precipitation during the southwest monsoon.

,*Wea. Forecasting***17****,**1080–1100.Buehner, M., and A. Zadra, 2006: Impact of flow-dependent analysis error covariance norms on extra-tropical singular vectors.

,*Quart. J. Roy. Meteor. Soc.***132****,**625–646.Buehner, M., and M. Charron, 2007: Spectral and spatial localization of background-error correlations for data assimilation.

,*Quart. J. Roy. Meteor. Soc.***133****,**615–630.Buizza, R., 1994: Localization of optimal perturbations using a projection operator.

,*Quart. J. Roy. Meteor. Soc.***120****,**1647–1681.Buizza, R., and T. N. Palmer, 1995: The singular-vector structure of the atmospheric global circulation.

,*J. Atmos. Sci.***52****,**1434–1456.Buizza, R., A. Hollingsworth, F. Lalaurette, and A. Ghelli, 1999a: Probabilistic predictions of precipitation using the ECMWF ensemble prediction system.

,*Wea. Forecasting***14****,**168–189.Buizza, R., M. Miller, and T. N. Palmer, 1999b: Stochastic representation of model uncertainties in the ECMWF ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***125****,**2887–2908.Candille, G., C. Côté, P. L. Houtekamer, and G. Pellerin, 2007: Verification of an ensemble prediction system against observations.

,*Mon. Wea. Rev.***135****,**2688–2699.Chessa, P. A., G. Ficca, M. Marrocu, and R. Buizza, 2004: Application of a limited-area short-range ensemble forecast system to a case of heavy rainfall in the Mediterranean region.

,*Wea. Forecasting***19****,**566–581.Côté, J., S. Gravel, A. Méthot, A. Patoine, M. Roch, and A. Staniforth, 1998a: The operational CMC–MRB Global Environmental Multiscale (GEM) model. Part I: Design considerations and formulation.

,*Mon. Wea. Rev.***126****,**1373–1395.Côté, J., J-G. Desmarais, S. Gravel, A. Méthot, A. Patoine, M. Roch, and A. Staniforth, 1998b: The operational CMC–MRB Global Environmental Multiscale (GEM) model. Part II: Results.

,*Mon. Wea. Rev.***126****,**1397–1418.Coutinho, M. M., B. J. Hoskins, and R. Buizza, 2004: The influence of physical processes on extratropical singular vectors.

,*J. Atmos. Sci.***61****,**195–209.Deardorff, J. W., 1978: Efficient prediction of ground surface temperature and moisture with inclusion of a layer of vegetation.

,*J. Geophys. Res.***83****,**1889–1903.Du, J., and M. S. Tracton, 2001: Implementation of a real-time short-range ensemble forecasting system at NCEP: An update. Preprints,

*Ninth Conf. on Mesoscale Processes,*Ft. Lauderdale, FL, Amer. Meteor. Soc., 355–356.Du, J., S. L. Mullen, and F. Sanders, 1997: Short-range ensemble forecasting of quantitative precipitation.

,*Mon. Wea. Rev.***125****,**2427–2459.Du, J., S. L. Mullen, and F. Sanders, 2000: Removal of distortion error from an ensemble forecast.

,*Mon. Wea. Rev.***128****,**3347–3351.Du, J., G. DiMego, M. S. Tracton, and B. Zhou, 2003: NCEP short-range ensemble forecasting (SREF) system: Multi-IC, multi-model and multi-physics approach. Research Activities in Atmospheric and Oceanic Modelling. J. Côté, Ed., Rep. 33, CAS/JSC Working Group on Numerical Experimentation (WGNE), WMO Tech. Doc. 1161, 5.09–5.10.

Efron, B., and R. Tibshirani, 1993:

*An Introduction to the Bootstrap*. Chapman and Hall, 436 pp.Ehrendorfer, M., R. M. Errico, and K. D. Raeder, 1999: Singular vector perturbation growth in a primitive equation model with moist physics.

,*J. Atmos. Sci.***56****,**1627–1648.Frogner, I-L., and T. Iversen, 2001: Targeted ensemble prediction for northern Europe and parts of the North Atlantic Ocean.

,*Tellus***53A****,**35–55.Frogner, I-L., and T. Iversen, 2002: High-resolution limited-area ensemble predictions based on low-resolution targeted singular vectors.

,*Quart. J. Roy. Meteor. Soc.***128****,**1321–1341.Garand, L., 1983: Some improvements and complements to the infrared emissivity algorithm including a parameterization of the absorption in the continuum region.

,*J. Atmos. Sci.***40****,**230–243.Garand, L., and J. Mailhot, 1990: The influence of infrared radiation on numerical weather forecasts. Preprints,

*Seventh Conf. on Atmospheric Radiation,*San Francisco, CA, Amer. Meteor. Soc., 146–151.Gauthier, P., C. Charette, L. Fillion, P. Koclas, and S. Laroche, 1999: Implementation of a 3D variational data assimilation system at the Canadian Meteorological Centre. Part I: The global analysis.

,*Atmos.–Ocean***37****,**103–156.Grimit, E. P., and C. F. Mass, 2002: Initial results of a mesoscale short-range ensemble forecasting system over the Pacific Northwest.

,*Wea. Forecasting***17****,**192–205.Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta-RSM short-range ensemble forecasts.

,*Mon. Wea. Rev.***125****,**1312–1327.Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: Is it real skill or is it the varying climatology?

,*Quart. J. Roy. Meteor. Soc.***132****,**2905–2923.Hersbach, H., R. Mureau, J. D. Opsteegh, and J. Barkmeijer, 2000: A short-range to early-medium-range ensemble prediction system for the European area.

,*Mon. Wea. Rev.***128****,**3501–3519.Hoskins, B. J., and M. M. Coutinho, 2005: Moist singular vectors and the predictability of some high impact European cyclones.

,*Quart. J. Roy. Meteor. Soc.***131****,**581–601.Hou, D., E. Kalnay, and K. K. Droegemeier, 2001: Objective verification of the SAMEX ’98 ensemble forecasts.

,*Mon. Wea. Rev.***129****,**73–91.Jolliffe, I. T., and D. B. Stephenson, 2003:

*Forecast Verification: A Practitioner’s Guide in Atmospheric Science*. Wiley and Sons, 240 pp.Kain, J. S., 2004: The Kain–Fritsch convective parameterization: An update.

,*J. Appl. Meteor.***43****,**170–180.Kain, J. S., and J. M. Fritsch, 1990: A one-dimensional entraining/detraining plume model and its application in convective parameterization.

,*J. Atmos. Sci.***47****,**2784–2802.Kain, J. S., and J. M. Fritsch, 1992: The role of the convective “trigger function” in numerical forecasts of mesoscale convective systems.

,*Meteor. Atmos. Phys.***49****,**93–106.Kuo, H. L., 1965: On formation and intensification of tropical cyclones through latent heat release by cumulus convection.

,*J. Atmos. Sci.***22****,**40–63.Kuo, H. L., 1974: Further studies on the parameterization of the influence of cumulus convection on large-scale flow.

,*J. Atmos. Sci.***31****,**1231–1240.Laprise, R., 1992: The Euler equations of motion with hydrostatic pressure as independent variable.

,*Mon. Wea. Rev.***120****,**197–207.Laroche, S., P. Gauthier, J. St-James, and J. Morneau, 1999: Implementation of a 3D variational data assimilation system at the Canadian Meteorological Centre. Part II: The regional analysis.

,*Atmos.–Ocean***37****,**281–307.Lin, J. W. B., and J. D. Neelin, 2000: Influence of a stochastic moist convective parameterization on tropical climate variability.

,*Geophys. Res. Lett.***27****,**3691–3694.Lorenz, E. N., 1965: A study of the predictability of a 28-variables atmospheric model.

,*Tellus***17****,**321–333.Mahfouf, J-F., B. Brasnett, and S. Gagnon, 2007: A Canadian Precipitation Analysis (CaPA) project: Description and preliminary results.

,*Atmos.–Ocean***45****,**1–17.Mailhot, J., and R. Benoît, 1982: A finite-element model of the atmospheric boundary layer suitable for use with numerical weather prediction models.

,*J. Atmos. Sci.***39****,**2249–2266.Marsigli, C., A. Montani, F. Nerozzi, T. Paccagnella, S. Tibaldi, F. Molteni, and R. Buizza, 2001: A strategy for higher-resolution ensemble prediction. Part II: Limited-area experiments in four Alpine flood events.

,*Quart. J. Roy. Meteor. Soc.***127****,**2095–2115.Mason, I., 1982: A model for assessment of weather forecasts.

,*Aust. Meteor. Mag.***30****,**291–303.McFarlane, N. A., 1987: The effect of orographically excited gravity wave drag on the general circulation of the lower stratosphere and troposphere.

,*J. Atmos. Sci.***44****,**1775–1800.Miguez-Macho, G., and J. Paegle, 2001: Sensitivity of North American numerical weather prediction to initial state uncertainty in selected upstream subdomains.

,*Mon. Wea. Rev.***129****,**2005–2022.Molteni, F., and T. N. Palmer, 1993: Predictability and finite time instability of the northern winter circulation.

,*Quart. J. Roy. Meteor. Soc.***119****,**269–298.Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122****,**73–119.Montani, A., C. Marsigli, F. Nerozzi, T. Paccagnella, S. Tibaldi, and R. Buizza, 2003: The Soverato flood in southern Italy: Performance of global and limited-area ensemble forecasts.

,*Nonlinear Processes Geophys.***10****,**261–274.Mullen, S. L., and D. P. Baumhefner, 1989: The impact of initial condition uncertainty on numerical simulations of large-scale explosive cyclogenesis.

,*Mon. Wea. Rev.***117****,**2800–2821.Murphy, A. H., 1973: A new vector partition of the probability score.

,*J. Appl. Meteor.***12****,**595–600.Noilhan, J., and S. Planton, 1989: A simple parameterization of land surface processes for meteorological models.

,*Mon. Wea. Rev.***117****,**536–549.Nutter, P., D. Stensrud, and M. Xue, 2004: Effects of coarsely resolved and temporally interpolated lateral boundary conditions on the dispersion of limited-area ensemble forecasts.

,*Mon. Wea. Rev.***132****,**2358–2377.Palmer, T. N., R. Gelaro, J. Barkmeijer, and R. Buizza, 1998: Singular vectors, metrics, and adaptive observations.

,*J. Atmos. Sci.***55****,**633–653.Pudykiewicz, J., R. Benoît, and J. Mailhot, 1992: Inclusion and verification of a predictive cloud water scheme in a regional weather prediction model.

,*Mon. Wea. Rev.***120****,**612–626.Richardson, D. S., 2000: Skill and economic value of the ECMWF ensemble prediction system.

,*Quart. J. Roy. Meteor. Soc.***126****,**649–668.Stensrud, D. J., and N. Yussouf, 2003: Short-range ensemble predictions of 2-m temperature and dewpoint temperature over New England.

,*Mon. Wea. Rev.***131****,**2510–2524.Stensrud, D. J., H. E. Brooks, J. Du, M. S. Tracton, and E. Rogers, 1999: Using ensembles for short-range forecasting.

,*Mon. Wea. Rev.***127****,**433–446.Stensrud, D. J., J. W. Bao, and T. T. Warner, 2000: Using initial condition and model physics perturbation in short-range ensemble simulations of mesoscale convective systems.

,*Mon. Wea. Rev.***128****,**2077–2107.Sundqvist, H., E. Berge, and J. E. Kristjansson, 1989: Condensation and cloud parameterization studies with a mesoscale numerical weather prediction model.

,*Mon. Wea. Rev.***117****,**1641–1657.Talagrand, O., R. Vautard, and B. Strauss, 1999: Evaluation of probabilistic prediction systems.

*Proc. ECMWF Workshop on Predictability,*Reading, United Kingdom, ECMWF, 1–25.Vannitsem, S., and Z. Toth, 2002: Short-term dynamics of model errors.

,*J. Atmos. Sci.***59****,**2594–2604.Wandishin, M. S., S. L. Mullen, D. J. Stensrud, and H. E. Brooks, 2001: Evaluation of a short-range multimodel ensemble system.

,*Mon. Wea. Rev.***129****,**729–747.Wilcoxon, F., 1945: Individual comparisons by ranking methods.

,*Biometrics***1****,**80–83.Wilks, D. S., 2005: Effects of stochastic parameterization in the Lorenz ’96 system.

,*Quart. J. Roy. Meteor. Soc.***131****,**389–407.Yang, Z., and R. W. Arritt, 2002: Tests of a perturbed physics ensemble approach for regional climate modeling.

,*J. Climate***15****,**2881–2896.Yeh, K-S., J. Côté, S. Gravel, A. Méthot, A. Patoine, M. Roch, and A. Staniforth, 2002: The CMC–MRB Global Environmental Multiscale (GEM) model. Part III: Nonhydrostatic formulation.

,*Mon. Wea. Rev.***130****,**339–356.Yu, W., L. Garand, and A. P. Dastoor, 1997: Evaluation of model clouds and radiation at 100 km scale using GOES data.

,*Tellus***49A****,**246–262.Zadra, A., M. Roch, S. Laroche, and M. Charron, 2003: The subgrid-scale orographic parameterization of the GEM model.

,*Atmos.–Ocean***41****,**155–170.Zadra, A., M. Buehner, S. Laroche, and J-F. Mahfouf, 2004: Impact of the GEM model simplified physics on extra-tropical singular vectors.

,*Quart. J. Roy. Meteor. Soc.***130****,**2541–2569.

Parameter settings for stochastically perturbed functions.

Main characteristics of the experiments.