1. Why observations of clouds are not included in present-day assimilation systems
Accurate weather forecasts rely to a large extent on obtaining accurate estimates of the instantaneous state of the atmosphere as initial conditions for the forecast model. Those initial conditions are produced using data assimilation, a process that combines observations with short-term forecasts to produce an estimate of the state that is as close as possible to the observations while remaining consistent with the error statistics of the model and the observations.
In the context of data assimilation, the complete description of all prognostic variables at all locations is called the “state vector,” while the (possibly transformed) subspace of this state adjusted during assimilation is called the “control vector.” For models of the earth’s atmosphere the state vector comprises the values of wind velocity, temperature, and water vapor (or related quantities such as vorticity, divergence, and potential temperature) in each model grid cell. In comprehensive models the state vector also includes variables related to clouds, typically the concentration of one or more species of condensed water (liquid, ice, or the combination). Models may also include some representation of the subgrid-scale distribution of cloud condensate, the simplest of which is the proportion of the grid cell occupied by clouds, usually called the “cloud fraction.” In the global models with which we are concerned, the equations linking these variables to the rest of the state vector include the effects of both explicit processes (e.g., condensation via large-scale cooling) and parameterized processes (e.g., detrainment from moist convection), and different cloud schemes are coupled more or less loosely to the rest of the model state.
Although clouds may be part of the state vector, clouds are shunned in most assimilation systems: cloud variables are not part of the control vector, nor are observations of cloud quantities typically included in the observing network. There are both theoretical and practical reasons for this, most of which are discussed in the review paper by Errico et al. (2007). Among the most daunting are the difficulties inherent in observing clouds, including non-unique relationships between observables and cloud variables and the difficulty of obtaining representative measurements in highly variable cloud fields. In addition, the parameterizations that connect cloud variables to other aspects of the model state, especially in global models, contain thresholds and strong nonlinearities. This implies that cloud variables are likely to violate many of the principles underlying operational data assimilation methods, including linear relationships between variables (as expressed by the covariance matrix) and a Gaussian distribution of model background and observational errors.
Progress has been made recently in using “cloud-affected” observations. At the European Centre for Medium-Range Weather Forecasts, for example, experiments have been done in which a diagnostic cloud scheme (Tompkins and Janisková 2004) determines cloud properties and their effect on the predicted satellite infrared radiance observations during the minimization steps of a variational assimilation scheme, leading to reduced errors in temperature and humidity fields (Chevallier et al. 2004). Experiments with other observation types in global models have had a more neutral impact (Benedetti and Janisková 2008). To date, however, clouds have been omitted from the control vector in global data assimilation, so that the model’s clouds in an analysis are potentially inconsistent with the rest of the model’s state.
This inconsistency may not be a problem. Clouds react quickly to environmental conditions. They are produced chiefly by grid-scale (parameterized) physics (Tiedtke 1993; Wilson et al. 2008) and are not subject to large-scale balance constraints (Errico et al. 2007). All these factors imply that clouds in a large-scale model may come along for the ride if the rest of the model state is accurate.
But data assimilation is, after all, designed specifically to wring the maximum amount of information from every observation, and there may be benefits to including the cloud state in assimilation that have yet to be uncovered. In cloud-scale models (i.e., those in which clouds are explicitly resolved) the assimilation of cloud observations can lead to improved estimates of the model state (Vukicevic et al. 2004, 2006), and one can imagine that some of this benefit might be available to global models with parameterized clouds.
Here we investigate the amount by which short-range forecasts produced by global models might be improved by fully including clouds in the data assimilation process, both by assimilating observations of clouds and by including the cloudy part of the model state in the control vector. We examine two global models using two different cloud schemes within a single ensemble data assimilation system. We consider the best-case scenario in which the assimilating model is assumed to be perfect and the cloud variables used in each model’s state vector are observed directly. The resulting impact of cloud observations on the state estimates in the two models are an optimistic estimate of what may be achieved in more realistic situations, with real cloud observations and imperfect models. The next section provides an overview of the two models and the data assimilation system. This is followed by a description of perfect model experiments that assess the improvement in analyses and short-range forecasts stemming from the assimilation. The final section describes the practical relevance of our results.
2. Perfect model experiments with two forecast models and a single data assimilation system
Current data assimilation systems may be divided into two broad categories: those based on variational analyses and those based on ensemble filters (Evensen 1994; Houtekamer and Mitchell 1998). Variational systems are more common in operational centers, but the two families are competitive in performance (Houtekamer and Mitchell 2005; Kalnay et al. 2007; Buehner et al. 2010). One important distinction is that ensemble assimilation systems do not require an adjoint or tangent-linear approximation to the forecast model. This is particularly attractive when assimilating clouds because the thresholds and strong nonlinearities in most cloud schemes make adjoints especially hard to construct. From a practical point of view, too, the ability to use ensemble assimilation schemes without having to build an adjoint also makes it relatively easy to use different forecast models and/or different cloud schemes with little additional effort. We have taken advantage of this simplicity to couple two global models of the atmosphere to a single ensemble data assimilation system, allowing us to examine the potential benefits of assimilating cloud observations with two different cloud schemes using uniform observational networks and data assimilation algorithms.
a. Two global forecast models
One model is the Community Atmosphere Model (CAM) 3.5.06, a descendent of version 3 of the atmospheric component of the National Center for Atmospheric Research (NCAR) Community Climate System Model (Collins et al. 2006). The most significant difference with respect to CAM 3 is the change to a latitude–longitude version of the Lin–Rood finite volume dynamical core (Lin and Rood 1996; Lin 2004). The default version of this core includes both algebraic and Fourier polar filters; we have opted to use only the latter to avoid infrequent, but large, transient oscillations in wind speed at the transition between the filters [see Fig. 8 in Anderson et al. (2009)]. CAM 3.5.06 also includes small changes to the convection and cloud schemes. We run the model with a grid spacing of 1.9° × 2°. The cloud scheme contains separate prognostic equations for cloud liquid and ice water content (Boville et al. 2006); cloud fraction is determined diagnostically.
The second model is an interim version of the Atmospheric Model (AM) developed at the Geophysical Fluid Dynamics Laboratory (GFDL). This is a descendent of the atmospheric component of the atmosphere–ocean model CM2.1 (Delworth et al. 2006) that couples the physics parameterizations developed for atmosphere model (AM2; GFDL Global Atmospheric Model Development Team 2004) to another variant of the Lin–Rood finite-volume core, this one on a 2° × 2.5° grid. The cloud scheme follows Tiedtke (1993) and includes prognostic equations for cloud fraction as well as cloud liquid and ice water contents.
b. An adaptable data assimilation system
We use AM2 and CAM in conjunction with the Data Assimilation Research Testbed (DART; Anderson et al. 2009). DART is a flexible ensemble data assimilation system that can be coupled to new models relatively easily. The experiments described below use an ensemble adjustment Kalman filter (EAKF; Anderson 2001) and 80 member ensembles. We assimilate observations in 6-h time windows centered on 0000, 0600, 1200, and 1800 UTC. The influence of observations is localized (Gaspari and Cohn 1999) to limit spurious correlations that might arise due to sampling from small ensembles, and the prior distribution of model states is inflated by an amount that adapts in space and time (Anderson 2007, 2009). Inflation is normally used to prevent systematic model errors and nonlinearities from causing the ensemble to drift far from the observations, so that the latter are rejected; in equilibrated perfect model experiments inflation is needed, in principle, only to account for sampling error due to the finite ensemble size and should normally be quite near one.
We developed this combination—the DART data assimilation system coupled to two models normally used to make climate projections rather than weather forecasts—specifically to understand the role that cloud schemes play in determining the utility of cloud observations. We do not expect that we have obtained optimum forecasts. The data assimilation system, for example, has not been carefully tuned to work with either model (and, in fact, this is the first time AM2 has been coupled to a data assimilation system at all). Neither of our forecast models is operational and, though variants have been tested using short forecasts from externally imposed initial conditions (see, e.g., Xie et al. 2008; Hannay et al. 2009), the models were developed to make climate projections, and neither has been used routinely as part of a data assimilation/forecasting cycle. As we show below, the benefit of accounting for clouds during data assimilation depends on the details of the model (and, presumably, the data assimilation system, observation networks, etc.). Our results illuminate certain classes of behavior but do not quantitatively predict the benefit in other circumstances.
c. Generating perfect observations
We assess the maximum benefit that might be provided by cloud observations using perfect model experiments (i.e., observing system simulation experiments with interpolated identity observations). These use a single integration of the predictive model as the “truth” from which synthetic observations are obtained. Perfect model experiments eliminate systematic differences between model and observations. They also remove many of the observational problems associated with interpreting cloud observations during data assimilation, especially those related to representativeness and nonlinear forward operators. This maximizes the benefit of real cloud observations to each model (subject to further tuning of the model for forecasts and its integration with the data assimilation system).
We consider two sets (networks) of observations. Conventional observations include measurements of wind velocity components u and υ, temperature T, and specific humidity q obtained from radiosondes; wind velocity and temperature from commercial aircraft [i.e., the Aircraft Communications and Addressing System (ACARS)]; and estimates of wind velocity obtained from satellite cloud tracking. Observation locations, times, and variances come from the metadata describing the observations used in the National Centers for Environmental Prediction (NCEP) reanalysis (Kistler et al. 2001) for July 2007.
Synthetic observations of cloud quantities are made on a regular geodesic grid with a spacing of roughly 2° at the equator, so that there is a maximum of one observation per model grid cell and roughly 1200 observation locations globally. Identity observations are produced for the cloud variables used to describe the model state, that is, cloud liquid and ice specific humidity (ql and qi, respectively) in both models and cloud fraction cf in AM2. Observations of clouds are produced at 300, 500, 700, and 900 hPa from the model state at the center of each time window. The error (standard deviation) in liquid and ice water contents is specified as 10% of the value of the observation, with a minimum of 10−6 kg kg−1, while the error in cloud fraction is 5%, ramping linearly to 1% when cloud fraction is within 10% of its upper or lower bound. These error estimates are arbitrary, though they do reflect the fact that cloud fraction, which relies only on distinguishing clouds from clear air, is easier to observe than any continuous variable.
Synthetic observations are created from a free run of the model. We advance a single copy of the model between observation times 6 h apart using boundary conditions and other forcings from July 2007; the state of this realization is treated as “truth.” For each observation in each network, we use the appropriate forward operator (i.e., linear interpolation) to predict the value of the observation from the model state, then add random noise consistent with a Gaussian distribution of errors with variance specified by the observation error variance. This produces synthetic observations consistent with both the time-evolving state of the model and with the uncertainty associated with each observation. Cloud water and ice contents are positive definite, so they must be set to 0 where the synthesized observations (truth plus observational noise) are less than 0, while cloud fraction is also bounded above. These truncations introduce small biases in the synthesized observations that we have ameliorated somewhat by using the variable error estimates described in the previous paragraph.
3. The impact of assimilating cloud observations
a. Generating test ensembles
We begin with ensembles that represent each model’s climatological spread during July. We construct these ensembles by extracting the conditions on 1 July for each year of a 20-yr simulation and integrating the ensemble in 5-day increments until we have 80 sets of initial conditions. We spin up the assimilation system from this climatological state by assimilating the synthetic conventional observations described in section 2c for 10 days. (We made the somewhat arbitrary decision to exclude specific humidity observations, but this has very little effect, as shown in section 3c.) The ensembles begin to equilibrate with respect to the observations within a few days (Fig. 1): the RMS difference in all quantities between the “truth” and the ensemble mean prior (6-h forecast) distribution decreases with time and is still decreasing slowly at the end of 10 days. (Full equilibration takes about a month.) Biases (not shown) are small (≈5%–10% of the RMSE at any given time), which reflects the fact that the observations are constructed so that the model has no systematic errors.
Median inflation approaches a value slightly greater than unity as the ensembles move toward equilibration, again because the observations are constructed so as to have no systematic errors. Figure 2 shows the inflation for T; inflation is quantitatively similar for different variables at corresponding times and locations. During the first few assimilation cycles, however, the inflation is greater than one, particularly for AM2. Climatological ensembles are broad by construction; to the extent that the distributions are also non-Gaussian, one expects model background error estimates to be larger than optimal. Under these circumstances observations are weighted too heavily and the ensemble spread reduced too aggressively. Inflation is then needed in subsequent assimilation cycles because the observations appear to be improbable. This issue is more pronounced for AM2 than for CAM.
The assimilation process accounts for errors in both model and observations when fitting the model ensemble to the observations, so we expect the RMSE and ensemble spread of observed variables to be commensurate with each other in perfect model experiments. It is a little surprising, then, that the ensemble spread in AM2, particularly in T, is markedly smaller than the RMS (Fig. 1). Inflation could cause this behavior, but it is quite near one for most of the time period (Fig. 2). This difference more likely reflects the fact that the localization half-widths are the same (0.2 rad) in both experiments. The localization scale is chosen to maximize the utility of observations while minimizing sampling errors from finite ensembles, and there is no a priori reason to expect the same value to work equally well in both models. In particular, deficient spread in AM2 is consistent with that model having correlations more localized in space than in CAM because the relationships among variables are more nonlinear and/or higher-dimensional. We have chosen to live with the suboptimal performance of the DART + AM2 system, rather than tuning the localization scales for each model, so that results for the two models can be compared directly.
Ensemble data assimilation determines the covariances between state variables as a function of time and space from the ensemble itself, then adjusts all variables consistent with the observations, the observation errors, and the background errors (as modeled by the ensemble covariance). This means that estimates of all quantities, including those not directly observed, are improved by observations (as long as the covariances between these quantities are nonzero). In particular, errors in specific humidity and cloud water content decline as the ensemble adjusts to the observations over time (Fig. 1), though the RMSE, particularly for cloud quantities, is much larger than the ensemble spread. These improvements suggest that the converse will also hold, that is, that sufficiently accurate observations of cloud parameters may also improve estimates of the overall state of the model.
b. Correlations between cloud and other control variables
One primary difference between clouds and other parts of the model state has to do with spatial scale: cloud variables in global models are largely determined by grid-scale processes (parameterizations), while the relationships between temperature, pressure, and winds reflect physics on synoptic scales resolved on the model grid. To the extent that cloud processes are localized in space, one can imagine that correlations between cloud variables and temperature, winds, and humidity, might also be strongly localized, and hence for observations of clouds to have minimal impact on other aspects of the state during data assimilation. But this fear turns out to be unwarranted. Figure 3 shows the increments, computed at the last time step of the equilibration runs described above, in T and ql due to individual observations of T and ql in CAM. These example observations are made at an arbitrary level within a synoptic storm, and localization is not applied. The increments are proportional to the correlation of the field being incremented with the variable being observed. Temperature is autocorrelated across much larger spatial scales than ql, consistent with the primary control of ql by small-scale processes. Nonetheless, observations of T and ql influence each other across multiple grid points, which further supports the idea that observations of ql have the potential to decrease the overall error in the analysis.
These correlations also suggest that simply including clouds in the control vector might also increase analysis and forecast skill, since observations of T, say, produce better estimates of cloud specific humidities, and hence better estimates of diabatic heating due to cloud processes such as precipitation. We return to this point below.
c. Assimilating observations of clouds
We evaluate that potential by assimilating a range of additional observation types beginning on day 11. The simplest case adds conventional observations of specific humidity q. We also perform assimilations adding, in turn, cloud liquid water specific humidity ql, cloud ice water specific humidity qi (for both models), and cloud fraction cf for AM2 alone.
Root-mean-square errors in 6-h forecasts of all variables continue to decrease slowly with time (Fig. 4) as the runs continue to equilibrate, but adding observations of q alone does not strongly affect the assimilation: the RMSE in all variables is only 1%–2% smaller than in parallel runs (not shown) that neglect q. The ensemble spread in q remains deficient in both models. This is because the observational error specified by NCEP, which ranges from 0.001 to 0.01 kg kg−1, is substantially larger than the RMS error, and assimilating many observations, even those with large error, acts to reduce the spread even if the bias or RMSE is unaffected [see, e.g., Eq. (18) in Anderson (2003)]. For AM2 the RMSE in q is notably larger at 0000 and 1200 UTC than at intervening times (i.e., the trace of RMSE is somewhat jagged in Fig. 4); this is the result of variable observation density (observations of q in our network are an order of magnitude less numerous at 0600 and 1800 UTC than at 0000 and 1200 UTC) and more rapid error growth in the q field in AM2 than in CAM.
Assimilating observations of cloud variables reduces the RMSE of all state variables in 6-h forecasts in both models, but the benefit is much larger in CAM than in AM2. Figure 5 shows the decrease in RMS error in 6-h forecasts (the prior ensembles), expressed as the ratio of the error in assimilation cycles that make use of cloud observations to those that do not (i.e., the experiments shown in Fig. 4). Assimilating observations of ql using CAM reduces the error by roughly 20% for all variables by the end of the 10-day experiment. The amount of benefit increases with time, suggesting that the information added by cloud observations persists in both time and space. Results using AM2 are more mixed. As with CAM, much of the benefit of cloud observations in AM2 can be realized using a single kind of observation (ql or cf), although observations of qi do reduce the error in qi itself. But improvements in forecasts of q and ql in AM2, though measurable, are much more modest than in CAM, and the amount of inflation in AM2 increases with the density of cloud observations being used (Fig. 6). This suggests that the linear correlations between cloud and other state variables are weaker and/or more localized in AM2 than in CAM, since numerous weakly correlated observations act to erroneously decrease the ensemble spread1 requiring inflation to compensate. Continued application of inflation also increases the RMSE by increasing the ensemble spread, thereby giving too much weight to the observations; by the end of the 10-day window this effect is large enough to substantially reduce the benefit brought about by assimilating cloud observations.
Observations of cloud fraction are at least as effective in reducing error in AM2, in which this variable is prognostic, as are observations of cloud water. This is encouraging since cloud fraction, which requires only the ability to distinguish clear and cloudy skies, is substantially easier to observe than cloud water or ice concentrations.
The value of the cloud observations is greatest where other observations are more sparse. Figure 7 shows the zonally averaged benefit of assimilating ql, expressed as the ratio of the RMS error in runs in which ql is assimilated to those in which it is not, averaged over the last five days of the runs shown in Fig. 5. The amount of benefit depends on the variable in question but is uniformly largest in the Southern Hemisphere, especially south of 30°S, where all other kinds of observations are sparse. But the benefit does not derive solely from having any kind of observation in these poorly observed parts of the globe. We have performed (but do not show) experiments in which synthetic GPS radio occultation observations (Anthes et al. 2008) provide about the same number of quasi–uniformly spaced observations as does our cloud network; these bring about half the benefit (decrease in RMSE) as do our cloud observations.
The covariance between cloud and other state variables is more localized in space than, say, temperature and other state variables (see Fig. 3), and the fact that clouds are dominated by parameterized rather than resolved processes (i.e., there are no large-scale, long-time balance requirements for clouds) means that correlations are also more localized in time. This implies that most of the benefit from using cloud observations comes from their impact on other state variables. We have found this to be true: runs with CAM (not shown) in which cloud observations are assimilated but clouds are not included in the control vector perform comparably to the runs shown here. The implication is that including cloud observations in existing data assimilation systems may be beneficial independent of whether the control vector is expanded to include clouds.
4. From perfect models toward the imperfect world
Our results imply that short-term forecast skill might be improved if observations of clouds were fully included in data assimilation systems, meaning that clouds are observed directly and, secondarily, are included as part of the assimilation control vector. The greatest benefit is achieved where other observations are sparse, so, as with any new kind of observation, the utility of the measurements depends on the network to which the new observations are being added. The amount of benefit depends strongly on the cloud scheme used by the model, as we discuss in more detail in the next paragraph. Additional factors will contribute to the challenge of assimilating cloud observations operationally, including model bias, questions of representativeness, and the difficulty of relating available cloud observations to model state. To quote one of the referees of this paper, “the devil is in the details,” and there remains a large distance between our experiments and the assimilation of real observations of clouds into an operational forecast model. Nonetheless, our results allow us to draw several practical conclusions.
Cloud observations are much more effective at reducing errors in CAM than in AM2. Some of this difference may be attributed to the assimilation system not having been tuned carefully to either model, and particularly to our use of a localization scale that is larger than optimal for AM2. In addition, Fig. 3 suggests that covariances between cloud and other variables extend across smaller spatial scales than those among more traditional state variables, even in CAM, so that it may be useful to localize the impact of cloud observations more strictly than other observations. But some of the difference between CAM and AM2 almost certainly lies in differences in the schemes controlling the evolution of clouds in each model. In particular, AM2’s cloud scheme has an additional degree of freedom—a prognostic equation for cloud fraction in addition to the equations for cloud water and ice. In the climate simulations for which these models are designed, AM2’s cloud field is in somewhat better agreement with observations than CAM (Pincus et al. 2008), but the additional degree of freedom appears to weaken the instantaneous correlations between cloud and other variables on which data assimilation relies, which makes observations of cloud less useful in reducing analysis and forecast errors in AM2 than in CAM. Thus a cloud scheme that produces more accurate forecasts may not be the cloud scheme best suited for use in data assimilation (Tompkins and Janisková 2004) in a perfect-model setting. Real observations of clouds will be useful only if 1) covariances between clouds and other variables in nature are strong enough over large enough scales to be resolved by the model and 2) the model is able to faithfully reproduce those covariances. We infer, too, that the utility of observations related to highly parameterized processes depends more strongly on the details of the forecast model used in the assimilation system than for observations related to resolved dynamics.
We have used interpolated identity observations (i.e., observations that map directly onto state variables), and, for clouds, the distance between what can be observed and how the clouds are represented in a model can be particularly large. Many observation types, including observations of cloud optical thickness or liquid water path, integrate cloud properties vertically within a column. Unfortunately, integral measures have not been particularly effective at improving assimilation performance (Geer et al. 2008). Clouds in nature are variable on scales much smaller than are resolved by even high-resolution forecast models. This is why clouds in global models are so sensitive to the details of the cloud parameterization, but it is also why obtaining representative observations of clouds for practical use is so difficult (Errico et al. 2007).
Cloud fraction is perhaps the exception: this quantity may be observed fairly well using a range of instruments, since segregating clear and cloudy skies is a far simpler observational task than determining the value of a continuous variable like specific humidity, and values are naturally representative of an area mean. Since observations of this quantity improve 6-h forecasts with AM2 at least as much as assimilating observations of cloud liquid, one can imagine that assimilating cloud fraction (in models where this value is prognostic) from the top of the atmosphere down to and including the level of the highest cloud observed (so that masking of low clouds by higher clouds is not an issue) might be a first practical step.
Acknowledgments
We appreciate the patient support of NASA under Grant NNG06GB76G and the very generous computing resources and technical assistance provided by the NASA Advanced Supercomputing Division. Comments from two anonymous referees helped us sharpen the arguments in this paper.
REFERENCES
Anderson, J. L. , 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129 , 2884–2903.
Anderson, J. L. , 2003: A local least squares framework for ensemble filtering. Mon. Wea. Rev., 131 , 634–642.
Anderson, J. L. , 2007: An adaptive covariance inflation error correction algorithm for ensemble filters. Tellus, 59A , 210–224.
Anderson, J. L. , 2009: Spatially and temporally varying adaptive covariance inflation for ensemble filters. Tellus, 61A , 72–83.
Anderson, J. L. , T. Hoar , K. Raeder , H. Liu , N. Collins , R. Torn , and A. Avellano , 2009: The Data Assimilation Research Testbed: A community facility. Bull. Amer. Meteor. Soc., 90 , 1283–1296.
Anthes, R. A. , and Coauthors , 2008: The COSMIC/FORMOSAT-3 mission: Early results. Bull. Amer. Meteor. Soc., 89 , 313–333.
Benedetti, A. , and M. Janisková , 2008: Assimilation of MODIS cloud optical depths in the ECMWF model. Mon. Wea. Rev., 136 , 1727–1746.
Boville, B. A. , P. J. Rasch , J. J. Hack , and J. R. McCaa , 2006: Representation of clouds and precipitation processes in the Community Atmosphere Model version 3 (CAM3). J. Climate, 19 , 2184–2198.
Buehner, M. , P. L. Houtekamer , C. Charette , H. L. Mitchell , and B. He , 2010: Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations. Mon. Wea. Rev., 38 , 1567–1586.
Chevallier, F. , P. Lopez , A. Tompkins , M. Janisková , and E. Moreau , 2004: The capability of 4D-Var systems to assimilate cloud-affected satellite infrared radiances. Quart. J. Roy. Meteor. Soc., 130 , 917–932.
Collins, W. D. , and Coauthors , 2006: The Community Climate System Model version 3 (CCSM3). J. Climate, 19 , 2122–2143.
Delworth, T. , and Coauthors , 2006: GFDL’s CM2 global coupled climate models. Part I: Formulation and simulation characteristics. J. Climate, 19 , 643–674.
Errico, R. M. , P. Bauer , and J.-F. Mahfouf , 2007: Issues regarding the assimilation of cloud and precipitation data. J. Atmos. Sci., 64 , 3785–3798.
Evensen, G. , 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 , (C5). 10143–10162.
Gaspari, G. , and S. E. Cohn , 1999: Construction of correlation functions in two and three dimensions. Quart. J. Roy. Meteor. Soc., 125 , 723–757.
Geer, A. J. , P. Bauer , and P. Lopez , 2008: Lessons learnt from the operational 1D+4D-Var assimilation of rain- and cloud-affected SSM/I observations at ECMWF. Quart. J. Roy. Meteor. Soc., 134 , 1513–1525.
GFDL Global Atmospheric Model Development Team , 2004: The new GFDL global atmosphere and land model AM2-LM2: Evaluation with prescribed SST simulations. J. Climate, 17 , 4641–4673.
Hannay, C. , D. L. Williamson , J. J. Hack , J. T. Kiehl , and J. G. Olson , 2009: Evaluation of forecasted southeast Pacific stratocumulus in the NCAR, GFDL, and ECMWF models. J. Climate, 22 , 2871–2889.
Houtekamer, P. L. , and H. L. Mitchell , 1998: Data assimilation using an ensemble Kalman filter technique. Mon. Wea. Rev., 126 , 796–811.
Houtekamer, P. L. , and H. L. Mitchell , 2005: Ensemble Kalman filtering. Quart. J. Roy. Meteor. Soc., 131 , 3269–3289.
Kalnay, E. , H. Li , T. Miyoshi , S.-C. Yang , and J. Ballabrera-Poy , 2007: 4-D-Var or ensemble Kalman filter? Tellus, 59A , 758–773.
Kistler, R. , and Coauthors , 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation. Bull. Amer. Meteor. Soc., 82 , 247–267.
Lin, S.-J. , 2004: A “vertically Lagrangian” finite-volume dynamical core for global models. Mon. Wea. Rev., 132 , 2293–2307.
Lin, S.-J. , and R. B. Rood , 1996: Multidimensional flux-form semi-Lagrangian transport schemes. Mon. Wea. Rev., 124 , 2046–2070.
Pincus, R. , C. P. Batstone , R. J. P. Hofmann , K. E. Taylor , and P. J. Gleckler , 2008: Evaluating the present-day simulation of clouds, precipitation, and radiation in climate models. J. Geophys. Res., 113 , D14209. doi:10.1029/2007JD009334.
Tiedtke, M. , 1993: Representation of clouds in large-scale models. Mon. Wea. Rev., 121 , 3040–3061.
Tompkins, A. M. , and M. Janisková , 2004: A cloud scheme for data assimilation: Description and initial tests. Quart. J. Roy. Meteor. Soc., 130 , 2495–2517.
Vukicevic, T. , T. Greenwald , M. Zupanski , D. Zupanski , T. Vonder Haar , and A. Jones , 2004: Mesoscale cloud state estimation from visible and infrared satellite radiances. Mon. Wea. Rev., 132 , 3066–3077.
Vukicevic, T. , M. Sengupta , A. S. Jones , and T. Vonder Haar , 2006: Cloud-resolving satellite data assimilation: Information content of IR window observations and uncertainties in estimation. J. Atmos. Sci., 63 , 901–919.
Wilson, D. R. , A. C. Bushell , A. M. Kerr-Munslow , J. D. Price , C. J. Morcrette , and A. Bodas-Salcedo , 2008: PC2: A prognostic cloud fraction and condensation scheme. II: Climate model simulations. Quart. J. Roy. Meteor. Soc., 134 , 2109–2125.
Xie, S. , J. Boyle , S. A. Klein , X. Liu , and S. Ghan , 2008: Simulations of Arctic mixed-phase clouds in forecasts with CAM3 and AM2 for M-PACE. J. Geophys. Res., 113 , D04211. doi:10.1029/2007JD009225.
When an observation is perfectly correlated with a state variable, there is no sampling error in the regression used to update the state variable ensemble given increments for the observed variable. As the expected correlation between an observation and a state variable goes to 0, the signal-to-noise ratio in the computation of the regression also goes to zero. The result is that observations that are weakly correlated with a state variable are expected to cause large erroneous reductions in the state variable’s variance.