## Abstract

A cloud-nucleating aerosol retrieval method was developed. It allows the estimation of ice-forming nuclei and cloud condensation nuclei (IFN and CCN) for regions in which boundary layer clouds prevail. The method is based on the assumption that the periodical assimilation of observations into a microscale model leads to an improved estimation of the model state vector (that contains the cloud-nucleating aerosol concentrations). The Colorado State University Cloud Resolving Model (CRM) version of the Regional Atmospheric Modeling System (RAMS@CSU) and the maximum likelihood ensemble filter algorithm (MLEF) were used as the forecast model and the assimilation algorithm, respectively. On the one hand, the microphysical modules of this CRM explicitly consider the nucleation of IFN, CCN, and giant CCN. On the other hand, the MLEF provides an important advantage because it is defined to address highly nonlinear problems, employing an iterative minimization of a cost function. This paper explores the possibility of using an assimilation technique with microscale models. These initial series of experiments focused on isolating the model response and showed that data assimilation enhanced its performance in simulating a mixed-phase Arctic boundary layer cloud. Moreover, the coupled model was successful in reproducing the presence of an observed polluted air mass above the inversion.

## 1. Introduction

The ensemble Kalman filter (EnKF; e.g., Evensen 1994; Anderson 2001) is a technique for sequential data assimilation that uses Monte Carlo methods to forecast error statistics and uses forecast covariances between observed and unobserved variables to spread information from the observations both spatially and between variables. This technique generalizes the use of the Kalman filter (Kalman 1960) to nonlinear processes and requires no derivation of tangent linear operator, adjoint equations and no integrations backward in time. A general discussion of aspects of the EnKF algorithm can be found in Evensen (2003). The EnKF represents one of many variations of the original Kalman filter that addressed the general problem of the estimation of a state vector for nonlinear models.

The EnKF type of algorithm rapidly gained popularity because of its ability to handle nonlinear models, its simple conceptual formulation, and its relative ease of implementation. This is especially appealing for the aerosol retrieval scheme under development because it involves complex microphysics. Since Evensen (1994) first applied EnKF in the atmospheric science framework, this versatile method has proven to handle strongly nonlinear dynamics and large state spaces efficiently, and it is now used in realistic applications with a wide variety of atmospheric and oceanic models. Various uses in larger scales of the atmosphere can be found in the literature (Houtekamer and Mitchell 1998, 2001; Anderson 2001; Whitaker and Hamill 2002), and there are also numerous publications on the use of EnKF on oceanic forecasts (Keppenne and Reinecker 2003; Brusdal et al. 2003; Haugen et al. 2002; Haugen and Evensen 2002) as well as on marine biochemical models (Carmillet et al. 2001; Natvik and Evensen 2003) and ecosystem models (Eknes and Evensen 2002; Allen et al. 2003). Zhang et al. (2004), Snyder and Zhang (2003), Tong and Xue (2005), and Xue et al. (2006) more recently investigated the EnKF potential in the convective scale (e.g., assimilating radar observations). Among others, the recently developed maximum likelihood ensemble filter (MLEF) algorithm (Zupanski 2005; Zupanski and Zupanski 2006; Zupanski et al. 2006) provides an important advantage for assimilation retrieval, because it is defined to address highly nonlinear problems by employing an iterative minimization of the cost function. The efficiency of iterative minimization in the MLEF is greatly improved by an advanced Hessian preconditioning, typically reaching the minimum in 1–2 minimization iterations (Zupanski 2005).

Nevertheless, the possibility of using any assimilation technique to improve the performance of microscale models has not yet been explored. With spatial resolutions on the order of tens of meters, large-eddy simulation (LES) models can explicitly resolve the dominant energy-containing eddies of the turbulent flux in a three-dimensional (3D) framework. LES models and high-resolution cloud-resolving models (CRMs; the 2D version) became major tools in the research of boundary layer (BL) clouds among other stratiform clouds because they can resolve the turbulent flows affecting various microphysical mechanisms. This kind of microscale model was used to investigate the formation, maintenance, and dissipation of various types of clouds; to develop parameterizations or turbulent closure models; to evaluate the sensitivity of model results to parameter changes, and to investigate the role of microphysical mechanisms (Moeng 1986; Teixeira and Hogan 2002; Golaz et al. 2002; Wyant et al. 1997). They have also been used as grid-volume average models; however, their domain is more frequently seen as column models capable of generating a realistic statistical perspective of the in-cloud small-scale inhomogeneities affecting the microphysical processes and cloud radiative properties.

In this study, we set out to investigate the problem of data assimilation into CRM/LES models. However, the lack of availability of high-resolution observations capable of capturing such small-scale features clearly creates an important obstacle. Therefore, we based our approach on linking the data assimilation algorithm to the above-mentioned statistical perspective that sees the 2D or 3D domain as a column model.

Several modeling studies investigated the effects of aerosols on radiative properties of stratiform clouds (among them, Harrington et al. 1999; Jiang et al. 2000, 2001; Carrió et al. 2005a, b, 2007). Among other studies, these LES and CRM simulations revealed a pronounced sensitivity of cloud properties (i.e., optical depth, size spectra, and hence satellite radiance, radar reflectivity, etc.) to varying concentrations of cloud-nucleating aerosols and led us to consider exploiting this high degree of sensitivity to estimate those concentrations.

In this paper, we described the basic configuration of a method to retrieve cloud condensation nuclei (CCN), giant CCN, and ice-forming nuclei (IFN) concentrations in regions where BL clouds are prevalent. The main objective of this study was to examine the concept of enhancing the microscale model performance by linking the aforementioned statistical perspective to a data assimilation algorithm for simple observational operators. It was also necessary to examine several important issues such as the response of cloud-nucleating aerosol concentrations to the potential optimization of a model state that is configured to include them. This phase required extensive experimentation with various aspects such as assimilation frequency, ensemble size, necessary number of assimilation cycles, the variables to be included in the state vector, number of iterations, and the forecast model configuration, as well as solving some unique problems associated with data assimilation into a microscale model. To isolate the interaction between the forecast model and the assimilation algorithm, we used a well-documented mixed-phase Arctic BL cloud case that provided both real observations to be assimilated and independent observations to evaluate the model performance (including CCN and IFN concentrations). Results, although preliminary, are highly encouraging because the coupled model successfully reproduced the observed presence of a polluted air mass above the inversion.

A follow-up paper will extend these studies to observations requiring more complex observational operators (e.g., satellite radiances). A priori, the proposed technique not only offers the potential for retrieval of cloud-nucleating aerosols, but also cloud droplet and ice particle concentrations, liquid water and ice water paths, and cloud albedo. Moreover, the microscale model could assimilate a wide variety of observations produced by satellite-based systems.

The CRM model and the MLEF algorithm are described in section 2. The basic configuration of the proposed retrieval method is presented in section 3. Section 4 considers the experimental design and describes the general simulation conditions. Results and a summary and conclusions are presented in sections 5 and 6, respectively.

## 2. The CRM and the assimilation algorithm

### a. The forecast model

The dynamical modeling framework used for this study is the CRM version of the Regional Atmospheric Modeling System at Colorado State University (RAMS@CSU; Pielke et al. 1992; Cotton et al. 2003). This nonhydrostatic model integrates predictive equations for the wind components, the Exner function, the ice–liquid potential temperature, and the total mixing ratio on a vertically stretched Arakawa C grid. The subgrid-scale fluxes are computed following Deardorff (1966), using a prognostic equation for the subgrid-scale turbulent kinetic energy.

The hydrometeor size spectra are assumed to follow gamma distributions and the condensed water substance is categorized into eight hydrometeor species: cloud droplets, drizzle drops, raindrops, pristine ice, snow, aggregates, graupel, and hail. In the two-moment microphysical framework (Meyers et al. 1997; Saleeby and Cotton 2004), these gamma distributions have 2 degrees of freedom. The prognostic microphysical variables include the number concentrations and mixing ratios of all eight hydrometeor species, as well as the IFN, CCN, and giant CCN concentrations. The consideration of a large cloud droplet mode (drizzle drops), in combination with the traditional single mode of cloud droplets, allows a more accurate representation of the bimodal distribution that occurs in the atmosphere. Collection is simulated using stochastic collection equation solutions, facilitated by lookup tables, rather than by continuous accretion approximations. The philosophy of bin representation of collection is extended to calculations of drop sedimentation. The introduction of prognostic cloud droplet concentration through nucleation of CCN limits the amount of cloud water depending on the available CCN. This eliminates the highly limiting constraint of selecting a constant mean diameter or a constant number concentration for cloud droplets. CCN (and giant CCN) are activated through nucleation of cloud droplets based on a Lagrangian model developed by Feingold and Heymsfield (1992). The concentration of CCN nucleated to become cloud droplets is obtained from a lookup table as a function of CCN concentration *N*_{CCN}, vertical velocity, and temperature (Saleeby and Cotton 2004). The lookup table is previously generated (offline) from detailed bin-model parcel calculations assuming that CCN are ammonium sulfate particles and have a bimodal size spectrum based on lognormal basis functions. Several studies [among them, Dusek et al. (2006) and Fitzgerald (1974)] showed that variations in chemistry are secondary to variations in concentrations and size spectra of the aerosol. The procedure is analogous to using the equation

where *N*_{CCN} is the concentration of CCN, *N*_{c1} represents the concentration of cloud droplets activated in the first cloud droplet mode (droplets with diameters less than 40 *μ*m), *S _{w}* is water supersaturation, and

*b*is an empirically determined parameter. Concentration

*N*

_{CCN}is a model forecast variable that can be advected and diffused, and it also has sinks due to nucleation to form cloud droplets and sources as cloud droplets are evaporated.

Pristine ice crystals formed by deposition–condensation freezing on IFN are given by

where *N*_{pris} and Δ*N*_{pris} are the pristine ice number concentration and its change due to nucleation, respectively. Variable *N*_{IFN} is the concentration of available IFN, *F _{M}* is a function of the Meyers et al. (1997) formula that represents the fraction of available IFN activated as a function of ice supersaturation

*S*, and the product

_{i}*N*

_{IFN}

*F*is the maximum concentration that can be activated at the current supersaturation conditions. The

_{M}*F*is maximized (equal to 1) for these simulations at an ice supersaturation of 40%. Ice activation occurs only when this maximum concentration exceeds the pristine ice number concentration. Concentration

_{M}*N*

_{IFN}is a forecast variable that is advected and diffused and has sinks due to ice activation.

In addition, homogeneous freezing of cloud droplets and haze particles is done following DeMott et al. (1994). Supercooled raindrops freeze by collecting ice crystals. Ice multiplication by rime-splinter process is parameterized following Mossop (1978), as described in Cotton et al. (2003).

The two-stream radiative transfer model used for this study (Harrington 1997; Harrington et al. 1999) treats the interaction of three solar and five infrared bands with the model gases and cloud hydrometeors. This scheme solves the radiative transfer equations for three gaseous constituents (water vapor, ozone, and carbon dioxide) and the optical effects of the hydrometeor size spectra. Gaseous absorption is calculated by following the fast exponential sum fitting of transmissions method proposed by Ritter and Geleyn (1992). Lorenz–Mie theory is used to compute the optical properties for water drops, and the theory of Mitchell et al. (1996) is used for nonspherical ice crystals. For each hydrometeor species, the band-averaged values of optical properties are computed for the assumed gamma distribution used in RAMS@CSU following the method of Slingo and Schreckner (1982).

### b. The maximum likelihood ensemble filter

The MLEF algorithm is explained in more detail in Zupanski (2005), Zupanski and Zupanski (2006), and Zupanski et al. (2006). Here we provide only basic explanations about the algorithm. Unlike most (i.e., standard) EnKF data assimilation algorithms, which seek a minimum variance state solution, the MLEF seeks a maximum likelihood solution. The solution for a state vector **x** (also referred to as control variable) of dimension *N*_{state} is obtained by minimizing a cost function *J* defined under standard Gaussian assumption as

In Eq. (3), the vector **y** is an observation vector of dimension equal to the number of observations *N*_{obs}, and *H* is a nonlinear observation operator. Subscript *b* denotes a background (i.e., prior) estimate of **x**, and superscript T denotes a transpose. The *N*_{obs} × *N*_{obs} matrix 𝗥 is a prescribed observation error covariance, and it includes instrumental and representativeness errors (e.g., Cohn 1997). In the experiments presented, 𝗥 is a diagonal, constant-in-time co(variance) matrix that corresponds to observation errors of 5 g m^{−2} and 1 W m^{−2} for vertically integrated water paths and surface fluxes, respectively. The matrix 𝗣* _{f}* of dimension

*N*

_{state}×

*N*

_{state}is the forecast error covariance. Note that we employ the commonly used rank-reduced square root formulation 𝗣

_{f}= 𝗣

^{1/2}

_{f}(𝗣

^{1/2}

_{f})

^{T}, where 𝗣

^{1/2}

_{f}is an

*N*

_{state}×

*N*

_{ens}square root matrix (

*N*

_{ens}being the ensemble size).

A conjugate gradient minimization algorithm (e.g., Luenberger 1984), with the line-search technique as in Navon et al. (1992), is used to minimize Eq. (3). A satisfactory solution to a nonlinear data assimilation problem is typically obtained after 2–3 minimization iterations, owing to the effective Hessian preconditioning, defined in ensemble subspace (e.g., Zupanski 2005). The MLEF algorithm is schematically shown in Fig. 1.

In the applications of this study, the CRM version of RAMS@CSU serves as a nonlinear forecast model *M*. The observation vector **y** includes the liquid and ice water paths (LWP and IWP), and the longwave and shortwave downwelling fluxes at the surface (LWDN and SWDN). The observation operator *H* is defined by the operators for calculating model simulated water paths and the surface fluxes. In the case of LWP and IWP, *H* is simply the vertical integration of the mixing ratios multiplied by the air density for the liquid and ice phase species, respectively, whereas, for the LWDN and SWDN, *H* is trivial because both fluxes are resolved by the two-stream radiative scheme within the forecast model. The control vector **x** includes model state variables as explained in the next section.

## 3. Configuration of the aerosol retrieval method

The proposed aerosol retrieval method is based on the assumption that data assimilation into a microscale model leads to an improved estimation of the model state vector that contains the IFN and CCN concentrations, along with the basic prognostic variables of the dynamic model and the prognostic variables that describe the size spectra of all hydrometeors.

As mentioned in the introduction, assimilating data into a microscale model introduces several unique problems. The fact that observations at such a high spatial resolution are rarely available, along with the complexity and nonlinearity of the microphysical processes interacting with the microscale circulations, makes the optimization of the structure of these eddies an unrealistic goal. Unlike typical applications for mesoscale and synoptic scale, the proposed approach does not focus on the spatial location and intensity of individual flow features. Conversely, it sees the 3D or 2D domain of the forecast (FCST) model as a column model capable of generating a realistic statistical perspective of the in-cloud small-scale inhomogeneities that govern the temporal evolution of the horizontally averaged vertical profiles of the microphysical variables. These vertical profiles are seen by the data assimilation algorithm as a 1D version of the state vector linked to the data assimilation algorithm. As a consequence, the observations are considered to be valid for the entire horizontal extension of the FCST model domain.

After each assimilation cycle, a new ensemble is defined perturbing the new optimal state and therefore each ensemble member needs a (spinup) time to develop stable turbulence statistics that are physically consistent with the new vertical profiles. During these spinup periods preceding the FCST runs, the horizontally averaged model state vector is not allowed to change. A Newtonian relaxation technique (nudging) is used to preserve this 1D version of the state vector while the 2D/3D model develops an eddy distribution consistent with these optimized vertical profiles. In addition, it is necessary to take into account aspects that are not resolvable within the framework of the microscale model. These larger-scale (LS) changes are taken into account only during the FCST runs.

The resulting configuration of the proposed method can be described as follows and a schematic representation is given in Fig. 2.

The

*first*ensemble of CRM simulations is randomly designed.Nudging terms are considered for each ensemble member to take into account LS changes until an assimilation time is reached.

Observational data are assimilated, and an

*optimal*state of the model is computed by the minimization of a cost function. It continues with the following step, unless this is the last assimilation cycle.A new ensemble of CRM simulations is generated by adding to the optimal state perturbations based on error covariance matrices.

Each ensemble member has its spinup period to allow for development of an eddy distribution that is physically consistent with the new (perturbed) optimal state.

During the spinup period, time “freezes” and therefore no nudging terms are applied to take into account LS changes. However, a 1D version of the optimal state is nudged.

When the spinup period ends,

*time*starts evolving and returns to step 2.Aerosol concentrations are retrieved from the last cycle’s optimal model state.

A similar configuration could be used with a moving domain along trajectories predicted by a mesoscale model. For this application that will be analyzed in a follow-up paper, the LS changes are provided by the mesoscale model [as done in Carrió et al. (2007)] and the observations to be assimilated are collected along the predicted trajectory. The numerical experiments presented in the following section used a fixed domain and focused on isolating the response of the microscale model to data assimilation.

## 4. Preliminary experiments

### a. 4 May 1998 Arctic BL cloud case

We chose this previously studied case (Carrió et al. 2005a), because it represented an excellent framework in which to perform these first numerical experiments. First of all, this mixed-phase cloud case allowed examining the response of both IFN and CCN concentrations (prognostic variables) to the assimilation of real observations. This cloud was extensively observed during the First International Satellite Cloud Climatology Project (ISCCP) Regional Experiment (FIRE)/Surface Heat Budget of the Arctic Ocean (SHEBA) field experiment, using multiple remote sensing measurements such as 35-GHz cloud radar, depolarization lidar, and microwave radiometer (Intrieri et al. 2002a, b), high-resolution rawinsonde soundings, and liquid-phase retrievals involving both radar and aircraft data (Zuidema et al. 2005). These series of cloud, atmospheric, and sea ice properties represented a unique framework to isolate the microscale model response to data assimilation. They provided high-quality observations to take into account LS changes, vertically integrated quantities to be assimilated, and independent observations to evaluate the impact of data assimilation on model performance. Last, the presence of a moderately polluted air mass above the inversion makes this Arctic BL cloud case particularly interesting for examining the feasibility of retrieving cloud-nucleating aerosol concentrations. The documented concentrations (obtained using the instantaneous CCN spectrometer of the Desert Research Institute) were approximately 100 and 250 cm^{−3} below and above the inversion, respectively (Yum and Hudson 2001). The IFN vertical profile derived from the CSU continuous-flow diffusion-chamber ice nucleus counter data exhibited relatively large concentrations above the boundary layer with a maximum value of 85.6 per liter (L^{−1}), while below the inversion the vertical average is approximately 3 L^{−1} (Rogers et al. 2001). These IFN and CCN concentrations (above the boundary layer) were in strong contrast with normal values for the pristine Arctic environment (typically 3 L^{−1} and 30 cm^{−3}, respectively).

### b. Simulation conditions

All numerical experiments were performed with an FCST model configuration, initial conditions, and the treatment of LS changes identical to those used in our previous study of this mixed-phase Arctic boundary layer case. A two-dimensional framework was used for the FCST model, and the simulation domain was 5000 m in the horizontal plane 3325 m in the vertical direction, and it was approximately located at 76°N, 165°W. A constant horizontal grid spacing of 50 m and a time step of 2 s were used. The vertical grid was stretched using the relationship Δ* _{z}*(

*k*+ 1) = 1.05Δ

*(*

_{z}*k*), with 30-m spacing at the finest to provide better resolution within the boundary layer. All runs were initialized at 1700 UTC 3 May 1998 (using the corresponding high-resolution SHEBA sounding). The Newtonian relaxation technique (nudging) was used to consider the LS changes that cannot be resolved within this modeling framework. Nudging terms were based on four 4 May 1998 SHEBA soundings (0515, 1115, 1715, and 2315 UTC) and were applied only to horizontal averages to avoid nudging out the finescale features developed in the CRM domain [for a more detailed description see Carrió et al. (2005a, b)]. The lateral boundary conditions were cyclic, and the domain top was a rigid lid. Rayleigh damping was used in the five highest levels of the domain to prevent the reflection of vertically traveling gravity waves off this rigid lid. Noninertial effects were neglected. Surface properties and fluxes were provided by the Los Alamos sea ice model (CICE) implemented into RAMS@CSU.

Simulations covered a period of 30 h, although no assimilation was performed until 0000 UTC 4 May. IFN and CCN concentrations were homogeneously initialized with typical clean concentrations of cloud-nucleating aerosols ([IFN] = 4 L^{−1} and [CCN] = 30 cm^{−3}, both below and above the inversion). Most experiments were performed assimilating temporal series of LWP and IWP. However, real SWDN and LWDN observations were also used for some ensemble runs. We considered different ensemble sizes of 50, 100, and 200 members, and a free run (control). A summary of the numerical experiments that correspond to results presented in this paper is given in Table 1. In assimilation runs, the two-letter prefixes PA, FL, and FP indicate the observation type that is assimilated (water paths, surface fluxes, and both, respectively) and the numerical suffixes indicate the ensemble size.

## 5. Main results

The model state vector was configured to include the variables controlling the size spectra of all hydrometeors (i.e., the number concentrations and the mixing ratios of cloud droplets, drizzle drops, raindrops, pristine ice crystals, snow, and aggregates, graupel, and hail), the CCN and IFN concentrations, the ice–liquid potential temperature, and the total water mixing (the sum of the mixing ratios corresponding to water vapor and all condensed species).

A suite of remote sensor measurements (35-GHz cloud radar, microwave radiometer, depolarization lidar, and surface radiometers) provided observational data to be used as independent observations as well as for assimilation purposes. The ice water content (IWC) vertical profiles were derived from 35-GHz radar data. The same rawinsonde data we used to define LS changes were combined by Zuidema et al. (2005) with the multisensor measurements to characterize the ice- and liquid-phase cloud components from 1 to 10 May 1998. Cloud base and top were established by the depolarization radar, and the top of the liquid cloud was identified by the temperature inversion. Adiabatic ascent calculations were applied to a parcel saturated at cloud base to obtain the liquid water content vertical profile. Liquid particle sizes were estimated by combining liquid water contents, determined from an adiabatic ascent calculation and constrained by the LWPs retrieved from an upwardly pointing microwave radiometer, with the droplet concentrations and distribution widths determined from aircraft data [for more details see Zuidema et al. (2005)].

The assimilation experiments covered a period of 24 h from 0000 UTC 4 May 1998. Figure 3 shows the hourly averaged data of LWP, IWP, LWDN, and SWDN used for the 24 (hourly) assimilation cycles. The cloud was mostly liquid at 0000 UTC and then underwent a gradual glaciation process reaching a minimum LWP at approximately 1200 UTC. LWDN showed a behavior closely linked to that of LWP, with a minimum at the same time. Arctic mixed-phase BL clouds are characterized by a high colloidal instability. Most microphysical schemes tend to overestimate severely the glaciation rates, consistently underestimating the liquid fraction of the simulated clouds.

The total condensate paths (TCPs) and liquid water fractions corresponding to different runs (FL50, PA50, PA100, and the free run) are compared with observed values in Fig. 4. TCP was computed as the horizontally averaged vertical integral of the water contents of all liquid- and ice-phase species with nonnegligible values (cloud droplets, drizzle drops, raindrops, pristine ice, snow, and aggregates). The free run represented fairly well the observed time evolution of TCP, and none of the assimilation experiments significantly enhanced the simulation of TCP. The latter was an expected result, because TCP is basically linked to the thermodynamic state, mainly governed by the LS changes. Moreover, this is desired behavior because the sampling freedom of the assimilation algorithm should focus more on the cloud microstructure and less on aspects that are not entirely resolvable within the FCST model. Conversely, the free run showed poor performance in representing the phase of the cloud, with liquid water fractions below 15% for the entire period of interest. Only the run that assimilated downwelling radiative fluxes (FL50) did not reproduce a drop in the liquid water fraction of a comparable magnitude between 1100 and 1600 UTC.

Figure 5 in analogous to Fig. 4 but compares downwelling radiative fluxes at the surface. The observed LWDN shows an abrupt drop at approximately 13 UTC. Runs FL50, PA50, and PA100 exhibited LWDN minima centered at approximately the same time, while the free run showed a monotonic increase with time. The magnitude of the minimum for FL50 was the closest to the observations. These experiments could not reproduce the observed abrupt drop, but the narrowest minimum corresponded to PA100. It must be noted that PA50 and PA100 only assimilated water paths and that, therefore, LWDN is an independent observation for these runs. Conversely, the simulated SWDN evolution showed negligible differences for these experiments. All runs (not only those compared in this figure) consistently underestimated SWDN by roughly 10 W m^{−2}. This almost constant difference between simulated and observed values may be explained by the presence of an upper-layer cloud that extended up to 5000m and could not be adequately resolved within our domain [see Zuidema et al. (2005)) and Carrió et al. (2005a)) for more details].

In general, better results were obtained for larger ensemble sizes, and assimilating water paths had a stronger impact when jointly comparing the simulated and observed water paths and surface radiative fluxes. As mentioned in section 1, the MLEF has the advantage of employing an iterative minimization of a cost function. Most of the numerical experiments listed in Table 1 have been rerun allowing two and three minimization iterations. In summary, consideration of a higher number of iterations had negligible impact when only paths were assimilated, probably because of the fairly linear relationships between them and the state variables (mixing ratios). Conversely, a higher number of iterations had a positive effect on runs that also assimilated radiative fluxes. Figure 6 shows a nondimensional measure of the “distance” between these observed and simulated quantities for different simulation conditions. The differences between observed and simulated values for each observation type (IWP, LWP, LWDN, and SWDN) were normalized, dividing them by the corresponding mean value for the simulation period [see Eq. (4)]. Increasing the ensemble size enhanced the performance independent of the assimilated observation type. However, increasing the number of minimization iterations only had an impact on runs FP50 and FP100 and showed virtually no effects on runs PA50 and PA100. Here is the normalized difference equation:

In Eq. (4), *d* is the distance, *X*^{obs}_{k} and *X*^{sim}_{k} are the observed and simulated values, respectively, represents the corresponding mean value for the simulation period, and the values of *k* denote the four different observation types.

The previous figures indicate that data assimilation improves the estimation of vertically integrated quantities in both cases, when they are assimilated and independent observations. However, this study has been motivated by the following questions: Can assimilating these vertically integrated quantities into a CRM enhance its performance in simulating the vertical structure of BL clouds? What would be the response of cloud-nucleating aerosol concentrations to the potential optimization of the model state? To address the first of these questions, we compared the temporal evolution of the vertical profiles for IWC and the effective radius of liquid-phase particles *R*_{eff}, simulated and retrieved using surface-based remote sensors.

Figure 7 compares the horizontally averaged IWC vertical profiles simulated for the free run and two runs assimilating both water paths and downwelling radiative fluxes. Experiments FP100 and FP200 significantly improved the performance of the model in simulating the timing, magnitude, and vertical position of the IWC maxima. In particular, the run that performed best in the latter aspect (FP200) involved doubling the ensemble size with a lower assimilation frequency (every 2 h). Note that run FP200 has a coarser temporal resolution and therefore it is more expedient from the computational point of view because it requires a lower number of spinup periods. The free run reproduced neither the observed pattern nor the magnitude of the maxima (note that this panel uses a different color bar). Figure 8 is analogous to Fig. 7 but compares retrieved and simulated vertical profiles of *R*_{eff}. The simulated values of *R*_{eff} were computed as the ratio between the third and second moments of the composite size distribution for liquid particles (binning the gamma distribution of cloud drops, drizzle drops, and raindrops). This parameter, which is of great importance from the radiative perspective, was realistically estimated by all numerical experiments, including the free run. However, note that the latter run underestimated the liquid mass by an order of magnitude, whereas FP100 and FP200 show liquid water content maxima similar to the airborne observed values (∼0.2 g m^{−3}). FP100 reproduced slightly better the timing and the expected drop size decrease as IWC values increase at the expense of the evaporation of liquid particles.

We also examined the response of cloud-nucleating aerosol to the potential optimization of the model state. The simulated IFN concentrations for the free run, FP50, and FP100 are given in Figs. 9a–c, respectively. All of these runs were initialized with a constant vertical profile of 3 L^{−1}; however, the data assimilation simulations rapidly developed significantly higher concentrations above the inversion. The IFN vertical profile derived from the CSU continuous-flow diffusion-chamber ice nucleus counter data at approximately 2200 UTC 4 May is given in Fig. 9d. The concentrations above the boundary layer are much higher than those typically observed (3–4 L^{−1}), with a peak value of 85.6 L^{−1}. Runs FP50 and FP100 both generated IFN concentrations that exceeded 80 L^{−1}. As expected, the control run only exhibits a slight decrease below the inversion because of the nucleation of pristine ice crystals. Although a detailed CCN vertical profile was not documented, the instantaneous CCN spectrometer of the Desert Research Institute indicated a maximum concentration of approximately 250 cm^{−3} above the inversion and lower values (∼100 cm^{−3}) within the boundary layer. Simulated CCN concentrations exhibited a similar behavior (Fig. 10) with higher concentrations above the inversion having values that exceeded 240 cm^{−3}. In summary, data assimilation experiments were successful in reproducing the observed presence of a polluted air mass with simulated IFN and CCN concentrations comparable to those documented. Note also that IFN and CCN concentrations are *not* assimilated and therefore are independent observations (retrieved variables).

## 6. Summary and concluding remarks

We developed an aerosol retrieval method that allows the estimation of IFN and CCN concentrations in regions where BL clouds prevail. This method is based on the assumption that periodical assimilation of observations into a microscale model leads to an improved estimation of the model state vector (that contains the cloud-nucleating aerosol concentrations). For this purpose we adapted the MLEF algorithm to be used with the CRM and LES versions of RAMS@CSU. On the one hand, the microphysical modules of this model explicitly consider the nucleation of IFN and CCN (and giant CCN). On the other hand, the MLEF presents an important advantage (relative to the classical ensemble Kalman filter) because it is defined to address highly nonlinear problems, employing an iterative minimization of a cost function. The resulting configuration presents some singular aspects such as the consideration of an FCST model and a model state vector with different dimensions. In addition, it was necessary to consider periodical spinup periods for ensemble members to develop stable turbulence statistics physically consistent with the (perturbed) new optimal model states. During recent years, the use of this assimilation algorithm among many other variations of the original Kalman filter technique gained popularity with atmospheric, oceanic, and sea ice models. However, the possibility of using any of these assimilation algorithms to improve the performance of high-resolution CRMs and LES models has not yet been explored.

The aerosol retrieval method was tested with real observations using an Arctic cloud BL case extensively documented during the FIRE/SHEBA field experiment. A wide variety of remote sensing measurements provided vertically integrated quantities to be assimilated as well as independent observations to evaluate the model performance. Airborne data for both IFN and CCN were also available for this day (4 May 1998). In addition, the fact that the air mass was found to be polluted in terms of IFN and CCN concentrations above the boundary layer made this case an ideal framework to test the coupled model as an aerosol retrieval method. The microscale model was successful in reproducing the presence of the polluted layer overriding the inversion and retrieving cloud-nucleating aerosol concentrations comparable to those observed. We performed these preliminary tests with a static domain to isolate the response of the microscale model to data assimilation while using high-quality in situ observations and simple observational operators. Results indicated that data assimilation enhanced several aspects of the microscale model performance in simulating Arctic mixed-phase BL clouds. This kind of cloud is particularly difficult to simulate, mainly because of its inherent colloidal instability.

These preliminary tests involved finding the optimal configuration for the aerosol retrieval method and required extensive experimentation with various aspects such as necessary number of assimilation cycles, observations to be assimilated, assimilation frequency, ensemble size, number of minimization iterations, and variables to be included in the state vector, as well as the FCST model configuration. In summary, most experiments required a warm-up period of nine cycles when assimilating observations hourly. Increasing the ensemble size generally gave better results independent of the other configuration parameters, and 100 ensemble members were enough to improve significantly the quality of the simulations. For instance, using 200 ensemble members showed an acceptable performance at a lower computational cost because it used a lower assimilation frequency (and required fewer spinup periods). When assimilating vertically integrated water paths, increasing the number of iterations while minimizing the cost function had a negligible impact. Conversely, subsequent iterations enhanced the model performance for a significantly less linear observational operator, like the one used to assimilate downwelling radiative fluxes. This was particularly visible for the infrared radiation emitted to the surface because of its strong link with cloud liquid water content.

We are now testing this approach with a microscale model domain that moves along trajectories predicted by a mesoscale model. The LS changes are provided by the mesoscale model [as done in Carrió et al. (2007)], and satellite radiances to be assimilated are collected along the predicted trajectory. For that purpose, we chose maritime stratocumulus cases documented during the Rain in Cumulus over the Ocean (RICO) project. These results will be published in a follow-up paper.

We also plan to improve the aerosol retrieval method by employing a more general variant of the MLEF, including non-Gaussian errors (e.g., Fletcher and Zupanski 2006) and model errors (Zupanski and Zupanski 2006). Because many cloud variables typically have non-Gaussian errors, and the model errors for these variables are sometimes large, these generalizations should further improve the method.

## Acknowledgments

Gustavo Carrió and William Cotton were funded by the National Aeronautics and Space Administration under Grant NNG04GD75G. Dusanka Zupanski and Milija Zupanski received support from the U.S. Department of Defense Center for Geosciences cooperative agreement Research Grants DAAD19-02-2-0005 and W911NF-06-2-0015.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**.**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Gustavo Gabriel Carrió, Atmospheric Science Bldg., Office 225, Colorado State University, Fort Collins, CO 80523. Email: carrio@atmos.colostate.edu