1. Introduction
In recent years, development of hybrid background error covariance (BEC) models has been an area of active research in atmospheric data assimilation (Hamill and Snyder 2000; Etherton and Bishop 2004; Wang et al. 2007). It has been shown in particular that hybrid models tend to be more robust than conventional ensemble-based data assimilation schemes, especially when the model errors are larger than observational ones (Wang et al. 2007, 2008, 2009). This feature is attractive for the regional assimilation problems in oceanography, where information on the background state is often scant and incomplete.
Sequential data assimilation schemes developed so far for regional oceanographic studies can be classified in two categories. The first one is the Kalman filter (KF)-type algorithms with low-rank BEC matrices
A typical oceanographic setting of such kind is a near-coastal survey by autonomous gliders, which have recently become a fast-developing operational technology in oceanography (Rudnick et al. 2004). Gliders are capable of making remotely controllable surveys of limited areas at high spatiotemporal resolution. Such a dense 4D coverage is usually accompanied by a relatively poor knowledge of the background ocean state: near-coastal regions are often affected by poorly known peculiarities of the bottom topography and the associated tidal/inertial motions that cannot be resolved by global OGCMs. Considerable model error covariances also persist at scales comparable with the size of the domain due to inconsistencies in the boundary conditions and/or local atmospheric forcing.
Because of the relative novelty of glider technology, examples of glider assimilation are rare in literature (Heaney et al. 2007; Shulman et al. 2009). Recently, Dobricic et al. (2010) have shown that three-dimensional variational data assimilation (3DVar) assimilation of glider data significantly improves the forecast skill of a regional model. Most importantly, glider data were able to capture basin-scale BE correlations, which improved the model’s forecast skill several weeks after termination of glider observations. Dobricic et al. utilized the second category 3DVar algorithm based on stationary Gaussian-shaped BECs in the horizontal combined with EOF decomposition in the vertical (Dobricic and Pinardi 2008) and did not explicitly include adaptive error covariances inferred from model statistics.
In this study we propose a hybrid 3DVar assimilation system specifically targeted on preserving survey-scale correlations that could be resolved by gliders in coastal areas. Similar to the existing atmospheric hybrid models, the “flow dependent” part of the covariance
Another distinctive feature of the BEC model is an explicit separation of the covariance components in
The rest of the paper is organized as follows. We start with the description of the hybrid BEC model (section 2), then briefly review the Navy Coastal Ocean Model (NCOM) forecast model and the experimental design for the Monterey Bay area (section 3). We continue with an examination of the forecast skills of the assimilation system for the twin-data experimental setting and subsequent real-data experiment (section 4). Section 5 concludes the paper.
2. A hybrid 3DVar assimilation scheme
a. The BEC model


























b. Definition of m and α
Accurate determination of the first term in the BEC model in (8) is important because this term is responsible for capturing error correlations on scales comparable with the size of the domain. In oceanographic applications these errors are generated by poorly known open boundary conditions and errors in atmospheric forcing, which tend to have larger scales than those of the internal oceanic variability. In addition,


The relationship in (10) gives an asymptotic (N ≫ 1) approximation to the Bayesian posterior probability for a model with m parameters (linear regression on m eigenfunctions) given N observations (T/S fields sampled by gliders at the analysis times) under the assumption that model-data misfits are normally distributed. A similar, but less restrictive m criteria could be also used (Akaike 1974; Hannah and Quinn 1979).




Here


In principle, one can generalize the covariance model in
c. Definition of β



d. Numerical implementation
Since gliders directly measure only the temperature and salinity fields, the operator





2D slices of the rows of the numerical approximations of
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1


The only type of data used in the present study were temperature and salinity profiles from gliders. Therefore, balance constraints were introduced by applying the linearized equation of state and the geostrophic–hydrostatic relationships directly to the temperature and salinity increments (e.g., Li et al. 2008) obtained from minimization of the cost function (1).
3. Experiment design
The BEC model was verified by 3DVar assimilation experiments with the Navy Coastal Ocean Model (NCOM) configured in the Monterey Bay (Fig. 2) for processing of the data acquired during the Autonomous Ocean Sampling Network (AOSN II) experiment (Ramp et al. 2008). The experiment was conducted in the summer of 2003 with the ultimate goal of developing an adaptive sampling technique that combines numerical forecasts with the data flows from controllable observation platforms. Observations were performed by several types of autonomous underwater vehicles (AUVs) including gliders, high-frequency radars, two moorings, bottom-mounted ADCPs, surface drifters, and CTD casts. In the present study, we focus the analysis on the temperature/salinity data from gliders only: space–time coordinates of the gliders are used to define observation operators
Locations of glider profiles during the experiment (solid dots) and model grid (smaller dots). The bathymetry contours are in m. The domain used for estimation of the distances
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
a. Numerical model, observations, and validation technique
To simulate oceanic variability during the experiment we used a version of NCOM forced by the Coupled Ocean–Atmosphere Mesoscale Prediction System (COAMPS; Hodur et al. 2002) winds in the time period between 1 August (t = 0) and 27 August (t = 27) of 2003. The model was configured on a curvilinear orthogonal grid (Fig. 2) with horizontal resolution ranging from 1 to 4 km, and a hybrid σ/z vertical coordinate system with 9 σ levels in the upper ocean and 32 z levels below. At the open boundaries, the model was one-way coupled to the global NCOM model (Shulman et al. 2009).
Glider observations during the experiment covered the central part of the model domain (Fig. 2). With a typical dive cycle of about 1 h, a glider would travel approximately 0.5 km between surfacings, which is well below the grid resolution. For that reason we prescribed observational operators


An example of temperature and velocity fields for the (a) true and (b) first-guess solutions (z = 28 m, t = 1.5 days).
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
To simulate model errors and assess the impact of assimilation in the twin-data experiments, the “first guess” model solution xfg(t) was generated by integrating the model for 27 days starting from the initial condition specified by xt(t = 8.5) (Fig. 3, right panel).
Experiments with real data were conducted and the results were validated in a similar manner, except that xt was taken as the first guess and
b. Parameters of the hybrid covariance model
In contrast to atmospheric applications, regional oceanographic problems have more difficulties with the BEC estimation from ensembles. The reason is that realistic ensembles simulating ocean variability on regional scales are rarely available. In the present study, the first-guess background error statistics was obtained from the ensemble of the differences
In the hybrid assimilation runs (with α ≠ 0) this first-guess ensemble
To increase the robustness in estimating α and β, we utilized the method of Wang et al. (2007) and performed additional time averaging while computing the sample variances in (13) and (15). This averaging was done over the ensemble of 30 states (15 days) preceding the analysis time. In the initial 15 days of the assimilation run, the missing background states were taken from the respective forecasts xfg(t) generated by the first-guess solution. Similar averaging over N = 30 samples following the analysis time was done when estimating σm in (11).
There are two parameters in the definition of
Comparison of the model solutions (Fig. 3) with the grid (Fig. 2) gives an indication that horizontal correlations are likely to decay at 3–6 grid steps. We checked this hypothesis by twin experiments with α = 0 and computed the forecast skill of the assimilated solutions with various values of
4. Results
a. Twin-data experiments
Figure 4 compares the skill of 3DVar assimilation runs performed with the Gaussian and hybrid BEC models. During the first 8 days of assimilation, the hybrid scheme was unable to detect any statistically reliable modes. Between days 8 and 11 the first mode was detected, accounting for 8% of the forecast error on day 8, 14% on day 11 (Fig. 4), and 17% on day 17. On day 12 the second mode was detected, accounting for 4% of the forecast error variance. Contribution of the second mode increased to almost 10% on day 18. Later, the modes appear to lose their predictive skill with the contributions dropping to 12% and 7%, respectively, on day 25.
Normalized distances
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
The 12-h forecast errors measured in terms of the normalized distances
Normalized distances
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
Assimilation experiments with different noise in observations have shown that the patterns in Figs. 4–5 are robust up to the noise levels of 0.5. At higher noise levels, the approximation (13) becomes less accurate and it is necessary to use the relationship (12) for estimating α. Larger errors in estimating α result in the loss of accuracy in estimating the number of modes m and the magnitude β of the Gaussian part of the covariance. We therefore assume that the proposed algorithm is valid when observation errors are considerably smaller than the background errors. This is not a severe restriction for regional assimilation problems in oceanography where the first-guess/background model solutions are rarely preconditioned by data and often appear to be rather far from reality.
b. Real-data experiments
Figure 6 shows a typical situation we encountered in the experiments with real data in the Monterey Bay: The first-guess model solution does not have much in common with the mooring record at 40 m (left panel). Moreover, the mean profiles in the right panel demonstrate considerable salinity biases above 30 m and in the depth range between 50 and 200 m. The rms variations of salinity measured by gliders and moorings are generally consistent with each other in magnitude (cf. horizontal bars and the width of light shading around the thick profile in the right panel). A noticeable bias between the mean salinity measured by moorings (solid dots) and gliders (thick line) could be attributed to differences in averaging: the glider profile is obtained by averaging over all the glider positions (Fig. 2), whereas the mooring profile (solid dots) is obtained as the mean of only two moorings. Similar biases between the first-guess solution and observations were obtained for the temperature field (not shown).
(left) Salinity recorded by the offshore mooring in Fig. 2 (black line) and the corresponding salinity of the first-guess NCOM solution (gray line). (right) Profiles of the average salinity measured by gliders (solid bold line), moorings (dashed line), and extracted from the first-guess model solution (solid thin line). Shading and horizontal bars show rms variability.
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
To estimate observation errors, we compiled the glider T/S records at times when gliders passed closer than 200 m of either of the moorings and compared these data with the corresponding observations at moorings. In total, 168 of such “pass-by events” were found. Comparison of these observations has shown that the rms discrepancies in temperature and salinity were fairly stable with depth and varied within 0.26–0.35 after normalization by the rms variances σm(z) recorded at the moorings (horizontal bars in the right panel of Fig. 6). Based on these computations, the observation error variances were estimated as R1/2(z) = 0.3σm(z) and assumed not to vary in the horizontal.



Figure 7 demonstrates the differences δq between the salinity forecast skills
Improvement of the 12-h salinity forecasts at glider observation points (gray line) and at the moorings (black line). Positive values correspond to smaller forecast errors for the hybrid scheme. Vertical bars indicate occasional detection of only one mode and the thick black line shows the percentage of the error variance that the mode explains.
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
Compared with the time-averaged assimilation skill in twin-data experiments (e.g.,
In principle, the ensemble could be expanded, for example, by the breeding technique, but the problem with the poor quality of the first-guess solution (Fig. 6) may still persist, because the bred vectors would still show unstable modes of the background state that is rather far from reality. In the present study, we used a simple approximation to the error fields by considering an ensemble of differences between a free model run and an assimilation run with the Gaussian covariance model. This ensemble was able to generate just a few members in the 27-day period. One may hope, however, that for longer observation periods the BEC model will gain enough skill to show better performance.
Figure 8 shows the time evolution of the ratio between the weighting parameters α and β in the twin- and real-data experiments. By an order of magnitude, the ratio γ is consistent with the results of Wang et al. (2008) who set γ = β/α = const in time and found the optimal γ to vary between 1 and 4 in a series of twin-data experiments with the WRF model. In our case, the relative weight β of the Gaussian term in the cost function appeared to be approximately 2 times smaller in the twin-data experiment (thin curve in Fig. 7). This is consistent with a better skill in explaining model-data misfits by the modes retrieved in the twin-data experiment (cf. Figs. 4 and 7). Larger relative values of α on days 10–13 (before the mode rejection) can be explained by the tendency of the algorithm to keep the deteriorating mode “alive.”
Time variation of the ratio β/α in the twin-data (thin line) and real-data experiments.
Citation: Monthly Weather Review 139, 6; 10.1175/2011MWR3510.1
We also investigated the impact of the algorithms for definition of m and α on the forecast skill. In the twin-data experiments with fixed m the 27-day-averaged skill was always worse than that in Fig. 4 for 3 tested values of γ = 0.5, 1, and 2. When m was computed through (10) and γ was kept constant at 0.95, the forecast skill was virtually the same as in Fig. 4, but somewhat below using other values of γ. Similar results were obtained with real data: keeping m = const degraded the forecast skill, often below the one obtained with Gaussian BEC model. Several runs with an adjustable m and γ = const were difficult to interpret as the skill improvements were small, highly variable, and did not show any deterministic dependence on the value of γ ∈ [0.5, 2.5].
5. Summary and discussion
In this study we proposed a hybrid BEC model specifically designed for 3DVar analysis of regional circulations supported by glider surveys. The model is supplied by an algorithm for weighting the ensemble-generated error covariance
The proposed BEC model is formulated in terms of the inverse covariances with the restriction of
The hybrid BEC model was validated by numerical experiments with simulated and real data. In the twin-data setting, the hybrid formulation was capable of improving the model’s forecast skill by 15%–20%, which is comparable with the improvement reported by Wang et al. (2008) for a hybrid scheme with the atmospheric WRF model. Results of the experiments with real data showed a few percent improvement with sporadic detection of only one mode. We attribute this to a poor quality of the background solution, which was heavily biased and demonstrated considerably lower time variation in the temperature and salinity fields (left panel in Fig. 6). Thus, finding a better background solution appears to be the first priority in upcoming studies of the algorithm.
Other developments may include elaboration of the structure of the diffusion tensor in
One of the drawbacks of the proposed model is the computational cost of estimating the weight β of the static covariance [(15)]. Our experience shows, however, that the numerator in (15) weakly depends on the structure of
The benefit of the proposed hybrid model may also be diminished for global assimilation problems where some sort of localization is needed and the impact of the ensemble-generated covariances may be smaller with higher observation density (e.g., Whitaker et al. 2008). Nevertheless, we assume that the proposed approach may have a prospect for further development for regional data assimilation problems with poorly known background states.
Acknowledgments
This study was supported by the Office of Naval Research (Program Element 0602435N) and NSF Grant 0629400. Helpful discussions with Dr. Shulman are acknowledged.
APPENDIX
Derivation of Equation (12)














REFERENCES
Akaike, H., 1974: A new look at the statistical model identification. IEEE Trans. Automat. Contrib., 19 (6), 716–723.
Bai, Z., and B. H. Golub, 1997: Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Ann. Numer. Math., 4, 29–38.
Brasseur, P., and J. Verron, 2006: The SEEK filter method for data assimilation in oceanography: A synthesis. Ocean Dyn., 56, 650–661.
Dobricic, S., and N. Pinardi, 2008: An oceanographic three-dimensional variational data assimilation scheme. Ocean Modell., 22, 89–105.
Dobricic, S., N. Pinardi, P. Testor, and U. Send, 2010: Impact of data assimilation of glider observations in the Ionian Sea (Eastern Mediterranean). Dyn. Atmos. Oceans, 50, 78–92, doi:10.1016/j.dynatmoce.2010.01.001.
Etherton, B. J., and C. H. Bishop, 2004: Resilience of hybrid ensemble/3DVAR analysis schemes to model error and ensemble covariance error. Mon. Wea. Rev., 132, 1065–1080.
Evensen, G., 2003: The ensemble Kalman filter: Theoretical formulation and practical implementation. Ocean Dyn., 53, 343–357, doi:10.1007/s10236-003-0036-9.
Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter–3D variational analysis scheme. Mon. Wea. Rev., 128, 2905–2919.
Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter. Mon. Wea. Rev., 129, 2776–2790.
Hannah, E. J., and B. G. Quinn, 1979: The determination of the order of an autoregression. J. Roy. Stat. Soc., 41B, 190–195.
Heaney, K. D., G. Gawarkiewicz, T. F. Duda, and P. F. J. Lermusiaux, 2007: Non-linear optimization of autonomous undersea vehicle sampling strategies for oceanographic data-assimilation. J. Field Robot., 24 (6), 437–448.
Hodur, R. M., J. Pullen, J. Cummings, X. Hong, J. D. Doyle, P. J. Martin, and M. A. Rennick, 2002: The coupled ocean/atmospheric mesoscale prediction system (COAMPS). Oceanography, 15, 88–98.
Li, Z., Y. Chao, J. C. McWilliams, and K. Ide, 2008: A three-dimensional variational data assimilation scheme for the regional ocean modeling system. J. Atmos. Oceanic Technol., 25, 2074–2090.
Pannekoucke, O., and S. Massart, 2008: Estimation of the local diffusion tensor and normalization for heterogeneous correlation modelling using a diffusion equation. Quart. J. Roy. Meteor. Soc., 134, 1425–1438.
Ramp, S. R., and Coauthors, 2008: Preparing to predict: The second Autonomous Ocean Sampling Network (AOSN-II) experiment in the Monterey Bay. Deep-Sea Res. II, 56, 68–86, doi:10.1016/j.dsr2.2008.08.013.
Rudnick, D. L., R. E. Davis, C. C. Eriksen, D. M. Fratantoni, and M. J. Perry, 2004: Underwater gliders for Ocean Research. J. Mar. Technol. Soc., 38, 73–84.
Saad, Y., 2003: Iterative Methods for Sparse Linear Systems. 2nd ed. Society for Industrial and Applied Mathematics, 528 pp.
Schwarz, G. E., 1978: Estimating the dimension of a model. Ann. Stat., 6 (2), 461–464.
Shulman, I., and Coauthors, 2009: Impact of glider data assimilation on the Monterey Bay model. Deep-Sea Res. II, 56, 188–198, doi:10.1016/j.dsr2.2008.08.003.
Tippett, M., J. L. Anderson, C. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 1487–1490.
Wang, X., T. M. Hamill, J. S. Whitaker, and C. Bishop, 2007: A comparison of hybrid ensemble transform Kalman filter-OI and ensemble square-root filter analysis systems. Mon. Wea. Rev., 135, 1055–1076.
Wang, X., D. M. Barker, C. Snyder, and T. M. Hamill, 2008: A hybrid ETKF-3DVAR data assimilation scheme for the WRF model. Part I: Observing System Simulation Experiment. Mon. Wea. Rev., 136, 5116–5131.
Wang, X., T. M. Hamill, J. S. Whitaker, and C. H. Bishop, 2009: A comparison of the hybrid and EnSRF analysis schemes in the presence of model error due to unresolved scales. Mon. Wea. Rev., 137, 3219–3232.
Weaver, A., and P. Courtier, 2001: Correlation modeling on a sphere using a generalized diffusion equation. Quart. J. Roy. Meteor. Soc., 127, 1815–1846.
Weaver, A., and S. Ricchi, 2004: Constructing a background-error correlation model using generalized diffusion operators. Proc. ECMWF Seminar Series on Recent Advances in Atmospheric and Ocean Data Assimilation, Reading, United Kingdom, ECMWF, 327–340.
Whitaker, J. S., T. M. Hamill, X. Wei, Y. Song, and Z. Toth, 2008: Ensemble data assimilation with the NCEP Global Forecast System. Mon. Wea. Rev., 136, 463–482.
Yin, Y., O. Alves, and P. Oke, 2011: An ensemble ocean data assimilation system for seasonal prediction. Mon. Wea. Rev., 139, 786–808.