A new ensemble ocean data assimilation system, developed for the Predictive Ocean Atmosphere Model for Australia (POAMA), is described. The new system is called PEODAS, the POAMA Ensemble Ocean Data Assimilation System. PEODAS is an approximate form of an ensemble Kalman filter system. For a given assimilation cycle, a central forecast is integrated, along with a small ensemble of forecasts that are forced with perturbed surface fluxes. The small ensemble is augmented with multiple small ensembles from previous assimilation cycles, yielding a larger ensemble that consists of perturbed forecasts from the last month. This larger ensemble is used to represent the system’s time-dependent background error covariance. At each assimilation cycle, a central analysis is computed utilizing the ensemble-based covariance. Each of the perturbed ensemble members are nudged toward the central analysis to control the ensemble spread and mean. The ensemble-based covariances generated by PEODAS potentially yield dynamically balanced analysis increments. The time dependence of the ensemble-based covariance yields spatial structures that change for different dynamical regimes, for example during El Niño and La Niña conditions. These differences are explored in terms of the dominant dynamics and the system’s errors. The performance of PEODAS during a 27-yr reanalysis is evaluated through a series of comparisons with assimilated and independent observations. When compared to its predecessor, POAMA version 1, and a simulation with no assimilation of subsurface observations, PEODAS demonstrates a quantitative improvement in skill. PEODAS will form the basis of Australia’s next operational seasonal prediction system.
Seasonal prediction is motivated to support a variety of communities and industrial groups including agriculture (McIntosh et al. 2007), monitoring and management of coral bleaching events (Spillman and Alves 2009), anticipation and management of vector-borne diseases (Thomson et al. 2006), and long-range forecasts of climate indicators (e.g., Zhao and Hendon 2009). Prediction of seasonal mean rainfall and circulation in the atmosphere for different regions of the globe is of great scientific and societal interest (Shukla et al. 2000). Dynamical seasonal prediction, also termed seasonal forecasting, usually involves the initialization and integration of a coupled ocean–atmosphere model and provides forecasts over time ranges of many months. Most seasonal forecast systems involve ensemble prediction to help formulate probabilistic forecasts and to provide users with an indication of forecast uncertainty. An important aspect of a seasonal forecast system is the ocean initialization, where oceanic observations are combined with a model-generated background field, providing ocean initial conditions for a forecast, or an ensemble of forecasts.
Existing ocean data assimilation systems, used in the initialization of operational and preoperational seasonal forecasts, include but are not limited to: those at the Japan Meteorological Agency [Meteorological Research Institute multivariate ocean variational estimation (MOVE); Usui et al. 2006], the European Centre for Medium-Range Weather Forecasts (ECMWF) Ocean Re-Analysis system 3 (ORA-S3; Balmaseda et al. 2008), Meteo-France (PSY2G2 from Mercator Ocean), the Bureau of Meteorology (PEODAS, described in this paper), National Centers for Environmental Prediction (NCEP) Global Ocean Data Assimilation System (GODAS; Behringer 2007), the Met Office Global Seasonal Prediction Model (GloSea3), and National Aeronautic and Space Administration (NASA) global modeling and assimilation office (ODAS-1; Keppenne et al. 2008). For a review of these systems, including a comparison of their methods and new developments in coupled model initialization [e.g., Geophysical Fluid Dynamics Laboratory (GFDL) coupled data assimilation (CDA) system; Zhang et al. 2007], the reader is referred to Balmaseda et al. (2009). Of these systems, some are based on three-dimensional variational (3DVAR) methods, some on univariate or multivariate optimal interpolation (OI), and some are variants of the ensemble Kalman filter (EnKF; Evensen 1994; Houtekamer and Mitchell 1998; Burgers et al. 1998). Some systems compute analyses in a single step, while others assimilate different data types independently over several steps. Other aspects of these systems that differ include the treatment of biases, the data types assimilated, and the initialization methods employed.
The ocean analysis system for the first-generation seasonal prediction system in Australia, called the Predictive Ocean Atmosphere Model for Australia versions 1 and 1.5 (POAMA; Alves et al. 2003; more information available online at http://poama.bom.gov.au/), is based on a univariate OI system (Smith et al. 1991) that assimilates only in situ temperature observations. Such univariate analysis schemes have been shown to provide benefits in reducing uncertainty and improving initial conditions for dynamical seasonal forecasts (Alves et al. 2004). However, because of the lack of appropriate multivariate formulations, this approach has a detrimental effect on the salinity and velocity fields of the model. Troccoli et al. (2002) reported that the univariate assimilation of temperature profiles, without attempting to correct salinity, can induce spurious convection and result in errors in the subsurface temperature and salinity fields. Bell et al. (2004) found that assimilation of temperature data into an ocean model near the equator often results in a dynamically unbalanced state with unrealistically strong vertical motions that corrupt the increments introduced by the data assimilation scheme and lead to an ocean state with a biased density field. To address these problems recent developments have focused on dynamically balanced multivariate analysis schemes for ocean data assimilation. For instance, Troccoli et al. (2002) proposed a method to preserve water mass characteristics, and Burgers et al. (2002) suggested imposing geostrophically balanced increments to horizontal velocities and density. In addition, various bias estimation and treatment techniques, inspired by Dee and Da Silva (1998), have been proposed for ocean data assimilation (Bell et al. 2004; Chepurin et al. 2005; Balmaseda et al. 2007). However, most of these techniques are not truly multivariate, since the analysis is performed by taking dynamical constraints that are applied a posteriori to a statistically generated univariate analysis. These approaches require parameter tuning and do not make optimal use of multivariate information in defining the analysis itself, and therefore, make the assimilation of different data types more complicated.
Ensemble-based data assimilation methods are relatively new and potentially attractive alternatives to four-dimensional variational (4DVAR) methods for operational data assimilation systems. They are relatively simple to code and maintain, since the tangent linear and adjoint models of the dynamics are not required. They also provide an ensemble of states to initialize ensemble forecasts. Ensemble-based background error covariances are state-dependent, inhomogeneous, anisotropic, and multivariate. Also, they facilitate the simultaneous assimilation of different observation types in a single analysis step, and yield potentially dynamically consistent analyses. Based on these considerations, a new ensemble-based ocean analysis system called the POAMA Ensemble Ocean Data Assimilation System (PEODAS) has been developed for operational implementation at the Australian Bureau of Meteorology. PEODAS can be regarded as an inexpensive EnKF because of simplifications used to reduce the system’s computational cost. The purpose of this paper is to document the details of PEODAS, to describe the salient properties of this system, and to document its performance through the assessment of a 27-yr reanalysis. This paper is organized as follows: a brief introduction to data assimilation is presented in section 2, along with a short description of the POAMA-1 assimilation (Alves et al. 2003) and a detailed description of PEODAS. A description of the ocean model that underpins POAMA and PEODAS and the configuration of a 27-yr reanalysis are presented in section 3, followed by an analysis and discussion of the characteristics of PEODAS in section 4. Results from a series of reanalyses are presented in section 5, including a series of model-observation comparisons, and a summary is presented in section 6.
2. Data assimilation
Consider the analysis equations,
where x is the model state vector, y is the vector of observations, 𝗞 is the Kalman gain matrix, and 𝗛 is an operator that maps from model space to observation space—typically 𝗛 is simply linear interpolation. The matrix 𝗣b is the background error covariance and 𝗥 is the observation error covariance. Superscripts a and b denote analysis and background, respectively, and the superscript T denotes a matrix transpose. The background field xb is the model field before assimilation, and the analysis xa is the model field after assimilation. The notation is based, wherever possible, on Ide et al. (1997).
Traditional OI-based schemes (e.g., Smith et al. 1991) typically solve the analysis Eqs. (1) and (2) by estimating the background error covariances in 𝗣b with an analytical function, usually a Gaussian function. In this case, the length scales of the background errors are usually assumed to be fixed in time, often vary in space, and are estimated from statistical properties of observations (e.g., Kanamitsu 1989). In most cases the observation errors are assumed to be uncorrelated, so 𝗥 is diagonal and the diagonal elements represent the assumed observation error variance, including both instrument and representation error (e.g., Oke and Sakov 2008). A critical aspect of any data assimilation system is the ratio of the assumed background to observation error variance. This largely determines the degree to which observations are fitted by an assimilation system.
where m is the ensemble size, ρ is a correlation function, the open circles denote Hadamard product (or Schur product, an element-by-element matrix multiplication; Houtekamer and Mitchell 2001), and the ensemble perturbation matrix 𝗔′ is given by
where is the ith perturbation ensemble member, derived from the ith ensemble member by removing the ensemble mean. Most EnKF-based systems can be classified into one of two flavors, the traditional EnKF with perturbed observations (e.g., Burgers et al. 1998; Houtekamer and Mitchell 2001), or ensemble square root filters (ESRFs; Tippett et al. 2003 and references therein). For a traditional implementation of the EnKF, observations are perturbed according to their assumed error, and a different realization of the observations is assimilated with each ensemble member yielding an analysis that has the theoretical minimum analysis error covariance in a statistical sense. By contrast, ESRFs generally solve the analysis Eqs. (1) and (2) for the ensemble mean, and then transform, or update, the ensemble about that mean so that its analysis error covariance matches the theoretical minimum analysis error covariance. So, the traditional EnKF involves the calculation of m analyses for an m-member ensemble, and the ESRF involves the explicit update of m ensemble members. Both flavors of the EnKF are computationally expensive, and are not always feasible for a practical application. The EnKF system developed here is more like an ESRF than an EnKF with perturbed observations. However, the ensemble transformation is achieved by nudging each ensemble member to the “central analysis.” This is described in detail below in section 2c.
The inclusion of the correlation function in Eqs. (3) and (4) represents localization (e.g., Houtekamer and Mitchell 2001). Localization is a necessary part of any ensemble-based data assimilation system where the model state dimension exceeds the ensemble size. Localization acts to reduce sampling error that arises from the use of a small ensemble, and to increase the rank of the ensemble, so that the system can “fit” the background innovations. One of the negative consequences of localization is the introduction of dynamical imbalance (e.g., Mitchell et al. 2002; Oke et al. 2007). As a general rule, as the localizing length scale is shortened (lengthened) the rank of the ensemble increases (decreases), the analyses can be made to fit observations more (less), and analysis increments become less (more) dynamically consistent. Without localization (i.e., ρ = 1), it follows directly from Eqs. (1) and (4) that
where c is an m-element vector. Written in this form, it is clear that when an ensemble is used to approximate the background error covariance, the increment, represented by Eq. (6), is simply a linear combination of ensemble perturbations. When localization is used, the coefficients c in Eq. (6) vary in space—though these coefficients are generally not computed explicitly. This demonstrates that there is a clear relationship between how an ensemble is constructed and the assumptions made when implementing the data assimilation system.
The application considered in this paper is for seasonal prediction. An important source of background error for the ocean state is due to errors in the surface forcing, and specifically the surface fluxes on intraseasonal time scales. In the assimilation system introduced below, we explicitly perturb the surface fluxes for the ensemble. As a result, the ensemble contains perturbations to the ocean state that reflect uncertainties in the surface fluxes. Ocean model error, such as parameterization of vertical mixing and other subgrid-scale processes, are not explicitly represented and can be significant. As discussed in detail below in section 2c, the construction of the ensemble in this study explicitly represents errors in surface forcing, and the ocean model error is accounted for by introducing ocean perturbations through a simple method: additive inflation.
There are many flavors of ensemble data assimilation systems that evolve, update, and characterize different ensemble members in different ways. The simplest of these use a time-invariant ensemble (e.g., Oke et al. 2005, 2008, 2009); some use a seasonally varying ensemble that is not state dependent (e.g., Brasseur et al. 2005); and others evolve the ensemble with time, resulting in truly state-dependent estimates of the background errors (e.g., Leeuwenburgh 2005; Bertino and Lisaeter 2008). The system developed here conforms to the latter, with state-dependent estimates of the background errors, implicit in the time-evolving ensemble, as described below in section 2c.
POAMA-1 is a first generation seasonal prediction system that is run operationally at the Bureau of Meteorology. The first version of POAMA (POAMA-1) was developed in a joint project involving the Bureau of Meteorology Research Centre and Australian Commonwealth Scientific and Research Organization (CSIRO) Marine Research. POAMA-1 became operational in October 2002, and has produced routine seasonal forecasts since then. A new version, POAMA-1.5, became operational in June 2007, however in this version there were no changes to the ocean data assimilation component.
The data assimilation system for POAMA-1 is a variation of the univariate OI scheme that is described by Smith et al. (1991). The details of POAMA-1 are presented by Alves et al. (2003). An important feature of the POAMA-1 ocean assimilation system is that the weight given to observations has been set to be no greater than that of the background field. This is to avoid large initialization shock in the coupled model, which may deteriorate the forecast skill. This is a common strategy used by the seasonal prediction community (e.g., Balmaseda et al. 2008). Additionally, the model sea surface temperature (SST) field is strongly relaxed to model-gridded SST analysis products, so that the model SST is always close to the “observed” SST. This is often regarded as an important requirement for seasonal forecast initialization (e.g., Balmaseda et al. 2008; Daget et al. 2009). Specifically, the ratio of the model error standard deviation to that of the observations is 0.47 (this factor of 0.47 was established when POAMA-1 was initially developed through a process of tuning). The observation errors are assumed to be correlated in space, with an e-folding decorrelation scale of 150 km. The background error covariances 𝗣b are modeled as Gaussian functions. Within 10° of the equator the covariances are assumed to be anisotropic, with e-folding zonal and meridional length scales of 1500 and 300 km, respectively. At mid and high latitudes (outside of ±25° latitudes) the covariances are assumed to be isotropic, with e-folding length scales of 500 km. Between 10° and 25° from the equator, the length scales are simply linearly interpolated.
POAMA-1 assimilates observations of in situ temperature from the top 500 m of the ocean, producing two-dimensional maps of temperature on model levels. The observations are preinterpolated to model levels before assimilation using linear interpolation. Salinity is not updated by POAMA-1. Based on the increments computed for temperature, geostrophic increments are computed for velocity, using a method that is similar to that described by Burgers et al. (2002).
The new assimilation system that is introduced here, PEODAS, is a variation of an EnKF system. PEODAS includes the routine generation of an ensemble of initial conditions that are used to generate a perturbed forecast ensemble, and a state-dependent estimate of the background error covariance. We regard these features as essential for seasonal prediction. Specifically, the ability to perform ensemble forecasts is important because it readily permits the issuance of probabilistic forecasts (e.g., Gneiting and Raftery 2005). Moreover, we expect the background field errors in a system to be largely attributable to errors in the surface forcing, and therefore, to be highly state dependent, for example, for different phases of the El Niño–Southern Oscillation (ENSO) or the Madden–Julian oscillation (MJO). We therefore expect the greatest uncertainty along the equator to be centered about the time-varying position of the ocean’s seasonal thermocline in response to uncertainty in the intraseasonal and interannual wind stress, and other surface flux components.
Because of limited computational resources, only a small ensemble size is viable, which limits the number of degrees of freedom used to represent forecast and analysis errors. Techniques for background covariance conditioning methods, such as localization have been applied here to solve this problem. However, while localization effectively removes spurious long-range covariances, it does not effectively damp the high-frequency short-scale oscillations that are present in low-rank error covariance estimates. Considering the slow changing nature of the ocean, we therefore designed a method to reduce the high-frequency short-scale oscillations. This method simply involves constructing the ensemble by including all the ensemble perturbations from the previous assimilation cycles within the intraseasonal time span. Experiments show this method does play a role as a filter similar to that of the spatiotemporal filter designed by Keppenne et al. (2008).
PEODAS differs from traditional EnKF systems in that only a single analysis is computed for a central forecast (like the ESRF). This analysis is calculated every 3 days using a modified version of the Bluelink Ocean Data Assimilation System (BODAS; Oke et al. 2008). A schematic of a typical integration of PEODAS is presented in Fig. 1. Briefly, a central model run is integrated, along with 11 perturbed ensemble members. For calculating the background covariances these 11 ensemble perturbations are augmented by the 11 ensemble perturbations from each of the previous nine assimilation cycles. Together, this gives an ensemble of 110 perturbations. At each assimilation cycle, the 110-member perturbation ensemble is used to estimate the background error covariances according to Eq. (3). Using these background error covariance estimates, the model background field, from the central model run, is updated by assimilating observations using the analysis Eqs. (1) and (2). PEODAS computes analyses and analysis increments for temperature, salinity, and velocity. The analysis increments for velocity are based on the ensemble-based covariance only, and do not involve any explicit assumption of geostrophy.
We chose to use a central run as a background rather than using the ensemble mean, and we only explicitly assimilate observations into the central run. This choice also facilitates the implementation of strong relaxation of SST in the central run to a gridded SST analysis and weak relaxation of SST to the perturbed runs. The model SST of the central run is always close to the “observed” SST while the ensemble spread of SST can be tuned by varying the relaxation time scale. This approach also helps to avoid the effect of sample error on the estimate of the ensemble mean due to a small sample size.
After each analysis, the ensemble members are updated by “compressing and nudging” the prior members around the central analysis. This step is the counterpart to the ensemble transformation step of ESRFs, or the assimilation of perturbed observations for the traditional EnKF. The ith ensemble member 𝗔i is compressed and nudged toward the central analysis 𝗔C according to:
where p denotes prior, u denotes updated, α and β are nudging factors that can vary between 0 and 1, and the overbar denotes the ensemble mean. Here, α = 1 − β is selected so that the Eq. (7) can be rewritten as
and the factor β is specified as
where and denote the background error variance and the observation error variance, respectively. By this approach, the updated ensemble mean is shifted toward the central analysis and the ensemble spread is adjusted to being β times its prior value. The factor β here is specified according to an ESRF formula (Anderson 2003) by assuming that the observation errors are uncorrelated and the distribution of observations is even and sufficiently dense so that there is at least one observation in each model grid. Although this approach is suboptimal as it does not take into account the distribution of observations, this simplified ensemble method leads to very significant computational savings. In theory, since comes from the prior ensemble, it varies with space and time and so does the β. However, as discussed in detail later, we use preassigned ratio of and , which leads to a further simplification of the ensemble transformation.
To account for errors in the surface fluxes at intraseasonal time scales, we perturb the surface forcing for each ensemble member. The method for generating these perturbations is an extension of the approach developed by Alves and Robert (2005). Briefly, we characterize the likely errors in surface fluxes (i.e., zonal and meridional surface stress, surface heat flux, shortwave radiation, and freshwater flux) by computing differences between the forcing fields from the NCEP/National Center for Atmospheric Research (NCAR) and 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) reanalysis products (Kalnay et al. 1996; Uppala et al. 2005). Time series of these fields are filtered in time, isolating the variability of 5 to 120 days. Perturbations on these time scales project strongly onto ocean thermocline variations. The first 139 EOFs of a 2-yr record of the normalized difference fields are computed. The EOFs contain the spatial structures of the errors in the surface flux fields. A time series of these EOFs are computed using randomly generated but time-correlated amplitudes for each EOF. A similar approach was also used by Leeuwenburgh (2005) to perturb the ensemble members in a traditional EnKF to represent background error.
Using this approach, we find that the ensemble spread is greatest at the surface and about the position of the thermocline, owing to the perturbations to surface fluxes. By contrast, the ensemble spread below the thermocline is much less. Based on a series of reanalysis experiments, we find that the ensemble spread is too small at depth, where the local forcing perturbations have minimal impact and model error is not taken into account. This results in an underestimation of the forecast error covariance at depth. Several different approaches have been suggested to represent model errors with the deep ocean within the EnKF (e.g., Keppenne et al. 2008). There are three relatively simple methods for accounting for model error: multiplicative inflation (Anderson and Anderson 1999), additive inflation (Houtekamer et al. 2005), and relaxation-to-prior (Zhang et al. 2004). Additive inflation was found to be more effective than the other two methods by Whitaker et al. (2008). Here, we apply an additive inflation to each ensemble member by adding scaled random samples from a set of difference fields computed from a long model run without data assimilation. The difference dataset is constructed by:
where x denotes daily model states (temperature, salinity, and the zonal and meridional current components) on a given date of each month, say the first day of each month and n denotes the month. Such difference fields are of intraseasonal time scales. The reason for this choice is that the intraseasonal differences will emphasize error growth that is consistent with the growth of errors from the perturbed forcing.
Based on our 27-yr (324 months) run, we establish a dataset of 322 difference states. The additive ocean perturbations are computed daily by randomly sampling 11 members from this dataset. After removing the ensemble mean from the sample, we apply a 0.125 scaling and implement a method described by Evensen (2003) to enforce correlation between random daily perturbations on 3-day decorrelation time scale to make the perturbations smoother. This scaling was chosen after several experiments to ensure the ensemble spread is within a reasonable span such that it inflates the variance in the deep ocean but is small compared to the spread due to perturbed surface forcing in the upper ocean. We apply the perturbations gradually, over the whole day, to each ensemble member using the Incremental Analysis Updating (IAU) procedure (Bloom et al. 1996) in order to minimize the initialization shock. We regard this inflation as a simple parameterization for model error—that is not explicitly represented by perturbing the forcing fields alone. Without this inflation, the ensemble variance is too small, so the deeper observations have a very little impact on analyses because the innovation covariance matrix (𝗛𝗣b𝗛T + 𝗥) is often poorly conditioned.
As in the standard EnKF, we not only use the spatial covariance information from the ensemble directly in the assimilation but also use the ensemble spread for the background error variance. However, defining the observation-error variance is a more complicated issue. The total observation error includes measurement errors, representation (or representativeness) errors associated with the unresolved physical processes, and interpolation errors associated with approximating the continuum observation operator (e.g., Daget et al. 2009). While all data assimilation applications include some estimate of the observation error, there is a wide range of approaches used (e.g., Oke and Sakov 2008). A statistical method that involves various assumptions originally proposed by Fu et al. (1993) has been widely used in ocean data assimilation (Fukumori 2000; Menemenlis and Chechelnitsky 2000; Leeuwenburgh 2007; Daget et al. 2009), but some of the assumptions are made purely for practical convenience and are questionable (see, e.g., Menemenlis and Chechelnitsky 2000; Daget et al. 2009). As discussed in detail below, we set the observation error estimates by assuming a constant background/observation error (i.e., square root of error variance) ratio of 0.47. This is the same as used in POAMA-1. Smith et al. (1991) and Alves et al. (2004) used a signal-to-noise ratio of 1.0, when they assimilated observations every 10 days. This signal-to-noise ratio is equivalent to the ratio of background to observation error that we refer to here. Assuming linear error growth and an assimilation cycle of 3 days, we require a signal-to-noise ratio of 0.47 to be equivalent to that of Smith et al. (1991) and Alves et al. (2004). We therefore impose a ratio of 0.47 for the background to observation errors in POAMA-1 and PEODAS. We decided to keep this ratio rather than explicitly specifying the observation errors for the following reasons:
In Wang et al. (2002), Alves et al. (2004), and the POAMA-1 systems, it was found that this achieved a good balance between model and observations that did not lead to significant initialization shock.
It overcomes the problem of ensemble collapse, where the ensemble spread is too small compared with the observation error variance, which may result in the model not being constrained by the observations.
It allows more control over the initial set of integrations, since allowing more degrees of freedom in a system is only advantageous if they can be properly specified (or controlled).
It facilitates a more straightforward comparison with the POAMA-1 system.
The ensemble variance is used to directly perturb the coupled model forecasts, with each ensemble member used as initial conditions for each member of the forecast ensemble.
As described above, we use localization to increase the rank of our ensemble and to reduce sampling error that results from using a small ensemble. We use an anisotropic localizing function that is a variant of the quasi-Gaussian function presented by Gaspari and Cohn (1999). The distance over which the ensemble-based covariances go to zero is 30° cosϕ and 10° cosϕ (ϕ is the grid latitude) in the zonal and meridional directions, respectively, and 50 m vertically. These length scales correspond to e-folding decorrelation length scales of approximately 20° cosϕ, 7° cosϕ, and 35 m, respectively. Here, the multiplication of the cosϕ factor is to make the scale consistent with the latitude-dependent nature of the Rossby deformation radius for a global analysis scheme. These localization scales are greater than the POAMA-1 correlation scales. In practice, the length scales of the background errors are implicit to the ensemble. The localizing length scales only set the upper limit on the background error covariance length scales. Use of longer length scales also yields analysis increments that are more dynamically consistent.
PEODAS has several appealing aspects to it that meet what we regard as essential requirements for a seasonal prediction system. First, it yields an ensemble of initial states for the forecast model that are intended to span the actual uncertainty in the estimate of the initial conditions. If the intention for the ensemble to span the actual uncertainty is realized in reality, such an ensemble is ideal for initializing probabilistic seasonal forecasts. Second, the background error covariances that are used to assimilate observations are state dependent and are generated based on our expectations of the dominant sources of error in our system. Third, PEODAS is computationally affordable, and can be scaled according to the computational resources available. We would prefer to integrate a new ensemble of 110 members for each cycle, but this is prohibitively expensive. As computational resources increase, we can readily reconfigure PEODAS to integrate more ensemble members at each cycle, and retain ensemble members from fewer previous cycles (e.g., we may integrate 30 ensemble members and retain ensemble members from the last four cycles, yielding a 120-member ensemble).
3. Model and reanalysis configuration
a. Ocean model
The ocean model that underpins both POAMA-1 and PEODAS is version 2 of the Australian Community Ocean Model (ACOM2; Schiller et al. 2002), a global configuration of version 2 of the Modular Ocean Model (MOM2; Pacanowski 1995). The model configuration is described in detail by Schiller and Godfrey (2003). Briefly, the model has constant zonal resolution of 2° and enhanced meridional resolution of 0.5° within 8° latitude of the equator that gradually increases to 1.5° toward the poles. There are 25 vertical levels with 7 levels in the top 100 m. This version of the model includes the hybrid mixed layer model described by Chen et al. (1994).
Results from different integrations of ACOM2 have been used to explore intraseasonal variability (Schiller and Godfrey 2003), upper-ocean dynamics (Schiller et al. 1998), and interannual dynamics (Schiller et al. 2000) in the Indian and Pacific oceans. These studies have undertaken extensive comparisons with observations including comparisons of time series of subsurface currents in the central Indian Ocean (Schiller and Godfrey 2003), comparisons between modeled and observed SST in the low-latitude Indian and Pacific oceans (Schiller et al. 2000), and validation of the model’s surface heat fluxes (Schiller et al. 1998) that represents an assessment of both the ocean’s upper-ocean dynamics and the atmospheric boundary layer model. Additionally, Schiller (1999) presented a series of comparisons between modeled and observed subsurface temperatures along frequently repeated expendable bathythermograph (XBT) lines in the Indian Ocean. In general, the model displays reasonable agreement with the available observations on the intraseasonal to interannual time scales.
b. PEODAS reanalysis configuration
To assess the relative performance of POAMA-1 and PEODAS, we perform a 28-year assimilation using PEODAS. The assimilation is integrated for 28 yr, between January 1979 and December 2006. Considering the first year as the period of assimilation spinup, we therefore discarded the analysis of 1979 and used the remaining 27-yr (1980–2006) reanalysis for the evaluation. Assimilation analyses are performed every 3 days using observations in a time window of 3 days, centered on the analysis time. We assimilate observations of in situ temperature and salinity from conductivity temperature depth (CTD), expendable bathythermograph (temperature only), Tropical Atmosphere Ocean (TAO)/Triangle TransOcean Buoy Network (TRITON)/Prediction and Research Moored Array in the Atlantic (PIRATA) mooring, and Argo profiles, sourced from version EN3 of the Enhanced Ocean Data Assimilation and Climate Prediction (ENACT) and Ensemble-based Predictions of Climate Changes and their Impacts (ENSEMBLE) quality-controlled database (Ingleby and Huddleston 2007). Although quality-control flags are provided with the EN3 data, we perform additional checks with the model value, removing observations where the background innovations are greater than five standard deviations from a climatology estimated from a long model run. Within each model grid cell, we select only a single temperature and salinity profile. This selection is based on the observation with the highest quality determined by the quality-control procedure. Superobing of each profile is done in the vertical to ensure that no more than one superobservation falls within each model depth level (see Oke et al. 2008).
The ocean observations are assimilated into an ocean-only integration forced by observed estimates of the surface fluxes. The unperturbed surface fluxes for momentum, heat, and freshwater are derived from ERA-40 (Uppala et al. 2005) for the period 1979–2001; and from NCEP/DOE Atmospheric Model Intercomparison Project (AMIP-II) Reanalysis (Kanamitsu et al. 2002) for the period 2002–06. As the freshwater flux from ERA-40 is known to be inaccurate, PEODAS uses a corrected version based on the approach described in Troccoli and Kallberg (2004). During the model integration, the subsurface temperature and salinity are relaxed to a monthly climatology World Ocean Atlas 2001 (WOA2001; Stephens et al. 2002; Boyer et al. 2002) throughout the water column with an e-folding time scale of 2 years. The relaxation to climatology is stronger for sea surface salinity (SSS; 1-yr time scale) due to large uncertainties in the freshwater flux. The model SST is also strongly relaxed to the analyzed SST maps, from the NCEP/NCAR reanalysis prior to 1982 and daily interpolated values derived from the OIv2 SST weekly product (Reynolds et al. 2002) from 1982 onward, with a 1-day time scale. For the PEODAS perturbed ensembles, the time scale of SST relaxation is increased to 5 day, which maintains an appropriate ensemble spread for SST.
c. Other reanalyses for comparison
Two other reanalyses are used in this study for comparison with PEODAS. First, a control run with no explicit assimilation of in situ observations is performed. This is done exactly the same as PEODAS but no observations are assimilated. This run includes the surface relaxation to SST and SSS, and the subsurface relaxation of temperature and salinity to climatology.
A second dataset is a reanalysis performed for the POAMA-1 system. As well as the analysis systems being different there are several other differences in the setup. The surface forcing used for the model is from the NCEP/NCAR reanalysis. The subsurface relaxation of temperature and salinity to climatology is not used in the POAMA-1 system. The observational dataset used by POAMA-1 is a local dataset assembled from real-time and delayed mode Global Telecommunication System (GTS) data at the Bureau of Meteorology. However, the observations from this dataset were incorporated in the EN3 quality-controlled database. POAMA-1 also incorporates a sophisticated quality control procedure described in Smith et al. (1991).
For POAMA-1 only temperature profiles in the upper 500 m are assimilated. The salinity fields are not updated, but velocity fields are updated using the geostrophic relation similar to Burgers et al. (2002). For PEODAS, the temperature, salinity, and velocity fields are updated explicitly from the ensemble data assimilation system. Observation errors are assumed to be correlated in space for POAMA-1, as documented above, and are assumed to be uncorrelated in space for PEODAS. For both POAMA-1 and PEODAS, the individual magnitudes of the observation and background error variances are not relevant, only their ratio since the observation errors are scaled by the assumed, or modeled, background error variance, so that the ratio of the background to observation error is 0.47 everywhere.
4. Characteristics of the assimilation system
a. Ensemble characteristics
The roles of an ensemble for an ensemble-based data assimilation system are threefold. First, the ensemble variance is intended to estimate the magnitude of the background error variance for all model variables. Second, the ensemble-based covariance fields quantify the system covariance. Third, the ensemble is used as a perturbed set of initial conditions for the coupled model forecasts. The covariance between state elements determines how a background innovation (y − 𝗛xb), from Eq. (1), is interpolated and extrapolated in space and between different variables. It also shows the structure of the perturbations that are applied to the coupled model forecast ensemble. We assess these aspects of the ensemble for a limited range of fields here. The time-averaged ensemble spread (defined by the standard deviation of the ensemble) is presented for surface and subsurface temperature and salinity in Fig. 2. It demonstrates that the magnitude of the estimated background field error for temperature and salinity varies significantly for different parts of the global ocean. The ensemble spread for both SST and SSS identify the regions of high uncertainty in the eastern tropical Pacific and Atlantic oceans. The ensemble spread for SSS is also high in the eastern tropical Indian Ocean and the region of the Pacific warm pool, because of large uncertainties in the freshwater flux in these regions.
The longitude–depth profiles of the ensemble spread for temperature along the equator (Fig. 2b) indicates that the largest temperature errors, according to the ensemble, fall at depths where the thermocline varies most significantly. This ranges from about 50 m in the eastern Pacific and Atlantic basins, and penetrates to about 150-m depth in the center of each basin. By contrast the ensemble suggests that the subsurface temperature error is expected to have a maximum at 100-m depth across the entire tropical Indian Ocean. These characteristics are consistent with known dynamics of these basins. Specifically, the equatorial oceans are known to respond quickly, and with large amplitude, to variations in surface fluxes, particularly to anomalies in zonal wind stress (McCreary 1976). The subsurface maximum in the equatorial Indian Ocean is due to the subsurface influence of equatorial Kelvin waves and their coastal extension along Indonesia, which are also known to be correlated with surface forcing (e.g., Sprintall et al. 2000).
The subsurface structure of the ensemble spread for salinity (Fig. 2d) is dominated by a near-surface maximum in the region of the warm pool in the western tropical Pacific. Maes et al. (2006) highlighted the importance of upper-ocean variability in controlling ocean–atmosphere interactions in the western Pacific. The uncertainty for surface salinity is also high in the eastern tropical Indian Ocean, an area with large variations in precipitation. Other local maxima in the ensemble spread of subsurface salinity are in the eastern equatorial Pacific and Atlantic, and are associated with variations of the ocean’s halocline in response to surface forcing through the same mechanism as that of temperature, discussed above.
An example of the ensemble-based correlation between temperature at a reference location along the equator (at 125°W and 100-m depth) and temperature, salinity, and zonal velocity is presented in Fig. 3. These correlations are used by PEODAS (after applying the localization function) to map background innovations of temperature onto the model state. Where the magnitude of the correlation is higher for different variables, the projection of the background innovation for temperature is strong. Along the equator in this region, there is typically a subsurface salinity maximum centered about 160-m depth. This helps us understand the structure of the ensemble-based correlations with salinity in Fig. 3b. A temperature increase (decrease) at the reference location is likely to correspond to a deepening (shoaling) of the thermocline and halocline. In the absence of anomalous surface fluxes that may modify the water mass properties, this deepening (shoaling) of a salinity profile will result in a salinity decrease (increase) above the salinity maximum, where the vertical salinity gradient is positive (the axis z is increasing downward), and a salinity increase (decrease) below the subsurface salinity maximum, where the vertical salinity gradient is negative. These characteristics are evident in the fields presented in Fig. 3b. The correlation between temperature at the reference location and zonal velocity indicates that a deepening (shoaling) of the thermocline will result in an eastward (westward) anomaly in the zonal currents in the vicinity of the observation and to its east (Fig. 3c), mainly because the temperature and velocity errors are both associated with wind stress errors.
More generally, the example of the ensemble-based correlation between temperature and other ocean variables, shown in Fig. 3, highlights the multivariate, and anisotropic nature of the ensemble-based estimates of the background error covariance generated and used by PEODAS.
Another example of the ensemble-based covariance structures generated by PEODAS for different times is presented in Fig. 4. The examples in Fig. 4 include the ensemble-based correlation between temperature from a reference location, at the surface along the equator, and temperature in the surrounding region. The first example (Fig. 4a) is during an El Niño event when the thermocline along the equator is flat and deep. The second example (Fig. 4b) is during a La Niña event when the thermocline slopes upward toward the east and is shallow. These correlation fields are consistent with the dynamics of the thermocline being heaved (raised or lowered) in response to variations in the wind stress and surface heat flux. For a positive background innovation—warming and/or deepening the surface mixed layer—the ensemble-based projection onto the model state includes a temperature increase within the mixed layer and a temperature decrease below the mixed layer. The examples in Fig. 4 clearly demonstrate the state dependence of the ensemble-based estimates of the background error covariance. The state dependence depicted in Fig. 4 is exactly what we expect from our conceptual understanding of the underlying dynamics of this region.
b. Assimilation increments
The time-mean and root-mean-square (RMS) of the increments from the 27-yr PEODAS reanalysis described in section 3 are shown for maps of temperature, salinity, and zonal velocity in the tropical Pacific, averaged over the top 300-m depth, in Fig. 5, and for a zonal section of subsurface temperature, salinity, and zonal velocity along the equator in Fig. 6.
The first thing to note from the fields in Fig. 5 and Fig. 6 is that the mean increments are small compared to the RMS of the increments. The ratio of bias to total increment is up to 40% for the temperature field in the equatorial thermocline. This means that the increments are not dominated by model bias. The mean increments for temperature and salinity (Fig. 5) show small, but spatially coherent positive–negative bands in the Northern Hemisphere western boundary currents. In the 27-yr reanalysis these mean increments act to shift these warm boundary currents offshore at higher latitudes, thereby shifting their separation toward lower latitudes. That is, the increments “try” to make the modeled western boundary currents separate from the coast sooner while the model apparently “wants” the boundary currents to “hug” the coast for longer. This is a common problem with coarse-resolution ocean models (Dengg et al. 1996; Chassignet and Marshall 2008). The only spatially coherent structure in the mean increments for zonal velocity (Fig. 5c) is a negative increment in the equatorial Pacific Ocean. This indicates that PEODAS is “trying” to strengthen (weaken) the model’s westward (eastward) velocities in this region.
The subsurface fields of the mean increments (Fig. 6) are not particularly coherent in space, with the exception of a cooling in the central Pacific between 50- and 200-m depth. This corresponds to a shoaling of the thermocline in that region. The assimilation acts to tighten the thermocline, so that the temperature gradient is sharper with data assimilation. This type of model bias is a common problem for most coarse-resolution ocean models (Stockdale et al. 1993; Bell et al. 2004). Details of the bias are addressed below in section 4c. The pattern of salinity and zonal velocity increments in this region is consistent with the error correlation structure shown in Fig. 3. The patterns of the multivariate increments demonstrate, in a qualitative sense, how dynamical balance is maintained by the equatorial zonal velocity increments for PEODAS. Specifically, the strong cooling and weak freshening increments in the central Pacific can result in positive density changes that affect changes in hydrostatic pressure, and hence changes to the east–west pressure gradient that can cause a dynamical imbalance and drive a spurious current. The zonal current increments in PEODAS oppose the unbalanced pressure gradient and therefore counteract the spurious current and hence the dynamical balance is maintained to some extent. Further calculations (not shown) reveal that along the equator the magnitude of the zonal current increment exceeds that of a simple geostrophical adjustment, which means that, in the presence of model bias, geostrophic balanced increments, as imposed in POAMA-1, are not sufficient to maintain dynamical balance near the equator.
The RMS of the increments (Fig. 5) indicates that the largest magnitude adjustments for temperature and salinity are in the western boundary currents, again presumably adjusting the positions of the boundary currents. There are also significant increments in the central tropical Pacific basin and near the equator in the Atlantic Ocean. The RMS of increments for temperature in the Indian Ocean show maxima that are located 5°–12° from the equator, corresponding to the typical latitude of seasonal Rossby waves (Masumoto and Meyers 1998; Schouten et al. 2002). The RMS of the increments for zonal velocity is strongest along the equator, where it is up to 1.8 cm s−1.
The RMS of the subsurface increments along the equator (Fig. 6) indicates that the dominant adjustments to temperature are associated with shifts and adjustments to the thermocline. The increments to salinity are greatest near the surface and in the eastern and western Pacific.
Comparison of the distribution of the RMS of temperature and salinity increments along the equator in the Pacific (Figs. 6d,e) and the corresponding fields for the ensemble spread (Figs. 2b,d) are generally similar in structure and magnitude—the increments for salinity are smaller than the time-averaged ensemble spread. This is expected, because the ensemble spread represents the assumed background error. Regions of larger background error require greater adjustments through assimilation. This is clearly seen in the comparison of Fig. 6 and Fig. 2. There are, however, differences in the structure of the spread/increment of the temperature fields in the thermocline. In both the Pacific and the Atlantic, the maximum spread occurs close to the eastern coast (Fig. 2b), while the maximum RMS of the increment occurs more in the ocean interior (140°W in the Pacific and close to the western coast in the Atlantic)—this may be because of the presence of the systematic bias in these locations (Fig. 6a).
The RMS of the increments to zonal velocity along the equatorial Pacific (Fig. 6f) indicate that the assimilation acts to adjust the surface velocities associated with the wind-driven equatorial current, and at depth, presumably modifying the strength and position of the equatorial undercurrent.
c. Systematic error and the degree of dynamical balance
The ensemble-based multivariate covariances generated by an EnKF could theoretically yield fully dynamically balanced increments, providing the ensemble size is large enough. However, as discussed in section 2, approximations introduced in PEODAS, such as the use of reduced-space methods and localization, could compromise the dynamical consistency. Anderson (2003) gives an account of the degree of dynamical consistency of analyses generated using localized covariances. To minimize the dynamical inconsistency, we use relatively long localizing length scales, a strategy recommended by Oke et al. (2007).
Given that an unbalanced data assimilation method can cause spurious vertical circulations and systematic errors that can make model bias even worse, the degree of balance in the analysis increments can therefore be diagnosed by comparing the temperature bias (shown in Figs. 7a–c) and the vertical velocity (shown in Figs. 7d–f) in the control and assimilations.
The control (Fig. 7a) shows substantial deviations of up to 2.5°C at the depth of thermocline, even though the subsurface fields are relaxed toward the WOA2001 climatology. It is warmer than WOA2001 in the eastern Pacific and in the Atlantic Ocean, and colder in the western Pacific. This systematic bias is linked to the overly diffuse nature of the modeled thermocline, probably because of limitations of the ocean model. Such biases are substantially reduced in both POAMA-1 (Fig. 7b) and PEODAS (Fig. 7c) as a result of assimilation. Below the thermocline, however, significant differences, with respect to WOA2001, appear in POAMA-1. POAMA-1 is more than 1.0°C warmer than WOA2001 at around 700 m in the eastern Pacific, and more than 1.5°C warmer than WOA2001 around 800 m in the Atlantic Ocean, similar to Troccoli et al. (2002). These spurious differences are not present in either the control or PEODAS. This demonstrates that the univariate assimilation scheme in POAMA-1 is the likely cause of the dynamical imbalance that results in a bias in temperature. By contrast, the ensemble-based multivariate increments in PEODAS are generally dynamically balanced. The mean vertical velocity field along the equator is large, around 200-m depth, in POAMA-1 (Fig. 7e), compared to that of the control (Fig. 7d). We attribute this difference to the degradation of the zonal velocities in POAMA-1 due to dynamical imbalance. PEODAS shows a similar increase in vertical velocity (Fig. 7f), but with a much smaller magnitude than POAMA-1. The nature of the dynamical balance of each run is further verified by the model–observation comparisons in section 5.
5. Model–data comparisons
a. Comparison with assimilated observations
In this section, we compare observed and modeled temperature and salinity fields for different locations. Maps of the number and spatial distribution of temperature and salinity profiles assimilated by PEODAS and used for the comparisons that follow are shown in Fig. 8. The XBT tracks and the TAO/TRITON/PIRATA moorings in tropical oceans are easily identified in Fig. 8. While the uneven and nonstationary nature of the observing system is highlighted (see Fig. 8 and Figs. 10a,d), it also shows that the observation coverage is now almost global due to the deployment of Argo floats in recent years. Recall that PEODAS assimilates both temperature and salinity profiles (Figs. 8a,b), while POAMA-1 only assimilates temperature (Fig. 8a).
The RMS difference (RMSD) between observed and modeled subsurface temperature and salinity in different regions for the entire 27-year reanalysis period is shown in Fig. 9. Specifically, we show the RMSD between the observed temperature and salinity and the control run [with no data assimilation (O-C)], the PEODAS background field [immediately before assimilation (O-B)] and the PEODAS analysis field [immediately after assimilation (O-A)]. We also show the profile of the time-averaged ensemble spread. The profiles presented in Fig. 9 are produced by interpolating each model product to the observation location in time and space, and then averaged all profiles in different regions. The regions shown include NIÑO-3 (5°S–5°N, 150°–90°W), EQ3 (5°S–5°N, 150°E–170°W), EQIND (5°S–5°N, 40°–120°E), and EQATL (5°S–5°N, 70°W–30°E). We focus largely on the tropical oceans since it is variability in these regions that is associated with the main modes of climate variability on interannual time scales, such as El Niño and the Indian Ocean dipole.
The RMSD profiles in Fig. 9 show that the control run has larger errors than PEODAS for all regions for both temperature and salinity. Similarly, the PEODAS background fields have larger errors than the PEODAS analysis fields. This is exactly as expected.
The difference between the RMSD profiles of the control and the PEODAS product is significant for both temperature and salinity. This difference is as large as 1.0°C and 0.1 psu at some locations. Note that the mean increments for temperature and salinity in the regions considered here (Fig. 5) are much smaller than these RMSDs. We ascribe most of this improvement to a better representation of the variability of the ocean rather than improvements to the mean fields.
The variance of the Eulerian temperature is greatest at the depth of the thermocline, because of the vertical excursion of isotherms in response to external forcing and wave propagation. As a result, the profile of RMSD for temperature shows a maximum about the thermocline depth (Fig. 9). The RMSD of O-A for temperature about the thermocline depth is around 1°C for all regions considered, and the corresponding RMSD of O-B is typically 0.3°C greater than that of the analysis. This indicates that, if the observation error is assumed to be zero, the error of model temperature near the thermocline typically grows by about 0.3°C over 3 days between each analysis.
The variance of the salinity is greatest at the surface, owing to the impact and uncertainty of the surface fluxes, particularly the freshwater flux. The RMSD of O-A for salinity is 0.12–0.2 psu at the surface, and the corresponding RMSD of O-B is about 0.04–0.06 psu greater than that of the analysis (Fig. 9).
In the deeper ocean, below about 300 m, the RMSD for both temperature and salinity decreases (Fig. 9), owing to the reduced variability there. For those depths the ensemble spread is very small. Moreover, at those depths there is also little difference between the quality of the analysis and background fields.
As noted above and in section 2 the ensemble spread is intended to represent the background error. Clearly from Fig. 9, the ensemble spread is much smaller than the RMSD between the background and observations for both temperature and salinity. This reflects two factors:
a consistency with the signal to noise ratio of 0.47 that was enforced; that is, the RMSD between the background and observations is dominated by observation error (presumably mostly representation error) rather than model error; and
the spread may be too low in places because it mainly represents errors associated with the intraseasonal component of forcing errors.
There is a classical relationship between the expectation of the innovation covariance matrix and the background and observation errors [Eq. (11)]. This relationship assumes that the background and observation errors are mutually uncorrelated, and that their covariance matrices are good approximations to the true error covariance matrices:
This equation is often used as a consistency test on the prescribed error statistics in a practical data assimilation system (e.g., Talagrand 1999; Desroziers and Ivanov 2001; Oke et al. 2002; Evensen 2003). Taking into account that the ratio of background error to observation error was enforced in PEODAS, if this statistical consistency is satisfied then the ensemble spread should be roughly 0.4 times the RMSD of O-B. From this point of view, the ensemble spread is quite a bit smaller than that value, except for the surface temperature where this relationship is consistent. This is a common issue in ensemble data assimilation applications involving the assimilation of actual observations into ocean or atmospheric models, since models are imperfect. To maintain the approximate equality of Eq. (11) some researches try to inflate the background error covariance by adjusting the amplitude of the perturbation (e.g., Keppenne et al. 2005). We find that the PEODAS system is almost unbiased for SST, so the relationship of Eq. (11) is satisfied very well for SST. This means that the PEODAS ensemble spread is appropriate for equatorial surface temperature, and we find no evidence indicating that the ensemble spread is underestimated for temperature or salinity due to the existence of the bias.
The structure (or shape) of the vertical profiles of the ensemble spread nicely resembles the vertical profiles of the RMSD fields for each region. The subsurface maximum in the RMSD profiles for temperature is evident in the ensemble spread, with the position of the maximum at about the correct depth. Similarly, the ensemble spread for salinity shows a maximum at the surface that is consistent with the RMSD profiles. However, we note that the ensemble spread for salinity does not capture the subsurface maxima that are apparent in the O-B or O-A statistics. It is found that there are subsurface bias maxima in the depth from the vertical profiles of the mean O-B or O-A fields (not shown) for each region. We note that the ensemble spread generated by PEODAS is used to estimate the unbiased background error. This is also the case for temperature in the Atlantic Ocean at the depth of around 300 m. Such a bias is strongest for the control. By contrast, PEODAS does a good job for eliminating the bias, although it is not removed entirely. This demonstrates that if the model is improved and the underlying tendency to introduce a bias is eliminated, the PEODAS system will perform better.
The PEODAS, POAMA-1, and control are all model-based reanalyses. Here we compare these reanalyses with a model-independent analysis, using the same observations as PEODAS, called the EN3 objective analysis produced by the ENSEMBLES project (see http://hadobs.metoffice.com/en3/). EN3 objective analyses uses optimal interpolation method with the analysis grid of 1.25° × 1.25° horizontally (meridional spacing decreasing to 0.3° near the equator) and vertical 40 levels. The product grid of EN3 is horizontal 1° × 1°, vertical 31 levels. Using subsurface temperature and salinity observational data (EN3 in situ datasets), the EN3 analysis relaxes toward climatology in the absence of data area. Figure 10 shows monthly time series (filtered with 13-month running mean) of area-averaged temperature and salinity, averaged over the top 300 m, in the western tropical Indian and Pacific Ocean (WTIO and WTPO). These two regions are chosen because they are important regions for the Indian Ocean dipole (Saji et al. 1999) and the Pacific warm pool variability. For this analysis the region for the WTIO spans 10°S–10°N, 50°–70°E and the region for the WTPO spans 10°S–10°N, 150°E–170°W.
The comparisons in Fig. 10 show that the control is generally biased warm and salty. The assimilation does a good job of eliminating biases in temperature. The nature of the variability in the WTIO and the WTPO is very different. For example, the variability in the tropical Pacific is dominated by several large-amplitude events in temperature that are associated with ENSO (Fig. 10e). We note that even the control run with no data assimilation captures these events in the Pacific, because of the surface forcing. By contrast, the temperature variability in the Indian Ocean exhibits more small-amplitude variability and only one or two large-amplitude events during the reanalysis. It is interesting that the salinity time series for POAMA-1 shows large differences with EN3 in both regions for most of the reanalysis (Figs. 10c,f). We attribute this difference to a combination of the neglect of salinity assimilation and the dynamically unbalanced assimilation technique used in POAMA-1, as discussed in sections 4b and 4c. For the WTPO, the POAMA-1 salinity drifts rapidly in the first 5–10 years. The control simulation is significantly better than that for POAMA-1. The smallest differences with EN3 objective analysis is for PEODAS, which shows particularly good agreement for WTPO. We note that there is a distinct change in the performance of the system after about 2001, when the Argo program really became established (Figs. 10a,d). After this period, the PEODAS analysis is in very good agreement with the objective analysis. This may indicate that before Argo, there were simply insufficient salinity observations, particularly in the tropical Indian Ocean, to constrain the modeled salinity at all; and only after Argo became established were there enough salinity observations to properly constrain a dynamical model. Alternatively, it may indicate that there were too few salinity observations for the EN3 analysis product to yield a reliable estimate of salinity.
We present monthly mean time series of the thermocline depth for different locations along the equator from TAO observations (McPhaden 1995; data delivery online at http://www.pmel.noaa.gov/tao/) and from different reanalyses in Fig. 11. Here the thermocline is defined as the depth of the 20°C isotherm. Although, PEODAS does not explicitly assimilate observations of the thermocline depth, it does assimilate temperature observations from which the thermocline depth can be directly derived. Figure 11 shows that the variability in the control run is generally in good agreement with the observed estimate at the selected TAO moorings. The correlations between the observed and modeled thermocline depth from the control run is over 0.89. We ascribe this high correlation to the strong dependence of the thermocline depth on surface fluxes; and since for this integration of the model the surface fluxes are derived from atmospheric reanalyses, the surface fluxes are realistic, and so the thermocline depth is realistically reproduced. However, the RMSD between the observed and modeled thermocline depth in the control run remains quite large, ranging from 8.9 to 20.5 m. This is comparable to the magnitude of the variability itself, and should be improved by assimilation. This means that even though the control run reliably represents the timing, or phase, of the thermocline variations, it does not accurately represent the magnitude of variations.
The time series of thermocline depth from the PEODAS analysis and background show a quantitative improvement compared to the control (Fig. 11). The PEODAS analysis tracks the observed thermocline depth almost perfectly, with correlations of 0.97–0.98 and RMSDs of 5–6.9 m. By contrast, the POAMA-1 fields have correlations of 0.89–0.96, indicating that POAMA-1 tracks the phase of the variability and the timing of events almost as well as PEODAS, but it does not reproduce the magnitude of variations as accurately as PEODAS. More specifically, the RMSDs for the POAMA-1 thermocline depth range from 5.9 to 13.9 m, which is significantly more than that of PEODAS, particularly in the central and eastern Pacific, even though the same signal-to-noise ratio was used.
b. Comparisons with independent observations
We now turn to a series of comparisons with unassimilated observations, namely velocity and sea level anomaly observations. Here, we include comparisons with PEODAS, POAMA-1, and the control.
Comparisons with zonal velocity profiles measured by TAO acoustic Doppler current profiler (ADCP) are shown in Fig. 12. This includes profiles of RMSD and correlation for different longitudes along the equator at locations of TAO moorings. Generally, PEODAS shows smaller RMSD and larger correlation compared to the other reanalyses. Of all the reanalyses considered, POAMA-1 tends to return the worst performance and PEODAS is generally the best. We think that this is because PEODAS also assimilates salinity and because the PEODAS-based increments are more dynamically consistent than the POAMA-1 based increments, due to the multivariate nature of PEODAS, using temperature and salinity observations to construct an explicit increment for velocity. Our argument about the dynamical consistency is simply that an ensemble-based system (i.e., PEODAS), as mentioned in section 2, produces assimilation increments that are consistent with the model dynamics. By contrast, this is not necessarily the case for analysis increments derived from a univariate OI scheme, even when geostrophic increments are included (i.e., POAMA-1).
We compare observed and modeled surface velocity in the tropical oceans in Fig. 13. Here, we use the surface zonal velocity estimates from the Ocean Surface Current Analyses—Real time (OSCAR) database (Bonjean and Lagerloef 2002). Briefly, OSCAR velocities are derived from satellite altimeter and scatterometer data from the end of 1992 up to near real time and cover the whole ocean from 60°S to 60°N. Comparisons between OSCAR and data from the World Wide Drifter Buoy Deployment as well as from TAO/TRITON/PIRATA mooring data indicate that the OSCAR methodology performs well at reproducing both the mean and time-varying structures of the surface circulation (data and comparisons online at http://www.oscar.noaa.gov/). The comparisons in Fig. 13 show correlations between zonal velocities from OSCAR and from different reanalyses. Specifically, we compare the observed estimates for zonal velocity from OSCAR, with the modeled surface zonal velocities from PEODAS, POAMA-1, and the control. Of these products, PEODAS has the highest correlations. Note that the regions of greater than 0.8 correlation in the Pacific and Indian oceans are more expansive for PEODAS than the others. It is also interesting to note that along the equator POAMA-1 has relatively low correlations compared with both PEODAS and control. Also throughout the tropical Indian Ocean POAMA-1 has the lowest correlation. As discussed in section 4b, this is due to both the lack salinity data and temperature only assimilation generating dynamical instabilities, and the inadequacy of the geostrophic increments for the surface current used in POAMA-1.
We compare observed and modeled sea level anomaly (SLA) in Fig. 14, using weekly maps of SLA that are produced by Archiving, Validation, and Interpretation of Satellite Oceanographic data (AVISO). AVISO SLA maps are generated by combining along-track SLA observations from all altimetric missions using OI. The details of the OI mapping are described by Ducet et al. (2000). Briefly, the length scales used for the OI range from 100 km in the zonal and meridional directions at 60°N–60°S, to 250 (350) km in the meridional (zonal) direction at the equator. Maps are produced on a global ⅓° Mercator grid with 18.5-km resolution at 60°N–60°S and 37-km resolution at the equator. A low-resolution (1° × 1°) version of weekly merged SLA product is used for comparisons here. The data are processed by linear interpolating the weekly data to daily from which the SLA monthly means are then calculated. Note that our model is a rigid-lid model. We therefore diagnose the model SLA by inverting the modeled surface pressure field and removing a long-term mean.
The comparison in Fig. 14 indicates that PEODAS produces fields that are in better agreement with observed SLA than both POAMA-1 and the control in the tropical Pacific Ocean. Consistent with the assessment of surface velocity in Fig. 13, the SLA comparisons show that PEODAS has a broader region of greater than 0.95 correlation in the tropical Pacific Ocean. It is again interesting that the POAMA-1 correlations are similar to those obtained using the control reanalysis, whereas the PEODAS correlations are superior. This indicates that PEODAS is a significant improvement of both control and POAMA-1.
There are, however, some areas, such as in the tropical Indian, Atlantic, and southeast Pacific oceans where the PEODAS correlations are a bit smaller than the control run. This is likely caused by the spurious trends and signals in salinity and/or temperature due to the nonstationary nature of the observing system in the presence of model bias (e.g., Fig. 10). This side effect of data assimilation is a common problem for most ocean data assimilation systems because of changing observing systems in the presence of model bias. To deal with such a problem, Balmaseda et al. (2007) introduced a unique bias correction method by having an a priori estimate of the bias term. They found that poorly observed regions such as the equatorial Indian Ocean are better represented by using this method. This bias correction scheme is under considerations for improving PEODAS in subsequent versions.
6. Summary and conclusions
In this paper, we provide a comprehensive description of PEODAS, a new Australian ocean data assimilation system for seasonal forecasting. PEODAS is an ensemble system that includes explicit estimates of state-dependent background error covariance; involves dynamically consistent, multivariate model updates; and explicitly involves ensemble prediction.
The primary purpose for developing PEODAS is for the delivery of operational seasonal forecasts for the benefit of the Australian and international communities. Some aspects of the design of PEODAS are based on what we regard as sound hypotheses about the system’s errors. For example, the method for ensemble generation that involve explicit perturbations to the surface forcing fields is based on an expectation that the dominant source of error in the system, particularly in the tropics, is errors in the surface fluxes. Other aspects of the design of PEODAS are more pragmatic and driven by a need for computational efficiency. For example, the compression of the ensemble after each assimilation cycle is somewhat ad hoc. We regard it as a computationally efficient ensemble transformation; and while we recognize it as suboptimal, we consider it to be a practical solution that is appropriate given the limitations in computational resources available. The additive covariance inflation is also regarded as a simple way to represent model error in the ocean interior that is not dependent on surface forcing error. We also recognize that fixing the ratio of observation to background error variance does not fully exploit the benefits of the time-varying ensemble spread, but we have retained this feature of the assimilation to facilitate comparison with POAMA-1 and to avoid overfitting observations.
Based on results from a 27-year reanalysis, we show that PEODAS outperforms POAMA-1, Australia’s current operational ocean analysis system. We compare reanalyses from PEODAS and POAMA-1 to both assimilated and independent observations. These include comparisons with in situ temperature, salinity, and velocity observations, and an objective analysis product, as well as comparisons with satellite-derived observations. On almost all metrics, PEODAS outperforms POAMA-1. We also include comparisons with a control run with no data assimilation. PEODAS outperforms the control run in almost all aspects. The main exception is for sea level in the Indian Ocean, where significant changes to the observing system lead to a nonstationary bias and spurious time variability. It is interesting that while POAMA-1 outperforms the control when assessing temperature based quantities, as expected, quantities that are not assimilated, such as salinity and currents, are comparable or inferior to the control. This shows that the lack of salinity assimilation and the univariate nature of POAMA-1 covariances led to dynamical inconsistencies. This is not the case for PEODAS, where the multivariate structure of the covariances were shown to be dynamically consistent and the reanalysis is superior to the control when compared to independent data.
The comparisons of salinity analyses in equatorial regions indicate that while salinity is reasonably well-constrained in the Pacific Ocean for the entire reanalysis period, it is virtually unconstrained in the Indian Ocean until the Argo program became well established, after 2001. After 2001, when the number of salinity observations increases notably, the reanalyzed salinity fields in the Indian Ocean suddenly become well-constrained, with close agreement with objective analysis, which is based on observational data and climatology.
The goal of PEODAS is to facilitate skilful seasonal forecasts of the ocean and atmosphere over time scales of several months. This paper provides an initial assessment of PEODAS, focused on the assessment of PEODAS-based reanalyses. The next step is to evaluate the performance of PEODAS-based long-range forecasts, and to compare the skill of those forecasts to that of POAMA-1. Compared to its predecessor, PEODAS is a step toward more optimal data assimilation. Several aspects require more tuning and refinement, and this will be done in subsequent versions. Based on the results presented in this paper, we expect that PEODAS will prove to be a valuable tool for the seasonal prediction community, and for the community that it hopes to serve.
The authors gratefully acknowledge F. Tseitkin for technical support; and D. Anderson for constructive and fruitful discussions that led to improvements in the design of PEODAS. Comments from three anonymous reviewers helped to improve the presentation of this paper. This research is funded by the Australian Bureau of Meteorology and Australia’s CSIRO through appropriation funding, and by the U.S. Office of Naval Research Ocean Modeling Program and Western Australia Marine Science Institution. NCEP reanalysis data were provided by NOAA/OAR/ESRL PSD, Boulder, Colorado, from their Web site (http://www.esrl.noaa.gov/psd/).
Corresponding author address: Yonghong Yin, Centre for Australian Weather and Climate Research, Bureau of Meteorology, GPO Box 1289, Melbourne, VIC 3001, Australia. Email: firstname.lastname@example.org