## Abstract

A simple, versatile, computationally efficient ensemble-based method for objectively designing an observation array is described. The method seeks to compute the observation array that minimizes the analysis error variance, according to Kalman filter theory. While most elements of the method have been described elsewhere, this paper attempts to present a simple, yet comprehensive, recipe for array design based on an ensemble of anomalies that represents the background error covariance. The versatility of the method is demonstrated through a series of applications to the tropical Indian Ocean (TIO). The first application uses model-generated fields of high-pass-filtered mixed layer depth to design an array to monitor intraseasonal variability. The second uses gridded observations of sea level anomaly to design an array to monitor intraseasonal-to-interannual variability. For both applications, the objectively designed arrays are compared to an array that will soon be implemented under the auspices of the Climate Variability and Predictability–Global Ocean Observing System (CLIVAR–GOOS) Indian Ocean Panel (CG-IOP). The authors conclude that the CG-IOP array produces results that compare well to the objectively designed arrays for intraseasonal variability, and observations to the east and northeast of the TIO and south of India are most important for resolving intraseasonal variability. The authors also find that observations near 9°S, where seasonal Rossby waves dominate, are important for observing seasonal-to-interannual variability. The described method for objective array design can be applied to a wide range of geophysical applications where time series of gridded modeled or observed fields are available.

## 1. Introduction

The use of models and data assimilation theory to aid in the design of atmospheric and oceanic observing systems has gained momentum over recent decades (e.g., Kållberg 1984; McIntosh 1987; Barth and Wunsch 1990; Kuo et al. 1998; Hackert et al. 1998; Bishop et al. 2001; Hirschi et al. 2003; Schiller et al. 2004; Oke and Schiller 2007b). The applications described in this study are motivated by the proposal of the Climate Variability and Predictability–Global Ocean Observing System (CLIVAR–GOOS) Indian Ocean Panel (CG-IOP; CLIVAR et al. 2006) to implement a surface mooring array in the tropical Indian Ocean (TIO) (CG-IOP array) that is similar to the Tropical Ocean and Global Atmosphere–Tropical Atmosphere Ocean (TOGA–TAO) array in the tropical Pacific Ocean. This initiative has resulted in a series of model-based studies that have attempted to assess the CG-IOP array and identify possible improvements. Results from these studies are varied. For example, Vecchi and Harrison (2007) presented results from a series of observing system simulation experiments (OSSEs) using a high-resolution ocean model and assessed the ability of an integrated observing system, including Argo observations, XBT lines, and the CG-IOP array to monitor intraseasonal and interannual variability. Ballabrera-Poy et al. (2007) used a reduced-order Kalman filter to objectively determine an array for mapping sea surface height and sea surface temperature in the TIO. Oke and Schiller (2007a) use an EOF-based approach to assess the CG-IOP array’s ability to monitor intraseasonal and interannual variability. Vecchi and Harrison (2007) conclude that in conjunction with the integrated observing system, the CG-IOP array should be capable of resolving intraseasonal and interannual variability. Both Ballabrera-Poy et al. (2007) and Oke and Schiller (2007a) argue that the CG-IOP array may oversample the region within a few degrees of the equator. These studies also suggest that key regions for monitoring seasonal-to-interannual variability are south of 8°S, at about 4°–5° from the equator and along the coast of Indonesia. These regions correspond to the locations of the maximum amplitude of seasonal Rossby waves (Masumoto and Meyers 1998; Schouten et al. 2002), equatorial Rossby waves, and strong Indian Ocean dipole events (Murtugudde et al. 2000), respectively.

In this study, we describe a simple, versatile, computationally efficient ensemble-based method for objectively designing an observation array, and consider two applications of the method to the TIO. The first application uses model-generated fields of high-pass-filtered mixed layer depth to design an array to monitor intraseasonal variability. The second application uses gridded observations of sea level anomaly to design an array to monitor intraseasonal-to-interannual variability.

Ensemble-based methods for optimal array design have been used intensively in the last few years (e.g., Bishop et al. 2001). These methods are based on ensemble square root filter theory (e.g., Tippett et al. 2003) and allow one to handle large systems in cases when explicit manipulation of the background error covariance matrix is not possible or not feasible. Most of the studies on the ensemble-based optimal array design consider the problem of *targeted observations,* or adaptive sampling, aimed at improving the model’s forecast at a given time (e.g., Bishop et al. 2001; Langland 2005; Khare and Anderson 2006). The problem of targeted observations implies the use of an ensemble of model states representing the background error covariance at a particular time, which in itself often poses a formidable task. Consequently, the advantages of the ensemble-based optimal array design methods may not have been fully exploited by the wider atmospheric and oceanographic scientific communities.

In this study, we consider a problem of optimal array design based on an ensemble of system states that is formed from output from a long model run or from time series of observation-based gridded fields. Because such an ensemble represents the time-averaged statistics of the system, the described method is most suitable for designing a stationary array that is intended to be deployed over a long period of time. Furthermore, the method does not take advantage of possible time correlations because it treats the ensemble as a nonordered collection of system states that adequately approximates the background error covariance.

The main steps in the objective design of an observation array, as described in this manuscript, are represented schematically in Fig. 1. Suppose an array of *p* observations is to be designed. The method starts with the construction of a representative ensemble. The first optimal observation is then calculated, and the ensemble is updated so that the influence of this observation is reflected (i.e., the background error variance of the ensemble is reduced). The next optimal observation is then calculated, the ensemble is updated again, and so on, until all *p* observations are calculated.

Equation (3.3) of Berliner et al. (1999) provides a means for identifying the optimal observation for the simplest case where the observation is of the same field that is to be analyzed by the assimilation system. In this paper, we express this formula in terms of ensemble data assimilation, so that an ensemble of anomalies can be used instead of the background error covariance matrix. We also extend their formulation to facilitate the design of multisensor observation networks. An example of a multisensor observation network is a mooring at a certain location that measures more than one variable and/or has multiple instruments at different depths. The ensemble update is conducted using the ensemble square root filter, so that the updated, or transformed, ensemble reflects the reduced background error variance, given information from the optimal observation, or observations, identified in step 2 above. The computational cost of the described method is shown to be linear with respect to the system state dimension, which makes it scalable for application to large systems. Further, by augmenting the state vector and/or rescaling its components, one can design an observing array that best constrains an arbitrary function of the state vector. We demonstrate the versatility of the method through a series of applications that are intended to resolve different time scales of variability, and that use ensembles from different sources (i.e., model and observation based).

## 2. Method

Suppose we have a discrete system that is characterized by a state vector **x**^{n}^{×1} and let us assume that the uncertainty of **x** is characterized by a background error covariance matrix 𝗣* ^{b}*. Suppose that after assimilation of a set of observations, the uncertainty of the analyzed system is characterized by the analysis error covariance matrix 𝗣

*. In this case, the optimal set of observations characterized by the observation matrix 𝗛*

^{a}^{opt}may be defined as a set that minimizes some norm of 𝗣

*:*

^{a}where {𝗛} denotes the set of all possible observations.

This approach can be easily implemented using the Kalman filter theory, according to which after assimilating *p* observations characterized by the observation matrix 𝗛^{p}^{×}* ^{n}* and error covariance matrix 𝗥

^{p}^{×}

*the analysis error covariance is given by*

^{p}where 𝗜 is the identity matrix and the superscript T denotes a matrix transpose. We note that the solution given by the Kalman filter is a minimum variance estimation; it does not require the statistics of the model errors and observation errors to be Gaussian; however, the minimum variance estimate coincides with the maximum likelihood estimate if these distributions are Gaussian (e.g., Jazwinski 1970, p. 157). Because the Kalman filter minimizes the trace of the analysis error covariance 𝗣* ^{a}*, we should assume for consistency that it is the trace of the system error covariance that is used as a norm of the uncertainty in the system state, ||𝗣

*|| = trace(𝗣*

^{a}*).*

^{a}where is the observation error covariance associated with the observation matrix 𝗛. Calculation of (3) involves inversion of a *p* × *p* matrix, which is computationally inexpensive for small *p*.

The solution (3) may include multiple observations and multiple sensor elements (e.g., a mooring at a certain location that measures more than one variable and/or has multiple instruments at different depths). In a more simple case when only one measurement of a single element **x*** _{k}* of the state vector is possible at a time, 𝗛 becomes a row vector 𝗛 =

**h**(

*k*) such that

**h**

*(*

_{i}*k*) = {0,

*i*≠

*k*; 1,

*i*=

*k*}, and 𝗥 becomes a scalar,

*R*=

*r*(

*k*). In this case, (3) simplifies to

This solution is equivalent to Eq. (3.3) of Berliner et al. (1999). We note two important features of this solution. First, the location of the optimal observation is given not by the maximum variance **P*** _{kk}* of the observed element

**x**

*, but rather by a certain balance between its variance and its covariance with other elements*

_{k}**P**

*,*

_{ki}*i*≠

*k*. Second, the location of the optimal observation depends on the value of the observation error variance

*r*(

*k*).

Although solution (3) may include multiple independent observations (corresponding to different rows in 𝗛), the number of possible combinations of locations of the array elements in this case increases as a power of the system dimension. In practice that means the set of possible observations 𝗛 becomes so large that cycling through its elements in (3) is too computationally expensive even for systems of moderate size and an array of just two elements. This case of “parallel” optimization should be distinguished from the case of multiple sensor elements, when locations of the individual sensor elements are mutually dependent, and the computational cost remains linear in respect with the system dimension.

After finding the location of the first optimal observation from (3) or (4), and calculating 𝗣* ^{a}* by (2), one may use the analyzed covariance 𝗣

*to find the location of the second optimal observation and so on (Fig. 1). As discussed above, this serial approach is the only feasible option for most applications. The serial assimilation and parallel assimilation of observations with uncorrelated errors result in the same analysis error covariance (Bierman 1977), so that the serial approach can be used to objectively determine an observation array of a given size (Bishop et al. 2001). Similarly, it is common practice in applications of the ensemble Kalman filter for observations to be assimilated either in small batches or one at a time (e.g., Houtekamer and Mitchell 1998; Whitaker and Hamill 2002). For completeness, we include a proof of this equivalence in the appendix. However, we note that this serial approach does not guarantee, nor does it generally yield, the array that corresponds to the global minimum of trace(𝗣*

^{a}*) (F. De Hoog 2007, personal communication). Hereafter, we therefore refer to the arrays deigned using this method as objectively designed arrays, rather than optimal arrays.*

^{a}In practice, the dimension of the state vector ** n** can be very large, making it impossible to explicitly manipulate the

*n*×

*n*covariance matrices. Instead, one may store and manipulate the covariance matrix implicitly via a representative ensemble 𝗔 of the system state anomalies, 𝗔

^{n}^{×}

*= [*

^{m}*δ*

**x**

^{(1)}, . . . ,

*δ*

**x**

^{(m)}], where

*δ*

**x**

^{(i)}=

**x**

^{(i)}−

**x**,

*m*is the ensemble size and the overbar denotes the ensemble average. In this case the error covariance 𝗣 associated with the ensemble 𝗔 is given by

and

respectively, where 𝗔* _{i}* =

**h**(

*i*)𝗔 is the

*i*th row of 𝗔. To maximize the computational effectiveness, one should precalculate the product 𝗔

^{T}𝗔 before cycling over the possible locations, in which case the computational cost of calculating (6) or (7) becomes linear with respect to the system state dimension

*n*and is

*O*(

*nm*

^{2}).

Using this ensemble-based approach, after finding an optimal observation, one needs to update the background ensemble 𝗔* ^{b}* → 𝗔

*in such a way that the analysis covariance calculated by (5) matches that given by (2) (Fig. 1). There are a number of formally equivalent solutions for such an ensemble update that are readily provided by different flavors of the ensemble square root Kalman filter (e.g., Tippett et al. 2003). The best choice among these solutions depends on the dimensions*

^{a}*n*,

*m*, and

*p*of the application. In the context of this paper, when typically

*n*≫

*m*and

*p*= 1 (for a single element array), a computationally effective choice is the solution given by the ensemble transform Kalman filter (Bishop et al. 2001):

in which a relatively expensive operation of calculating the inverse square root of a matrix is performed in the ensemble space of dimension *m*, rather than in the state space of dimension *n*. This calculation itself can be conducted by using an eigenvalue (or a singular value) decomposition of the matrix in square brackets in (9). Because this matrix is symmetric, it can be decomposed as

where 𝗨 is an orthonormal matrix (𝗨𝗨^{T} = 𝗜) and **Λ** is a diagonal matrix. Subsequently, the inverse square root can be calculated as

The computational cost of this step is defined by that of the eigenvalue decomposition during the calculation of the transformation matrix 𝗧, which is *O*(*m*^{3}), and by that of the ensemble transformation 𝗔* ^{b}*𝗧, which is

*O*(

*nm*

^{2}). Because in most applications

*m*≪

*n*, the cost of calculating the ensemble transform 𝗧 is typically much less than that of calculating the optimal observation and performing the ensemble transform itself.

Note that the calculation of a constrained array, as in section 4a, requires multiple ensemble updates for each subset of observations, which may substantially increase the computational cost. For example, assume that we have to find an optimal set of six observations within a two-dimensional gridded field of size 50 × 50, constrained to be located at the same column of the grid (e.g., the same longitude). We may proceed by finding six optimal observations within each column and choosing the set with the minimal trace of the analyzed covariance. In this case, finding the best set will require 6 × 50 = 300 ensemble updates compared to just 6 ensemble updates in the case of independent, or unstructured, observations. It will also require 300 calculations of the optimal location, but each of these calculations will be restricted to cycling through one column (longitude) of the grid and therefore will become about 50 times less expensive than in the nonconstrained case.

Equations (7) and (8) form a theoretical basis for the ensemble-based method of objectively designing observation arrays in which the ensemble is used to factorize the system covariance by means of (5). The two main assumptions for this method are that the error covariance of the system is correctly represented by the ensemble of state vector anomalies, and that observations are assimilated in an optimal way. The statistical properties of the ensemble should depend on the application to which the observing system is being designed. For example, suppose an ensemble Kalman filter system is used for data assimilation. In this case, the background field is the ensemble mean and the ensemble should be formed by the anomalies about the ensemble mean at a given time. Alternatively, suppose the observation array is designed for a monitoring system that will produce gridded fields of the observed quantity, using climatology as the background field (e.g., Smith and Meyers 1996). In this case, the ensemble may be a time series of gridded anomalies generated either from a long model run or from observations. In the remainder of this paper, we restrict our attention to the latter case, where it is appropriate to use a stationary ensemble.

As noted above, the application of a Kalman filter does not necessarily imply that the system is Gaussian. Similarly, there are no restrictions on the statistics of the initial ensemble of anomalies. Its distribution does not have to be Gaussian or even represent the distribution of the model perturbations; also, it does not have to be zero centered. The only requirement to the initial ensemble is to factorize the background error covariance by means of (5).

The described method can be easily generalized to calculate observations targeted for the maximum reduction of variance in an arbitrary quantity **q** (e.g., volume transport, rate of overturning, water mass properties) related to the system state,

This can be achieved by augmenting the state vector to include both **x** and **q**,

and scaling down **x** by setting ɛ so small that the trace of the covariance matrix of the new ensemble will be dominated by the contribution from **q**. Alternatively, instead of scaling, one can replace the product 𝗔^{T}𝗔 in (6) or (7) by 𝗔^{T}_{q}𝗔_{q}, where 𝗔* _{q}* is the ensemble formed only by the anomalies of the elements of interest of the augmented state vector: 𝗔

_{q}= [

*δ*

**q**

^{(1)}, . . . ,

*δ*

**q**

^{(m)}],

*δ*

**q**=

**q**−

**q**.

Equation (8) may be used to calculate the analyzed anomalies for any (not necessarily optimal) set of observations. Together with (5), it provides a basis for an inexpensive comparison of the performance of different observation arrays without performing OSSEs.

In this paper, the performance of objectively derived arrays is compared with the performance of the CG-IOP mooring array for the TIO. The CG-IOP array is structured, with a number of mooring lines at various longitudes across the TIO to facilitate routine servicing. We therefore constrain our objectively designed arrays to have a structure that is similar, although not identical to the CG-IOP array. Namely, we allow 33 locations only to be structured in five meridionally aligned lines of six moorings, one line of three moorings, and to be located within 12° of the equator. To find the “optimal” locations for the first mooring line, we calculate the optimal locations for six sites for all possible lines and choose the line that provides the maximal reduction of trace(𝗣* ^{a}*) and so on.

## 3. Data

We present results for applications to the TIO using model-generated ensembles of high-pass-filtered mixed layer depth to represent intraseasonal variability (representing periods of 3–100 days) and an ensemble of observation-based gridded sea level anomalies (GSLAs) representing intraseasonal-to-interannual variability. The details of the models and observations are outlined below.

### a. Model configurations

The reliability of an objectively designed array from a model-based ensemble depends on the representativeness of the ensemble. If the background error covariance represented in the ensemble is unrealistic, then the objectively designed array will be flawed. One way to assess the robustness of an objectively designed array is to derive multiple arrays from multiple ensembles generated by different models or different model simulations. For this study, we derive and evaluate an objectively designed array using output from three different models, with different resolution, different forcing, and integrated for different time periods. All three models are global configurations of different versions of the Geophysical Fluid Dynamics Laboratory (GFDL) Modular Ocean Model (Pacanowski 1995; Griffies et al. 2004), with the hybrid mixed layer model described by Chen et al. (1994). Specifically, the models used here are the Australian Community Ocean Model (ACOM), versions 2 and 3 (ACOM2 and ACOM3), and the Ocean Forecasting Australian Model (OFAM). The details of these models are summarized below and in Table 1.

#### 1) ACOM2

Details of the configuration of ACOM2 are described by Schiller and Godfrey (2003). ACOM2 is based on version 2 of the Modular Ocean Model (MOM2; Pacanowski 1995), has constant zonal resolution of 2°, enhanced meridional resolution of 1/2° within 8° latitude of the equator that gradually increases to 3/2° toward the poles, and 25 vertical levels. ACOM2 is initially spun up for 20 yr using climatological forcing and is subsequently integrated for 1982–94 using 3-day-averaged wind stress from a blend of National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) fields (Kalnay et al. 1996) and the Florida State University climatology (Legler et al. 1989). Surface heat and freshwater fluxes are obtained by coupling ACOM2 to the atmospheric boundary layer model (ABLM) described by Kleeman and Power (1995).

#### 2) ACOM3

Details of the configuration of ACOM3 are described by Schiller (2004). ACOM3 is based on version 3 of MOM (MOM3), has a zonal resolution of 1/2°, meridional resolution of 1/3°, and 33 vertical levels. The model is initially spun up for 8 yr using climatological forcing and is subsequently integrated for 1992–2000 using daily *European Remote Sensing Satellite-1/2* (*ERS-1/2*) scatterometer winds, surface heat fluxes (latent, sensible, and longwave) from the ABLM of Kleeman and Power (1995), solar shortwave radiation derived from observed outgoing longwave radiation within 15° of the equator and NCEP elsewhere, and precipitation from Xie and Arkin (1997).

#### 3) OFAM

Details of the configuration of OFAM are described by Oke et al. (2005; 2008). OFAM is based on MOM4 and has zonal resolution of 1/10° between 90°E and 180° that gradually increases to 2° between 60°W and 0°, meridional resolution of 1/10° south of 25°S that gradually increases to 2° north of 45°N, and 47 vertical levels. The model is initially spun up for 3 yr with climatological forcing and is subsequently integrated for 1992–2004 using 6-hourly wind, heat, and freshwater fluxes from the 40-yr European Centre for Medium-Range Weather Forecasts Re-Analysis (ERA-40; Kållberg et al. 2004).

#### 4) Model validation

Results from ACOM2 and ACOM3 have been used to explore intraseasonal variability (Schiller and Godfrey 2003) and upper ocean dynamics (Schiller et al. 1998). These studies have undertaken extensive comparison with observations, including comparisons of time series of subsurface currents in the central Indian Ocean (Schiller and Godfrey 2003) and validation of the model’s surface heat fluxes (Schiller et al. 1998) that represent an assessment of both the ocean’s upper ocean dynamics and the ABLM. Additionally, Schiller (1999) presented a series of comparisons between modeled and observed subsurface temperature along frequently repeated XBT lines in the Indian Ocean. The extensive model validation referred to above was typically very favorable. This demonstrates that these models generally provide a realistic representation of the intraseasonal variability in the TIO.

To date, the validation of OFAM has focused on the Australian region for a 13-yr reanalysis experiment, known as the Bluelink Reanalysis (BRAN), that was based on OFAM (Oke et al. 2005). Oke et al. (2005) showed that BRAN provides a good synoptic representation of the eddy field around Australia, and that the subsurface temperature variability along the equatorial Pacific Ocean is in good agreement with observations from the TOGA–TAO array in both BRAN and a free run of OFAM. More recently, Oke et al. (2008) presented a comprehensive evaluation of the latest BRAN experiment, showing that both BRAN and OFAM compare well to observations.

### b. Remapping

To simplify our analysis, we remap all fields onto the ACOM2 grid. This is achieved by averaging the original fields onto the ACOM2 grid, using a boxcar filter that has a footprint that spans each model grid cell. This remapping has the effect of eliminating the details of the fields that have scales that are not resolvable on the ACOM2 grid. All gridded products therefore become more or less compatible and can be compared directly in the experiments that follow. This remapping also means that the range of possible locations included in the computation of optimal locations, by (7) or (6), are the same for all applications presented here.

One limitation here, for ACOM3, is that the ACOM3 fields are only available on a 1° × 1° grid that has been subsampled (not averaged) from the original grid. As a result, the ACOM3 fields are noisier than they would be if we could remap them by averaging from the original grid.

### c. Observations

Weekly maps of GSLA, on a ⅓° Mercator grid, are used here to represent intraseasonal-to-interannual variability and to demonstrate that the method described in section 2 can be applied as easily to gridded observations, as it can to model-based fields. The GSLA maps used here are produced by Aviso, are freely available using OpenDAP (opendap.aviso. oceanobs.com/thredds/dodsC/), and are generated by combining along-track sea level anomaly observations from all altimetric missions [Ocean Topography Experiment (TOPEX)/Poseidon, *ERS-1/2*, *Jason-1*, and *Envisat*) using optimal interpolation (OI). The details of the OI mapping are described by Ducet et al. (2000) and La Traon et al. (1998). Briefly, the length scales used for the OI range from 100 km in the zonal and meridional directions at 60°N–60°S to 250 (350) km in the meridional (zonal) direction at the equator.

## 4. Results

### a. Model-based array design

We objectively compute an observation array using an ensemble of intraseasonal mixed layer depth (IMLD) from ACOM2, ACOM3, and OFAM and subsample and average fields from ACOM3 and OFAM onto the ACOM2 grid, as described in section 3b. The mixed layer is defined here as the depth over which the potential density decreases by 0.1 kg m^{−3} compared to the surface density and has been high-pass filtered to retain only variability on time scales of less than 100 days. The IMLD has been shown to provide a good representation of intraseasonal variability (e.g., Shinoda and Hendon 1998). We apply the method described in section 2 to objectively design an observation array for the ACOM2, ACOM3, and OFAM ensemble of IMLD and calculate the structured observation array as described in section 2. As mentioned above, to be consistent with the configuration of the CG-IOP array, the mooring locations are restricted to within 12° of the equator, while the system state is defined to include all grid points in the TIO north of 15°S. We use an ensemble of 1463 (12 yr), 976 (8 yr), and 1460 (12 yr) members of three daily IMLD fields from ACOM2, ACOM3, and OFAM, respectively.

The variance of IMLD for all three models is shown in Fig. 2. These variance maps show good qualitative agreement, with the maximum variance in the eastern equatorial Indian Ocean. This region has been identified by observations to be a site where intraseasonal oscillations have their largest amplitude (Webster et al. 2002; Sengupta et al. 2001). The variance in OFAM is about twice as large as in ACOM2 and ACOM3 owing to the finer horizontal resolution, rendering a richer representation of equatorial Kelvin waves and other small-scale phenomena.

Taking into account the likely vertical resolution of the array, and subgrid-scale and daily variability, we assume that the observed IMLD has an error variance of 25 m^{2}, and that observation errors are not correlated in space. The method allows us to not only calculate the observation array objectively, but also to rank the individual sites or mooring lines of the CG-IOP array in terms of their impact on trace(𝗣* ^{a}*). We compute an array for each model and calculate the corresponding final theoretical analysis error variance [diagonal elements of 𝗣

*from (2)], including the CG-IOP array, for the ensembles from all three models. This allows us to cross check the likely performance of each array using statistics from all models. Figure 2 shows maps of the theoretical analysis error variance for each experiment along with the locations and rankings of each array. We have also ranked the locations from the CG-IOP array using each model’s ensemble (Fig. 2, second row). We note that each of the derived arrays have several mooring lines in common. The rankings of the locations all agree that the best mooring line is between 90° and 95°E (locations denoted 1–6 for all four arrays), and that the mooring line south of India is also very important (locations 7–12 for the CG-IOP, ACOM2, and ACOM3, and 13–18 for OFAM). These regions are where the IMLD have the largest variance.*

^{a}The basin-averaged theoretical analysis error variance of IMLD and its percent reduction (given by the observed variance minus the final analysis error variance, divided by the observed variance, multiplied by 100) are presented in Table 2 for each array. This shows that when the ACOM2 ensemble is used to assess the arrays, the ACOM2 array performs best, as we expect, and the CG-IOP array performs worst. Similarly, when the ACOM3 (OFAM) ensemble is used to assess the relative performance of each array, the ACOM3 (OFAM) array performs best, and the CG-IOP array performs worst (Table 2). Again, this is as we expect, indicating that the arrays derived from the models are indeed an improvement on the CG-IOP array, at least in theory. In practice, consideration of the maps of the theoretical analysis error variance in Fig. 2 and the statistics in Table 2 suggest that we should not expect a very significant difference in the ability of the CG-IOP array or any of the objectively designed arrays to monitor variability in IMLD. Indeed, for IMLD, all arrays considered here only reduce the error variance by 19%–32%. This reduction is quite small and is due to the relatively short decorrelation length scales of IMLD. We conclude that the CG-IOP array is probably as good as any array that we can objectively design for resolving IMLD. However, based on our experiments, it is not clear that a mooring array, with only 33 moorings in the TIO, in isolation will resolve intraseasonal variability with the accuracy that may be needed for some applications. This may only be achieved using an integrated observing system, as Vecchi and Harrison (2007) suggest.

In addition to the experiments described above, we have also applied the objective array design method to several different model variables (e.g., upper ocean heat content over the top 400 m). However, we found that the models used here give quite a different representation of these variables. It is not clear whether these differences are due to inadequacies in the models, due to the models simulating different time periods, or that the differences in the model configurations (e.g., resolution) fundamentally change the model’s ability to represent various phenomena (e.g., equatorial waveguide). Regardless of the reasons for the discrepancies, this indicates that we should not base an optimal array design on these particular variables. This highlights one of the limitations of our method; that is, the reliability of the objectively derived array directly depends on the representativeness of the ensemble used. Simply stated, the model-based array design is only appropriate if the model’s representation of the variability is realistic. This suggests that model-based array design should probably always be performed with either a number of different models or with a very well-validated model.

We also found that the upper ocean heat content often has smaller correlation radii compared with the IMLD. We believe that this may be a result of correlation between the temperature in the upper 400 m and the thermocline depth, so that the contribution to the heat content from lowering (or rising) of the temperature is compensated to some degree by the contribution from lowering (or rising) of the thermocline. This possibly indicates that the upper ocean heat content may not be appropriate for characterizing the intraseasonal variability in TIO than IMLD.

We have also performed an experiment with a superensemble, where the anomalies from all three models are combined. We find that the results closely resemble the results from the OFAM experiments (not shown). This is because the variance in OFAM is about twice as large as the variance in ACOM2 and ACOM3. The ensemble members derived from the OFAM ensemble therefore dominate this superensemble.

### b. Observation-based array design

We objectively compute an observation array using a 676-member ensemble of GSLA (i.e., anomalies from a long-term mean) that are derived from altimetric observations as described in section 3c (i.e., 13 yr of weekly Aviso maps). The GSLA are unfiltered and provide a realistic representation of intraseasonal-to-interannual variability. We understand that the CG-IOP mooring array for the TIO is not intended to independently monitor variability on these time scales. Indeed other components of the integrated observing system, such as altimetry and Argo, are more suitable for resolving this variability. However, to demonstrate the versatility of the method, we apply it here to objectively design an observation array based on these maps of GSLA. A map of the variance of GSLA is shown in Fig. 3a. It can be seen that there is strong variability on seasonal to interannual time scales in the boundary currents in the TIO and across the entire basin at around 8°–12°S where seasonal Rossby waves are known to be prevalent (Masumoto and Meyers 1998; Schouten et al. 2002).

In contrast to the previous example, we present both a structured array and an unstructured array, where the array need not conform to some predefined design. We restrict the possible locations of the observations to the TIO (±12°). We assume observations have an error variance of 9 cm^{2} and that the observation errors are uncorrelated in space. The results are shown in Fig. 3. The basin-averaged variance of the GSLA in the TIO is shown in Table 3, along with the basin average of the theoretical analysis error variance and its percent reduction. These statistics show that all three arrays resolve a significant percentage of the observed signal. Table 3 also indicates that the objectively designed arrays outperform the CG-IOP array and that the unstructured array outperforms the structured array, as we expect.

While the objectively designed arrays are arguably an improvement on the CG-IOP array for resolving interannual variability, the increase in the percent variance explained is only 7%–9% over the CG-IOP array. This suggests that, although the CG-IOP array is not designed to resolve interannual variability in isolation from the rest of the global ocean observing system, it is still likely to do quite a good job.

Both the structured and unstructured arrays include many observations in the seasonal Rossby wave band between 6° and 12°S. Interestingly, they both include many observations off Somalia and relatively few observations off Indonesia, while both regions have relatively large variance (Fig. 3a). These results are explained by the correlation maps in Fig. 4, showing the correlations between sea level from a reference location (denoted by the star) and sea level across the entire TIO. Figure 4a shows that sea level offshore of Indonesia is well correlated with sea level along the coast and over a very broad region. The spatial structure of the correlation map shows a dipole structure. This structure is observed in several previous studies (Chambers et al. 1999; Feng et al. 2001; Wijffels and Meyers 2004; Rao and Behera 2005). Also, the footprint of the positively correlated region reflects Rossby–Kelvin wave patterns. This explains why even a single observation offshore of Indonesia can reduce the analysis error variance in an extended region including along the coast.

Figure 4b shows that sea level off Somalia is relatively uncorrelated with sea level across the TIO. The region off Somalia is dominated by mesoscale variability that spawns from the energetic and highly variable boundary currents in this region. While the mesoscale variability in this region is well organized (Schott and McCreary 2001), its variability is apparently somewhat chaotic and is characterized by short decorrelation length scales. This example explains why many observations may be required to reduce the analysis error variance off Somalia.

While we can understand why the method identifies certain regions as optimal locations on dynamical grounds and through analyses like that presented in Fig. 4, the number of observations in the objectively designed arrays in the seasonal Rossby wave band highlights a limitation of the method. This overpopulation of optimal observations near 9°S is partly attributable to the stationary nature of the ensemble used. For example, suppose observations from the array are assimilated into a dynamical model to improve the forecast. We might reasonably expect the model to accurately propagate a seasonal Rossby wave westward, possibly rendering some of the observations in this latitude band redundant. We expect an adaptive sampling application, where an ensemble Kalman filter is used in conjunction with a dynamic model (e.g., an ocean general circulation model) would identify fewer optimal locations in this latitude band.

By computing the reduction in trace(𝗣* ^{a}*) when each observation is “added” to an observation array, we can quantify the relative value of each observation location. An example of this calculation is presented in Fig. 5, showing both the cumulative percentage reduction of the first

*n*observations and the percentage reduction of each individual observation. We present these statistics for the case of the objectively designed unstructured array using GSLA observations, shown in Fig. 3d. In Fig. 5 we show statistics for arrays that use two different criteria in their objective design. The first, denoted min trace(𝗣

*), is the criterion described in section 2. The second, denoted max diag(𝗣*

^{a}*), identifies the “next best” observation (referred to as the optimal observation in Fig. 1) as the location that has the maximum background error variance. We find that the criterion of minimal trace is clearly superior, demonstrating the benefits of including information about the background error covariances, rather than just the background error variances. Indeed, the details of the observation array (not shown) are also quite different for these two criteria. Specifically, the maximum variance criterion yields more observations clustered in areas of large variance and short length scales, namely, in the Somali Current and off Sri Lanka. Interestingly, despite these differences, the overall reduction of trace(𝗣*

^{b}*) is quite similar for both criteria.*

^{a}Figure 5b shows that the criterion of maximum background variance, denoted max diag(𝗣* ^{b}*), is clearly suboptimal. For example, the fifth observation yields a greater reduction in trace(𝗣

*) than any of the first four observations. By contrast, the proposed method [min trace(𝗣*

^{a}*)], while not optimal, is clearly more consistent in this regard, with each additional observation yielding a smaller reduction in the analysis error variance than any of the observations that are already selected.*

^{a}For a given application, the type of analysis presented in Fig. 5 might provide insight into the number of observations required to monitor the variability in a given region. If, for example, the first *n* observations reduce the trace(𝗣* ^{a}*) considerably and all subsequent observations result in only very small further reductions, then it would be clear that only

*n*observations are required. For the example considered here, Fig. 5a shows a quasi-exponential profile in the cumulative percentage reduction of the trace(𝗣

*), indicating that there are diminishing returns for each additional observation, but that there is no obvious cutoff beyond which observations yield insignificant return.*

^{a}## 5. Discussion and conclusions

Through a series of experiments using model-based fields, we objectively design an observation array for monitoring intraseasonal variability in the TIO. To assess the robustness of our model-based arrays, we use fields from three different models. The details of the resulting arrays are similar for all models. Specifically, they all give the highest ranking to the same mooring line in the CG-IOP array at about 90°E. More generally, they all indicate that the key regions to monitor are near the equator to the east, northeast of the TIO, and south of India. Despite the moderately superior performance of the objectively designed arrays, in theory, we conclude that the CG-IOP array is probably as good, in practice, for resolving intraseasonal variability, as any array we can objectively design. This conclusion is a testament to the insight and intuition displayed by the CLIVAR panel in designing the CG-IOP array. We also present results from a series of experiments designed to monitor intraseasonal-to-interannual variability using fields of gridded observations. From these experiments we suggest that the key region to monitor is 6°–12°S, where seasonal Rossby waves are dominant.

To better understand why the method for objective array design “selects” certain regions, we present correlation maps to show the region of influence of different locations. We suggest that this is a very useful tool that could easily be used when an observation array is being designed.

We also demonstrate how our method could be used to estimate the number of observations required to adequately observe the variability in some region. This involves the computation of the relative value of each observation in an observation array that may also prove to be a useful metric for some practical aspects of observation array design.

In these experiments we use a simple method for objectively designing an observation array using ensemble data assimilation theory. The method can be applied to model-generated fields or to time series of gridded observations and allows one to generate either a structured or unstructured observation array. It is flexible and can be used in a range of different geophysical applications for objective array design.

We underline that having an ensemble that correctly represents the background error covariance is crucial for the applicability of the method. Using ensembles formed from output from a long model run or from observation-based gridded fields implies the ergodicity of the system and does not take advantage of possible time correlations. The application for which the method described here is most suitable is where an ensemble optimal interpolation (EnOI) system (e.g., Oke et al. 2002; Evensen 2003; Oke et al. 2005; Oke and Schiller 2007b; Oke et al. 2008) is used to assimilate observations from a fixed observation array (e.g., an array of moorings). In this case, the EnOI system should use the same ensemble of anomalies as that used to objectively calculate the observation array. The system should also use the same background field as is used in constructing the ensemble of anomalies. Depending on the application, this background field may be a constant field (e.g., climatology) or a time-varying field (e.g., seasonal cycle). However, we argue that in many other situations, regardless of the analysis system to be used, the nonstationarity of the system’s statistics may be not overly important and that it may be appropriate to use the time-averaged statistics, represented by an ensemble that is constructed from realizations of the system over a long period of time. In these cases, the described ensemble-based method represents a simple, objective, versatile, and computationally efficient tool to calculate an observation array.

## Acknowledgments

This research is funded by Australia’s CSIRO through appropriation funding and by the U.S. Office of Naval Research Ocean Modeling Program through Grants N000140410345 and N000140711054. The altimeter products were produced by Ssalto/Duacs and distributed by Aviso, with support from CNES.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

### APPENDIX

#### Equivalence of Serial and Parallel Assimilation

The equivalence of serial and parallel assimilation is demonstrated in a somewhat complicated way by Bierman (1977, p. 72). For completeness here, we present a more straightforward proof of this equivalence.

Suppose that a set 𝒪 of *p* observations characterized by the observation matrix 𝗛^{p}^{×}* ^{n}* and the observation error covariance 𝗥

^{p}^{×}

*can be split in two sets 𝒪*

^{p}_{1}and 𝒪

_{2}containing

*p*

_{1}and

*p*

_{2}observations,

*p*

_{1}+

*p*

_{2}=

*p*, characterized by observation matrices 𝗛

_{1}

^{p1×n}and 𝗛

_{2}

^{p2×n},

and error covariances 𝗥_{1}^{p1×p1} and 𝗥_{1}^{p2×p2}, respectively. Suppose that these two groups of observations have uncorrelated errors, so that 𝗥 is block-diagonal:

For our purpose, it is convenient to use an alternative expression for the analysis error covariance that is equivalent to (2):

For simplicity, we assume that both the background error covariance matrix and observation error covariance matrix are nonsingular. Then, the equivalence of serial and parallel assimilation means that the assimilation of 𝒪 should yield the same covariance as the consecutive assimilation of 𝒪_{1} and 𝒪_{2}:

or

which simplifies to

which is the case due to the block-diagonal structure of 𝗥^{−1}:

## Footnotes

*Corresponding author address:* Peter R. Oke, CSIRO Marine and Atmospheric Research, GPO Box 1538, Hobart, 7001 TAS, Australia. Email: peter.oke@csiro.au