## 1. Introduction

In the field of data assimilation the key to success is knowing the errors in either the data or the computed values as a function of time and spatial location. It is rare that one will know such spatial and time information about the errors in the data, but wavelet analysis can give us such information on the errors of the computed values and hence a mechanism to determine an appropriate weighting between the data and computed values. Furthermore, wavelet analysis can give a very reliable estimate of model variation that often has a correlation with model error. This technique of correlating model error with model variation is used in the method known as optimal interpolation (OI; Ghil 1989).

In order to apply the so-called Kalman filter (Kalman 1960) to the meteorological and oceanographic application for data assimilation purposes, an approximation of the prediction error covariance matrix is necessary. Otherwise it would require the square of the model state dimension storage, *O*(*n*^{2}), and *O*(2*n*) additional integration time steps for the computation of the error evolution. Even with a sophisticated reduction method, initialization of the error covariance matrix remains a difficult task. Initialization of the prediction error is typically done by making an estimate based on, for example, a limited number of empirical orthogonal function (EOF) modes of the model variation (Pham et al. 1998).

In the new approach introduced by Jameson and Waseda (2000), the estimation of the error associated with numerical schemes was diagnosed by wavelet analysis and the detected error was used as an indicator of the prediction error. Therefore, it was not necessary to solve the evolution of the prediction error. Such a method can easily be implemented by extending a simple assimilation scheme often referred to as nudging (Malanotte-Rizzoli and Holland 1986) or OI (Mellor and Ezer 1991).

As an extension of the previous approach (EEWADAi) using the finest-scale wavelet coefficients that are known to detect numerical errors (Jameson 1998), we used in this study the wavelet coefficients on different scales that are considered to indicate model variation not necessarily restricted to numerical errors. In section 2, a brief review of wavelet diagnosis is given. Section 3 explains how one can measure variation at different scales and in multiple dimensions using wavelet decomposition. A short description of the assimilation scheme and the result of the twin experiment conducted using a regional ocean general circulation model (OGCM) will be presented in section 4, and the conclusion follows.

## 2. Wavelet analysis

Possibly the most instructive way to think of wavelets is in contrast to traditional analysis techniques such as Fourier analysis. With Fourier analysis we analyze discrete or continuous data using basis functions that are global, smooth, and periodic. This analysis yields a set of coefficients, say, *a*_{k}, which gives the amount of energy in the data at frequency *k.* Wavelet analysis, by contrast, analyzes data with basis functions that are local, slightly smooth, not periodic, and that vary with respect to scale and location. Wavelet analysis thereby produces a set of coefficients *b*_{j,k} that give the amount of energy in the data at scale *j* and location *k.* Wavelet analysis can serve as a good complement to Fourier analysis. In fact, data that are efficiently analyzed with Fourier analysis often are not efficiently analyzed with wavelet analysis and the opposite situation also holds.

For our purposes here we will confine our discussion to the so-called orthogonal wavelets and specifically the Daubechies family of wavelets. The orthogonality property leads to a clear indication when data deviate from a low-order polynomial, the importance of which will become clear when we discuss numerical methods.

### a. Defining the Daubechies wavelet

*ϕ*(

*x*), the scaling function, and

*ψ*(

*x*), the wavelet. The scaling function is the solution of the dilation equation: where the coefficients

*h*

_{k}define the fundamental properties of of the scaling function and will be explained precisely in section 2c. Equation (1) carries the name “dilation equation” since the independent variable

*x*appears alone on the left-hand side but is multiplied by 2, or dilated, on the right-hand side. One also requires the scaling function

*ϕ*(

*x*) be normalized:

^{∞}

_{−∞}

*ϕ*(

*x*)

*dx*= 1. The wavelet

*ψ*(

*x*) is defined in terms of the scaling function: see section 2c for an explanation of

*g*

_{k}.

*ϕ*(

*x*) and

*ψ*(

*x*) by dilating and translating to get the following functions: where

*j,*

*k*∈

*Z.*The dilation parameter is

*j,*and

*k*is the translation parameter.

### b. The spaces spanned by wavelets

*ϕ*

^{j}

_{k}

*x*) and

*ψ*

^{j}

_{k}

*x*) over the parameter

*k,*with

*j*fixed, be denoted by 𝘃

_{j}and 𝘄

_{j}respectively, The spaces 𝘃

_{j}and 𝘄

_{j}are related by where the notation 𝘃

_{0}= 𝘃

_{1}⊕ 𝘄

_{1}indicates that the vectors in 𝘃

_{1}are orthogonal to the vectors in 𝘄

_{1}and the space 𝘃

_{0}is simply decomposed into these two component subspaces.

### c. The high- and low-pass filters and orthogonality

*H*=

*h*

_{k}}

^{L−1}

_{k=0}

*G*=

*g*

_{k}}

^{L−1}

_{k=0}

*g*

_{k}= (−1)

^{k}

*h*

_{L−k}for

*k*= 0, … ,

*L*− 1. All wavelet properties are specified through the parameters

*H*and

*G.*If one's data are defined on a continuous domain such as

*f*(

*x*) where

*x*∈

*R*is a real number, then one uses

*ϕ*

^{j}

_{k}

*x*) and

*ψ*

^{j}

_{k}

*x*) to perform the wavelet analysis. If, on the other hand, one's data are defined on a discrete domain such as

*f*(

*i*) where

*i*∈

*Z*is an integer then the data are analyzed, or filtered, with the coefficients

*H*and

*G.*In either case, the scaling function

*ϕ*(

*x*) and its defining coefficients

*H*detect localized low-frequency information, that is, they are low-pass filters (LPF), and the wavelet

*ψ*(

*x*) and its defining coefficients

*G*detect localized high-frequency information, that is, they are high-pass filters (HPFs). Specifically,

*H*and

*G*are chosen so that dilations and translations of the wavelet,

*ψ*

^{j}

_{k}

*x*), form an orthonormal basis of

*L*

^{2}(

*R*) and so that

*ψ*(

*x*) has

*M*vanishing moments that determine the accuracy. In other words,

*ψ*

^{j}

_{k}

*x*) will satisfy where

*δ*

_{kl}is the Kronecker delta function, and the accuracy is specified by requiring that

*ψ*(

*x*) =

*ψ*

^{0}

_{0}

*x*) satisfy for

*m*= 0, … ,

*M*− 1. Under the conditions of the previous two equations, for any function

*f*(

*x*) ∈

*L*

^{2}(

*R*) there exists a set {

*d*

_{jk}} such that where

### d. Quadrature mirror filters and the Haar wavelet

The two sets of coefficients *H* and *G* are known as quadrature mirror filters. For Daubechies wavelets the number of coefficients in *H* and *G,* or the length of the filters *H* and *G,* denoted by *L,* is related to the number of vanishing moments *M* by 2*M* = *L.* For example, the famous Haar wavelet is found by defining *H* as *h*_{0} = *h*_{1} = 1. For this filter, *H,* the solution to the dilation equation, Eq. (1); *ϕ*(*x*), is the box function: *ϕ*(*x*) = 1 for *x* ∈ [0, 1] and *ϕ*(*x*) = 0 otherwise. The Haar function is very useful as a learning tool, but because of its low order of approximation accuracy and lack of differentiability it is of limited use as a basis set. The coefficients *H* needed to define compactly supported wavelets with a higher degree of regularity can be found in Daubechies (1988). As expected, the regularity increases with the support of the wavelet. The usual notation to denote a Daubechies-based wavelet defined by coefficients *H* of length *L* is *D*_{L}.

### e. Setting a largest and smallest scale

_{0}

_{1}

_{2}

_{J}

_{J}

*j*= 0 is arbitrarily chosen as the finest scale required, and scale

*J*would be the scale at which a kind of local average,

*ϕ*

^{J}

_{k}

*x*), provides sufficient large-scale information, that is, the first term in Eq. (13) provides the local mean around which the function oscillates.

One must also limit the range of the location parameter, *k.* Assuming periodicity of *f*(*x*) implies periodicity on all wavelet coefficients, *s*^{j}_{k}*d*^{j}_{k}*k.* For the nonperiodic case, since *k* is directly related to the location, a limit is imposed on the values of *k* when the location being addressed extends beyond the boundaries of the domain.

### f. Implementation on a computer

*s*

^{j}

_{k}

*d*

^{j}

_{k}

*h*

_{n}refers to the chosen filter while we have

*g*

_{n}= −(−1)

^{n}

*h*

_{L−n}.

^{j,j+1}

_{N}

*j*to scaling function and wavelet function coefficients at scale

*j*+ 1, that is,

^{j,j+1}

_{N}

**s**

_{j}onto

**s**

_{j+1}and

**d**

_{j+1}: where we by

**s**

_{j}refer to the vector containing the coefficients at scale

*j.*Note that the vectors at scale

*j*+ 1 are half as long as the vectors at scale

*j.*

To be perfectly correct one would first approximate the scaling function coefficients at the finest scale using the raw data, however, in practice it seems to make very little difference if one simply considers the raw data to be the scaling function coefficients. So, for our purposes here we will simply use the raw data as the scaling function coefficients on the finest scale. The repeated application of the matrix 𝗽^{j,j+1} yields the wavelet coefficients at the various scales, and it is these wavelet coefficients that provide a guide to the errors committed during the numerical calculation.

To illustrate further, let us consider that the raw data are given and are assumed to be the scaling function coefficients on the finest scale, **s**_{0}. One wavelet decomposition yields the scaling function coefficients and wavelet coefficients at scale *j* = 1, **s**_{1}, and **d**_{1}. A second application of the wavelet decomposition matrix will yield the vectors **s**_{2} and **d**_{2}. It is the vectors **d**_{1}, **d**_{2}, … , which yield the critical information on the numerical errors. If, for example, one sees that the values of the **d**_{1} are relatively large in the middle of the vector, then it is clear that within this one-dimensional vector the largest errors will be in the middle of the one-dimensional domain from which this vector was derived. What we care about most are the relative errors being committed, but we also have some interest in the absolute errors, the subject of the next section.

## 3. SUgOiWADAi

As we mentioned in the abstract and introduction, SUgOiWADAi works by using the information from many scales of the wavelet analysis in contrast to EEWADAi that used the wavelet analysis information only on the finest scale. By increasing the amount of information used, one would naturally expect that the result would improve. In particular, SUgOiWADAi provides not only a measure of numerical error as EEWADAi does, but it also provides an estimate of model variation. In order to understand how this variation is measured and how this new information is coupled with the estimate of numerical error, we need to return to the wavelet subspace notation.

### a. Measuring variation at different scales in one dimension

As we have mentioned, SUgOiWADAi not only can give a direct measure of numerical error but the scheme can also give a very reliable measure of model variation. In this section we will explain how SUgOiWADAi works and compare the variation measure given by SUgOiWADAi to that used in OI schemes.

From wavelet analysis one can find the variation at a variety of scales. We can think of the information in the highest-frequency wavelet box 𝘄_{1} as telling us about the numerical error, that is, it will give us a measure of local deviation from low-order polynomials. It can also be seen as a measure of variation at the smallest scale or similarly the highest frequency. Basically, the information in the box 𝘄_{1} measures variation over roughly two grid points. In order to compare the ability of wavelets to detect variation with the ability of OI schemes to detect variation, we need a measure of variation that covers a slightly larger portion of the domain. Therefore, we would use the information available in the boxes 𝘄_{2}, 𝘄_{3}, and perhaps also 𝘄_{4}, where the box numbering is such that higher numbers represent coarser scales.

#### 1) The Haar (D2) wavelet

*h*

_{0}and

*h*

_{1}and the high-pass filter coefficients by

*g*

_{0}and

*g*

_{1}. So, if we are given a stream of data from one field, perhaps velocity, pressure, etc., from our calculation, say,

*f*

_{0},

*f*

_{1}, … ,

*f*

_{N}, then the first step is to find the scaling function coefficients on the finest scale,

*s*

^{0}

_{0}

*s*

^{0}

_{1}

*s*

^{0}

_{N}

_{0}. Stricktly speaking, one should approximate the scaling function coefficients on the finest scale from the raw data via some kind of quadrature formula; however, our goal is to measure only the variation in the data, therefore we can simply assign these scaling functions coefficients directly to the raw data:

*s*

^{0}

_{0}

*f*

_{0},

*s*

^{0}

_{1}

*f*

_{1}, … ,

*s*

^{0}

_{2}

*f*

_{2}. Then one obtains the “scaling” function coefficients at the first scale,

*s*

^{1}

_{0}

*s*

^{1}

_{1}

*S*

^{1}

_{N/2}

_{1}, and the “wavelet” coefficients

*d*

^{1}

_{0}

*d*

^{1}

_{1}

*d*

^{1}

_{N/2}

_{1}, by The important point to note here is that the wavelet coefficients on this first scale are combined from only two scaling function coefficients on the finest scale, therefore they can only

*feel*variation over only two numbers. Now, to go from this first scale to the second scale,

_{1}

_{2}

_{2}

_{2}and 𝘄

_{2}is found from two numbers in the subspace 𝘃

_{1}. Therefore, each coefficient in 𝘄

_{2}is composed from four numbers in the finest-scale subpace 𝘃

_{0}. Therefore, one can say that each coefficient in 𝘄

_{2}can

*feel*or measure variation that occurs over four numbers in the original raw data

*f*

_{i}. Likewise, each wavelet coefficient in 𝘄

_{3}can

*feel*variation that occurs over eight numbers in the original data.

It is very important to note that the Haar wavelet is very special in the sense that the wavelet coefficients do not overlap when performing a wavelet decomposition. More will be said on this point in the next section.

#### 2) The D4 wavelet

_{4}wavelet, suppose that one wants to project from 8 scaling function coefficients at scale

*j*to 4 scaling function coefficients at scale

*j*+ 1 and 4 wavelet coefficients at scale

*j*+ 1. The decomposition matrix for the case of periodic boundary conditions,

^{j,j+1}

_{8}

*felt*by each coefficient in the various wavelet subspaces 𝘄

_{1}, 𝘄

_{2}, and 𝘄

_{3}. Obviously, as we go from 𝘃

_{0}to 𝘃

_{1}⊕ 𝘄

_{1}we can see that each wavelet coefficient

*d*

^{k}

_{1}

_{1}feels a region of four numbers in the original data.

_{2}. In a straightforward application of the above matrix, we can see that in our first step in going from 𝘃

_{0}to 𝘃

_{1}we have and when we decompose 𝘃

_{1}into 𝘃

_{2}⊕ 𝘄

_{2}we see that

*d*

^{1}

_{2}

*g*

_{1}

*s*

^{1}

_{1}

*g*

_{2}

*s*

^{2}

_{1}

*g*

_{3}

*s*

^{3}

_{1}

*g*

_{4}

*s*

^{4}

_{1}

Now, we can see that the region that can be felt in the coefficient *d*^{1}_{2}*f*_{0} to *f*_{9}, that is, a region 10 points wide. Thus as we go to wavelet and scaling function coefficients at higher and higher levels of decomposition, one can see that the coefficients are influenced by more and more of the data in the physical space, or the coefficients feel an increasingly larger region of the physical space. So, after one decomposition, each coefficient will feel a region equal in size to the length of the wavelet filter denoted by the parameter *L.* And, as one decomposes on higher and coarser scales the region felt by each wavelet coefficient grows in proportion to the length of the wavelet filter and the level of decomposition.

### b. Wavelet detection of model variation

As explained in previous sections of the paper, wavelet analysis breaks up data into local frequency components. That is, at a given physical space location one can obtain an estimate of the various scales of information present in the vicinity of this physical space location. Using the previously defined notation, we have the “frequency” or “scale” boxes 𝘄_{1}, 𝘄_{2}, 𝘄_{3}, … , which contain this local frequency content information. Furthermore, “variation” in a model will appear as a localized oscillation at some scale, and this information will appear as a large wavelet coefficient in one of the frequency boxes. That is, there is a one-to-one correspondence between model variation at a given scale and the wavelet energy at that same scale. In fact, variation is exactly what wavelet analysis detects. One can see these effects in Fig. 1. In the top panel of the figure, one can see the sea surface height (SSH) as one traverses the Kuroshio. In the second panel from the top, one can see the finest-scale wavelet coefficient. This coefficient has two peaks according to the location where the numerical truncation error will be the largest. The third and fourth panels show the wavelet coefficients at the larger scale of the second and third decompositions. At these scales one is detecting not numerical error but model variation.

In summary, we note that other techniques such as OI measure only model variation, and this measure of variation is far less precise than the model variation measure that is given by wavelet analysis. In addition, wavelet analysis gives a precise measure of the error committed in the numerical calculation. Both of these estimates, the error estimate and the variation estimate, make wavelet analysis a very powerful and useful tool.

### c. When variation and error are not the same

In a word, where there is *numerical error* there will be variation, but variation does not imply numerical error. Roughly speaking, one can make the argument based on scale information. For example, if one observes wavelet energy in the finest-scale box, 𝘄_{1}, then this energy will indicate that the gridpoint density for the numerical method in a given region of the domain is not sufficient and that numerical errors are committed in this region. Certainly this same 𝘄_{1} box energy indicates local high-frequency information or local variation. On the other hand, observed energy in the coarser-scale box, 𝘄_{3}, will not necessarily indicate an insufficient gridpoint density but, again, it will indicate that variation occurs at scale 3. To be more precise, suppose that in the vicinity of gridpoint *x*_{k} that energy is present in box 𝘄_{3} but not in box 𝘄_{1}, this will indicate variation but not numerical error. On the other hand, if energy is present in box 𝘄_{1} then this will indicate both numerical error and variation. As above, let us refer to Fig. 1 where we can see that the third and fourth panels from the top show the model variarion but not the numerical truncation error. At these scales the model physics are producing changes or variations but the scale is sufficiently large that the numerical truncation error is very small.

### d. Measuring variation at different scales in higher dimensions

*felt,*as discussed above, carry over directly dimension by dimension. The tensor product approach works as follows: recall that in one dimension, one decomposition of the finest-scale subspace yields

_{0}

_{1}

_{1}

_{0}

_{0}

_{1}

_{1}

_{1}

_{1}

_{1}

_{1}

_{0}⊗ 𝘃

_{0}. The subspace

_{1}

_{1}

_{1}

_{1}

_{1}

_{1}

_{j}

_{j}

*j*= 1, 2, 3, 4..,. Generally, decompositions up to

*j*= 4 will be sufficient.

#### 1) Numerical error and variation in higher dimensions

We note that it is the subspace 𝘄_{1} ⊗ 𝘄_{1} that will primarily detect numerical error. Recall that the reason for this is that this subspace will detect deviation from low-order polynomials over just a few grid points in both the horizontal and vertical directions. In Fig. 2 one can see the four subspaces formed from the two-dimensional wavelet decomposition at the finest scale. The upper-left panel contains the wavelet coefficients at the finest scale in both directions. This panel would indicate the presence of numerical truncation error in both directions simultaneously. The upper-right panel of Fig. 2 indicates the presence of numerical truncation error in the vertical direction, and the lower-left panel indicates the presence of numerical truncation error in the horizontal direction. The lower-right panel is a simultaneous average in both directions; note the change of color scale in this panel. Figure 3 indicates the two-dimensional wavelet coefficients after a second wavelet decomposition and Fig. 4 indicates the two-dimensional wavelet coefficients after a third decomposition. Figures 3 and 4 would be used to give estimates of model variation. Further, a subspace such as 𝘄_{4} ⊗ 𝘄_{4} will detect deviation over a very large number of grid points in both the horizontal and vertical directions. Note that in Figs. 2, 3, and 4 the lower-right panel denotes the values of the scaling function coefficients. These lower-right panels give, if you will, a local average of the data and should look essentially like a smeared version of the original data. In Fig. 5 we show the wavelet coefficients that combine the numerical error with the model variation. As mentioned above, deviation from low-order polynomials over a large number of grid points *does not* imply numerical error but simply gives a measure of variation. As one proceeds down the heirarchy of subspaces from 𝘄_{1} ⊗ 𝘄_{1} to 𝘄_{4} ⊗ 𝘄_{4} then one is proceeding from a very localized measure of deviation to a more global measure of deviation. It is only the most localized measure of deviation that can provide a measure of numerical error, whereas the the more global measures of deviation can be used to give a measure of variation.

As above, we can refer to Fig. 1 to see an illustration of these ideas. In terms of the tensor product, the explanation is simple since one can consider the information direction by direction. In this case, the explanation is exactly the same as in the one-dimensional case if one considers the explanation direction by direction. Thus, for one to see large wavelet coefficients in the subspace 𝘄_{1} ⊗ 𝘄_{1} then one must see large wavelet coefficients in each direction of the first scale decomposition, which is the second panel of Fig. 1. On the other hand, large wavelet coefficients in the subspace 𝘄_{3} ⊗ 𝘄_{3} would require large wavelet coefficients in each direction of the third level of decomposition corresponding to the fourth panel of Fig. 1.

### e. error variance estimation using wavelet analysis

*numerical*variation as is done in EEWADAi, but also an estimate of model variation, similar to what is done in OI methods. Recall that in EEWADAi wavelet analysis gives us a local measure of deviation from the low-order polynomial structure in the data. Given this measure of deviation, then one has an estimate of local numerical error in the model. From these local error estimates, EEWADAi easily provides an estimate of local error variance. For EEWADAi it was important that this variance estimate remain

*local*since the ability to estimate errors locally in both space and time was one of the key strong points of EEWADAi. Recall that for EEWADAi we found a local average of the squared errors in space and we estimated the error variance around, say, grid point (

*x*

_{K1}

*y*

_{K2}

*z*

_{K3}

*d*(

*k*

_{1},

*k*

_{2},

*k*

_{3}) is the wavelet coefficient after only one decomposition at the wavelet spatial index of

*k*

_{1},

*k*

_{2},

*k*

_{3}. Here

*n*is a small number used to define a small averaging box around the point of interest, and

*C*is a constant that is needed to scale the wavelet transform to the problem at hand.

_{j}will be the wavelet-detected variation at scale

*j,*and

*C*

_{j}will be a corresponding scaling constant. We have chosen the coefficients

*C*

_{j}to correspond to a natural scaling that occurs within the wavelet basis. That is, we choose

*C*

_{1}= 1,

*C*

_{2}= 1/2, and

*C*

_{3}= 1/4. It should be stated that some flexibility exists here and one might want to alter these coefficients depending the characteristics of the calculation at hand. For example, altering the order of the numerical scheme can vastly change the magnitude of the coefficients in the finest-scale subspace 𝘄

_{1}. In this case, one might choose to increase

*C*

_{1}to a number scaled according to this numerical order. Again, the user should experiment with these constants to find values that are suitable for the given calculation; Δ

_{j}is defined as As above, the parameter

*n*

_{j}will define a box in three dimensions about which the summation of the wavelet coefficients occurs. If one is working in a lower dimension such as two dimensions, then one would not sum over the the

*k*

_{3}parameter. The reason that one would have a different

*n*

_{j}for each scale

*j*is that the wavelets at the larger scales, higher values of

*j,*will cover larger portions of the domain. One can see this from the above discussion of regions of influence. Therefore, one would expect that the values of

*n*

_{j}will decrease as

*j*increases. This will, of course, depend on the size of the region that one wished to use in finding their estimate of error variance. The point (

*K*

_{j1},

*K*

_{j2},

*K*

_{j3}) will be a wavelet translation index at the scale

*j*that will correspond to a point in the physical space that is roughly centered near the area of the physical space where one needs an estimate of error variance.

## 4. Comparison with optimal interpolation

In a word, optimal interpolation (OI) schemes work by using the model variation as an estimate of model error. In fact, it is easy to find counter examples that illustrate that the model variation can be independent of model error, but in practical computations there appears to be a correlation. Therefore, it is common practice to use the model variation as an estimate of model error in OI schemes.

### a. Computational cost

One certain advantage that wavelet analysis will have over OI schemes is that the wavelet analysis can be performed at every time step at a very low computational cost. For example, if one is using the 𝗱_{4} wavelet then the computational cost will be 4 × *N* to decompose once, which gives us the information for the 𝘄_{1} box, and 4 × *N*/2 to get the information in the 𝘄_{2} box, leading to a total computational work of 4 × *N* + 4 × *N*/2 + 4 × *N*/4 + 4 × *N*/8 to obtain the information in the 𝘄_{4} box. Computationally this is relatively inexpensive. If, however, one considers this to be too expensive, then one can certainly perform the analysis every other time step or even less often. This depends on how quickly the data changes with respect to a given time step. Usually the solution will not change much qualitatively within just a few time steps.

### b. Measuring variation and variance

In Fig. 1 we show a comparison between the instantaneous measure of model variation and numerical error given by wavelet analysis and the time-averaged variance that is used in the method of OI as an estimate for model error. Note that the blue curves in plots 2, 3, and 4 are the squares of the wavelet coefficients at wavelet scales 1, 2, and 3. At scales 1 and 2 we can see that the blue curve has 2 peaks that correspond to the 2 regions of the top plot that contain the small scale, that is, the regions of the top plot where the curve bends the most. At scale 3 the blue curve has 1 large peak that captures the entire downward sloping region of the top plot in addition to the 2 curving regions before and after the downward slope. In other words, at wavelet scale 3 one sees a rather large-scale phenomenon. In the bottom plot the blue curve indicates the time-averaged variance that is used in OI methods as an indicator or model error. However, as one can clearly see from the peaks of this blue curve in the bottom plot, the time-averaging process tends to smear information, thereby producing a less accurate estimate of instantaneous model variation or error.

### c. Wavelet reconstruction at different scales

*h*(

*x*) might have a wavelet expansion of where the

*d*s indicate the wavelet coefficients and the

*s*s indicate the scaling function coefficients. In Fig. 1 we see that the blue curves in the second, the third and the fourth plots show the square of the

*d*s, and the red curves in the same three figures show the partial reconstructions at the given scales. In other words, the red curve in the second plot shows which would be the projection of

*h*(

*x*) onto the wavelet basis functions at the finest scale. One will notice that whereas the blue curves are relatively smooth, the red curves are relatively rough. This roughness is due to the nature of the 𝗱

_{4}wavelet that is currently being used.

### d. Numerical setup for the twin experiment

Twin experiments were conducted in order to perform a benchmark test of the three schemes, OI, EEWADAi, and SUGOiWADAi. The numerical setup of the twin experiment is similar to the one described in Jameson and Waseda (2000) and so the details will be omitted here. It is a standard method of testing the convergence of the assimilated solution to the control run by means of comparing their square differences.

The model is a version of a sigma-coordinate primitive equation (Mitsudera et al. 1997) that covers the main Kuroshio stream along the southern Japan coast, the Oyashio Current in the north, and the Kuroshio Extension region; the approximate domain is from 20°–52°N to 125°–170°E, configured to have a curvilinear coordinate system (206 by 209 by 32 discretization) in which the horizontal axes follow the mean geometry of Kuroshio stream. The set of dynamical and thermodynamic primitive equations are described in Blumberg and Mellor (1983). During the 6 years of spinup integration, the model was driven by monthly mean climatologies such as Hellerman–Rosenstein wind and the Comprehensive Ocean–Atmosphere Data Set (COADS) heat flux, and the Levitus climatology was used for the temperature and salinity initialization as well as for the lateral boundary restoration.

*C*

_{ij}

**r**

_{i}

**r**

_{j}

^{2}

**s**

^{2}

_{o}

*s*

_{o}of 60–100 km. The diagonal elements of matrix 𝗱 were fixed for the OI scheme initialized by the model variance and were varied in time and space for EEWADAi and SUGOiWADAi. The elements of the diagonal observation error matrix 𝗿 were fixed in time so the only change in ‖𝗸‖ would occur through variation in 𝗱. The matrix 𝗵 represents mapping of the satellite data along the track onto the nearest neighboring model grid points. The gain matrix 𝗸 is computed once for OI but will be updated at every assimilation time step for EEWADAi and SUGOiWADAi. The correction to the model state

**x**will be made using the observational data

**y**(sampled from the control run) as

**x**

**x**

^{−}

_{T or S}

**y**

**x**

^{−}

**y**is always the SSH anomaly and 𝗳

_{T or S}is the statistical correlation between SSH anomaly and the temperature or salinity anomalies.

We have, therefore, tested the impact of varying 𝗱 to the assimilation skill with the twin experiments. Two methods were implemented varying 𝗱 based on the information obtained through the wavelet decomposition of the model output field. In practice, we have chosen the diag(𝗱) as a function of the variation at three scales, see (22), and for each scale the variation was detected from the subspaces, 𝘃_{j} ⊗ 𝘄_{j} and 𝘄_{j} ⊗ 𝘃_{j}, instead of the variation detected from the subspace 𝘄_{j} ⊗ 𝘄_{j} alone. In the former approach we can capture either horizontal edges or vertical edges, a less strict requirement for error detection than the latter. The weighting of the three scales was given as 1, 1/2, and 1/4 for SUGOiWADAi and 1, 0, and 0 for EEWADAi in ascending order in scale.

### e. Results of the twin experiments

In the previous study (Jameson and Waseda 2000) we have shown that the new wavelet-based method (EEWADAi) outperforms the traditional OI scheme after 120 days of assimilation. In other words the 2 solutions were not noticeably different in quality up to day 120. Together with the previous results of EEWADAi, we present the *L*_{2} difference of 2 of the flow fields, sea surface height and temperature at a given level, between the control run and the test run at 5-day intervals during the experiment of SUGOiWADAi. We can see from the two plots presented, see Figs. 6 and 7, that SUGOiWADAi (red circles) reduced the error significantly around day 40–100 when EEWADAi (blue circles) gave a similar performance to OI (green circles). We also see that after 120 days both wavelet-based methods, EEWADAi and SUGOiWADAi, have reached a kind of steady state.

It is clear now that this second-generation, wavelet-based assimilation scheme SUGOiWADAi has improved considerably over the earlier version EEWADAi. One can see from Figs. 6 and 7 that the red circles always show smaller errors than the others (blue and green circles) indicating the stability of the new skill. This probably is because the errors in the entire model domain are heterogeneous such that some part of the domain may have more numerical errors and some other part may have more errors due to model variation. The new method, SUGOiWADAi, will automatically detect both errors and weigh appropriately so that the impact of both can be incorporated into the estimation of the error covariance matrix at each time step. The success of this new scheme is rather important since for most reduced Kalman filtering approaches, the numerical errors and the errors associated with a localized model variation are neglected in order to save the computational load. We have shown in this study that such knowledge is essential in the improvement of the assimilation schemes.

## 5. Conclusions

In the field of data assimilation, one can summarize the weakness of current methods as saying there is insufficient knowledge of errors, either from the computational side or from the external source of data. Without knowledge of errors one cannot hope to assimilate external data efficiently. From the simple example of incorporating real-time sea surface height data into a model run, one must know roughly which is more accurate, the height given by the numerical calculation or the height given by the external data source, say data from a satellite. Generally we will have some knowledge of the satellite errors but not as a function of space and time. For example, we might know that the satellite data error is some kind of skewed Gaussian with a certain standard deviation. But, we will generally have no knowledge of how this error changes with spatial location over the ocean or with time. On the other hand, if one has some knowledge of the errors in the computational scheme then one will have an estimate of the relative errors between the satellite data and computed data.

In this manuscript we have introduced a second-generation data assimilation scheme that uses wavelet analysis to build the error covariance for reduced Kalman filtering. In our first-generation scheme, EEWADAi, we used wavelets to detect numerical error. In our current second-generation scheme, SUgOiWADAi, we use wavelet analysis to not only detect numerical error but also to give estimates of local model variation at various scales. This utilization of information at various scales incorporates the strong points of EEWADAi with the strong points of the existing and commonly used optimal interpolation schemes. By bringing together the strong features of both approaches, we have created a very robust data assimilation method that outperforms both of the previous approaches. Furthermore, our new approach is computationally very inexpensive.

Our future intentions are to explore further ways to enhance SUgOiWADAi and to explore ways to broaden its application.

## Acknowledgments

We would like to express our thanks to Dr. Yaremchuk of the IPRC for his valuable comments and encouragement. We also would like to express our thanks to Dr. Yoshikawa and Mr. Taguchi for providing us with the Kuroshio regional circulation model for the twin experiment. We thank Ms. Diane Henderson for her careful editorial review. This research was supported by Frontier Research System for Global Change. The IPRC is partly sponsored by the Frontier Research System for Global Change.

## REFERENCES

Blumberg, A. F., , and Mellor G. L. , 1983: Diagnostic and prognostic numerical circulation studies of the South Atlantic Bight.

,*J. Geophys. Res.***88****,**4579–4592.Daubechies, I., 1988: Orthonormal basis of compactly supported wavelets.

,*Commun. Pure Appl. Math.***41****,**909–996.Erlebacher, G., , Hussaini M. Y. , , and Jameson L. , Eds.,. . 1996:

*Wavelets: Theory and Applications*. Oxford University Press, 510 pp.Ezer, T., , and Mellor G. L. , 1994: Continuous assimilation of Geosat altimeter data into a three-dimensional primitive equation Gulf Stream model.

,*J. Phys. Oceanogr.***24****,**832–847.Ghil, M., 1989: Meteorological data assimilation for oceanographers. Part I: Description and theoretical framework.

,*Dyn. Atmos. Oceans***13****,**171–218.Jameson, L., 1998: A wavelet-optimized, very high order adaptive grid and order numerical method.

,*SIAM J. Sci. Comput.***19****,**1980–2013.Jameson, L., , and Waseda T. , 2000: Error estimation using wavelet analysis for data assimilation: EEWADAi.

,*J. Atmos. Oceanic Technol.***17****,**1235–1246.Kalman, R. E., 1960: A new approach to linear filtering and prediction problems.

,*J. Basic Eng.***82D****,**35–45.Malanotte-Rizzoli, P., , and Holland W. R. , 1986: Data constraints applied to models of the ocean general circulation. Part I: The steady case.

,*J. Phys. Oceanogr.***16****,**1665–1687.Mellor, G., , and Ezer T. , 1991: A Gulf Stream model and an altimetry assimilation scheme.

,*J. Geophys. Res.***96**((C5),) 8779–8795.Mitsudera, H., , Yoshikawa Y. , , Taguchi B. , , and Nakamura H. , 1997: High-resolution Kuroshio/Oyashio System model: Preliminary results (in Japanese with English abstract). Japan Marine Science and Technology Center Rep. 36, 147–155.

Pham, D. T., , Verron J. , , and Roubaud M. C. , 1998: A singular evolutive extended Kalman filter for data assimilation in oceanography.

,*J. Mar. Syst.***16****,**323–340.

^{*}

School of Ocean and Earth Science and Technology Contribution Number 5871 and International Pacific Research Center Contribution Number 119.