1. Introduction
Sea surface height anomaly (SSHA) measurements from space provide a global and nearly real-time representation of ocean dynamical features and are the most important observation component of the U.S. Navy’s operational global ocean prediction system. Observations of SSHA are particularly useful for mapping eddies, meandering currents, and fronts that are associated with, for example, planetary waves and geostrophic currents. This information is utilized by assimilation into the
Because NLOM is a layered model with high horizontal but low vertical resolution, it does not independently provide a realistic vertical structure. A more complete depiction is made by using the SSHA fields from NLOM within the Modular Ocean Data Assimilation System (MODAS) to construct subsurface three-dimensional temperature and salinity fields. These global fields, which we call synthetics, are generated through the use of gridded statistical relationships between SSHA, sea surface temperature (SST), and historical in situ profile observations of temperature and salinity (Fox et al. 2002a,b). MODAS synthetics are generated for several experimental cases that differ according to the selection of zero to three satellite altimeter data streams to be assimilated by NLOM. These synthetic fields of temperature and salinity provide the basis for the present study and are compared with independent in situ ocean profile observations to evaluate their relative accuracy.
A main goal of this analysis is to evaluate the relative prediction accuracy resulting from different numbers and configurations of altimeters providing data for NLOM and MODAS. Another equally important goal is to develop validation methodologies for error assessment of ocean prediction systems in general. To accomplish these objectives, we apply two comparison frameworks and two statistical methodologies. Comparisons are made point for point and in binned overlapping 1° radius regions 60 days long. Error analyses are then computed using both traditional Gaussian and also robust nonparametric statistical methods. The metric used for the evaluation is thermocline depth (TD) because the SSHA is most highly correlated with the thermocline (Hurlburt 1986).
During the analysis time period, there were three satellite altimeter platforms used in daily operations. Maintaining that level of satellite coverage is expensive, and thus a clear understanding regarding the prediction impact of multiple satellites is of interest. Recent scientific literature shows the importance of satellite altimeters and their influence on ocean prediction. Use of the Navy’s Coastal Ocean Model (NCOM) to predict SSHA, in assimilative and nonassimilative modes, was evaluated relative to tide gauges by Barron et al. (2004). Other studies assimilate altimeter SSHA and surface drifter data together (e.g., Lin et al. 2007; Fan et al. 2004; Ishikawa et al. 1996). In a study by Smedstad et al. (2003), identical twin numerical experiments simulated the SSHA error versus the number of assimilated satellite altimeters in a ⅙° version of NLOM. The present analysis provides a similar error comparison but with assimilation of real altimeter measurements in
Numerical models can capture the statistical characteristics of the spatial and temporal variability relatively well. Climate studies often require that the underlying mechanisms and general character of the prediction is correct (e.g., Doney et al. 2007), while the specific skill of predicting temperature and salinity at a point is less important. Other validation studies evaluate the long-term temporal variability of models (Kara and Hurlburt 2006). More detailed predictions are often required by the fishing industry, the navy, and others for prediction of the synoptic variability. For example, the positions of fronts are often used by the maritime community to evaluate model skill (e.g., Oey et al. 2005).
An even more detailed method for comparing model results is to compare the model at the exact location of the observation point for point. This type of comparison is a very stringent measure for model evaluation, because the observations are influenced by all the physical processes present in the ocean, some of which may not be adequately represented in the prediction system. For example, turbulent diffusion is parameterized, and internal waves and tides are not included in the prediction systems we evaluate here. Point-for-point comparisons can exaggerate the influence of small displacement errors of mesoscale features. If the predictions represent the mesoscale features relatively well but with slight shifts in the horizontal, relatively large point-for-point errors may occur.
A second type of comparisons begins with overlapping data bins with a 1° radius and 60-day time window. The data in these bins are then linearly fit in a least squares sense, providing a statistical representation (superposition data) of thermocline depth within the bin. This methodology is applied to produce both superobservation and superprediction data. The main advantages are an evening of the data weights between high- and low-density observation regions and estimates of time rate of change. Results using this procedure are compared with results from point-for-point comparisons to evaluate the impact of the comparison framework.
Both traditional Gaussian and robust nonparametric statistical methods are used in the present analysis. Ocean observations rarely exhibit Gaussian data distributions that are assumed when using Gaussian statistical methods. To evaluate the impact of non-Gaussian data distributions, nonparametric and Gaussian statistical methods are applied to all comparisons in this analysis.
In section 2, the observations are described, and special attention is paid to the vertical resolution because it has a large influence on the TD computations. The dynamically interpolated SSHA fields from NLOM are described in section 3a, and sections 3b and 3c describe MODAS and the details of the synthetic variants, respectively. The TD metric and superdata and statistical methods are described in section 4. The results are presented in section 5, with a summary of conclusions in section 6.
2. Profile observations
Observed profiles of temperature and profile pairs of temperature and salinity from the World Ocean Database 2005 (WOD05; Boyer et al. 2006), the U.S. Navy’s Master Oceanographic Observation Dataset (MOODS; Teague et al. 1990), and Argo (Roemmich et al. 2004) are combined. Because the data sources for MOODS and WOD05 are mostly the same, much of these data are duplicated, but each set contains unique profiles. The WOD05 has less stringent quality-control (QC) procedures than MOODS. Argo data are contained in the MOODS and WOD05 datasets but not up to the present time. In addition, the Argo profiles in MOODS are from the real-time system and thus lack some of the QC procedures specifically designed for floats performed at the Argo data acquisition centers (DAC). Also, further QC recommendations are periodically available, such as the pressure sensor error that was discovered for certain float types (available online at http://www.argo.ucsd.edu). For these reasons, the latest Argo data from the DACs have been used in place of the same profiles present in WOD05 and MOODS.
The data for this study are temperature profiles from 2001 through 2003 that have the shallowest depth level above 12 m, the deepest depth level below 150 m, and at least temperature values. Most profiles are expendable bathythermographs (XBTs), but many are from conductivity–temperature–depth (CTD) recorders. The total number of profiles for this time period is 225 160, with 73 549 having profiles pairs of temperature T and salinity S (Fig. 1a) and 151 611 having T only (Fig. 1b). An additional restriction is that, in the upper 150 m, no profile may have any gaps in depth levels exceeding 25 m. This restriction reduces the numbers to 222 772 total, 72 586 for T and S, and 150 186 for T only.
Although the data are quality controlled, errors including profile location, XBT drop rate, and other potential errors may still exist. The effect of location errors and how the QC procedures can miss this type of error are discussed by Kara et al. (2009).
3. Prediction system
a. NLOM
NLOM is a global numerical primitive equation layered formulation model with six dynamic layers and a bulk mixed layer. The operational eddy-resolving
The SSHA assimilation scheme was developed for the efficiency requirements of operational systems and uses an incremental updating technique (Smedstad et al. 2003). First, an optimum interpolation deviation analysis of SSHA is performed using a model forecast as a first guess and the mesoscale covariance functions of Jacobs et al. (2001). The deviation analysis is used to apply weights to all observations according to their space and time locations, in a 3-day window, during which the observations are incrementally updated in the model. The satellite altimeter products used for assimilation are from 1) Jason-1, 2) European Remote Sensing Satellite-2 (ERS-2)/Envisat, and 3) the Geosat Follow-On (GFO). SST is assimilated through a heat flux relaxation scheme that depends on a 3-h e-folding time scale and the mixed layer depth. The surface information is transferred through all dynamic layers via the statistical inference technique of Hurlburt et al. (1990). This technique updates the pressure field in all layers below the surface.
An important aspect of NLOM is its estimate of the bottom pressure anomaly, which is used to split the SSH signal into steric and nonsteric components. The SSHA used in this study is NLOM’s total SSH minus the SSH proportional to the bottom pressure anomaly minus NLOM’s long-term mean steric SSH (Barron et al. 2007). As a result, this SSHA is the steric SSH anomaly that is required for MODAS (see section 3b).
A modified version of the
A set of simulations using NLOM were run for years 2001–03, differing only in the number of altimeter data streams assimilated, from 0 to 3. Each altimeter data stream provides, in one day, approximately 35 000 observations globally. An additional three-altimeter simulation was started from a different initial condition (same day, different climatological year). Both three-altimeter cases produce similar results, except in areas where the variability is nondeterministic.
b. MODAS
MODAS estimates profiles of temperature and salinity from inputs of SST and/or SSHA using gridded polynomials at standard depths with coefficients determined by least squares fit to historical observations (Fox et al. 2002a,b). The SSHA input for the greatest MODAS accuracy is the offset from the steric height anomaly relative to 1000 m in the background climatology. If SST and/or SSHA are not available, the profile estimates revert to the MODAS bimonthly climatology. Both the MODAS background climatology and regression coefficients are defined in a database centered on 15 January, 15 March, 15 May, 15 July, 15 September, and 15 November. Synthetics for a given day are based on coefficients linearly interpolated in time and space to the desired location. Climatological profiles are linearly interpolated in space and time as well.
For this analysis, we provide SSHA and SST to MODAS to produce synthetic profiles of temperature and salinity from which we compute the thermocline depth parameters described in section 4a. To test the impact of assimilating altimeter data, we pair SST from the observed ocean profiles with SSHA from a series of NLOM experiments. In addition, climatology profiles are defined by a zero SSHA and climatological SST.
c. Synthetics
A set of synthetic profiles is derived using MODAS at each of the observed profile locations and times described in section 2. All of the synthetics use the shallowest temperature value from the observed profile as the SST input for MODAS; the climatology profiles use climatological SST and zero SSHA. Eight different SSHA estimates are used to create different sets of synthetics. The descriptions and labels for each are listed in Table 1. For this analysis, we focus on the relative impact that the number of satellite altimeters have on the combined NLOM and MODAS predictions of SSHA and thermocline depth.
4. Methods
a. Metrics
Two TD parameters computed from temperature profile observations and synthetics made during 2001–03 are listed in Table 2. The parameters are based on two isothermal layer depth (ILD) algorithms computed using freely available software described in Lorbacher et al. (2006) and Kara et al. (2000). These methodologies can also be applied to density profiles to compute MLD, but this is not considered here because not all profiles have both temperature and salinity and we are primarily considering thermocline depth.
The ILD algorithms have been well tested, and their characteristics are known. For example, the Lorbacher et al. (2006) ILD selects the first curvature peak of the temperature profile and tends to be relatively shallow. Because the Lorbacher et al. (2006) ILD method is curvature based, the label we use is ILD∇, where the subscript ∇ represents the gradient associated with curvature. The Kara et al. (2000) ILD method selects the depth where a change in temperature relative to the near surface exceeds a threshold. Because the Kara et al. (2000) ILD is based on a change in temperature, the label we use is ILDΔ, where the subscript Δ indicates that it is a threshold methodology. Hereafter, the unadorned ILD will refer to the isothermal layer depth in general. The ILDΔ values tend to be deeper than ILD∇ and are most closely associated with the seasonal ILD (e.g., Helber et al. 2008).
For a small percentage of profiles, each ILD method has certain shortcomings. For example, for profiles where the water column is mixed all the way to the bottom of the profile, ILD∇ returns a value of zero. This is because the algorithm cannot find a curvature peak (Kara et al. 2009). For similar cases, particularly at high latitudes, the threshold value for ILDΔ is too large and returns an ILD that is unrealistically deep or at the bottom of the profile. To eliminate this problem, profiles are discarded if the ILD estimates are at the bottom of the profile and more than 50 m from the ocean bottom (based on the navy’s 2-min bathymetry) because in these cases the observations do not allow for an accurate estimate of the ILD.
The TD parameter is computed by first determining two temperatures: 1) the temperature of the profile at the ILD T0 and 2) the change in temperature between T0 and the temperature 100 m below the ILD, ΔT. The depth below the ILD where the temperature is equal to T0 − ΔT/2 defines TD. Because we have two estimates of the ILD, there are two corresponding estimates of TD, TDΔ and TD∇, which correspond to ILDΔ and ILD∇, respectively.
The TDΔ is shown graphically with labels in Figs. 2a,b. For profiles with sharp thermoclines just below the ILDΔ, the TDΔ is relatively close to the ILDΔ (Fig. 2a). For weak thermoclines, the TDΔ is further from the ILDΔ (Fig. 2b). In this way, the TDΔ characterizes the location of the highest gradients of the upper thermocline from a simple, widely applicable calculation. Because the computation is a bulk estimate of the high temperature gradient depth, it is insensitive to random noise in the profile. Finite difference computation of vertical gradients in observation profiles is inherently noisy and thus avoided in this analysis. The strength of the thermocline (TS) is the gradient of the upper thermocline between the ILD and TD (Table 2; Fig. 2).
b. Superposition
As mentioned in the introduction, observation time and locations are grouped into analysis windows to produce superposition data (“superdata”). For this analysis, the windows are 1° radius circular regions 60 days long computed each month for the entire analysis time period from January 2001 through December 2003. The 1° radius circles are centered on even-numbered latitude and longitude locations. For consistent comparison, the predictions are sampled or interpolated to the time and three-dimensional locations of the observations and are binned in the same manner.
The superdata methods reduce the weight of individual data points in regions of the ocean with relatively high observation density. This is because the subsequent statistical analysis treats every superdata point the same. As a result, regions with a relatively large number of points have the same weight as regions with a small number of points. Both superdata and point-for-point results are discussed in section 5.
c. Statistical error analysis methods
Standard data quality-control procedures are not perfect and allow some erroneous data through. In an attempt to reduce the impact of this noise, we employ robust nonparametric statistical methods, because these methods are less sensitive to the influence of outlier data values (e.g., Rousseeuw and Leroy 1987). Because the nonparametric methods have analogous counterparts in traditional Gaussian statistics, we list those first and explain the robust methods second.
The biweight methods are robust in that they are nonparametric, meaning that the data distribution needs not resemble an assumed form. The biweight methodology applies less weight to data points that are statistically considered outliers of the natural distribution of the data. The details of the biweight computations are described in the appendix. To evaluate the impact of the robust error statistics, the results from both Gaussian and nonparametric statistical methods are presented in section 5.
The scaling is important because it facilitates comparison of different datasets with varying variance levels. This is the case in this experiment because TDΔ and TD∇ have different variance levels by virtue of differences in the ILD methodology. The summary diagrams allow consistent comparison of these side by side.
Statistical significance is computed with the use of bootstrap standard error estimates (Efron and Tibishirani 1986). The error bars presented in the figures of section 5 represent a 100 independent draw bootstrap standard error estimate.
5. Results
a. Global consistency
Inspection of the model results relative to observations reveals a remarkable similarity that bolsters confidence in the U.S. Navy’s ocean prediction system. For example, Figs. 3a,b show the mean SSHA values (superdata fit parameter a0) for the global observations and the A3EGT SSHA (the SSHA used in the A3EGT synthetics described in Table 1), respectively. Each dot represents regions that have enough data to span 30 days of the 60-day bins. The parameter a0 represents the mean of 1° diameter circles centered on even latitude and longitudes for a 60-day period starting on 1 January for the years 2001–03. In this format, comparing the two fields, the spatial variability appears similar and the magnitude is close. Similarly, TDΔ from observations and the A3EGT synthetics are shown in Figs. 4a,b. Comparisons with ILD climatologies (available online at http://www7320.nrlssc.navy.mil/nmld/nmld.html; Kara et al. 2003) are consistent, though the TDΔ is expected to be deeper.
b. SSHA accuracy
To gain an understanding of the relative accuracy of the NLOM SSHA estimates before using them to construct synthetics with MODAS, we compare them to the BEST SSHA estimate (the SSHA used for the BEST synthetics described in Table 1). This estimate is the closest we have to direct in situ observations of SSHA. As mentioned in section 4c, consistent comparisons can be made with summary diagrams and, consequently, all SSHA errors (relative to the BEST SSHA) using both nonparametric and Gaussian statistical methods are presented in Fig. 5. Using the superdata comparison framework (Fig. 5a), we find that the SSHA estimates using two or three altimeters have the smallest bias, whereas the MODAS SSHA estimate has the smallest RMSE. In Fig. 5 and the rest of the summary diagrams, the greater accuracy is indicated by values closer to the top-left corner of the figures. Farther toward the left represents smaller RMSE, and farther toward the top represents smaller bias. Bias is decreasing upward, because in the cases considered here it is always negative. MODAS tends to have a shallow bias in MLD and TD because of the unrealistically smooth vertical gradients of the synthetics.
The Gaussian statistical methods produce similar results to those for the nonparametric methods, with the exception of the A0 case (Fig. 5a), which has larger RMSE. For the traditional Gaussian statistical results, the bootstrap standard error bars are unstable and sensitive to the number of independent draws. For this reason, some of these error bars extend outside the axis limits in Fig. 5. The point-for-point comparison framework has a reduction in bias error with an increase in RMSE error only for Gaussian statistics (Fig. 5b).
Because standard error bars for the one- and two-altimeter cases do not overlap, two altimeters are significantly more accurate than just one. One altimeter is also clearly better than the nonassimilative NLOM case. It is interesting to note that the A1G has smaller errors than the A1J, suggesting the higher horizontal resolution of the GFO orbit may reduce the overall bias. This is consistent with the expected error estimates of Jacobs et al. (2002).
c. TD accuracy
The primary metric for this analysis is the TD, which is computed from the MODAS synthetics. An advantage of this metric is that we are able to compute TD from both the observation and the synthetic profiles, whereas for SSHA there is no direct in situ observation. In this section, summary diagrams for all synthetics for both TD metrics are presented in Figs. 6a–d. In these figures, the lowest error is again closest to the top-left corner of the plot, because there is a shallow TD bias (MB′ < 0) for all synthetics and RMSE′ > 0. The error bars are from a 100 independent draw bootstrap standard error estimate. Figure 6a uses both the superdata and nonparametric methods and therefore should be the most reliable global error estimate. Figure 6b also uses the superdata methods, but with Gaussian statistics. Figures 6c,d are point-for-point comparisons with nonparametric and Gaussian statistics, respectively.
The statistical methods have the largest impact on the results (cf. Fig. 6a with 6b and Fig. 6c with 6d). Note that the axis limits are larger in Figs. 6b,d. Outliers increase the errors when using Gaussian statistical techniques because the nonparametric statistics reduce the weight of points that exist far from the median absolute deviation (see appendix), whereas the impact of the superdata versus point-for-point comparisons is relatively small (cf. Fig. 6a with 6c; cf. Fig. 6b with 6d). There is, however, less separation between the TDΔ and TD∇ metrics for the point-for-point framework. There is also a slight increase in error levels for the point-for-point framework, suggesting that there are larger errors in regions of high data density.
Notice that the synthetics are more skillful for the threshold method TDΔ, and the gradient method TD∇ has the larger error for all cases. This occurs because MODAS synthetic profiles tend to have overly smooth vertical gradients that result in a shallow bias for ILDΔ and therefore TDΔ. The shallow bias is exacerbated by the curvature-based ILD∇ algorithm when operated on synthetics because of the reduced curvature relative to the observed profiles.
The synthetic with the smallest overall error for both metrics using nonparametric statistics is BEST (Figs. 6a,c), which shows the optimal performance of the MODAS approached by using as input the actual dynamic height deviation from the observed profiles (see Table 1). The smallest bias is for CLIM and TONLY, which have nearly identical errors. The next smallest error is with the three-altimeter synthetics A3EGT and A3EGT2. The largest error, in MB and RMSE, tends to occur with the zero-altimeter synthetic case A0. In terms of RMSE, all synthetics, except BEST and A0, tend to have nearly the same RMSE. The inclusion of altimeter assimilation tends to have the largest impact on the MB.
The results using Gaussian statistics are less clear because the standard error bars are larger and the estimates are clustered closer together (Figs. 6b,d). The general trend is maintained, where fewer altimeters have larger errors, but the separations between values are generally not significant.
The authors have not found a suitable nonparametric estimate for the correlation coefficient. The traditional estimate computed using Eq. (6) for both comparison frameworks is shown in Fig. 7. Although all correlations are high, the differences between the cases are not significant, with the exception of the difference between the A0 and the other altimeter assimilative cases. A major difference in correlation relative to the results shown in Fig. 6 is the low correlation values of CLIM and TONLY. This is consistent because climatologies do not vary on subseasonal time scales; therefore, we would expect lower correlations. Climatologies are designed to represent long space and time scale averages such as those computed for Fig. 6.
The overall results show that a significant error reduction is obtained by including at least one altimeter. The difference between one and three altimeters is significant only in bias. Given the limitations of the time and space scales inherent in the NLOM/MODAS assimilation, we are unable to discern a significant difference in the errors using two or three altimeters. The use of nonparametric statistics tends be more effective in discriminating differences among the MODAS SSHA-derived predictions.
d. Error variability
To explore the seasonality of the error and fit parameters over the annual cycle, we choose two regions in the northwest Pacific Ocean. The high variability (HV) region bounded by 20°–50°N and 120°E–160°W has high SSHA variability because of the meandering of the Kuroshio Current. The low variability (LV) region bounded by 0°–20°N and 130°E–150°W has relatively low SSHA variability.
For the low variability region, which is represented by the dashed–dotted lines in Fig. 8, the effects of seasonality are relatively weak for a0, a1, and e, which change little throughout the year. In Figs. 8d,e, percent error and correlation coefficient for both HV and LV regions are considered for both the A3EGT synthetics (in black) and CLIM (in gray) cases. In Figs. 8d,e, the gray and black dashed–dotted lines are relatively close together compared to the solid lines. This suggests that the A3EGT and CLIM cases have similar errors in the LV region. In the HV regions, however, the CLIM case has larger magnitude in Fig. 8d and lower correlation on average in Fig. 8e. The average correlation coefficient in the HV region for a0 determined from the A3EGT synthetics is 0.93, higher than 0.83 for climatology. Average correlation is also relatively low in the LV region, 0.84 for both A3EGT and CLIM. In the LV region, the accuracy of the A3EGT synthetics is not significantly different from climatology. The value added by altimetry in the MODAS synthetics is evident only in the region of high SSH variability, where the signal stands out from the background noise.
The global distribution of observed a0 values can be seen in Fig. 9. For the 60-day period starting 1 February, the northwest Pacific Ocean Kuroshio Extension region (the HV region in the solid box of Fig. 9) has relatively deep TDΔ in the annual cycle. In contrast, the 60-day period starting in August has relatively shallow TDΔ in the Kuroshio Extension region (see also Fig. 8). The seasonal cycle is much weaker in the LV region (dashed–dotted box in Fig. 9).
The rate of change of TDΔ is given by the fit parameter a1, as shown in Fig. 10 globally for the 60-day periods starting in March and November. In the Kuroshio Extension (HV) region, a1 is shoaling most rapidly in March and April and deepening in November and December (Fig. 8). This pattern for the annual cycle of TDΔ rate of change applies generally to most regions in the midlatitude Northern Hemisphere.
The annual cycle of the percent error in A3EGT synthetics is linked to the TDΔ rate of change. Wherever the magnitude of a1 is large, the percent error magnitude is large. For the Kuroshio Extension region, the 60-day periods starting in April and December have relatively large percent error (Fig. 8); the global distributions of errors for these months are shown in Fig. 11. The largest errors tend to be where SSHA variability is large: for example, in the eddy shedding regions of the Kuroshio and the Gulf Stream. In regions of the ocean where the SSHA variability is weak, such as the LV region, the errors are relatively small.
6. Summary and conclusions
The impact of the number of satellite altimeters on upper-ocean predictions of thermocline depth (TD) are evaluated using global, data assimilating, layered model SSHA analyses. By varying the number of altimeters from zero to three, the prediction accuracy is determined relative to a global set of in situ profile observations. In addition, methods for evaluating prediction accuracy are presented using two comparison frameworks and two statistical methodologies. Comparisons are made point for point and for binned and fitted superdata regions (1° radius, 60 days). Both traditional Gaussian and nonparametric statistical methodologies are applied.
The general results show that accuracy provided by the satellite altimeter datasets is greater in regions of high SSHA variability, and significant error reduction is achieved with the addition of at least one satellite altimeter dataset. Under the limitations of the analysis methods, additional error reduction when assimilating data from three altimeters versus one is significant only in bias. These conclusions are drawn from both the point-for-point and superdata frameworks when using nonparametric statistical methods. The lack of significant skill improvement between two and three satellite altimeter datasets may be a consequence of the length and time scales associated with the NLOM assimilation and the smoothing associated with gridded climatological coefficients of MODAS. As a result, this system is unable to take full advantage of the added spatial detail provided by multiple altimeters, even though each additional data stream adds approximately 35 000 data points globally per day. The simulated error experiment of Smedstad et al. (2003) also indicates that the marginal reduction in SSHA error decreases for each additional satellite data stream.
As expected, the BEST synthetics have the greatest accuracy but are unavailable in practical applications because the observed profiles are used for their derivation. The BEST SSHA estimate is fully consistent with the observed profile within the context of MODAS synthetics. This upper accuracy limits of MB′ and RMSE′ (bias and rms error scaled by the observations standard deviation) are approximately −0.05 and 0.25, respectively, for nonparametric statistics and the superdata methodology. The next greatest MB′ accuracy is found in the results from CLIM and TONLY. This somewhat surprising result is due to climatologies being specifically designed to globally minimize MB′ and RMSE′. The correlation levels of CLIM and TONLY, however, are lower than the altimeter assimilative MODAS cases. The TONLY estimates are nearly the same as CLIM, suggesting that SST alone has little influence on TD.
In terms of RMSE′, all of the synthetics, with the exception of BEST and A0, have nearly the same value. The differences in RMSE′ of these synthetics are not statistically significant. The lowest accuracy in both MB′ and RMSE′ comes from the zero-altimeter synthetic A0. This is because, without SSHA assimilation, NLOM is not constrained at all by SSHA. These lower accuracy limits of MB′ and RMSE′ are approximately −0.2 and 0.45, respectively, for nonparametric statistics and the superdata methodology.
In this paper, the statistical methodologies are found to have a relatively large impact on the results. Evaluation based on traditional statistics that assume Gaussian data distributions result in inflated error and error uncertainty estimates. Using nonparametric statistics circumvents the need for strict and often unfounded assumptions about the data distribution. The two comparison frameworks used in this analysis have a lesser impact. The effect of the superdata methods is to even the influence of data between low- and high-density observation regions. Because the global error results for the point-for-point framework are slightly larger than for the superdata framework, high-density observation regions tend to occur in areas with larger errors.
The errors associated with SSHA (Fig. 5) are generally larger than the errors found for TD (Fig. 6). This is due to representation errors associated with the SSHA comparison, because there are no direct in situ estimates of SSHA. Instead, we use the difference in observed dynamic height from climatology. Climatology, however, is not identical to the mean sea surface as measured from satellites or associated with NLOM. These misrepresentations add to the errors found in this analysis. Because MODAS is designed to provide realistic three-dimensional temperature and salinity structure, the TD estimates have greater skill relative to direct in situ profiles observations.
The metric TD∇ has poorer accuracy than TDΔ because ILD∇ introduces a shallow bias in the ILD estimates. The ILD∇ methodology is based on the first curvature peak found in the profile and therefore returns the depth of the purely isothermal layer near the surface. This exacerbates the already shallow bias of MODAS, further degrading the synthetic accuracy. Synthetics and other predictions tend to smooth the sharp gradients found in observations, further reducing the accuracy of both TD∇ and TDΔ. The more unbiased threshold ILD-based estimate TDΔ does not reinforce the shallow bias of MODAS and leads to more accurate TDΔ estimates.
Using summary diagrams, the errors scaled by the observation standard deviation are plotted side by side for consistent comparison. These diagrams are repeated for each statistical methodology and each comparison framework. In each rendition, the results are similar, with the exception that Gaussian statistics inflate the error and error uncertainty levels. The nonparametric statistical methods indicate lower error levels and larger separation between, for example, A1J and A3EGT synthetics, where the standard error bars do not overlap (Fig. 6). The traditional Gaussian statistical methods are unable to differentiate these cases because this separation is smaller and the standard error bars overlap.
The result that the A1G SSHA estimates tend to have better accuracy than the A1J SSHA estimates is consistent with expected error estimates of Jacobs et al. (2002). The distance between ground tracks in the GFO orbit is about 125 km at 35°N with a 17.05-day repeat cycle, whereas the distance between ground tracks in the Jason-1 orbit is about 260 km at 35°N, with a 9.95-day repeat cycle. The characteristics of the mesoscale tend to have small space and long time scales relative to the altimeter orbit spacing and repeat cycle, respectively (Jacobs et al. 2001). This suggests that the shorter orbit spacing should be an advantage for the A1G SSHA estimate, a hypothesis corroborated in the results of this paper.
Comparing predictions to in situ observations is the ultimate test in that the observations are the closest information we have to the true ocean condition. This type of comparison, however, is inherently plagued with representation errors, because the observations generally have more physical processes influencing the measurements on shorter space and time scales than the predictions are able to resolve.
Acknowledgments
This publication is a contribution to the Assessing UUV Sampling Strategies with Observation System Simulation Experiments project supported by the Office of Naval Research under Program Element 602345N. We thank the National Oceanographic Data Center and T. Boyer for assistance with the World Ocean Database 2005 data, K. Grembowicz of the Naval Oceanographic Office for assistance with the U.S. Navy’s Master Oceanographic Observational Data Set, and J. Dastugue for skillful rendering of figures. Finally, we appreciate the comments of two anonymous reviewers and discussion with J. Cummings and G. Jacobs of the Naval Research Laboratory that have led to important improvements to this manuscript.
REFERENCES
Barron, C. N., and Kara A. B. , 2006: Satellite-based daily SSTs over the global ocean. Geophys. Res. Lett., 33 , L15603. doi:10.1029/2006GL026356.
Barron, C. N., Kara A. B. , Hurlburt H. E. , Rowley C. , and Smedstad L. F. , 2004: Sea surface height prediction from the global Navy Coastal Ocean Model during 1998–2001. J. Atmos. Oceanic Technol., 21 , 1876–1893.
Barron, C. N., Smedstad L. F. , Dastugue J. M. , and Smedstad O. M. , 2007: Evaluation of ocean models using observed and simulated drifter trajectories: Impact of sea surface height on synthetic profiles for data assimilation. J. Geophys. Res., 112 , C07019. doi:10.1029/2006JC003982.
Barron, C. N., Kara A. B. , and Jacobs G. A. , 2009: Objective estimates of westward Rossby wave and eddy propagation from sea surface height analyses. J. Geophys. Res., 114 , C03013. doi:10.1029/2008JC005044.
Boyer, T. P., and Coauthors, 2006: World Ocean Database 2005. NOAA Atlas NESDIS 60, 190 pp.
Doney, S. C., Yeager S. , Danabasoglu G. , Large W. G. , and McWilliams J. C. , 2007: Mechanisms governing interannual variability of upper-ocean temperature in a global ocean hindcast simulation. J. Phys. Oceanogr., 37 , 1918–1938.
Efron, B., and Tibishirani R. , 1986: Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci., 1 , 54–77.
Fan, S., Oey L-Y. , and Hamilton P. , 2004: Assimilation of drifter and satellite data in a model of the northeastern Gulf of Mexico. Cont. Shelf Res., 24 , 1001–1013.
Fox, D. N., Barron C. N. , Carnes M. R. , Booda M. , Peggion G. , and Van Gurley J. , 2002a: The Modular Ocean Data Assimilation System. Oceanography, 15 , 22–28.
Fox, D. N., Teague W. J. , Barron C. N. , Carnes M. R. , and Lee C. M. , 2002b: The Modular Ocean Data Assimilation System (MODAS). J. Atmos. Oceanic Technol., 19 , 240–252.
Helber, R. W., Barron C. N. , Carnes M. R. , and Zingarelli R. A. , 2008: Evaluating the sonic layer depth relative to the mixed layer depth. J. Geophys. Res., 113 , C07033. doi:10.1029/2007JC004595.
Hoaglin, D. C., Mosteller F. , and Tukey J. W. , 1983: Understanding Robust and Exploratory Data Analysis. John Wiley and Sons, 447 pp.
Hurlburt, H. E., 1986: Dynamic transfer of simulated altimeter data into subsurface information by a numerical model. J. Geophys. Res., 91 , 2372–2400.
Hurlburt, H. E., Fox D. N. , and Metzger E. J. , 1990: Statistical inference of weakly correlated subthermocline fields from satellite altimeter data. J. Geophys. Res., 95 , 11375–11409.
Ishikawa, Y., Awaji T. , Akitomo K. , and Qiu B. , 1996: Successive correction of the mean sea surface height by the simultaneous assimilation of drifting buoy and altimetric data. J. Phys. Oceanogr., 26 , 2381–2397.
Jacobs, G. A., Barron C. N. , and Rhodes R. C. , 2001: Mesoscale characteristics. J. Geophys. Res., 106 , 19581–19595.
Jacobs, G. A., Barron C. N. , Fox D. N. , Whitmer K. R. , Klingenberger S. , May D. , and Blaha J. P. , 2002: Operational altimeter sea level products. Oceanography, 15 , 13–21.
Jolliff, J. K., Kindle J. C. , Shulman I. , Penta B. , Friedrichs M. A. M. , Helber R. , and Arnone R. A. , 2009: Summary diagrams for coupled hydrodynamic-ecosystem model skill assessment. J. Mar. Syst., 76 , 64–82. doi:10.1016/j.jmarsys.2008.05.014.
Kara, A. B., and Hurlburt H. E. , 2006: Daily inter-annual simulations of SST and MLD using atmospherically forced OGCMs: Model evaluation in comparison to buoy time series. J. Mar. Syst., 62 , 95–119.
Kara, A. B., Rochford P. A. , and Hurlburt H. E. , 2000: An optimal definition for ocean mixed layer depth. J. Geophys. Res., 105 , 16803–16821.
Kara, A. B., Rochford P. A. , and Hurlburt H. E. , 2003: Mixed layer depth variability over the global ocean. J. Geophys. Res., 108 , 3079. doi:10.1029/2000JC000736.
Kara, A. B., Helber R. W. , Boyer T. P. , and Elsner J. B. , 2009: Mixed layer depth in the Aegean, Marmara, Black and Azov Seas: Part I: General Features. J. Mar. Syst., 78 , (Suppl.). 169–180. doi:10.1016/j.jmarsys.2009.01.022.
Lanzante, J. R., 1996: Resistant, robust and non-parametric techniques for the analysis of climate data: Theory and examples, including applications to historical radiosonde station data. Int. J. Climatol., 16 , 1197–1226.
Lin, X-H., Oey L-Y. , and Wang D-P. , 2007: Altimetry and drifter data assimilation of loop current and eddies. J. Geophys. Res., 112 , C05046. doi:10.1029/2006JC003779.
Lorbacher, K., Dommenget D. , Niiler P. P. , and Köhl A. , 2006: Ocean mixed layer depth: A subsurface proxy of ocean-atmosphere variability. J. Geophys. Res., 111 , C07010. doi:10.1029/2003JC002157.
National Oceanic and Atmospheric Administration, 1986: ETOP05 digital relief of the surface of the earth. National Geophysical Data Center Data Announcement 86-MGG-07, 19 pp.
Oey, L-Y., Ezer T. , Forristall G. , Cooper C. , DiMarco S. , and Fan S. , 2005: An exercise in forecasting loop current and eddy frontal positions in the Gulf of Mexico. Geophys. Res. Lett., 32 , L12611. doi:10.1029/2005GL023253.
Roemmich, D., Riser S. , Davis R. , and Desaubies Y. , 2004: Autonomous profiling floats: Workhorse for broad-scale ocean observations. Mar. Technol. Soc. J., 38 , 21–29.
Rousseeuw, P. J., and Leroy A. M. , 1987: Robust Regression and Outlier Detection. John Wiley and Sons, 329 pp.
Shriver, J. F., Hurlburt H. E. , Smedstad O. M. , Wallcraft A. J. , and Rhodes R. C. , 2007: 1/32° real-time global ocean prediction and value-added over 1/16° resolution. J. Mar. Syst., 65 , 3–26.
Smedstad, O. M., Hurlburt H. E. , Metzger E. J. , Rhodes R. C. , Shriver J. F. , Wallcraft A. J. , and Kara A. B. , 2003: An operational eddy resolving 1/16° global ocean nowcast/forecast system. J. Mar. Syst., 40 , 341–361.
Teague, W. J., Carron M. J. , and Hogan P. J. , 1990: A comparison between the Generalized Digital Environmental Model and Levitus climatologies. J. Geophys. Res., 95 , 7167–7183.
Wallcraft, A. J., Kara A. B. , Hurlburt H. E. , and Rochford P. A. , 2003: The NRL Layered Global Ocean Model (NLOM) with and embedded mixed layer submodel: Formulation and tuning. J. Atmos. Oceanic Technol., 20 , 1601–1615.
Zou, X., and Zeng Z. , 2006: A quality control procedure for GPS radio occultation data. J. Geophys. Res., 111 , D02112. doi:10.1029/2005JD005846.
APPENDIX
Robust Nonparametric Statistics
The locations of the publicly available profile observations that have (a) temperature and (b) paired values of temperature and salinity.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The computation of TD for a typical temperature profile with a relatively (a) strong and (b) weak thermocline. The temperatures at the ILD, TD, and ILD + 100 m are T0, T0 − ΔT/2, and T0 − ΔT, respectively. The thermocline gradient is computed two ways, as labeled. The ILD is computed using the threshold method (ILDΔ) and has a value of (a) 49.5 and (b) 48.6 m and the corresponding TDΔ is (a) 56.5 and (b) 68.6 m.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The fit parameter a0 (superdata mean; in m) for the (a) BEST and (b) A3EGT SSHA estimates for the global ocean for a 60-day analysis window starting 1 Jan for the years 2001–03. Values exist only where data span at least 30 of the 60 days.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The fit parameter a0 (superdata mean; in m) for the (a) observed and (b) A3EGT synthetic TDΔ estimate for the global ocean for a 60-day analysis window starting 1 Jan for the years 2001–03.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The (a) superdata and (b) point-for-point summary diagram of biweight statistics
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
(a) The superdata and nonparametric statistics summary diagram of
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The correlation coefficient of observed vs synthetic TDΔ for each of the cases in Table 1 as listed on the x axis. The correlation is computed on superdata (black) and point-for-point (gray) values, and the corresponding vertical lines represent the 100 draw bootstrap standard error estimate.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The median fit parameters (a) a0, (b) a1, and (c) e and the (d) percent error and (e) correlation of a0 from A3EGT synthetics (in black) and CLIM (in gray) by month, relative to the fit parameters determined directly from observations. The solid lines are for the region bounded by 20°–50°N and 120°E–160°W and the dashed–dotted lines are for the region bounded by 0°–20°N and 130°E–150°W in the northwest Pacific Ocean.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The fit parameter a0 (superdata mean; in m) for the observed TDΔ for the global ocean for a 60-day analysis window starting (a) 1 Feb and (b) 1 Aug for the years 2001–03. The rectangles represent the regions for the computations in Fig. 8.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The fit parameter a1 (superdata rate of change; in m day−1) for the observed TDΔ for the global ocean for a 60-day analysis window starting (a) 1 Mar and (b) 1 Nov for the years 2001–03.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The percent error for TDΔ A3EGT fit parameter a0 (superdata mean) for the global ocean for a 60-day analysis window starting (a) 1 Apr and (b) 1 Dec for the years 2001–03.
Citation: Journal of Atmospheric and Oceanic Technology 27, 3; 10.1175/2009JTECHO683.1
The SSHA estimates and other details used in the generation of MODAS subsurface synthetics. All cases but CLIM use SST from the observation profile. The labels are used in the figures and text to denote each case.
The upper thermocline metric parameters that are computed from observations and synthetics.