## Abstract

A benchmark of linear predictability of sea surface height (SSH) globally is presented, complementing more complicated studies of SSH predictability. Twenty years of the Estimating the Circulation and Climate of the Ocean (ECCOv4) state estimate (1992–2011) are used, fitting autoregressive moving average [ARMA()] models where the order of the coefficients is chosen by the Akaike information criteria (AIC). Up to 50% of the ocean SSH variability is dominated by the seasonal signal. The variance accounted for by the nonseasonal SSH is particularly distinct in the Southern and Pacific Oceans, containing >95% of the total SSH variance, and the expected prediction error growth takes a few months to reach a threshold of 1 cm. Isolated regions take 12 months or more to cross an accuracy threshold of 1 cm. Including the trend significantly increases the time taken to reach the threshold, particularly in the South Pacific. Annual averaging has expected prediction error growth of a few years to reach a threshold of 1 cm. Including the trend mainly increases the time taken to reach the threshold, but the time series is short and noisy.

## 1. Motivation

The variability and change of future sea surface height (SSH, denoted *η*) is the center of much of the concern about the ongoing global warming. Understanding and predicting these key values, globally and regionally, involves projection and space–time integration of the numerous factors that influence SSH. These factors include the wind field, atmospheric pressure, tides, ice melt, river runoff, heat and freshwater exchange, and the shifting ocean circulation itself (Parker 1992; Church et al. 2013). The diverse physics spans a large range of time scales for oceanic response (e.g., Wunsch 2015).

Compared to the atmosphere, most relevant oceanic time scales are very long, ranging from months to thousands of years. The presence of that long time scale (long memory) and the observed small perturbations in the oceanic state suggest that many of the major components determining future values of *η* can be predicted from a knowledge of the present and past states of the ocean. The expected prediction error (PE) growth of *η* is not well established. Attempting to estimate the PE via ensembles of climate model simulations reveals large ensemble spread [e.g., the Intergovernmental Panel on Climate Change (IPCC)] (Church et al. 2013; Stainforth et al. 2005; Palmer 2012).

The goal here is to assess quantitatively the extent to which *η* variability is predictable using linear methods, describing both the deterministic (seasonal) changes as well as the underlying continuum treated as a wide-sense stationary linear process.^{1} As discussed, for example, by Wunsch (2013), such an approach provides a baseline against which predictions made with considerably more complex methods (nonlinear, nonstationary, extended to spatial structure) can be compared. The general case involves much more complex computations and raises the purely practical issue of whether the linear, univariate, stationary approach is adequate for SSH, and for how long.

An extensive body of literature explores the variability of *η* with varying degrees of complication, ranging from elementary statistics to the application of hierarchies of general circulation models (GCMs). These methods have varying degrees of regional success (Gille 1994; Chowdhury et al. 2007; Melillo et al. 2014) or global application (Rahmstorf et al. 2012; Church et al. 2013). The purely statistical approach is less concerned with capturing the underlying physics, while the GCM approach treats the *η* field as the deterministic integrated sum of ocean and atmospheric physics. This present study uses the simplest statistical approach to present a benchmark for more complex studies.

Treating oceanic change as linear may be counterintuitive. However, the modern observational record shows no major shifts in the large-scale baroclinic structure of the ocean (e.g., Roemmich et al. 2012). Apart from small regions of sea ice or convection, well-understood theory also supports the inference of only perturbation changes over periods from decades to centuries (Hirschi et al. 2013).

Interpretation of statistics from short records is difficult (see, e.g., Wunsch 1999; Percival et al. 2001; Ocaña et al. 2016). The methods that underlie much of what is presented here rely on the assumptions that *η* changes from the superposition of deterministic seasonal components and from a wide-sense stationary stochastic process. Of most relevance to the latter are general red noise processes and the extreme of white noise, which is, by definition, linearly unpredictable. Detection of true nonstationarity is not possible with the short records at hand. Similarly, an infinite number of generalizations to nonlinear representations are possible, but unless the linear assumption can be excluded, it remains an important reference point.

Local and global predictability are in many ways distinct; for example, regional variability in *η* has been attributed to shifts in wind features, tropical modes, and features such as the North Atlantic Oscillation (Yin and Goddard 2013; Roberts et al. 2016). Here, the approach is that of a univariate “black box,” with the underlying mechanisms (e.g., determining the changing global mean of *η*) having been discussed in many published papers (Parker 1992; Piecuch and Ponte 2011; Forget and Ponte 2015; Ocaña et al. 2016). The oceans store large portions of the added heat from global warming, and land ice is retreating, along with other external forcings, but discussion of these specific physical contributions as functions of time and position is postponed.

The methods are detailed in section 2, and the results are presented in section 3, where the seasonal and nonseasonal contributions to the variance of *η* are presented. As defined in this paper, the seasonal component is perfectly predictable, and the nonseasonal portion involves stochastic forecasting. A set of four reestimates is presented: 1) using monthly or annual means of , 2) with apparent linear trends included as part of the background red noise, and 3) and 4) with the trends removed in both cases. Section 4 presents the discussion and conclusion.

## 2. Numerical and ARMA models

Predictability of *η* is studied using the ECCOv4 global bidecadal state estimate, as described by Wunsch and Heimbach (2013), Forget et al. (2015), and others (see also ECCO Consortium 2017a,b). The state estimate is global, with latitudinal 1° resolution with tropical mesh refinement. A least squares with Lagrange multipliers approach is used to obtain the state estimate. The result is an adjusted, yet free-running, version of the MIT general circulation model (MITgcm; Adcroft et al. 2004). In contrast to most “reanalysis” products, the ECCO oceanic state satisfies basic conservation laws for enthalpy, salt, volume, and momentum, while remaining largely within error estimates of a diverse set of global data (Wunsch and Heimbach 2007, 2013; Stammer et al. 2016). Regions without data are filled in a dynamically consistent way, avoiding the use of untested statistical hypotheses (e.g., Reynolds et al. 2013).

At each point of latitude and longitude () of the state estimate, the temporal mean (1992–2011) is removed, and is defined as , where denotes the seasonal and the nonseasonal . Throughout this study, each coordinate is used, and, for simplicity, the spatial indices are dropped hereafter. At this stage, any temporal trend is being included as part of .

The seasonal component here includes its first two harmonics, as illustrated in Fig. 1. Variances of the seasonal and nonseasonal components are additive:

To the extent that , useful prediction is purely deterministic. When the seasonal variability is not dominant, the predictability of the nonseasonal process has to be examined. Deterministic prediction of sinusoidal components is straightforward.

Linear predictability of wide-sense stationary stochastic processes, not distinguishable from Gaussian, is well understood, with a very large literature including standard textbooks (e.g., Box and Jenkins 1970). Here, the formalism is discussed only insofar as it develops the notation to be applied. Most linear methods are based on the autoregressive (AR) process of order *n* [AR(*n*)], the moving average (MA) process of order [MA(*m*)], or the mixed autoregressive moving average (ARMA) of order [ARMA()].

As the textbooks show (Box and Jenkins 1970), these representations are interchangeable, with a choice being mainly one of convenience, efficiency of representation, or a combination of those two. If is a zero-mean wide-sense stationary time series, two of the representations are

A combination gives the general ARMA() model:

where and are regression coefficients; is near-Gaussian white noise with zero mean and variance ; and *t* is any time, past, present, or future, measured in units producing an implied time step . Conversion of one form to another or to the mixed representation is discussed in textbooks (e.g., Box and Jenkins 1970). In the MA form, the white noise increments are determined for past values leading up to the present time *t*. Parameter is known (estimated), but no future values are known. In the AR form, past values are assumed to have been estimated, as is , but again, no future values are available. In the presence of noise, these representations can become unstable, being indistinguishable from apparent nonstationarity. Tests for stability/nonstationarity are based upon the zeros and poles of complex polynomials formed from the various coefficients (Box and Jenkins 1970).

The MA form gives the simplest representation of the growth of prediction error, from Eq. (5), although the final growth rate is the same for all consistent ARMA forms. (All results here are based on the conversion to the MA after determination of the more general ARMA.) Converting the ARMA to the MA form using the Wold representation, the *τ*-ahead PE is

This quadratic error growth depends upon the values of , whose sum can never exceed the time series variance , for which the best prediction would be the time mean.

The performance of the ARMA() is assessed in terms of the PE growth over time. This criterion is expressed as the time it takes the error to grow beyond a given threshold, and good model performance refers to a relatively small PE at a specific time.

In practice, regression coefficients are most often found using one of several versions of least squares in which autocovariances are estimated along the way. The main difficulty is determining the orders for the particular representation. Orders are increased incrementally until a stopping criterion is met (see, e.g., Akaike 1973; Hughes and Williams 2010; Aho et al. 2014). Adding regression parameters improves the fit to the data but risks overfitting. The Akaike information criteria (AIC) is used here—minimizing the expectation of the PE where *k* is the number of parameters:

where is the likelihood:

Parameter is the observed, and is the prediction, so are the prediction residuals. In the estimate, the AIC value is minimized, which determines the smallest appropriate order to represent the time series. As discussed by Priestley (1981) and Yang (2005), the AIC can overestimate the order; see the appendix for more detail.

In the following discussion of the SSH time series, two cases are considered: one where only a time mean has been removed (*η*), and one where a best-fitting linear trend has been subtracted as well (). Separate analyses of , using both monthly and annual mean time series, are considered. Further removing a seasonal cycle leads to time series and , respectively. A significant linear or quadratic trend can, itself, be used to make a prediction. By including trend structures in the stochastic process, the predictability of the time series will be enhanced.

## 3. Results

### a. Seasonal variance

Figure 2a shows the total variance of *η* from monthly means of 1992–2011. Western boundary currents and their extensions are associated with higher variance, particularly in the Northern Hemisphere. The tropics have a large SSH variance, particularly in the Pacific Ocean. The Indian Ocean is dominated by monsoonal effects, particularly in the Arabian Sea warm pool, and complicated interactions of jet dynamics and the Indonesian throughflow seen particularly around 15°S (Schott et al. 2001). Elevated standard deviations occur in the eastern Indian Ocean that are not seen in the Atlantic or Pacific Oceans. The Southern Ocean shows some excess variance, particularly in the Indian and Pacific sectors. Bathymetric features, such as the Pacific–Antarctic Ridge, are associated with higher variance (Ponte and Piecuch 2014).

Figure 2b illustrates the percentage of the total variance included in the component. A striking, but well-known, seasonal hemispheric difference appears across the equator (Pattullo et al. 1955), where much of the *η* variance in the Northern Hemisphere is dominated by the seasonal component. Exceptions include a large zonal band in the Pacific and areas at a similar latitude in the North Atlantic. The Irminger and Labrador Seas also have large areas where the seasonal signal is less dominant, as is also true of the Bering Sea. The South Pacific Ocean has large areas where the seasonal signal is weak, but a signal extends westward off the coast of Peru, likely associated with the upwelling there. A similar feature is seen in the East Australian Current. In the Indian Ocean, fluctuations in the Arabian Sea are almost entirely captured by the seasonal monsoonal component. South of the equator, the eastern half of the Indian Ocean in the Southern Hemisphere is largely dominated by , but less clearly so in the west. In the Southern Ocean, a clear, seasonal dominance appears in the Weddell Gyre region, as well as in the area of the standing meander off the Agulhas coast of South Africa.

### b. Seasonal prediction

The seasonal component is predictable, as defined over the bidecadal time interval covered by ECCOv4, and Fig. 2d illustrates the associated contribution to the prediction of the standard deviation of the seasonal component in cm over 1992–2011. As expected, where the seasonal variance is a large fraction of the total, good predictability is found (e.g., associated with the seasonal component in the Mascarene Basin area, as well as in the Southern Ocean in the Indian and Pacific Ocean sectors).

### c. Nonseasonal variance

Figure 2c shows the percentage of the total variance accounted for by the nonseasonal background process. Here, a visually striking signal appears across the Pacific Ocean along the equator, probably associated with the El Niño–Southern Oscillation (ENSO) climate mode. Further zonal bands appear to the north of the equator. A strong signal occurs in the Bering Strait region. In the Atlantic Ocean, the subtropical regions show active areas, as well as along the paths of the Labrador and southern tip of the East Greenland Currents. In the South Atlantic, the nonseasonal component of variance accounts for a large portion of the variance in the Brazil Current and the Zapiola region of the Argentine Basin, as well as in zonal bands. In the Indian Ocean, the nonseasonal component of variance also accounts for a large portion of the total variance to the west, in the Mascarene Basin region, as well as along the western Indonesian coast associated with the propagation of the throughflow. The Southern Ocean produces a very large signal associated with regions of deep mixed layers and possible mode water formation, as well as in areas where the Antarctic Circumpolar Current (ACC) is directed southward. Predictability associated with this nonseasonal variance is addressed in the remainder of this paper.

### d. Predictability after trend removal

A linear trend is now removed from the SSH values, meaning that a possibly perfectly predictable component is eliminated. The is fit to an ARMA() process of the ECCOv4 state estimate from 1992 to 2011. A four-point smoother is applied to to reduce noise (two points in latitude and two in longitude), equivalent to moving from the tracer (*t* point) to the vorticity point (*f* point) in the Arakawa C grid. Using the smoother tends to make the data adhere more closely to a normal distribution, but it can exaggerate the spatial covariance of isolated outliers. The performance of the different choices is given by the rate of the PE growth over time. Figure 3a shows the order *n* of the ARMA() chosen with the AIC. The order chosen shows how many coefficients are used to optimally represent and the associated prediction error.

The simplest linear theory assumes that the underlying values are Gaussian, or close to it, an assumption tested in the ECCO estimated SSH in Fig. 4a using the Shapiro–Wilk test for normality (Shapiro and Wilk 1965). Large areas associated with features such as the ENSO signal appear to deviate from normality (i.e., *p* values close to 0). This result has implications for the predictability because these departures are important when interpreting the PE.

Figure 5 shows the error growth asymptoting to its upper bound: the full variance of the background residual time series. Large differences as a function of region appear in the PE, as well as in their asymptotic rate of growth. The expected error *e*-folding structure is shown to illustrate the rate of predictability decay, independent of magnitude.

The ARMA() expected PE growth associated with the is illustrated in Fig. 6a. The PE growth is expressed in terms of the time taken (months) before a target (1 cm) is exceeded. Large areas of the ocean show limited performance based on the PE growth over time, but certain (mostly isolated) areas have good performance, exceeding a year. In interpreting these figures, it is prudent to keep in mind the assumption of normality (Fig. 4a).

High predictability is seen in the North Pacific Ocean. Although the physics here are beyond the intended scope of this paper, these areas are associated with the Kuroshio crossing the Pacific and wave–eddy interactions stretching from Hawaii to the coast of California in a banana shape. Areas in the equatorial Pacific also show better performance of PE growth over time but are clearly nonnormal. Features associated with the Pacific–Antarctic Ridge in the Southern Ocean also produce better performance in terms of the ARMA() expected PE growth over time, along with some areas in the Irminger Sea.

### e. Predictability with apparent trends

Tests of predictability are now made with the linear trend left in the time series. Including the trend treats it as an unresolved component of a red noise process. The inclusion of the trend is expected to increase the performance of the ARMA() PE growth over time. One cannot distinguish this variability from a red noise process with existing data (Church et al. 2013; Lyu et al. 2014; Ocaña et al. 2016). To assess the impact of including the trend, analysis of the background process is repeated. Figures 3c and 3d illustrate the ARMA() order chosen by the AIC. This result is similar to Figs. 3a and 3b, but sometimes smaller values of result, as is physically plausible with a trend.

The normality of the stochastic background process with the trend is retested in Fig. 4b, illustrating that most of the ocean remains indistinguishable from having a normal distribution in . Figure 6b shows the prediction performance based on the ARMA(), phrasing the result in terms of the number of months it takes for the PE to cross the accuracy target of 1 cm. Retaining the trend adds predictability, with large areas taking over 12 months before exceeding the 1-cm threshold. Areas where the trend is important are in bands in the subtropics in both the Atlantic and Pacific, as well as large areas off the coast of Greenland, the Drake Passage, the Kuroshio path across the Pacific, the southern Indian Ocean, and a remarkably large region in the South Pacific poleward of 30°S. Generally, Fig. 6b is seen to amplify Fig. 6a, but with notable exceptions. These exceptions could be associated with the spreading of a thermosteric signal, particularly the subtropical South Pacific associated with Pacific decadal oscillation dynamics. The subtropical Atlantic stands out as another region where the trend is key, along with the South Pacific. The mechanisms, particularly in terms of linear dynamics, are not clear. For example, regions associated with bathymetric features like the Pacific–Antarctic Ridge do not stand out as intuitive regions of heat storage.

### f. Predictability with annual averages

Interannual and monthly physics are distinct. Assessing the annually averaged and from ECCOv4 separately, the assumptions of estimating the covariance and a Gaussian distribution are likely inaccurate, owing to the short, 20-value record.

Figure 7 shows the chosen ARMA() order. A four-point smoother is again used for reduced noise in the ECCOv4 for 1992–2011, where the trend is removed and the data are annually averaged. The *n* and *m* of the ARMA() now reflect the annually averaged data rather than the monthly. The *n* of the ARMA() is generally lower for the annually averaged than for monthly . However, large regions show very different patterns than in the monthly data. Examples include the Arabian Sea region and the North Pacific. However, the areas where one order dominates are generally larger than for monthly . Regions that have higher orders for yearly averaged data than for monthly averages are areas such as the Pacific sector of the Southern Ocean.

Figure 8a assesses the extent to which the 20-yr time series can be viewed as coming from a normally distributed population when using annual averages. Most of the ocean passes this test for annually averaged , but with large areas that appear noisy/nonstationary, presumably owing primarily to the presence of noise.

The associated prediction performance based on the ARMA() expected PE growth over time is shown in Fig. 9a. In most areas, the time to reach the criterion of 1-cm expected error is 1 year, but areas exceeding 3 years are seen. These regions are clustered in the Pacific, with a patch in the western equatorial Pacific, a band stretching from the central North Pacific to the coast of the United States (from Hawaii to California), and small, isolated patches elsewhere. A larger patch of good PE performance over time exists in the Pacific sector of the Southern Ocean.

As expected, the role of the linear trend is also important in the annually averaged . Figures 7a–c illustrate the associated orders of the ARMA(). As with the monthly data, the orders are generally lower. Higher orders exist in a band stretching from the central Pacific northward to the United States (from Hawaii to California), as well as a feature associated with the Pacific–Antarctic Ridge.

Figure 8b, using annually averaged , suggests that most of the ocean passes the test of normality, but again with large areas of failure. The associated prediction performance based on the ARMA() expected PE growth over time is shown in Fig. 9b. This result is similar to that in Fig. 9a, but most of the longer-term PE performance over time is found in the South Pacific. A band stretching from the central Pacific northward to the United States still exists, but large areas do not pass the test of a stable or wide-sense stationary ARMA. Patches of longer-term PE performance over time are also seen in the Indian Ocean and Drake Passage, with isolated regions elsewhere.

## 4. Discussion and conclusions

In this paper, linear univariate predictability of SSH *η* is discussed, and benchmarks for more elaborate prediction methods are presented. In general, more complex models and prediction methods (e.g., GCM projections) would need to exceed this PE performance over time to be proven worthwhile. Prediction performance is presented here in terms of the time it takes the expected PE to grow beyond 1 cm. More complex models should necessarily do better, and their use may well be justified, particularly in specific, physically identifiable regions. This approach is supported by work such as Goddard et al. (2015), where certain events in have been attributed to factors such as changes in the Atlantic meridional overturning circulation and the North Atlantic Oscillation. Existing spreads in ensemble studies, such as the CMIP5 models, would suggest that many difficulties remain (Church et al. 2013).

In the ECCOv4 state estimate, the seasonal cycle , seen in Fig. 2b, accounts for more than 80% of the variability over 20+ years in large parts of the Atlantic and Northern Pacific, as well as in the Weddell Gyre area in the Southern Ocean. In these regions, the seasonal component is likely sufficient for estimating the variation in *η* for at least a few decades.

The percentage of 20-yr variance in the stochastic, nonseasonal component , seen in Fig. 2c, is important over large areas of the ocean, particularly in the Southern Hemisphere. Parameter is treated as a weakly stationary univariate random process, with the assumption of being normally distributed. Areas where is particularly important for predictions, as seen in Fig. 6b, lie in the Southern Ocean and in extensive regions of the Pacific, as well as in the Indian Ocean. When a linear trend is removed, as in Fig. 6a, regions where the ARMA prediction still performs well are mainly clustered in the Pacific, particularly in a band extending northeast from Hawaii, as well as in small, isolated areas elsewhere. A patch of higher predictability exists in the Pacific sector of the Southern Ocean. However, the ARMA prediction, as expected, tends to perform less well when the trend is removed.

An important next step is to distinguish the physical mechanisms, whether atmospheric or oceanic, in regions where the ARMA prediction procedure does well and those where it works poorly. Stationary linear prediction methods can be of limited utility for a variety of reasons. These include dominance by unpredictable white noise (e.g., from atmospheric forcing), non-Gaussian forcing functions, strong nonlinearities in ocean physics, and nonstationary behavior from external forcing and lack of equilibrium in the ocean.

Investigating the contribution of specific mechanisms to the predictability structures of *η* is outside the scope of this study, but results presented suggest such analysis is merited. For example, work with linear models of includes that of Hughes and Williams (2010). Using altimetry alone, and without a prediction focus, they fit AR(*n*) models, choosing the orders using the Bayesian information criterion (BIC). With their higher temporal resolution, the AR(*n*) fits are not easily compared to those from the use of the ARMA on monthly and annual average values. They concluded that Rossby wave patterns are important within ±30° of the equator, while advective processes become more influential at higher latitudes, allowing features such as the Pacific–Antarctic Ridge to influence predictability. The ARMA fits to the ECCOv4 data also have distinct features associated with the Pacific–Antarctic Ridge and different behavior within 30° of the equator, suggesting the physics that give rise to these features are coherent across these time scales and lend themselves to linear modeling approaches.

Predictability from annually averaged data, as in Figs. 9a and 9b, proves generally different. With the trend included, a region in the western South Pacific Ocean has striking performance. Paradoxically, an increase in prediction performance is seen in a band extending northeast of Hawaii when the trend is removed. This is likely a stochastic artifact. Given the long time scales controlling oceanic physics, the records remain far too short to infer statistically stable results. In this context, the continuing difficulties, generally experienced in distinguishing the lowest frequencies present between a general red noise process and a true secular trend of multidecadal applicability, remain a major issue. Whether unconstrained models, such as the CMIP5 ones used by Lyu et al. (2014), have true prediction skill remains unknown. Note, too, that the univariate approach used here is readily extended to accommodate multivariate predictive models employing correlated spatial structures of many different types, which may work much more effectively in some areas.

In brief summary, the present study produces a benchmark of univariate linear skill in predicting *η*. Figure 2b illustrates that up to 50% of the ocean *η* variability is accounted for >80% using only the seasonal signal over the 20 years of the ECCO state estimate. The remaining ocean *η* variability has a significant stochastic component, with expected prediction error growth largely taking over 2 months to exceed 1 cm. Figure 6b shows that treating the linear trend as part of the continuum enhances the predictive performance, as expected. In areas in the Southern and Pacific Oceans, the stochastic continuum contains more than 95% of the total variance of *η*, with expected prediction performance of 1 cm exceeding a year in significant portions of these regions. Moving forward, extending in time the global measurements of *η* and understanding the underlying physical processes remains the key to progress in regional sea surface height prediction.

## Acknowledgments

This work was funded by the U.S. National Aeronautics and Space Administration Sea Level Change Team (contract NNX14AJ51G) and through the ECCO Consortium funding via the Jet Propulsion Laboratory.

### APPENDIX

#### Influence of Chosen Information Criteria

The choice of information criteria to determine the ARMA() significantly influences the predictive performance. The AIC [Eq. (2)] was used, as it demonstrated better performance for applications where the “true” model is likely not available (Yang 2005). The BIC has been used in studies such as Hughes and Williams (2010). The BIC is defined as

where *n* is the number of data points, *k* is the number of parameters, and is the likelihood shown in Eq. (2).

As discussed by Priestley (1981) and Yang (2005), the AIC tends to overestimate the true order, and the BIC tends to underestimate it. The AIC results showed better predictive power. The two criteria give different weights to penalizing the number of regression coefficients, with the BIC having a larger penalty term.

Figure A1 illustrates the orders chosen by the BIC for . Note the difference from Fig. 3, where higher orders are chosen using the AIC. Overall, the AIC produces estimates at higher orders than the BIC, but results for large areas are similar (e.g., the North Pacific and equatorial Atlantic). The information criteria, particularly the AIC, tend to pick out differing dynamical regions.

The predictive potential associated with the using the BIC is illustrated in Fig. A2. This shows a very similar pattern to Fig. 6a, and we highlight the differences by showing the difference between the two (AIC − BIC) in Fig. A3a. Here, we see that the AIC largely gives higher predictive performance based on the ARMA() expected prediction error growth. The difference between the AIC and BIC prediction performance is highlighted in Fig. A3b. We see slight differences, with the AIC having higher predictive performance in bands ±30° of the equator, particularly in the Pacific and Indian Oceans, but also in the Atlantic. The BIC shows somewhat higher predictive performance in the higher latitudes, particularly in the Pacific.

For annually averaged , the difference between using the AIC and BIC is smaller. Figure A3c highlights the difference between the AIC and BIC criteria, showing that overall, the AIC has higher performance based on the ARMA() expected prediction error growth. With annually averaged , the AIC and BIC also give similar prediction accuracies, with the differences highlighted in the difference plot shown in Fig. A3d. However, small areas show higher performance based on ARMA() expected prediction error growth with the BIC.

The difference in predictability with the AIC and BIC informing the choice of the order suggests that the AIC has the highest utility. The BIC underestimates the order, and the AIC is found to be more suitable. This is illustrated in detail, looking at the predictability with the apparent trend in , in Fig. A3d and is demonstrated throughout with the higher prediction performance using the AIC. The AIC is better particularly in the equatorial regions. The difference between the performance of the prediction is larger in the monthly data with the trend. For the case that is not detrended, the BIC occasionally does better than the AIC, but no obvious spatial pattern is apparent. The differences are larger for the annually averaged data, as the chosen ARMA() orders would suggest. The higher orders chosen by the AIC are not surprising, as the AIC penalizes adding parameters less strongly than the BIC. Burnham and Anderson (2002) show that the AIC can actually be derived from the BIC using a different prior in the Bayesian framework. They suggest the AIC has advantages over the BIC, first being based on information theory and, second, having a more sensible prior. Our results are in agreement with Burnham and Anderson (2002) and similar work by Yang (2005), suggesting better performance based on the ARMA() expected prediction error growth is achieved using the AIC. The AIC has been seen to have higher performance than the BIC, as is discussed further by Burnham and Anderson (2004) and Aho et al. (2014). The implications of how well the different models capture the different dynamical regimes is not discussed, as this relies on large generalizations of the prediction performance of the fitted ARMA() over vast areas.

## REFERENCES

*Proc. ECMWF Seminar Series on Numerical Methods, Recent Developments in Numerical Methods for Atmosphere and Ocean Modelling*, Reading, United Kingdom, ECMWF, 139–150.

*Proc. Second Int. Symp. on Information Theory*, Budapest, Hungary, Institute of Electrical and Electronics Engineers, 267–281.

*Time Series Analysis: Forecasting and Control.*Holden-Day, 553 pp.

*Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach.*2nd ed. Springer-Verlag, 488 pp., https://doi.org/10.1007/b97636.

*Climate Change 2013: The Physical Science Basis*, T. F. Stocker et al., Eds., Cambridge University Press, 1137–1216.

*Mar. Technol. Soc. J.*,

**25**, 13–24.

*J. Mar. Res.*,

**14**, 88–155.

*Spectral Analysis and Time Series, Volumes 1–2*. Academic Press, 890 pp.

*Biometrika*,

**52**, 591–611, https://doi.org/10.1093/biomet/52.3-4.591.

*Modern Observational Physical Oceanography: Understanding the Global Ocean.*Princeton University Press, 511 pp.

*Ocean Circulation and Climate: A 21 Century Perspective*, G. Siedler et al., Eds., International Geophysics Series, Vol. 103, Academic Press, 553–579.

## Footnotes

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

“Wide sense” stationarity is the terminology of electrical engineering; mathematicians call it “weakly” stationary, and in both cases, only the first two moments (mean and variance) are assumed time independent (Priestley 1981).