## 1. Introduction

Operational weather forecasts are skillful out to 7–10 days (Simmons and Hollingsworth 2002), and operational seasonal forecasts are skillful out to 3–8 months (depending on season and model; Barnston et al. 2012), but there is relatively limited evidence that forecasts are skillful in the intermediate 3–4-week range (Newman et al. 2003; Pegion and Sardeshmukh 2011; Wang et al. 2014). If skillful forecasts in the 3–4-week range existed, they would have significant social and economic value because many management decisions in agriculture, food security, water resources, and disaster risk are made on this time scale. However, most studies that claim predictability in the 3–4-week range identify this skill in the tropics (Li and Robertson 2015), in upper-level quantities like geopotential height fields (Pegion and Sardeshmukh 2011), or in certain global climate indices (Wang et al. 2014), whereas the skill of midlatitude land surface quantities like 2-m temperature or precipitation tend to be negligible (Li and Robertson 2015). Johnson et al. (2013) develop an empirical model for predicting North American 2-m temperature out to 4 weeks based on a linear trend and statistical relations with the Madden–Julian oscillation (MJO) and El Niño–Southern Oscillation (ENSO) and find that this empirical model has skill in certain regions and phases of the MJO. This paper will show that an operational forecast model makes skillful predictions of week-3–4 average temperature and precipitation over the contiguous United States (CONUS).

Predictability of temperature and precipitation depends very much on the spatial and temporal scale under consideration. Beyond weather time scales (e.g., 7–10 days), it is widely accepted that only large-scale spatial structures are predictable. Accordingly, we propose a novel approach to investigating subseasonal predictability using a set of spatial patterns that can be ordered by length scale. We will show that week-3–4 averages of time series corresponding to many of these spatial patterns can be skillfully predicted by a state-of-the-art prediction model. In addition, we find linear combinations of these time series that maximize predictability and show that many of these predictable components can be predicted with skill.

## 2. Data

The computations performed in this study are strongly constrained by the availability of forecasts; hence, it is helpful to discuss data issues first. We analyze retrospective forecasts, called “hindcasts,” from version 2 of the Climate Forecast System (CFSv2; Saha et al. 2014). The CFSv2 is a coupled atmosphere–ocean–land–ice model and is initialized based on analysis products for the atmosphere, ocean, land, and sea ice. The hindcasts under investigation were initialized at 0000, 0600, 1200, and 1800 UTC of each day over the 12-yr period from January 1999 to December 2010. Although these hindcasts were integrated out to 45 days, only the 2-week mean of weeks 3–4 were considered. Only one hindcast per initialization time is available, so a lagged-ensemble approach is employed whereby an average of forecasts initialized at different times but verifying at the same time were used. In general, skill increases with the size of the lagged ensemble until it saturates around 4 days (as shown in section 4). Accordingly, we consider hindcasts based on a 4-day lagged ensemble, which contains 16 members, derived from four hindcasts per day. To be clear, the 4-day lagged ensemble is computed from hindcasts that are initialized at or before time *t* and that verify from times

For verification, the 2-week mean temperature is compared to estimates from the NCEP–NCAR reanalysis (Kistler et al. 2001). Similarly, hindcasts of daily precipitation were verified relative to the Climate Prediction Center (CPC) unified gauge-based analysis (Chen et al. 2008).

Climatologies of daily temperature and precipitation are quite noisy and require significant smoothing. No significant dependence of hindcast climatology on lead time was detected, so the model climatology for each calendar day was estimated by averaging all hindcasts verifying on the same day and over all lead times. In addition, the daily climatology was fit to a second-order polynomial over the 76-day period starting from the first of each month. Various checks and visual comparisons were made to ensure that the estimated climatologies were reasonable.

MJO indices are computed from CFSv2 hindcasts in the manner of Trenary et al. (2017). Specifically, the familiar real-time multivariate MJO indices (RMM1 and RMM2) of Wheeler and Hendon (2004) were derived from an EOF analysis of observations, and then the resulting EOF patterns were projected on model variables. In contrast to the standard approach, a 120-day running mean was not subtracted from the indices; hence, our MJO indices include interannual variability.

## 3. Methods

This section describes our methods for 1) defining an orthogonal set of large-scale patterns, 2) quantifying predictability and skill, and 3) finding patterns that maximize predictability and skill.

### a. Eigenvectors of the Laplacian operator

We project temperature and precipitation fields onto the eigenvectors of the Laplacian operator over CONUS. Laplacian eigenvectors provide a convenient orthogonal basis set that can be ordered by a measure of length scale. Special cases of Laplacian eigenvectors include Fourier series and spherical harmonics, which are used routinely to decompose time series by time scale and spatial structures by length scale, respectively. Eigenvectors of the Laplacian operator over CONUS were obtained using a Green’s function method described in DelSole and Tippett (2015), which should be consulted for details (codes are available upon request). The resulting spatial patterns are orthogonal with respect to an area-weighted inner product and ordered such that the first corresponds to a spatially uniform pattern over the domain (i.e., the largest spatial scale that fits in the domain), and subsequent patterns correspond to dipoles, tripoles, quadrupoles, and so forth of decreasing length scale. These vectors depend only on the geometry of the domain and therefore are data independent, in contrast to empirical orthogonal functions (EOFs). Thus, a single set of spatial patterns are used to analyze different variables and seasons.

Laplacian eigenvectors 2–10 over CONUS are shown in Fig. 1. The first eigenvector is not shown because it equals a constant over the whole domain. The second and third eigenvectors measure the east–west and north–south gradients, respectively. The next two eigenvectors correspond to a tripole and quadrupole, and so on. The percent variance of observed 2-week means explained by the first 20 Laplacian eigenvectors is shown in Fig. 2; similar percentages are found in the model (not shown). As expected, the explained variance tends to decrease with decreasing spatial scale.

### b. Measure of predictability

*in a model*is predictable by that model. As such, predictability is an inherent property of a model that can be measured independently of observations. The standard approach to measuring predictability is to consider an ensemble of predictions initialized at equally likely states of the system. Although the CFSv2 reforecast dataset does not have multiple ensemble members for the same initial condition day (i.e., a “burst” ensemble), an ensemble can be approximated by grouping hindcasts initialized 6 h apart and that verify on the same day. The resulting ensemble often is called a lagged ensemble (Hoffman and Kalnay 1983). Let

*t*, where

*τ*is the lead time; time is measured in units of days. If

*E*is the ensemble size, then the mean of the lagged ensemble is defined as follows:where

If the noise perturbations are independent and identically distributed Gaussian random variables, then *F* follows an F distribution with *F* obtained from a permutation of (independent) samples is equally likely. Accordingly, we construct a permuted ensemble by drawing forecasts from random years. Importantly, the entire sequence of forecasts within a year are drawn, ensuring that the serial correlation across consecutive days is preserved. This sampling is tantamount to randomly permuting (or “shuffling”) the years assigned to the forecasts. The statistic *F* is computed for the permuted ensemble, and this procedure is repeated many times (i.e., 10 000 times). The rank of the *F* obtained from the unpermuted ensemble is evaluated relative to the values of *F* for the permuted ensembles. Under the hypothesis of exchangeability, the rank is uniformly distributed. The actual lagged ensemble is said to be predictable if the observed value of *F* exceeds the 95% percentile of the *F* values obtained from permuted samples.

### c. Measure of skill

Skill refers to the degree to which a forecast predicts the observed variable. Two standard measures of skill are mean square error *ρ*. Significance tests for skill based on mean square error have been discussed by DelSole and Tippett (2014), while those based on correlation are standard. Unfortunately, these tests are not appropriate for forecasts initialized at daily intervals because of the serial correlation mentioned above. We again apply a permutation method in which the year labels for the observations are randomly permuted. By selecting the entire sequence of observations within a year, the serial correlation between observations on daily time scales is preserved. After shuffling the year labels for the observations, the correlation coefficient between forecasts and shuffled observations can be computed. This procedure is repeated many times (i.e., 10 000 times) to build up an empirical distribution for the correlation under the null hypothesis of independence. The 95th percentile of the resulting samples then defines the 5% significance threshold value for the correlation coefficient.

### d. Predictable component analysis

*F*in (2). This procedure is formally equivalent to predictable component analysis (see DelSole and Tippett 2007 for a review). We briefly review this procedure to clarify its application in our particular situation. Let the weights of the linear combination be

*m*th Laplacian eigenvector. If the weights are collected into the vector

*λ*is the value of

*F*for the linear combination defined by the weights

*F*, the second maximizes

*F*subject to being uncorrelated with the first eigenvector (in a sense defined shortly), and so on. Moreover, the eigenvalues give the corresponding maximized

*F*values. These solutions define the predictable components, the first of which will be called the “most predictable component.” Each eigenvector can be substituted in (5) to define the time series associated with that component. Because covariance matrices are symmetric, the resulting time series for different components are uncorrelated. The spatial structure of the predictable component is obtained from regression. The regression coefficient between the predictable component time series in (5) and the

*m*th Laplacian eigenvector isThe Laplacian eigenvectors are then summed using weights specified in the vector

*m*th Laplacian eigenvector even if that vector was not included in the optimization procedure discussed above (e.g., when

Note that the above procedure yields a complete set of predictable components for each lead time *τ*. This lead time dependence is sensible because predictability is characterized by different patterns at different time scales. An alternative approach is to characterize predictability over all time scales, which can be done by maximizing a measure of predictability integrated over all lead times. This approach is called average predictability time (APT; DelSole and Tippett 2009) analysis. APT analysis is not used here because we want to demonstrate the existence of predictability specifically for the week-3–4 forecasts. Although APT analysis can find predictable components on subseasonal time scales, testing the hypothesis of predictability on subseasonal time scales is not straightforward because the integral includes the short weather lead times that are predictable. By applying predictable component analysis for only one lead time, subseasonal predictability can be tested in isolation from predictability on other time scales.

The sampling distribution of the maximized *F* values (i.e., the eigenvalues) under the null hypothesis of no predictability can be estimated using a permutation technique similar to that described above, in which the label for years assigned to forecasts are randomly permuted. The only extra step is that instead of drawing a single variable, an entire *M*-dimensional vector is drawn, corresponding to the amplitudes of the *M* Laplacian eigenvectors for the relevant forecast. Again, an essential element of the technique is to draw the entire sequence of forecasts within a year for the *M* eigenvectors, which preserves the serial correlation on daily time scales. After generating a mock ensemble forecast dataset comprising *T* time steps and *E* ensemble members, the covariance matrices are computed and the generalized eigenvalue problem in (11) is solved. This process is repeated many times (i.e., 10 000 times) to build up an empirical distribution for the eigenvalues.

## 4. Results

The correlation skill of 4-day lagged ensembles of temperature and precipitation of week-3–4 hindcasts over CONUS during January and July is shown in Fig. 3. Statistically insignificant values at the 5% level (according to the permutation test) are masked out. The figure shows that winter temperature and precipitation and summer temperature are skillfully predicted by the CFSv2 over a third to a half of the area of CONUS. Summer precipitation shows effectively no skill (e.g., the number of positive and negative correlations are approximately equal). Although some negative correlations are statistically significant in a local sense, we do not believe them to be field significant.

Our goal is to diagnose the predictability and skill shown in Fig. 3 in terms of large-scale spatial structures. The predictability and skill of individual Laplacian eigenvectors of January temperature as a function of ensemble size is shown in Fig. 4. Qualitatively similar results are obtained for other variables and time periods. Not surprisingly, predictability decreases with ensemble size because each additional member is initialized farther from the target and therefore contains more noise. The signal-to-noise ratio (SNR) decreases by a factor of 2–3 from a 12-h to a 4-day lagged ensemble. In contrast, the skill tends to increase with ensemble size, provided the skill is sufficiently large.

The predictability of week-3–4 temperature and precipitation CFSv2 hindcasts projected onto individual Laplacian eigenvectors is shown in Fig. 5. Predictability is quantified by the SNR *F* is defined in (2). We use

Although the above results demonstrate week-3–4 predictability, this result does not necessarily imply that the associated hindcasts are skillful (i.e., that the hindcasts can predict observed anomalies with skill). In most cases, mean square error shows no significant skill. Accordingly, we consider skill based on correlation, which is invariant to linear transformations of the forecast and thus does not penalize biases or errors in forecast amplitude. The skills of the hindcasts based on a 4-day lagged ensemble are shown in Fig. 6. The figure shows that many spatial structures of winter temperature and precipitation and summer temperature can be predicted with skill by

Although no individual Laplacian eigenvector has significant skill for summer precipitation, this result does not necessarily imply that summer precipitation cannot be predicted with skill. In particular, it is possible that some linear combination of eigenvectors can be predicted with skill. To test this possibility, we apply predictable component analysis to find linear combinations of Laplacian eigenvectors that maximize predictability. A critical step in this procedure is selecting the number of eigenvectors. This step is tantamount to a model selection problem and is one of the most challenging problems in statistics (Fukunaga 1990; Hastie et al. 2003; Taylor and Tibshirani 2015). Fortunately, we have found that our results are not sensitive to the precise number of eigenvectors in the range of

The maximized signal-to-noise ratios for CFSv2 week-3–4 hindcasts are shown in Fig. 7. As above, we use ensemble size

The regression map between the most predictable component time series and relevant field is shown in Fig. 8. The winter temperature and precipitation patterns are similar to the observed ENSO teleconnection patterns derived from monthly means (Yang and DelSole 2012), suggesting that CFSv2 week-3–4 predictability arises from El Niño/La Niña events. The summer temperature pattern also bears some resemblance to model–ENSO teleconnection patterns (e.g., compare to Fig. 7 of Wang et al. 2012), but the correspondence to the summer precipitation pattern is weak.

The skills of the predictable components are shown in Fig. 9. The figure shows that the most predictable components have skill at weeks 3–4 for winter temperature and precipitation and summer temperature. In contrast, the most predictable component of summer precipitation has no significant skill (it is too small to appear in the figure). About two to three predictable components of winter temperature and precipitation and summer temperature have skill. Confidence intervals for the correlation skills overlap (not shown), indicating that the correlations cannot be distinguished. It follows that the ranking according to skill cannot be determined based on the available data. Thus, the fact that the most predictable component is not the most skillful is not necessarily meaningful.

To gain insight into the nature of the predictability and skill, we show in Fig. 10 time series of the most predictable components. These time series confirm that secular trends are small. In addition, for the components with the most skill, the time series exhibit relatively large jumps between years but relatively small fluctuations within a year. This feature suggests that the predictability comes from predicting the overall mean during the month rather than predicting variations within the month. To test this possibility, the forecasts within a month were decomposed into the sum of two terms, a monthly mean plus an anomaly relative to the monthly mean, and then the correlation skill of these two components were computed separately. The result, shown in Fig. 11, shows that skill associated with the monthly mean often dominates. Moreover, skill in predicting the anomalies rarely exceeds 0.35, whereas skill in predicting monthly means frequently exceeds 0.35.

Given that predictability appears to be dominated by the monthly mean component, it is reasonable to explore relations with other variables by computing correlations between monthly mean quantities. The simultaneous squared correlation between each predictable component in CFSv2 and the Niño-3.4 index is shown in Fig. 12a. We call this measure *R*^{2} because it corresponds to the coefficient of determination of a regression model for predicting the component based on Niño-3.4. Because the Niño-3.4 index is persistent on weekly time scales, its value in the model is very close to its initial value, which in turn is close to the observed value. Thus, these correlations measure the ENSO teleconnections in the model. We see that the most predictable components of winter temperature and precipitation in CFSv2 are highly correlated with ENSO.

In addition to ENSO, the MJO often is cited as a phenomenon that may give rise to subseasonal predictability (Vitart 2014). To explore this, we compute the coefficient of determination between the predictable component and the RMM1 and RMM2 indices defined in Wheeler and Hendon (2004). These indices were computed from daily CFSv2 fields, then averaged over week-3–4 hindcasts, and then averaged over the month so that a correlation could be computed using only monthly values. The coefficient of determination

*X*and

*Y*, after

*Z*has been removed, iswhere

*Y*based on

*X*, and

*Y*based on

*X*and

*Z*. The constant term is understood to be included in all regression models. The quantity

*Y*explained by

*Z*after the linear relation with

*X*has been removed from all variables. In the case of ENSO after MJO has been removed (Fig. 12c), only the leading predictable component of winter precipitation shows a significant relation with ENSO. In contrast, the leading component of winter temperature has a significant correlation with ENSO (see Fig. 12a), but not after the MJO has been removed (which has an

*R*

^{2}of about 0.4, just below the significance threshold; see Fig. 12c). This result does not necessarily mean the leading component of winter temperature is unrelated to ENSO, but rather that a correlation could exist but the sample size (i.e., 12 yr) may be too small to detect it. In the case of MJO after ENSO has been removed, shown in Fig. 12d, the third and fourth predictable components of winter precipitation show a significant relation with MJO.

For completeness, we note a similar analysis was performed using the North Atlantic Oscillation (NAO) index and MJO indices. We find that the correlations with the predictable components are marginally significant but that these correlations become insignificant when MJO has been regressed out (not shown).

## 5. Conclusions

This paper shows that an operational forecast model skillfully predicts week-3–4 temperature and precipitation over the contiguous United States. This skill can be identified at the gridpoint level (about 1° × 1°) and by projecting data onto an orthogonal set of large-scale CONUS patterns (as derived from the eigenvectors of the Laplacian operator). An important aspect of this identification is a permutation significance test that accounts for serial correlation on daily time scales. Skill is detected based on correlation measures but not based on mean square error measures, indicating that an amplitude correction is necessary for skill. Our results differ from those of Li and Robertson (2015) perhaps because we analyzed weeks

Winter temperature and precipitation tend to have more predictability than their summer counterparts, with summer precipitation having the weakest predictability of all quantities considered in this paper. In addition, the most predictable components were identified by finding linear combinations of Laplacian eigenvectors that maximize signal-to-noise ratio. The results of this maximization procedure clarify the spatial structure of the predictable variability. The most predictable component during winter effectively represents the model’s ENSO teleconnection pattern. Some predictable components of winter precipitation are associated with MJO activity. The skill of the predictable components is dominated by the skill in predicting the mean value during a month rather than from predicting anomalies relative to the monthly mean. By explicitly identifying patterns in an operational forecast model that are predictable on subseasonal time scales and demonstrating that these patterns can be predicted with skill in observations, the above results provide a scientific basis for week-3–4 predictions.

We thank two reviewers and the editor Joseph Barsugli for helpful comments that led to improved clarity in the final paper. This research was supported primarily by the National Oceanic and Atmospheric Administration, under the Climate Test Bed program (NA10OAR4310264) and the MAPP program (NA14OAR4310184). Additional support was provided by the National Science Foundation (AGS-1338427), National Aeronautics and Space Administration (NNX14AM19G), and the National Oceanic and Atmospheric Administration (NA14OAR4310160). The views expressed herein are those of the authors and do not necessarily reflect the views of these agencies.

## REFERENCES

Barnston, A., M. K. Tippett, M. L. L’Heureux, S. Li, and D. G. DeWitt, 2012: Skill of real-time seasonal ENSO model predictions during 2002–11: Is our capability increasing?

,*Bull. Amer. Meteor. Soc.***93**(Suppl.), doi:10.1175/BAMS-D-11-00111.2.Chen, M., W. Shi, P. Xie, V. B. S. Silva, V. E. Kousky, R. Wayne Higgins, and J. E. Janowiak, 2008: Assessing objective techniques for gauge-based analyses of global daily precipitation.

,*J. Geophys. Res.***113**, D04110, doi:10.1029/2007JD009132.DelSole, T., and M. K. Tippett, 2007: Predictability: Recent insights from information theory.

,*Rev. Geophys.***45**, RG4002, doi:10.1029/2006RG000202.DelSole, T., and M. K. Tippett, 2009: Average predictability time: Part II: Seamless diagnosis of predictability on multiple time scales.

,*J. Atmos. Sci.***66**, 1188–1204, doi:10.1175/2008JAS2869.1.DelSole, T., and M. K. Tippett, 2014: Comparing forecast skill.

,*Mon. Wea. Rev.***142**, 4658–4678, doi:10.1175/MWR-D-14-00045.1.DelSole, T., and M. K. Tippett, 2015: Laplacian eigenfunctions for climate analysis.

,*J. Climate***28**, 7420–7436, doi:10.1175/JCLI-D-15-0049.1.Fukunaga, K., 1990:

*An Introduction to Statistical Pattern Recognition*. 2nd ed. Academic Press, 591 pp.Hastie, T., R. Tibshirani, and J. H. Friedman, 2003:

*Elements of Statistical Learning*. Corrected ed. Springer, 552 pp.Hoffman, R. N., and E. Kalnay, 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting.

,*Tellus***35A**, 100–118, doi:10.1111/j.1600-0870.1983.tb00189.x.Johnson, N. C., D. C. Collins, S. B. Feldstein, M. L. L’Heureux, and E. E. Riddle, 2013: Skillful wintertime North American temperature forecasts out to 4 weeks based on the state of ENSO and the MJO.

,*Wea. Forecasting***29**, 23–38, doi:10.1175/WAF-D-13-00102.1.Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD-ROM and documentation.

,*Bull. Amer. Meteor. Soc.***82**, 247–267, doi:10.1175/1520-0477(2001)082<0247:TNNYRM>2.3.CO;2.Li, S., and A. W. Robertson, 2015: Evaluation of submonthly precipitation forecast skill from global ensemble prediction systems.

,*Mon. Wea. Rev.***143**, 2871–2889, doi:10.1175/MWR-D-14-00277.1.Newman, M., P. D. Sardeshmukh, C. R. Winkler, and J. S. Whitaker, 2003: A study of subseasonal predictability.

,*Mon. Wea. Rev.***131**, 1715–1732, doi:10.1175//2558.1.Pegion, K., and P. D. Sardeshmukh, 2011: Prospects for improving subseasonal predictions.

,*Mon. Wea. Rev.***139**, 3648–3666, doi:10.1175/MWR-D-11-00004.1.Rowell, D. P., 1998: Assessing potential seasonal predictability with an ensemble of multidecadal GCM simulations.

,*J. Climate***11**, 109–120, doi:10.1175/1520-0442(1998)011<0109:APSPWA>2.0.CO;2.Saha, S., and Coauthors, 2014: The NCEP Climate Forecast System version 2.

,*J. Climate***27**, 2185–2208, doi:10.1175/JCLI-D-12-00823.1.Simmons, A. J., and A. Hollingsworth, 2002: Some aspects of the improvement in skill of numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***128**, 647–677, doi:10.1256/003590002321042135.Taylor, J., and R. J. Tibshirani, 2015: Statistical learning and selective inference.

,*Proc. Natl. Acad. Sci. USA***112**, 7629–7634, doi:10.1073/pnas.1507583112.Trenary, L., T. DelSole, M. K. Tippett, and K. Pegion, 2017: A new method for determining the optimal lagged ensemble.

, doi:10.1002/2016MS000838, in press.*J. Adv. Model. Earth Syst.*Vitart, F., 2014: Evolution of ECMWF sub-seasonal forecast skill scores.

,*Quart. J. Roy. Meteor. Soc.***140**, 1889–1899, doi:10.1002/qj.2256.Wang, H., A. Kumar, W. Wang, and B. Jha, 2012: U.S. summer precipitation and temperature patterns following the peak phase of El Niño.

,*J. Climate***25**, 7204–7215, doi:10.1175/JCLI-D-11-00660.1.Wang, W., M.-P. Hung, S. J. Weaver, A. Kumar, and X. Fu, 2014: MJO prediction in the NCEP Climate Forecast System version 2.

,*Climate Dyn.***42**, 2509–2520, doi:10.1007/s00382-013-1806-9.Wheeler, M. C., and H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction.

,*Mon. Wea. Rev.***132**, 1917–1932, doi:10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.Yang, X., and T. DelSole, 2012: Systematic comparison of ENSO teleconnection patterns between models and observations.

,*J. Climate***25**, 425–446, doi:10.1175/JCLI-D-11-00175.1.