Recent studies, based largely on analyses of reanalysis datasets, suggest that the Northern Hemisphere winter storm track activity has increased significantly during the second half of the twentieth century. In this study, this increasing trend, in terms of filtered mean sea level pressure (MSLP) variance statistics, is assessed using surface ship observations and a statistical storm track model.
MSLP observations made by ships, archived as part of the reanalysis project conducted by the National Centers for Environmental Prediction–National Center for Atmospheric Research, have been analyzed. Observational errors are estimated by comparing reports of nearly collocated observations. Consistent with previous studies, the observational errors of ship pressure observations are found to be very large during the late 1960s and early 1970s. Without correcting for observational errors, the storm track activity over the Pacific, computed based on ship observations, is found to be decreasing with time, while the upward trend in the Atlantic is much smaller than that found in the reanalysis data. Even after corrections have been made to account for secular changes in observational error statistics, the ship-based trend in the Pacific is still found to be much smaller than that found in the reanalysis, while over the Atlantic, the corrected ship-based trend is consistent with that found in the reanalysis.
The robustness of the results is tested by application of data trimming based on the reanalysis products. Ship observations that are different from the reanalysis by more than a prescribed limit are removed before the statistics are computed. As the prescribed limit is reduced from 30 to 2.5 hPa, the ship-based storm track activity becomes increasingly consistent with that based on the reanalysis. However, even when the smallest limit is used, the trends computed from the ship observations are still smaller than those computed from the reanalysis, strongly suggesting that the trends in the reanalysis are biased high. Nevertheless, the results suggest that decadal-scale variability of the Atlantic storm track activity is not very sensitive to the trimming limit, while results for the Pacific storm track are not as robust.
As an independent corroboration of the ship observation results, a statistical model is used to test whether the storm track trend found in the reanalysis is dynamically consistent with observed mean flow change. Five hundred winters of GCM simulations are used to construct a linear model based on canonical correlation analysis (CCA), using monthly mean distribution of MSLP anomalies as a predictor to hindcast monthly mean MSLP variance. The Atlantic storm track in the CCA model hindcast is well correlated with the storm track in the reanalysis on both interannual and decadal time scales, with the hindcast trend being 82% of that found in the reanalysis. Over the Pacific, the CCA hindcast does not perform as well, and the hindcast trend is only 32% of that found in the reanalysis.
The results of this study suggest that the actual trend in Pacific storm track activity is probably only about 20%–60% of that found in the reanalysis, while over the Atlantic, the actual trend is likely to be about 70%–80% of that found in the reanalysis. Two new basinwide storm track indices, which should contain less bias in the secular trends, have been defined based mainly on ship observations.
The midlatitude storm tracks are locations where baroclinic waves and their associated cyclones/anticyclones are most prevalent. These waves/storms transport large amounts of heat, momentum, and moisture poleward as part of the maintenance of the global circulation, at the same time bringing strong winds and significant weather to the surface observers. Hence there are two popular ways to define the storm tracks. The first takes advantage of the weather-generating potential of surface cyclones and define storm tracks based on cyclone statistics (e.g., Petterssen 1956; Whitaker and Horn 1982). Alternatively, one can define the storm tracks based on eddy variance/covariance statistics to directly account for the eddy transport by these waves (e.g., Blackmon 1976; Blackmon et al. 1977). While these two definitions are by no means exactly congruent (Wallace et al. 1988; Paciorek et al. 2002), both pick out the Pacific and Atlantic basins as the most active regions during the Northern Hemisphere (NH) cool season.
A number of recent studies, mostly based on analyses of reanalysis data produced by the National Centers for Environmental Prediction–National Center for Atmospheric Research [NCEP–NCAR 40-Year Reanalysis (referred to herein as NNR; see Kalnay et al. 1996; Kistler et al. 2001)], as well as the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40; see Uppala et al. 2005), have suggested that the NH winter storm track activity has increased significantly over the second half of the twentieth century. For example, studies such as Serreze et al. (1997), Graham and Diaz (2001), Geng and Sugi (2001), Gulev et al. (2001), Wang et al. (2006), and others suggest that intense cyclone activity over the North Pacific and North Atlantic have exhibited significant increasing trends along with decadal-time-scale oscillations during the past 40 yr. Meanwhile, Chang and Fu (2002) and Hu et al. (2004) have shown that the NH winter storm track activity, when measured in terms of eddy variance and covariance statistics, has also increased significantly.
As pointed out by Graham and Diaz (2001), there are other observations that indirectly support an increase in the NH winter storm track activity. Graham and Diaz (2001) compared observed wave data from buoys off the U.S. West Coast to wave hindcast data and suggested that these observations are consistent with the wind increase seen in the NNR. These results are confirmed by more recent wave hindcast studies by Wang and Swail (2001, 2002) and Caires and Sterl (2005a, b). Based on examination of visually observed wave height from ship reports, Gulev and Grigorieva (2004, 2006) found positive trends in significant wave height over both the Pacific and Atlantic over the last 50 yr.
In Fig. 1, storm track activity, as indicated by the variance in mean sea level pressure (MSLP) filtered using a 24-h difference filter,1 computed using NNR, is shown. In Fig. 1a, the average over the 14 winters [December–January–February (DJF)] of 1957/58–1970/71, is shown, while the average over the DJFs of 1985/86–1998/99 is shown in Fig. 1b. Their difference (Fig. 1b minus Fig. 1a) is shown in Fig. 1c. It is clear that, according to the NNR, both the Atlantic and Pacific storm tracks are significantly more active during the second period.
The year-to-year variations in basin-averaged 24-h filtered MSLP variance are shown in Fig. 2. In Fig. 2a, Pacific storm track activity is indicated by averaging over the area 30°–55°N, 140°E–130°W. The results from the two reanalysis datasets show a high degree of consistency with each other, especially for surface analyses in the NH.2 The data show significant year-to-year variability, as well as a clear upward trend. This trend and the decadal modulations can be seen more clearly by examining the five-winter running means displayed in Fig. 2b.
The year-to-year variability in the Atlantic storm track activity, as indicated by the 24-h filtered MSLP variance averaged over the region 35°–65°N, 60°W–0°, is shown in Fig. 2c. Again the two reanalyses agree very well. The five-winter running means are shown in Fig. 2d. Again there is significant decadal modulation, as well as an overall upward trend. The results shown in Fig. 2 are qualitatively consistent with those computed based on cyclone count statistics in studies such as Graham and Diaz (2001) and Geng and Sugi (2001).
While NH storm track activity displays a clear upward trend in the reanalysis datasets, several recent studies have pointed out that there could be biases in climate trends determined based on reanalysis data. For example, Harnik and Chang (2003) computed 24-h filtered meridional velocity variance at 300 hPa using rawinsonde data and found that the trend seen in the rawinsonde data is smaller than that found in NNR. However, most of the rawinsonde data analyzed by Harnik and Chang are located over land, away from the peaks of the storm tracks. Bengtsson et al. (2004) computed the global total kinetic energy from ERA-40 data and found an increasing trend, but they suggested that the trend is due to huge changes in the global observing system that occurred in 1979 (mainly the assimilation of satellite data after 1979) and that after the effects of such changes are corrected for, no significant change in global kinetic energy is apparent. However, the results of Bengtsson et al. showed that the impact of satellite data is much larger in the SH than in the NH; hence, its implications for the NH storm track trend are not entirely clear.
Since storm track activity in the reanalysis datasets is likely to be influenced by changes in the data input, it would be useful to use a single type of observational data to validate the upward trend in storm track activity. Near the peaks of the storm tracks, the main observational dataset that has existed over much of the period covered by the reanalyses is ship observations. Zorita et al. (1992) have examined variability of the standard deviation of MSLP taken from the monthly summary statistics of the International Comprehensive Ocean–Atmosphere Data Set (ICOADS; see Woodruff et al. 1987; Worley et al. 2005). Chang (2005, hereafter C05) examined MSLP observations over the central North Pacific contained in the ICOADS and found that, because of changes in the frequency and quality of observations, there may be time-dependent biases in the climate statistics computed directly from ship observations. C05 made preliminary attempts to correct for the biases and concluded that the secular trends in sextile and unfiltered variance statistics of MSLP computed from the ship observations over the Pacific are likely to be significantly smaller than similar trends derived from the NNR.
As shown in Figs. 1 and 2, apart from the trend, there ais substantial interannual and interdecadal variability in storm track activity based on the reanalysis datasets. What could be the cause behind such variability? Previous studies (e.g., Lau 1988; Cai and Mak 1990) have shown that storm track variability is strongly tied to those of the mean flow. A substantial part (but not all) of storm track variability can be linked to prominent modes of atmospheric low-frequency variability, such as the North Atlantic Oscillation (NAO) or ENSO (e.g., Rogers 1997; Chang and Fu 2002). However, in most instances, the direction of causality is not entirely clear, in that there exists a two-way feedback: changes in storm track activity (i.e., eddy transports) can drive low-frequency flow anomalies, while changes in the low-frequency flow can act to organize storm track anomalies (e.g., Branstator 1992, 1995; see also Chang et al. 2002).
Regardless of the direction of causality, one can make use of this “symbiotic” relationship to test whether the storm track increase found in the reanalysis data is dynamically consistent with observed changes in the low-frequency flow. Chang and Fu (2003) constructed a linear storm track model based on canonical correlation analysis (CCA; e.g., Bretherton et al. 1992), using analyzed monthly/seasonal mean flow anomalies to hindcast monthly/seasonal mean storm track anomalies. Based on that model, Chang and Fu (2003) concluded that the upward trend in storm track activity in the Pacific is inconsistent with the trend in the low-frequency flow. However, since the model of Chang and Fu (2003) was constructed based on only 60 months of reanalysis data, the variability predicted by that model appears weak, over both the Pacific and the Atlantic. Compo and Sardeshmukh (2004) constructed a similar model using a much longer time series of GCM data and found that their model performs better than that of Chang and Fu (2003). In this study, we will reexamine Chang and Fu’s results using a CCA model constructed based on 1500 months of GCM data.
The main objective of this paper is to assess the upward trend in storm track activity found in the reanalysis data using two methods: 1) compare with variance statistics derived directly from ship observations and 2) use a statistical storm track model to assess whether the storm track trend is consistent with trends in the low-frequency flow. To indicate storm track activity, the 24-h filtered MSLP variance statistics will be used. While this is only one of many indicators of storm track activity, and certainly does not capture all aspects of storm track variability, it is used here because it can be easily computed directly from observations (see, e.g., Harnik and Chang 2003), while alternative measures such as cyclone count statistics cannot be readily derived from ship observations. In addition, Figs. 1 and 2 suggest that there is substantial basinwide interdecadal variability, so only basin-averaged variance statistics will be examined here. Regional variability and trends will be examined elsewhere.
2. Ship observations
a. Spatial and temporal coverage
The main datasets used in this study are the NNR 2.5° × 2.5° reanalysis grids and ship observations from the binary universal format representation (BUFR) observational dataset archived as part of the NNR project (see Kalnay et al. 1996; Kistler et al. 2001). The NNR observational dataset is used for this study (instead of ICOADS) because it not only contains the observations but also the analysis and first-guess fields interpolated onto the location and time of the observation, as well as quality control (QC) flags stating whether the observation is accepted by the reanalysis. The ICOADS dataset will also be examined for comparison.
In this study, the observations will be binned onto a 5° × 5° latitude–longitude grid. Apart from the observations, ICOADS contains two sets of monthly mean summary statistics [Monthly Summary Trimmed Groups (MSTGs)], computed on 1° × 1° and 2° × 2° latitude–longitude grids, respectively. However, C05 showed that even for the 2° × 2° grid, there are on average less than 20 observations per month over the central North Pacific during the 1950s, early 1960s, and mid to late 1990s. With such a small number of observations, reliable statistics cannot be derived (see, e.g., Kidson and Trenberth 1988). Hence a coarser grid than that used by ICOADS has to be used here.
The amount of ship observations available that have valid MSLP reports is shown in Fig. 3. Shown in Fig. 3a is the percentage of 6-hourly synoptic reporting hours (124 for December and January, and 112 or 116 for February) that has one or more ship observations within a 6-h period centered on the synoptic hour, during the 14 winters (DJF) of 1957/58–1970/71. Over the Atlantic, most grid boxes have observations at least 80% of the time. The situation is not as good in the Pacific, but nearly all grid boxes have observations at least 30% of the time, and most have observations more than 50% of the time. The percentage coverage for the winters of 1985/86–1998/99 is shown in Fig. 3b. The coverage is not as good during the latter period but is still very good over much of the Atlantic, while over the Pacific, the majority of grid boxes have observations at least 40% of the time.
The average coverage for each winter over the central North Pacific (30°–55°N, 150°E–150°W) is shown by the solid line on Fig. 3c, while the coverage for the Atlantic (35°–60°N, 60°W–0°) is shown by the dashed line. Over the Pacific, the coverage is greater than 60% between the mid 1960s and early 1990s but drops to just over 40% in the late 1990s. Over the Atlantic, the coverage is greater than 70%, except for one winter in the 1990s. The average number of observations per grid box per synoptic time is shown in Fig. 3d. Over the Pacific, there is an average of one to two observations during much of the period, while over the Atlantic, the average number of observations is more than two for the entire period, and more than three most of the time.
b. Estimation of the true variance
As discussed in the introduction, the measure of storm track activity that is used in this study is the 24-h filtered MSLP variance. The variance computed directly from observations not only reflects the true variance but is affected by observational and sampling errors, as follows:
where VARO is the variance directly computed from observations, VART is the true variance, VARE is the observational error variance, VARSA the error variance introduced due to sampling (or missing data), and VARSP the error variance due to random distribution of the locations of the ship observations over the 5° × 5° grid boxes. To estimate VART, not only is VARO needed but also estimates of the last three terms on the rhs of (1).
As discussed in Kent and Berry (2005), there are a number of sources of random observational errors, including barometer corrections, rapid changes in temperature, errors in reporting and transmission, and errors in ship position. Kent and Berry (2005) used a technique called the semivariogram method (see Lindau 2003; Kent et al. 1999) to estimate errors in ship observations. Basically, the observational error is estimated by comparing pairs of observations that are nearly collocated in space and time. In this study, the time separation between the pairs must be less than 3 h. For spatial separation, the distance between the pairs is binned into 10-km bins, out to a maximum separation of 100 km. The mean square difference in MSLP between the observation pairs is first computed separately for each bin, and then a linear regression is performed, with the intercept (at separation 0 km) taken as the estimated observational error squared. Errors estimated based on the NNR dataset are shown in Fig. 4.
In Fig. 4, it can be seen that over much of the period, the estimated observational error (dashed line) is significantly larger than that assumed by the NNR (1.6 hPa, demarked by the horizontal dashed line). The errors are especially large during the late 1960s and early 1970s. At present, it is not clear why the observational errors are so large during that period, except that there are suggestions that this may be related to either data transmission or archival errors during the early days of the Global Telecommunications System (E.C. Kent 2004, personal communication). The results shown in Fig. 4 are consistent with those of Kent and Berry (2005).
Application of the NNR QC reduces the observational error (solid line) only slightly before 1980, with more significant error reduction since 1980 and especially after 1993, probably because more metadata are available about the more recent observations, which leads to the rejection of a larger proportion of questionable observations by NNR.
The ICOADS dataset also contains QC flags in addition to the observations. The observations contained in the two datasets have been compared, and during most months more than 90% of the data contained in the NNR dataset can be matched with an observation in ICOADS. The observational error estimated from the observations that passes both the NNR and ICOADS QC [the ICOADS QC used here is the one applied to generate ICOADS standard statistics (see Woodruff et al. 1987)] is also shown in Fig. 4. Compared with the error estimated using just the NNR QC alone, application of the ICOADS QC in fact increases the estimated error during some years. This is most likely because ICOADS QC identifies more duplicates than NNR QC does. Nevertheless, Fig. 4 shows that even after application of QC, the observation error remains too large to be acceptable, especially during the late 1960s and early 1970s.
To examine the observational error in more detail, the distribution of the logarithm (base e) of the number of pairs of observations within 100 km of each other, within the latitude range 30°–60°N, as a function of absolute magnitude of observed pressure difference between the observation pair (Δp), is shown in Fig. 5. Note that only observations that passed NNR QC are retained. The distribution for the winter of 1971/72, when the estimated observation error is large, is shown in Fig. 5a, while that for 1991/92, when the estimated error is much lower, is shown in Fig. 5b. Also shown in each plot are two Gaussian distributions, one with the standard deviation equal to the observed standard deviation (dashed line) and the other being the fit to the observed distribution for small Δp (i.e., a Gaussian fit to the number distribution for Δp ≤ 5 hPa).
Examining Fig. 5, it is clear that the distribution of the number of observation pairs as a function of Δp is not Gaussian. The part of the distribution for Δp ≤ 5 hPa is well fitted by a Gaussian, but for Δp > 10 hPa, both distributions resemble an exponential instead of a Gaussian. The problem is particularly acute for 1971/72, when more than 7% of the pairs of observations have Δp > 10 hPa. For 1991/92, only 1.1% have Δp > 10 hPa. The long exponential tails for the two distributions account for the large standard deviation.
Could the exponential tails be due to actual pressure differences between the observations, since they are not exactly collocated? To assess this possibility, the distribution for 1971/72 is recomputed with the maximum distance reduced to 50 km (and also 25 km). One would expect that with reduced distances between the pairs, the percentage of pairs with large Δp should decrease if the pressure difference is real. However, even with the smaller separation, similar percentages of large pressure differences are obtained.
As discussed above, the NNR observational archive not only contains the observations but also the analysis interpolated to the observation time and location. The distribution of Δp in the analysis for the pairs of observations shown in Fig. 5a is shown in Fig. 6 (crosses). There are some pairs with Δp of up to 25 hPa, and the distribution for number of pairs with Δp > 10 hPa does resemble an exponential tail, but the exponential tail does not extend beyond Δp of 30 hPa. The dashed line shown in Fig. 6 is the convolution of the Gaussian fit shown as the solid line in Fig. 5a, with the distribution of number of pairs as a function of Δp in the analysis (crosses in Fig. 6), and represents the expected distribution of number of pairs as a function of Δp if the real pressure difference is given by that obtained from the analysis, while the observational error is given by the solid curve shown in Fig. 5a. Comparisons of Fig. 6 to Fig. 5a, as well as the similarities in the distribution of pressure differences for ships separated by different distance ranges, discussed in the preceding paragraph, suggest that it is unlikely that the long exponential tails seen in Fig. 5 can be due to real pressure differences experienced by the ships.
Some of the errors are probably due to coding or location errors in the archives. As an example, for the weather ship “K,” which is usually located near 16°W, there is one observation on 7 January 1972 showing its position at 46°W, with other observations at adjacent time periods indicating that it was at 16°W. Nevertheless, the observation has a QC flag, which indicates that it was not rejected by NNR.3 One possible way of removing some of this type of error is by tracking the position of ships. However, previous attempts (E. C. Kent 2004, personal communication) have found only minor reductions in the errors after application of ship tracking to remove location errors. Moreover, since a lot of ship reports do not contain the ship’s call sign (more than half of the observations in the NNR archive during the 1960s and early 1970s have the generic call sign “SHIP”), ship tracking is impractical.
2) VARSA and VARSP
Over most of the grid boxes, there are times when there are no observations (see Fig. 3c). Kidson and Trenberth (1988) and, more recently, Gulev et al. (2007) have shown that data gaps will introduce errors in the estimations of variance and covariance statistics. Kidson and Trenberth (1988) also showed that the sampling error becomes more and more severe as the percentage of data gap increases. Hence, to minimize this error, statistics computed from any grid box will only be defined if there are more than 45 observations in that grid box within a month. The cutoff value of 45 is basically a compromise—using larger values will render variances over most grid boxes in the central Pacific undefined, while smaller values will allow grid boxes with large sampling errors to contaminate the statistics.
The NNR grid value represents the value of the MSLP at the center of a grid box. However, ship observations are not located at the box centers, and the spatial variability in MSLP within a grid box can add additional variance onto the observed variance. The contribution of this spatial variance, together with the sampling error discussed above, can be estimated using the NNR. Taking the NNR gridpoint data as the “truth,” the dataset can be used to generate a “simulated ship report” dataset by interpolating the NNR data to the exact location and time of each available ship report. The variance computed from this “simulated ship report” dataset (called NNR-INTP herein) can then be compared with the variance computed using the full NNR dataset (NNR-FULL), and the difference between the two will give an estimate of the combined effects of VARSA and VARSP. This is similar to the strategy used by Chang (2003) for aircraft observations. In practice, the analysis interpolated to the time and location of the observation is already part of the NNR BUFR archive, so no additional interpolation needs to be performed. Comparison between NNR-INTP and ship observations (corrected for VARE) will show directly the biases between NNR and ship observations.
Because VARSP is estimated based on the NNR, the effects of subsynoptic-scale spatial variability, such as mesoscale or frontal variability, will be missing from this estimate. However, it can be argued that much of this subsynoptic spatial variability will in fact be aliased as part of the observational error (VARE), since for the semivariogram method to work well, the fields are assumed to be smooth so that one can extrapolate from finite separation to zero separation. Thus, unresolved spatial features will be aliased by this technique as part of the observational error. Hence underestimation of VARSP should be largely compensated by overestimation of VARE.
c. Data trimming based on NNR
In the preceding discussions, it is shown that applications of both NNR and ICOADS QC do not succeed in reducing the large observational errors in the ship observations. One can try to correct the observed variance by the estimated error variance. This is one of the strategies used in the next section. However, it is shown that because of the large errors in the 1960s and early 1970s (see Fig. 4), the correction applied will give rise to a large correction to the trend in the variance. Hence an alternative strategy is desirable to assess the robustness of the results.
One of the QC criteria for ICOADS standard statistics is data trimming, in which observations that differ from the climatological median for that grid box by over 2.8 times the climatological standard deviation for that grid box will be removed before the computation of the statistics. It is shown in Fig. 4 that this criterion fails to reduce the large observational errors. However, motivated by this criterion, an alternative strategy is developed. It is shown in Fig. 3 that over the entire period, there are on average one or more observations per 5° × 5° grid box per analysis. Hence one would expect that the analysis should not be too far from the truth. For example, NNR assumes a first-guess error for MSLP that varies between 2 and 3 hPa in the midlatitudes. While this may be too optimistic, one can compute the difference between the NNR and ERA-40 and use that as a lower bound for the analysis error. For example, for January 1972, the rms difference in MSLP between the two reanalyses is 1.8 hPa in the Pacific and 1.5 hPa for the Atlantic. For January 1992, the rms difference between the two reanalyses is about 1.4 hPa for both the Pacific and Atlantic. Hence one might expect that it should not be too frequent that the reanalysis MSLP would differ from the “truth” by, say, 10 hPa or more.
The strategy taken here will be as follows. Each observation will be compared with the analysis interpolated to the location and time of the observation, and all observations that differ from the analysis by a prespecified amount will be trimmed from the dataset. The prespecified limit for data rejection will be varied to examine the robustness of the results to this (largely subjectively specified) limit. The limit tested ranges from 30 hPa down to 2.5 hPa.
The effect of the data trimming on the distribution of Δp of nearly collocated (i.e., separated by less than 100 km) observation pairs for 1971/72 is shown in Fig. 7. The dashed curve, which represents the expected distribution if we assume that the actual observation error distribution of nonoutliers can be represented by a Gaussian fit to the distribution for small Δp, is reproduced from Fig. 6. From Fig. 7, a trimming limit of 5 hPa (circles) seems to fit the dashed curve best, while a limit of 3 hPa (triangles) clearly produces too narrow a distribution.
To test the effectiveness of the trimming strategy, high quality ocean weather station (OWS) data are used. Gulev et al. (2000) have demonstrated the possibilities of using these data for climate studies of synoptic activity. During DJF of 1971/72, 10 OWSs are available over the Atlantic, and their locations are listed in Table 1. The MSLP reports from the OWSs and nearby ships (less than 100 km and 3 h) are compared, and the distribution of the pressure difference between the two reports is tabulated. The distribution for OWS I is shown in Fig. 8a. For this particular OWS, a total of 2456 observation pairs involving the OWS and another ship can be found during that winter. Out of these, 165 pairs (6.7%) have an observed pressure difference larger than 10 hPa, and 105 pairs (4.3%) have a difference larger than 20 hPa.
Assuming that the observation error of OWS reports is small, data trimming based on NNR is first performed only on observations other than those from OWSs. Figure 8b shows the result of the trimming using a limit of 10 hPa. After trimming, the number of observations that differ from OWS reports by large amounts clearly decreased; 2318 observation pairs are left, with 36 pairs (1.6%) having an observed pressure difference larger than 10 hPa, and 16 pairs (0.7%) larger than 20 hPa. Out of the 138 pairs of observations that have been trimmed, 129 pairs (93%) have an observed pressure difference larger than 10 hPa. This suggests that most of the observations removed by the trimming strategy are observations that are questionable, since they differ from nearby high quality OWS observations by more than 10 hPa.
Inspecting Fig. 8b, we see that even after data trimming, there are still observation pairs with pressure difference as large as 30 or 40 hPa. To see what causes such large pressure differences, all observation pairs with a pressure difference of over 20 hPa have been examined. It turns out that all of these appear to be due to errors in the OWS report. Two kinds of errors are found. The first kind is due to errors in the OWS position, similar to the error described near the end of section 2b(1). The other kind appears to be errors in the OWS MSLP reports. As an example, the following sequence of hourly MSLP observations (in hPa) was reported by OWS I between 2200 UTC 7 January and 0200UTC 8 January 1972: 988.5, 988.0, 1025.9, 987.1, 986.2. Clearly, the observation 1025.9 hPa at 0000 UTC is questionable. At that time, a nearby ship reported an MSLP of 987.2 hPa, and this makes up one of the pairs with an observed pressure difference larger than 20 hPa. These errors in the OWS reports are probably due to errors in data transmission or archival.
Analyses similar to those shown in Fig. 8 have been conducted for all 10 OWSs. Taken together, 21, 879 close pairs of observations are found involving one OWS and another ship. Out of these, 1007 pairs (4.6%) have an observed pressure difference over 10 hPa, with 418 (1.9%) over 20 hPa. Using 10 hPa as the trimming limit, the number of pairs is reduced to 21, 158, with 386 (1.8%) having a pressure difference over 10 hPa, and 129 (0.6%) over 20 hPa. Hence, out of the 721 pairs removed, 621 (86%) have a pressure difference over 10 hPa. This shows that in the vicinity of all the OWSs, most of the ship reports that are removed by data trimming using NNR as reference are probably rejected correctly. Of the remaining 129 pairs that have a pressure difference of over 20 hPa, at least 124 of them are found to be likely due to errors in the OWS report, with the remaining cases possibly due to spatial or temporal variability.
The discussions above show that even OWS reports in the data archives contain errors. Hence, data trimming is applied to all observations, including OWS reports. The result of the trimming using a limit of 10 hPa, for observation pairs involving OWS I, is shown in Fig. 8c. As expected, after the trimming, all pairs with large pressure differences are removed.
The effect of the data trimming on the observational error estimate for all winters is shown in Fig. 9. The solid line, reproduced from Fig. 4, is the error estimate when no data trimming is applied, while the dashed curve is the estimate based on fitting each winter’s frequency distribution as a function of Δp (for Δp ≤ 5 hPa) to a Gaussian (solid lines in Fig. 5). It is clear that data trimming based on a limit of a 5- or 10-hPa difference from NNR reduces the estimated observational error to a value close to that assumed by NNR, and close to the Gaussian fit to the frequency distribution of Δp for small Δp.
The fraction of observations trimmed is shown in Fig. 9b. For a limit of 10 hPa, less than 5% of observations are trimmed for most winters, except for the early 1970s, when up to 9% of observations are trimmed. For a limit of 5 hPa, more data are rejected, but, apart from the early 1970s, less than 10% of observations are trimmed.
One may criticize that since the data trimming employed here is performed based on the NNR, the trimmed data should be consistent with NNR and thus should give results close to those computed based on the NNR. In fact, the main point here is that since the data have been trimmed to be more or less consistent with NNR, if the trend computed from the trimmed data is still inconsistent with that computed from the NNR, then one can be more confident that any inconsistency between the observations and NNR is probably not due to secular changes in the observational error alone but may be due, at least in part, to biases in the NNR.
3. Assessing storm track trends using ship observations
a. Without data trimming
The 24-h filtered MSLP variances, computed based on ship observations (dashed line) and NNR (solid line), are shown in Figs. 10a–d (Pacific) and 10e–h (Atlantic), respectively. The areas are the same as those used in Fig. 2. In Figs. 10a–b and 10e–f, the observed variances (VARO) are not corrected for observation errors (VARE), while those shown in Figs. 10c,d and 10g,h have been corrected for observational errors. Observational errors have been estimated separately for the Pacific and Atlantic, but the difference between the two basins is relatively minor.
As discussed in section 2, variance computed directly from ship observations (VARO) is also contaminated by sampling error (VARSA) and spatial variability (VARSP). To take this into account, while the variances computed from the NNR shown in Figs. 9a,b and 9e,f are computed based on all NNR data (NNR-FULL), those shown in Figs. 10c,d and 10g,h are based on NNR interpolated to the observation time and location (NNR-INTP). Hence, the two curves shown in the bottom four panels should be directly comparable with ship observations corrected for observational errors, while the NNR-INTP variance includes the effects of VARSA and VARSP.
There are instances when more than one observation exists within a grid box within a 6-h period. While averaging the observations together will serve to reduce the observational error for that particular grid box at that time, doing so makes it hard to correct for the observational errors in the computation of the variance, since the error characteristics of the observations are changed by averaging the observations together. Hence when more than one observation is present, one of the observations is randomly picked to compute the variance. To utilize the observations more fully, the variance for each month is computed 10 times, each time randomly picking one observation at times when any grid box has more than one observation, and the 10 variances are averaged to form the statistics for that month.
Figure 10a shows the unadjusted statistics for the Pacific. Clearly, the variance computed from the ship observations is significantly higher than that computed from NNR, especially during the earlier part of the record. This is not surprising, since observational errors are estimated to be large during that period. Nevertheless, there is still quite a bit of correspondence between the year-to-year variability of the two time series, with the correlation between the two time series being 0.59 after detrending.4
The lower-frequency component of the interannual variability is shown in Fig. 10b by the 5-yr running means. The NNR time series clearly shows an upward trend (equivalent to +14.5 hPa2 over the 42-yr period; see Table 2), while the time series computed from uncorrected ship observations actually shows a decreasing trend (Table 2). Note that a similar trend computed based on ERA-40 data comes out to be +13.1 hPa2 over the same period.
In Fig. 10c, the ship observations have been corrected for observational errors (i.e., VARO–VARE), while the NNR statistics have been computed based on the NNR-INTP dataset. Now the two time series correspond much better, with the correlation increased to 0.80. However, one can see a tendency that the variance computed based on NNR-INTP is generally higher than that computed from the ship observations after 1970, while the opposite is true in the 1960s. This can be more clearly seen in the 5-yr running mean shown in Fig. 10d. The NNR-INTP time series shows a slightly smaller upward trend (at +13.8 hPa2 over 42 yr) than that based on NNR-FULL, while the ship observation time series shows only a small uptrend trend (+2.8 hPa2 over 42 yr). Hence, while the applied observational error correction does succeed in increasing the trend computed from the ship observations, the corrected trend is still much smaller than that computed based on NNR.
Similar statistics for the Atlantic are shown in Figs. 10e–h. Again, the variance computed from the ship observations, uncorrected for observational error (dashed lines in Figs. 10e–f), is significantly larger than that computed based on NNR-FULL. However, the winter-to-winter variability between the two time series is better correlated than their Pacific counterparts, with a correlation of 0.77. While the decadal-time-scale variabilities agree quite well with each other (Fig. 10f), the trend computed from the ship observations (+2.4 hPa2 over 42 yr; see Table 3) is much smaller than that computed based on NNR-FULL (+17.1 hPa2 over 42 yr). The trend computed based on ERA-40 data is +14.9 hPa2 over 42 yr.
The variances computed from ship observations after corrections for observational errors, and from NNR-INTP, are shown in Figs. 9g,h. It is clear that after these adjustments, the two time series show excellent agreement, with a correlation of 0.92. Even the trends agree very well—the trend computed from the NNR-INTP time series is +14.7 hPa2 over 42 yr (again somewhat smaller than that computed based on NNR-FULL), while that from the ship observations corrected for observational errors comes out to be +14.2 hPa2.
b. With data trimming based on NNR
The same procedure as described in the preceding subsection has been applied to ship observations trimmed when they differ from NNR by a value larger than a prescribed limit. Different limits have been prescribed to test the robustness of the results, including 30, 15, 10, 7.5, 5, and 2.5 hPa.
First, the results based on a limit of 30 hPa are shown in Fig. 11. Over the Pacific (Figs. 11a,b), the variance computed using uncorrected ship observations is still significantly higher than that computed based on NNR-FULL, but the winter-to-winter variability now correlates better between the two time series (correlation equals 0.73). However, the trend based on ship observations uncorrected for VARE (but trimmed) is still negative (Table 2).
After the corrections due to observational errors have been applied to ship observations, and the NNR is adjusted for VARSA and VARSP (Figs. 11c,d), the two time series come into much better agreement, with correlation having a value of 0.89. However, the long-term trend in the variance computed based on ship observations (+5.0 hPa2 over 42 yr) is still less than half (about 36%) of that computed based on the NNR (see Table 2).
Over the Atlantic, the ship-based variance, uncorrected for observational error (Figs. 11e,f), is again significantly larger than that computed based on NNR, but the correlation between the two is quite high (r = 0.84). After the ship variance has been corrected for observational errors, and the NNR data interpolated to ship observation location and time (Figs. 11g,h), the two time series come into excellent agreement (r = 0.97). The trends in the adjusted variances are +11.8 and +14.4 hPa2 over 42 yr, respectively, with the ship-based trend being 82% of that based on NNR-INTP. Comparing Fig. 11 to Fig. 10, we can see that even with the trimming limit set as high as 30 hPa (with fewer than 4% of ship observations trimmed for all years), the agreement between the ship- and NNR-based time series has already become much better.
The results, when the trimming limit is set at a value of 5 hPa, are shown in Fig. 12. With the limit set at this rather low value, even the variance based on ship observations but uncorrected for observational errors now correlates well with the NNR-FULL time series (Figs. 12a and 12e), but the ship-based variance is still a bit larger than the NNR-based variance (see also Figs. 12b and 12f). After correction for observational errors, the agreement between the variance computed based on ship observations and NNR-INTP becomes excellent in both basins (Figs. 12c and 12g). The ship-based trends, though, are still smaller than those computed based on the NNR. Over the Pacific, the ship-based trend is about 60% that of the NNR-based trend, while over the Atlantic, the ship-based trend is about 72% that of the NNR-based trend.
The sensitivity of the computed trends to the limit of data trimming can be seen in Tables 2 and 3. Over the Pacific, the ship-based trend, after corrected for observational errors, increases monotonically from a value of only 20% that of the NNR-INTP trend when no data trimming is performed to a value that is 74% that of the NNR-INTP trend when the data trimming limit is set to be 2.5 hPa. Over the Atlantic, when no data trimming is done, the corrected ship-based trend comes out to be 97% that of the NNR-INTP trend. With data trimming, the ship-based trend decreases to a value of about 70% that of the NNR-INTP for trimming limits of between 15 and 5 hPa, and then increases again when the trimming limit is set below 5 hPa. It is of interest to note that even when a trimming limit of 2.5 hPa is used, the ship-based trends for both basins are still smaller than those computed based on NNR. This strongly suggests that the trends in the MSLP variance computed from NNR are probably biased high. However, it is clear that the Atlantic trend is much more robust than that over the Pacific.
The sensitivity of the decadal-time-scale variability to the limit of data trimming is highlighted in Fig. 13. For the Atlantic (Fig. 13b), regardless of whether trimming is employed, the decadal variability is very robust, with lowest activity during the late 1960s and early 1970s, with a secondary minimum near 1980, and maximum activity around 1990 and during the mid 1970s. Over the Pacific (Fig. 13a), the results are not as robust. When no trimming is employed (black line), the storm track activity has minima during the early 1970s, near 1980, and during the early 1990s, and maxima during the mid 1960s, mid 1980s, and late 1990s. When the trimming limit is decreased to 10 or 5 hPa (blue and red lines), the storm track activity has minimum activity during the early 1960s, and secondary minima during the late 1970s and early 1990s. Overall, the results shown in Fig. 13 and Tables 2 and 3 show that we can have much higher confidence regarding the decadal variability and secular trend of the Atlantic storm track, and much less confidence regarding those of the Pacific storm track.
The spatial distribution of the difference in 24-h filtered MSLP variance between the winters of 1985/86–1998/99 and 1957/58–1970/71 is shown in Fig. 14. Figure 14c shows the difference as computed based on all NNR data (NNR-FULL). The difference as computed based on ship observations using 10 hPa as the trimming limit is shown in Fig. 14b, while that based on NNR-INTP is shown in Fig. 14a. Based on the NNR, storm track activity generally increased over the central Pacific and northern Atlantic. The pattern computed from the ship observations shows a lot of consistency with that computed based on NNR, except that over the Pacific, there are a few grid boxes with larger negative changes, and fewer grid boxes with large positive changes. These are what give rise to a smaller positive trend in storm track activity found in the variance computed from ship observations. Figure 14 also shows that some regions, such as the Labrador Sea, are undersampled, which is probably the reason that the trend computed from the NNR-INTP dataset is slightly smaller than that computed based on the NNR-FULL dataset.
4. Assessing storm track trends using a statistical storm track model
Regardless of whether storm track variations excite anomalies in the low-frequency flow (e.g., Branstator 1992) or changes in the low-frequency flow act to organize storm track anomalies (e.g., Branstator 1995), storm track anomalies are found to be strongly tied to anomalies in the mean flow (e.g., Lau 1988; Cai and Mak 1990). Exploiting this “symbiotic relationship,” Chang and Fu (2003) constructed a linear storm track model based on CCA (e.g., Bretherton et al. 1992), using analyzed monthly/seasonal mean flow anomalies to hindcast monthly/seasonal mean storm track anomalies. A similar model, but constructed based on GCM data, was used by Compo and Sardeshmukh (2004). Their results suggest that a CCA model based on GCM data may perform better than one based on analyzed data because of the much larger amount of GCM data available.
Here, a similar model is constructed using 1500 months of GCM simulation. The GCM simulation used is derived from the Atmospheric Models Intercomparison Project (AMIP) runs conducted at the Geophysical Fluid Dynamics Laboratory (GFDL), using their Atmospheric Model Version 2.0 (AM2.0; see GFDL Global Atmospheric Model Development Team 2004). The model is forced by prescribed SST and diurnally varying solar insolation, and outputs are available from 1950 to 2000. An ensemble of 10 runs has been made. Here, NH winter (DJF) data from all 10 runs are analyzed; hence a total of 1500 months of data are available.
With the model running at relatively fine resolutions—a horizontal resolution of 2° latitude and 2.5° longitude, and 24 unevenly spaced hybrid sigma-pressure levels in the vertical—the climatological distribution of the model storm tracks (in terms of 24-h difference filtered MSLP variances), as well as their interannual variability, are very close to those found in the NNR, in terms of both amplitude and spatial pattern (not shown).
Monthly mean MSLP anomalies are used as the predictor, while anomalies in the monthly mean variance of MSLP, filtered using the 24-h difference filter, are used as the predictand. The CCA is performed with prefiltering (Barnett and Preisendorfer 1987), with only the leading 50 EOFs retained for each field. However, the results are not sensitive to the truncation limit. After the CCA relationship is obtained from the GCM data, the model is then applied to hindcast the observed storm track variations, using seasonal mean MSLP anomalies from NNR as the predictor. Based on this approach, the part of the storm track variability that is dynamically consistent with the observed mean flow anomalies can be obtained.
The results of the CCA hindcast are shown in Figs. 15 and 16. Given the analyzed monthly mean MSLP fields, the CCA-model-generated storm track anomalies over the Atlantic are well correlated with the analyzed anomalies, with the two time series correlating at 0.67 (Fig. 15c). The decadal-time-scale variability is also well captured (Fig. 15d). Over the Pacific, the results are not as good, with the two time series correlating at a modest value of 0.51 (Fig. 15a). The decadal variability in the model hindcast is also much weaker than that found in the NNR.
The point-by-point correlation between the CCA hindcast and NNR storm track anomalies is shown in Fig. 16a. Over much of the Atlantic and eastern Pacific, the correlation is over 0.5. However, over the western Pacific, the correlation is quite low. The secular trend of the storm track is indicated by the difference between the 14 winter means of 1985/86–1998/99 and 1957/58–1970/71. The difference in the NNR is shown in Fig. 16c, while that from the CCA hindcast is shown in Fig. 16d. In the NNR, both the Pacific and Atlantic storm tracks show substantial increase in activity. In the CCA hindcast, the Atlantic storm track clearly shows substantial increase, while the Pacific storm track shows much more modest increase, mainly to the southwest of the Aleutians. The CCA hindcast also indicates substantial decrease in storm track activity over the western Pacific, but over that region the CCA hindcast is not well correlated with NNR (Fig. 16a). Overall, if only the regions where the correlation between the CCA hindcast and the NNR storm track anomalies is over 0.5 are considered (i.e., shaded regions in Fig. 16a), both storm tracks in the CCA hindcast show increase in activity between the two periods, with the increase in the Atlantic being 82% of that in the NNR, while over the Pacific, the increase is only 32% of that found in the NNR.
The small trend in the Pacific storm track found in the CCA hindcast is not unexpected. The difference in the seasonal mean MSLP anomalies between the two 14-yr periods is shown in Fig. 16b. Over the Atlantic, there is a significant increase in the strength of both the Icelandic low and the Aleutian high, showing a strong increase in the midlatitude westerlies, which is consistent with a significant increase in the storm track activity. However, over the Pacific, while there is an increase in the strength of the Aleutian low, the increase is moderate (pressure fall of only 3.5 hPa) and is in fact not statistically significant at the 5% level. Note that the CCA model of Chang and Fu (2003), constructed using pairs of analyzed anomalies (from 1979 to 2002) instead of GCM simulations, also failed to hindcast the strong increase in Pacific storm track activity found in NNR. These results suggest that the large increase in Pacific storm track activity found in the NNR is not dynamically consistent with the analyzed mean flow change.
Based on reanalysis data, NH winter storm track activity, as indicated by time-filtered MSLP variance, shows a substantial basinwide increase during the second half of the twentieth century, over both the northern Pacific and Atlantic. However, several previous studies have suggested that there could be spurious climate trends in the reanalysis data. In this paper, MSLP variances computed from ship observations are compared to those computed based on NNR in order to validate this trend.
Observed variances computed directly from ship reports are contaminated by observational errors, temporal data gaps, and irregular spatial distribution of the ship observations. The impacts of temporal sampling and spatial variability can be estimated by comparing variances based on NNR interpolated to observation location and time to those based on the full NNR dataset, and the effects are found to be relatively minor. In contrast, the bias due to observational errors is found to be much larger. The observational error is estimated by comparing pressure reports of nearly collocated pairs of observations. It is generally larger than the 1.6 hPa assumed by NNR and is particularly large during the late 1960s and early 1970s. Thus the correction made by subtraction of the observational error variance introduces a correction to the trend that is of the same order of magnitude as the trend found in the NNR. After such a correction, the trend found from the ship observations over the Atlantic is consistent with the trend in the NNR, while that over the Pacific is only 20% of that found in the NNR.
Closer examination of the observational error suggests that there is a long exponential tail of observations with large MSLP errors that are not rejected by the QC of either the NNR or ICOADS. An alternative strategy is developed to remove those observations with large errors. Noting that most areas over the storm track regions have relatively frequent observations, it is hypothesized that the NNR MSLP analysis should not be too different from the actual MSLP most of the time. Hence, observations that are significantly different from the value given by the NNR are trimmed from the dataset before the variance is computed. Different trimming limits, ranging from 30 to 2.5 hPa, are employed to test the robustness of the results.
The data trimming-strategy is tested using high quality OWS data, and the results show that in the vicinity of the OWS, most of the reports rejected by this trimming strategy have pressure observations that differ significantly from the OWS reports and hence are questionable. After the application of the trimming, the trend in Pacific storm track activity computed based on the ship observations increases monotonically from 36% to 74% of that computed based on NNR when the trimming limit is reduced (Table 2), while the trend over the Atlantic, computed based on ship observations, generally lies between 70% and 80% of that computed based on NNR (Table 2). These results suggest that the trends computed based on the NNR are likely to be biased high.
Making use of the symbiotic relationship that previous studies have found between storm track and mean flow anomalies, a statistical model that relates storm track to mean flow change is constructed to test the dynamical consistency of the NNR storm track trend. Five hundred winters of GCM simulations are used to construct a CCA model, using monthly mean distribution of MSLP anomalies as a predictor to hindcast monthly mean MSLP variance. The Atlantic storm track in the CCA model hindcast is well correlated with the storm track in the NNR for both interannual and decadal time scales, with the hindcast trend being 82% of that found in the NNR. Over the Pacific, the CCA hindcast does not perform as well, and the hindcast trend is only 32% of that found in the NNR. These results, together with the results based on ship observations, suggest that the actual trend in Pacific storm track activity is probably only about 20%–60% of that found in the NNR, while over the Atlantic the actual trend is likely to be about 70%–80% of that found in the NNR. The large bias over the Pacific and much smaller bias over the Atlantic are consistent with the results of Harnik and Chang (2003), Chang and Fu (2003), and C05.
Based on the discussions above, a basinwide storm track activity index computed solely based on the NNR (as well as ERA-40) is expected to contain biases in the trend and thus is not ideal for climate diagnostic purposes. It is proposed that an index based on ship observations should be used instead. The index favored here is the one computed using NNR data trimming, with a moderate trimming limit of 10 hPa. This is basically a compromise between being too constrained by the NNR when smaller trimming limits are used and having to apply too large observational error corrections when larger trimming limits are used. In addition, the trends (Tables 2 and 3) computed based on this trimming limit lie within the ranges deemed plausible in the preceding paragraph. Separate indices for the Pacific and Atlantic are tabulated in Tables 4 and 5, respectively. Users of these indices should note that while the interannual variability of these indices is probably quite robust, the decadal-time-scale variability and long-term trend for the Pacific storm track is quite uncertain.
The results of this study suggest that the trend in NH winter storm track activity, in both the NNR and the ERA-40, is probably biased high. At present, it is not clear what leads to this bias, but it is likely due to the influence of changes in the observing system. Further data-constrained assimilation experiments should be performed to examine this issue.
In this study, the focus has been solely on the basinwide upward trend in storm track activity. With the availability of ship data over much of the basins, and high quality OWS data during some of the years over many locations, especially in the Atlantic, regional variability and trends can be studied, with possible supplement by observations from nearby coastal stations.
Due to the fact that cyclone count statistics are not readily derivable from ship observations, this study has only examined the trend in MSLP variance statistics. However, since MSLP variance is clearly tied to cyclone intensity, biases in the trend in MSLP variance would imply that there are probably biases in the cyclone statistics derived from the reanalysis datasets. Further studies should be conducted to examine how the two types of storm track statistics quantitatively relate to each other.
The author would like to thank the anonymous reviewers for comments that have helped to significantly improve this paper. The NCEP/NCAR reanalysis data and the observational data archive were obtained from NCAR with the assistance of the DSS. ERA-40 data were downloaded from the ECMWF data server. GFDL AMIP simulations data were downloaded from the GFDL data portal. This research is supported by NOAA Grants NA16GP2540 and NA06OAR4310084.
Corresponding author address: Dr. Edmund Chang, ITPA/MSRC, Stony Brook University, Stony Brook, NY 11794-5000. Email: email@example.com
The response function of the 24-h filter (after dividing by 2) is shown in Fig. 2 of Wallace et al. (1988). The peak response for the filter is at a period of 2 days, with half-power points at periods of 1.2 and 6 days. Chang and Fu (2002) examined different variance and covariance statistics computed using this and other bandpass filters and found that they show very similar variability.
Note that in the NH upper troposphere, the two reanalyses are not as consistent, as the trend in upper-level variance computed from ERA-40 is smaller than that computed based on NNR. Results regarding storm track activity in the NH upper troposphere will be presented elsewhere.
This particular observation also passed ICOADS QC for standard statistics.
As discussed below, the time series show significant differences in the trends; hence, to highlight interannual variations, all correlations are computed based on detrended time series.