Tropical cyclones (TCs) are identified and tracked in six recent reanalysis datasets and compared with those from the IBTrACS best-track archive. Results indicate that nearly every cyclone present in IBTrACS over the period 1979–2012 can be found in all six reanalyses using a tracking and matching approach. However, TC intensities are significantly underrepresented in the reanalyses compared to the observations. Applying a typical objective TC identification scheme, it is found that the largest uncertainties in TC identification occur for the weaker storms; this is exacerbated by uncertainties in the observations for weak storms and lack of consistency in operational procedures. For example, certain types of storms, such as tropical depressions, subtropical cyclones, and monsoon depressions, are not included in the best-track data for all reporting agencies. There are definite improvements in how well TCs are represented in more recent, higher-resolution reanalyses; in particular MERRA-2 is comparable with the NCEP-CFSR and JRA-55 reanalyses, which perform significantly better than the older MERRA reanalysis.
Tropical cyclones (TCs) are one of the most damaging weather-related natural hazards on the planet, causing 42% of the United States catastrophe-insured losses in the period 1992–2011 (King 2013). Individual intense events can result in severe losses. For example, Hurricane Katrina resulted in an estimated death toll of 1833 people and financial losses of over $125 billion (Adeola and Picou 2014). Weaker storms such as tropical depressions can also have an impact in terms of loss of life and disruption in vulnerable societies (ECLAC 2009). It is therefore important to utilize the available data and new analysis techniques to better understand their properties and behavior, with the aim of mitigating their societal, economic, and environmental impacts.
Because of the relatively short observational record of TCs, and problems with sampling within the record, there is considerable uncertainty in the variability of TCs in terms of frequency over climate time scales of the last 100 yr (Landsea 2007; Landsea et al. 2009), resulting in uncertainty in the interannual variability and trend detection. The use of reanalyses to detect TCs provides an opportunity to reduce this uncertainty (Truchelut et al. 2013), by allowing the creation of a larger data sample that, when used in conjunction with the historic observational data, can help to provide more confidence in TC numbers than the observations alone. Reanalyses combine observations with a short forecast from a general circulation model (GCM) to produce gridded datasets, constrained by observations, with regular output intervals, and can act as a bridge between the observations of TCs and simulated tempestology. However, there can be problems in using reanalyses related to the changing observing system, in particular the introduction of spurious trends (Bengtsson et al. 2004a) and the fact that different reanalyses use different GCMs with different parameterizations and different data assimilation methods, all of which can contribute to differences between them. The study of Schenkel and Hart (2012) previously considered the representation of TCs in the Northern Hemisphere in several reanalyses, including several of those used in this study, by manually tracking the best-track TCs in the reanalyses, and found considerable variation in the properties of TCs between the reanalyses, for location, and a consistently large underestimate of intensity (10-m winds and mean sea level pressure) for all the reanalyses. This uncertainty in the representation of TC properties in reanalyses can introduce uncertainty into their automated detection in these data, so that the objective detection criteria are often tailored to the particular reanalysis of interest (Murakami 2014).
Another motivation for a careful study of the properties of TCs as represented by reanalyses is that they are often used as a means of calibrating TC detection and tracking schemes before applying them to climate models (Bengtsson et al. 2007a). This is done by first applying the detection to the reanalyses or operational analyses and adjusting the detection criteria to give similar numbers of TCs to those found in the observations provided by the TC warning centers’ best-track data. This may be problematic if there are large differences between how reanalyses represent TCs in terms of their properties, such as structure and intensities, or if there are biases in the best-track data.
The reanalysis model dynamical core, parameterizations, and resolution all play a critical role in determining the output of extreme events in reanalysis data. These vary widely, with in particular newer generations of reanalyses being produced at higher resolutions and with modern data assimilation systems.
For climate models, the IPCC Fifth Assessment Report (IPCC 2013) stated that there is medium evidence and high agreement that year-to-year count variability of Atlantic hurricanes can be well simulated by modest resolution (100 km or finer) atmospheric GCMs (AGCMs) forced by observed sea surface temperatures (SSTs). Both Strachan et al. (2013) and Roberts et al. (2015) show that 60 km is adequate for simulating interannual variability, although not intensity.
Recent work by Murakami (2014) showed that, when considering five reanalyses (also included in this study), the highest-resolution reanalysis is not always the best in terms of simulating the TC climatology and properties, nor do the higher-resolution reanalyses produce significantly more intense storms than those with lower resolutions, suggesting that the simulation of TCs in the reanalyses is highly dependent on model formulation (Schenkel and Hart 2012) and/or data assimilation strategy. However, if we can understand the uncertainties of TCs in the reanalyses, they may provide a useful means of extending the observations—for example, by extending the identified TC life cycles to include the extratropical transition (Jones et al. 2003) and beyond, which is useful for TC-related extratropical risk analysis and GCM assessment (Haarsma et al. 2013).
The use of reanalysis could also assist in the identification of subtropical and hybrid tropical storms (Roth 2002; Guishard et al. 2009), which are also associated with severe weather, providing a more complete set of tropical storm data for use in GCM assessment than is perhaps currently present in best-track data; the inclusion of these types of storms in the best-track datasets is highly variable between the operational centers.
The main aim of this paper is to quantify the uncertainties in how well TCs are represented in a number of recent reanalyses, and how this affects the objective identification of TCs in reanalyses. This is achieved by exploring the following:
how well reanalyses represent the observed TCs in the best-track data using direct track matching, and
how well an objective identification scheme identifies the best-track TCs in the reanalyses and what might be the cause of differences.
2. Data and methods
Data from six recent reanalyses are used in this study and described below. Also used are best-track data produced by the tropical warning centers as postseason analyses of the TC tracks. These have been combined into the International Best Track Archive for Climate Stewardship (IBTrACS) dataset (Knapp et al. 2010) and are used in this study for verifying the TCs identified in the reanalyses. The IBTrACS-ALL, which includes data from all agencies, is used in this study. The common period of 1979–2012 is used throughout for all datasets, except for one reanalysis where the period is 1980–2012.
Throughout the rest of the paper the following nomenclature is used; the term “tropical cyclone” (TC) is used for warm core storms generally and, where appropriate, the term “tropical storm” (TS) is used for TCs with wind speeds greater than 17 m s−1.
a. Best-track dataset
For full details of the IBTrACS-ALL dataset, see Knapp et al. (2010). The original wind speed data in knots is converted to wind speed in meters per second. The World Meteorological Organization (WMO) standard for reported tropical cyclone wind speed is maximum 10-min sustained winds at 10-m height over a smooth surface; however, this is rarely observed, so some discrepancy between agencies is apparent. Different agencies apply different wind-averaging periods, with the eastern Pacific, North Atlantic [Regional Specialized Meteorological Center (RSMC) Miami], and central Pacific (RSMC Honolulu) using 1-min averaging periods; the north Indian Ocean (RSMC New Delhi) using a 3-min period; and the other agencies using 10-min averaging periods (Schreck et al. 2014). The 10-min wind speeds are converted to 1-min wind speeds using a factor of 1.13, which has traditionally been used (Harper et al. 2010), and the data from RSMC Miami and New Delhi are used in their original form. However, there are uncertainties in the accuracy and fidelity of this conversion, with different conversion factors for at-sea, off-sea, off-land, and in-land parts of the storm suggested (Harper et al. 2010). Other uncertainties also exist in the best-track data, which have been discussed is several studies; a summary of these uncertainties can be found in the appendix of Hodges and Emerton (2015). They include issues relating to location and intensity uncertainties and operational differences between agencies. This is further discussed in the discussion section (section 4).
For the analysis of the identified TCs in different ocean basins the IBTrACS basin boundaries (Knapp et al. 2010) have been used, with TCs assigned to a particular ocean basin, based on where the storm reaches maximum 10-m wind speed intensity.
b. Reanalysis datasets
Meteorological centers around the world produce reanalysis datasets as an ongoing enterprise. The reanalyses are essentially based on frozen operational numerical weather prediction (NWP) systems. New reanalyses are often released following significant improvements in the models and data assimilation schemes. The reanalyses differ in terms of the models and data assimilation methods used to produce them, so differences in their output are to be expected. Six recent global atmospheric reanalysis datasets have been analyzed for TCs in this study and are summarized in Table 1. They include the European Centre for Medium-Range Weather Forecasts (ECMWF) interim reanalysis (ERA-Interim, hereinafter ERAI; Dee et al. 2011); the Japanese 25-year Reanalysis (JRA-25) (Onogi et al. 2007) and 55-year Reanalysis (JRA-55) (Kobayashi et al. 2015); the National Aeronautics and Space Administration (NASA) Modern-Era Retrospective Analysis for Research and Applications (MERRA; Rienecker et al. 2011) and the following version 2 (MERRA-2; Bosilovich et al. 2015; Molod et al. 2015); and the National Centers for Environmental Prediction (NCEP) Climate Forecast System Reanalysis (CFSR; Saha et al. 2010). The NCEP-CFSR is the only coupled atmosphere–ocean–land surface–sea ice reanalysis. NCEP-CFSR, MERRA, and MERRA-2 all use different versions of the 3D variational data assimilation (3D-Var) scheme: the Grid-point Statistical Interpolation (GSI) scheme (Shao et al. 2016). For MERRA and MERRA-2 the Incremental Analysis Update (IAU; Bloom et al. 1996; Rienecker et al. 2011) system is also used. The data period used for all the reanalyses is 1979–2012, except for MERRA-2, which starts in 1980.
A key difference between the Japan Meteorological Agency (JMA) reanalyses and the reanalyses produced by the other agencies is the assimilation of tropical wind retrievals (TWR). Wind profile data over and around tropical cyclone centers are retrieved from historical data and processed and assimilated as if they were dropsonde observations (Hatsushika et al. 2006). With the integration of this additional wind data, the intensity of the storms in the JMA reanalyses is found to be improved (Hatsushika et al. 2006). Another difference between the reanalyses is that the NCEP-CFSR uses a technique to improve the representation of TCs by adjusting the location of the tropical vortex to its observed location before the assimilation of storm circulation observations (Saha et al. 2010). The MERRA-2 reanalysis also uses this method.
All the reanalyses in this study make use of quality control processes and bias correction for the diverse range of observations that are assimilated, such as the variational bias correction of satellite radiances (Dee and Uppala 2009).
c. Tropical cyclone detection method
The analysis of TCs in this study relies on identifying and tracking them. The first step is to track all tropical disturbances, in both hemispheres, before applying two different identification methods to separate the TCs from other tropical systems. This is different from some other schemes where the identification is performed during the tracking and hence only identifies the TC stage of the life cycle. Though not crucial to this study, the approach taken here identifies much more of the life cycle, including the precursor and post-extratropical transition stages (Jones et al. 2003).
For the first step, where all systems in the domain are tracked, the tracking methodology is based on Hodges (1994, 1995, 1999). The domain extends to 60°N in the NH and 60°S in the SH. The tracking method uses the 6-hourly relative vorticity at the levels 850, 700, and 600 hPa, vertically averaged. The data are spectrally filtered using triangular truncation to retain total wavenumbers 6–63. The spectral coefficients are also tapered to further smooth the data using the filter described in Sardeshmukh and Hoskins (1984). The spectral filtering acts to remove the noise associated with the smallest spatial scales in the vorticity, which produces more reliable tracking in data of this type, and to remove the large-scale background, which is also found to be beneficial. The tracking proceeds by identifying the off-grid vorticity maxima, by applying a maximization scheme (Hodges 1995), if they exceed a value of 5 × 10−6 s−1 in each time frame (SH scaled by −1). These are initially linked together using a nearest-neighbor approach and then refined by minimizing a cost function for track smoothness, subject to adaptive constraints on displacement distance and track smoothness (Hodges 1999). The use of the vertically averaged vorticity is different from some previous studies using this tracking algorithm, where the single level of 850-hPa vorticity reduced to T42 resolution was used (Strachan et al. 2013; Roberts et al. 2015; Bell et al. 2013; Bengtsson et al. 2007b; Manganello et al. 2012). The use of the vertically averaged vorticity is found to improve the temporal coherence when a vorticity maximum shifts between levels (Serra et al. 2010; Fine et al. 2016) and results in more of the systems life cycle being detected. A simple vertical average is found to be sufficient, even though the levels are not evenly spaced, since, once spectrally filtered, there is little difference from using the mass weighted vertical average. Only tracks that last at least 2 days (eight time steps) are retained for further analysis. While observed TCs can have lifetimes shorter than 2 days, this only covers the period when they are determined to be TCs, whereas the tracking scheme used here aims to identify the precursor and post-TC stages resulting in much longer lifetimes (see Figs. 1c,d) so that using the 2-day threshold is not detrimental to detecting nearly all the observed TCs in the reanalyses, as shown below in the results section (section 3).
Previous methods used to detect TCs in reanalysis or GCM data rely on applying particular criteria, representative of the properties of TCs, such as thresholds on intensity [e.g., mean sea level pressure (MSLP) minima, low-level wind intensities, or vorticity extrema] and a threshold on the warm core structure either determined directly as a temperature anomaly or inferred from the presence of decreasing winds or vorticity between the lower and upper troposphere [e.g., Bengtsson et al. (1995) and related methods]. These are often applied as part of the tracking scheme itself, which is different from the approach used here. A minimum period of one day is typically imposed, for which these criteria are satisfied contiguously, and that they are satisfied only over the ocean by imposing the land–sea mask. The criteria based on intensity and structure can be strongly dependent on the model resolution and how processes important to TC development, such as convection, microphysics, and surface drag, are represented in the model. This has resulted in some studies using resolution-dependent identification criteria (Walsh et al. 2007; Manganello et al. 2012) or tuning the identification criteria to maximize the detected TCs, for example in reanalyses compared with observations (Murakami 2014), and some studies have used basin-dependent criteria (Camargo et al. 2005). The study of Horn et al. (2014) has shown that the subjective choice of different identification criteria is the main reason for differences between the numbers of TCs identified by different identification schemes.
In this study a dual approach is taken to isolate the TCs from all the tracked systems. Taking the tracks identified in the first stage, where all systems are tracked, the first approach used to isolate the TCs aims to evaluate which of the observed TCs in the IBTrACS dataset can be found in the reanalyses, without applying any criteria dependent on intensity or structure. This approach makes use of spatiotemporal matching: a track in the reanalyses matches with a track in IBTrACS if the mean separation distance between them, computed over the time period that they overlap, is less than 4° (geodesic) and is the least mean separation distance if more than one track satisfies this criterion, where any amount of temporal overlap is allowed. This will be termed the “direct matching” method. A similar approach has previously been used for extratropical cyclones (Hodges et al. 2003). The relaxed criterion on the temporal overlap is chosen because, in general, the TCs in IBTrACS have much shorter lifetimes compared to the tracks in the reanalyses produced by the tracking scheme. Several diagnostics are produced from the matched tracks, such as the mean separation distance distribution, lifetime distribution, and intensity distribution based on low-level winds, at 10 m and 925 hPa, and MSLP.
The second approach used to isolate the TCs from all the tracked systems is to objectively identify them using a typical set of identification criteria based on intensity and structure; this will be termed the “objective detection” method. The criteria used are similar to those used previously with this tracking algorithm (Bengtsson et al. 2007a,b; Strachan et al. 2013). This requires adding additional fields to the tracks, namely the T63 vorticity at levels 850 and 700–200 hPa to provide intensity and warm core criteria. This is done by recursively searching for a vorticity maximum at the different levels using the maximum at the previous level as a starting point for a steepest ascent maximization applied to the B-spline interpolated field. A search radius of 5° (geodesic) is used centered on the location at the previous level. The same approach is used in the Southern Hemisphere by multiplying fields by −1. Also added are the mean sea level pressure minimum and maximum winds at 10 m and 925 hPa as alternative measures of TC intensity. For MSLP a steepest descent method is used with the B-spline interpolation and a search radius of 5° (geodesic) centered on the tracked vorticity center to find the closest pressure minimum, while for the winds a direct search for the maximum winds within 6° of the tracked center is used. The criteria for identification are the following:
the T63 relative vorticity at 850 hPa must attain a threshold of at least 6 × 10−5 s−1;
the difference in vorticity between 850 and 200 hPa (at T63 resolution) must be greater than 6 × 10−5 s−1 to provide evidence of a warm core;
the T63 vorticity center must exist at each level between 850 and 200 hPa for a coherent vertical structure;
criteria 1 to 3 must be jointly attained for a minimum of four consecutive time steps (one day) and only apply over the oceans; and
tracks must start within 30°S–30°N.
The approach used here means that the tracking and identification is performed at a common resolution for all the reanalyses, making the tracking and identification as resolution independent as possible, although the actual model resolution will still have some impact on the identification.
The TCs identified by the objective detection method are also matched against the observed tracks in IBTrACS, using the same criteria as in the direct matching method, to determine the hit and miss rates of the identification scheme.
The tracking is applied to each full year, January–December, for the Northern Hemisphere (NH) and July to June the following year in the Southern Hemisphere (SH), resulting in 34 years in the NH and 33 in the SH (33 and 32 respectively for MERRA-2).
In this section the ability of the different reanalyses to simulate different aspects of TC behavior is assessed and compared to the observed TC activity, as represented by the IBTrACS database described in the best-track dataset subsection.
a. Direct matching results
The numbers of TCs in IBTrACS that match with a storm in the reanalyses for each reanalysis using the direct matching method are summarized in Table 2 for both NH and SH. This shows that ~95% of the TCs in IBTrACS are identified in the reanalyses in the NH and ~92% in the SH. The different reanalyses are remarkably similar in this respect. In general the TCs not found in the reanalyses tend to be the weakest and/or shortest-lived TCs in IBTrACS in both hemispheres. Some of the missing TCs fail to pass the 2-day lifetime threshold imposed on the reanalysis tracks. There is also some evidence that the number of missing TCs in the reanalyses, according to the matching criteria, is reduced in the later period, after 2000: compared to the earlier period, the number of matches increases to ~98% in both NH and SH. This improvement may be associated with the assimilation of improved observations, in particular the availability of surface scatterometer winds from the QuikSCAT satellite data from mid-1999 until the end of 2009 and continuing with similar data from other remote sensing platforms since then.
To see how the TCs identified in the reanalyses by the direct matching method compare with those in IBTrACS several sets of statistics are produced.
Figures 1a and 1b show distributions for the mean separation distance (geodesic distance) between the identical reanalysis tracks and those of IBTrACS, obtained using the direct matching method, in the NH and SH respectively. In the NH (Fig. 1a) the majority of TCs identified in the reanalyses have a mean separation from those in IBTrACS of less than 2° (220 km), with the peak of the distribution for each reanalysis typically at less than 1° (110 km). The smallest mean separation distances occur for JRA-55, with the distribution peak at 0.5° (56 km) and the largest for MERRA, with the distribution peak at 1° and the other reanalysis somewhere in between. The JRA-55 separation distances are comparable with those from the much higher-resolution (T1279; 16 km) operational analyses of ECMWF (Hodges and Emerton 2015; see the appendix therein), which may be a consequence of the assimilation of the TWR observations in JRA-55. This conjecture is strengthened by the fact that JRA-25, which also assimilates TWR data, is comparable in terms of the mean separation distances to the much higher resolution NCEP-CFSR. It is also apparent that MERRA-2 has improved over MERRA with respect to the separation distances. In general, the mean separation results for the NH (Fig. 1a) are consistent with those found by Schenkel and Hart (2012) for the identical reanalyses considered. In the SH (Fig. 1b) a rather similar picture is seen, with each of the reanalyses occurring in the same order as in the NH of best to worse. While the separation distances appear slightly larger for some reanalyses in the SH (i.e., ERAI and MERRA), the others are comparable with the results in the NH, highlighting the improvement in the SH in the more recent reanalyses compared with older reanalyses.
Figures 1c and 1d show the lifetime distributions in the NH and SH respectively. In the NH it is apparent that the TCs identified in the reanalyses have much longer lifetimes than the TCs in the observations. This is a consequence of not imposing any TC identification criteria during the tracking. Imposing the TC detection criteria during the tracking would truncate the tracks to the TC stage alone and introduce a dependency of the lifetime on the chosen criteria and how well TCs are represented in the reanalyses in terms of intensity and structure. The extended life cycles include pre-TC stages such as easterly waves and the stage after extratropical transition. Some of the reanalysis TCs can exist for longer than one month, in which time a precursor disturbance can travel across an ocean basin, develop into a TC, and recurve to high latitudes undergoing extratropical transition, whereas none of the observed TC tracks lasts this long. The distributions for the different reanalyses are quite close together, showing that rather similar lifetimes are obtained for all the reanalyses. A similar set of results is obtained in the SH, although the distributions for the reanalyses are a little noisier, due to the smaller number of observed TCs in this hemisphere.
3) Latitude of maximum intensity
The latitude at which the maximum intensity is attained in terms of the 10-m winds is shown for the NH and SH in Figs. 1e and 1f, respectively. In the NH the distributions show that, while most TCs in the reanalyses attain their maximum intensity at similar latitudes to those in the observations, there are some TCs that attain their maximum intensity at much higher latitudes. A possible cause for this behavior is that, because of the longer life cycles that are identified in the reanalyses, some storms only attain their maximum intensity as they recurve to higher latitudes and become larger and better represented at synoptic scales. While this could be addressed by restricting the reanalysis tracks to just the TC stage, this would mean either truncating the tracks where they overlap with the best-track data (Hodges and Emerton 2015) or using the detection criteria based on intensity and structure discussed above to define the TC part of the life cycle. Either of these approaches introduces a degree of subjectivity: the first as it depends on the different operational practices of the operational agencies, and the second because it depends on how well TCs are represented in the different reanalyses. Also, for this part of the study, we want to see what exactly is in the reanalyses in terms of TC life cycle and restricting the life cycles defeats this objective. This is also important for future work, such as studies of extratropical transition and risk associated with TCs and their later life cycle stages in extratropical regions. A similar situation may also occur for the TC stage itself, where the relatively low resolution of the reanalyses means that TCs are not well represented at the small spatial scales of TCs in the tropics, but become better represented as they move to higher latitudes. A similar picture is seen for the SH (Fig. 1f). This type of behavior is often seen for TCs identified in relatively low-resolution climate model simulations (Manganello et al. 2012).
Also examined are the maximum intensity distributions of the TCs for three intensity measures: minimum MSLP and maximum 10-m and 925-hPa wind speeds, which are shown in Fig. 2 for both NH and SH TCs. For both MSLP (Figs. 2a,b) and 10-m wind speeds (Figs. 2c,d) in the NH and SH it is clear that all the reanalyses underestimate the intensity of TCs compared to the observations and that the intensities are model dependent. This is not surprising considering the relatively low spatial resolutions of the reanalyses where the assimilation of observations cannot correct for this. Previous studies with dynamical downscaling of individual historical TCs, such as Katrina, have shown that resolutions of approximately 1–5 km with a nonhydrostatic model are necessary to simulate TC inner-core processes correctly in order to enable the right magnitude of wind intensities (Davis et al. 2008) to be simulated. However, some studies using hydrostatic models with parameterized convection at resolutions of ~10 km can certainly produce TCs with depths as large if not larger than observed, though winds can still be too weak (Manganello et al. 2012). Coupling to the ocean has also been found to be important in correctly simulating TC intensity (Kilic and Raible 2013), although only the NCEP-CFSR applies any such coupling and Previous studies with dynamical downscaling.
The results for intensity based on the MSLP (Figs. 2a,b) show that in general the more recent reanalyses, NCEP-CFSR, JRA-55, and MERRA-2, have deeper TCs; this is more evident in the SH, although in both hemispheres few TCs reach minimum pressures below 940 hPa. The more recent reanalyses may be performing slightly better with respect to this intensity measure, possibly due to better use of the available observations and improved models, and not necessarily due to resolution. For 10-m wind speeds (Figs. 2c,d), much larger differences are seen between the different reanalyses, although, as already mentioned, none of them can simulate the strongest intensities seen in the observations. NCEP-CFSR has the most intense TCs in terms of 10-m wind speeds, with some TCs almost attaining intensities of 50 m s−1 (category 3 TS) but with no category 4 or 5 (Saffir–Simpson scale) TSs. The weakest maximum 10-m wind speed intensities are produced by the MERRA reanalysis with no TCs surpassing 30 m s−1, which barely reaches category 1 TS. However, the more recent MERRA-2 reanalysis shows a significant improvement being comparable with JRA-55 in having TCs that can almost attain 10-m wind speeds of 40 m s−1 (category 1 TS), although this is less than those seen for the NCEP-CFSR. The results for the reanalyses’ TC 10-m wind speeds show similar behavior in both hemispheres. The results for both 10-m wind and MSLP maximum intensities are generally consistent with those of Schenkel and Hart (2012) for the NH.
One problem with using the 10-m winds from the reanalyses is that they are not a direct model prognostic field but rather are computed as a diagnostic, although not necessarily in the same way for each reanalysis. They are generally computed as an extrapolation from the lowest model level to the surface using profile functions and corrected when over land for terrain roughness to conform to the WMO standard for SYNOP observations (see, e.g., ECMWF 2015). However, for some reanalyses this is not done for the actual analyses: for example, in MERRA, it is performed during the IAU cycle and so does not experience the full analysis increment, and is an average over four model time steps (M. Bosilovich, NASA, 2016, personal communication). To evaluate the uncertainty further, the maximum wind speeds at the 925-hPa pressure level associated with the TCs are also considered (pressure level winds are obtained by interpolation between model levels); the TC 925-hPa winds are shown in Figs. 2e and 2f for the NH and SH, respectively. The downside to using the 925-hPa winds is that there are no available observations with which to compare, although this is not critical here, where we just want to see if the same differences between the reanalyses, as seen for 10-m winds, occur at this level. The results for the wind speed intensity at 925 hPa show a rather different perspective from those at 10 m, with both NCEP-CFSR and MERRA-2 having comparable values in the tail of the distribution with values as high as 60 m s−1. The MERRA reanalysis is now comparable with the other reanalyses of JRA-55, JRA-25, and ERAI.
5) Wind speed–pressure relationship
The wind speed–pressure relationship is often used by the operational centers to estimate winds from pressure measurements and surface pressure from wind measurements, for which various quadratic empirical relationships have been developed based on cyclostrophic balance (Knaff and Zehr 2007). Hence, the wind–pressure relationship of TCs is often considered in studies of TCs in models and reanalyses (Roberts et al. 2015) to compare with the observed relationship, although it should be noted that the observations may themselves be estimated from one of the empirical relationships, which can differ between agencies (Knaff and Zehr 2007).
Figure 3a shows the wind–pressure relationship for the observations and the TCs identified in the different reanalyses using the direct matching method in the NH. The wind–pressure relationship is determined using the 10-m wind speeds and MSLP values, by determining the maximum attained 10-m wind speed and taking the MSLP value at the same time. The results show that all the reanalyses reflect the underestimate of both the 10-m wind speeds and MSLP depths of the TCs, this being most prominent for MERRA. This can be related to the radius of maximum wind (RMW), computed for the reanalyses at the time of maximum 10-m wind intensity, and shown for the NH in Fig. 3c. The RMW is not available for all the agencies that contribute to IBTrACS but we estimate it at the time of maximum wind intensity, based on the simple Rankine model described by Knaff and Zehr (2007). This gives RMW values for the observations predominately below ~100 km (1°) and a peak around ~50 km (0.5°). This is consistent with the findings of Kimball and Mulekar (2004) for North Atlantic TSs who made use of an extended “best track” dataset.
For all the reanalyses the RMW are seen to be too large (Fig. 3c). Assuming gradient wind balance for the TCs, and the fact that RMWs are too large and wind intensities are too low for the reanalyses, implies that the pressure difference between the storm centers and the environment is also too low, consistent with the wind speed–pressure relationship in Fig. 3a. The fact that the NCEP-CFSR has the strongest wind intensities and one of the smallest RMWs is also consistent with the result in Fig. 3a that NCEP-CFSR is closest to the observed wind speed–pressure relationship, whereas MERRA, which has the weakest maximum wind speeds and large RMWs, is the worst of the reanalyses in this respect. MERRA-2 shows a significant improvement over MERRA in terms of the wind speed–pressure relationship, which can be understood in terms of the improved maximum wind speeds and lower RMWs. In fact, MERRA-2 has the lowest RMWs, although is not as strong in intensity (10-m wind speed) as NCEP-CFSR.
The fact that NCEP-CFSR appears to perform the best in terms of the wind speed–pressure relationship may be the result of the vortex relocation scheme used by the NCEP-CFSR assimilation system, which, as pointed out by Schenkel and Hart (2012), will result in improved vortex location, which in turn may lead to improved TC intensities as a result of the TC being in the correct environment. Allied to this, Schenkel and Hart (2012) also pointed out that observations within the TC vicinity are less likely to be rejected by the assimilation scheme, due to smaller differences with the first-guess field. However, the situation is likely more complex than this, as MERRA-2 also uses the vortex relocation method and has the lowest RMWs but is not the most intense in terms of wind speed. JRA-55, on the other hand, with a similar resolution to MERRA-2, has the smallest location errors (Figs. 1a,b), and does not use vortex relocation, but it does assimilate best-track data as synthetic dropsondes (Hatsushika et al. 2006) and has comparable intensities to MERRA-2 and a wind speed–pressure relationship, also very similar to MERRA-2. Hence, it appears that there are complex trade-offs occurring within the assimilation systems.
b. Objective identification
Following the assessment of how well TCs are represented in the chosen reanalyses it is of interest to see how existing objective TC identification schemes perform in order to try and understand the impacts of the differences between reanalyses on objective TC identification. This is important, as objective schemes are the only way to identify TCs in climate model simulations and they are often contrasted with reanalyses as a means of verification at comparable resolutions.
As Murakami (2014) has shown, detection schemes have to be tuned to particular reanalyses to optimally detect TC–TS frequencies. This is also what tends to happens in operational settings, where detection schemes are often tuned to a particular operational setup, so that applying them to data from a different operational center can give very different numbers of detected TCs from the in-house method [cf. Fig. 22 of Kobayashi et al. (2015)]. Some schemes also adjust identification criteria by ocean basin (Camargo and Zebiak 2002) to account for model biases. However, these are not appealing approaches in the climate model context, where a fixed set of criteria, applied in a common resolution framework, will provide a better comparison between different model simulations or different climate scenarios (Shaevitz et al. 2014).
To assess how one such scheme performs, the objective detection method described in the methodology section, based on the vorticity at multiple levels between 850 and 200 hPa, is applied to the vorticity tracks obtained from the tracking of all vorticity centers.
1) Annual counts
The annual average TC counts are determined for each ocean basin (Fig. 4) and are shown in Fig. 5. In the NH, the annual number is in reasonably good agreement with the observations of IBTrACS apart from MERRA, which has ~30 fewer identified TCs, while the other reanalyses are slightly over or under in number, a result also previously noted by Murakami (2014) using the same criteria. However, in the SH the identification has resulted in a much higher number than in the observations, which occurs for all the ocean basins. The overestimation is particularly large in the South Pacific (SP) region; the South Atlantic (SA) region also has more identified systems than are in the observations. These differences will be discussed further in the discussion section (section 4).
2) Matching against IBTrACS
To further analyze the objectively identified TCs, they are matched against the observed TCs of IBTrACS, using the same matching method as used for the direct matching method, to identify the common storms between the two and the false positive and negative detections. The results of this track matching are shown in Table 2 in terms of the probability of detection (POD) and false alarm rate (FAR). The POD is defined here as the number of matched storms for each reanalysis divided by the total number of storms in the observations, and the FAR by the number of nonmatched storms in each reanalyses divided by the total number of storms in the same reanalysis. Also shown in Table 2, for comparison, are the POD for the direct matching results, before applying the objective criteria, discussed in the “Direct matching results” subsection (section 3a), which shows an almost uniform detection rate of 0.95 across all the reanalyses in both hemispheres, although this is lower in the SH than the NH. The reason why the POD for the SH is lower for the precriteria matching is likely related to differences in the observations that are assimilated in the reanalyses between the two hemispheres, as there is no dependence on structure or intensity for detection for these results.
For the POD based on using the objective detection method the values are much lower, with the best detection for JRA-55 and the worst for MERRA in both hemispheres, although POD is higher in the SH than the NH, possibly due to differences in sample sizes. The FAR (Table 2) shows values ranging from 0.16 for JRA-25 to 0.36 for NCEP-CFSR in the NH. The fact that JRA-25 has the lowest FAR may be related to this reanalysis having the lowest resolution and hence detecting fewer small-scale and possibly weaker storms; this could be investigated using GCMs of varying resolution. In the SH, FAR is much higher, as might be expected from the previous discussion, due mostly to the higher number of TCs detected compared with the observations. From these values of POD and FAR it is apparent that, although similar numbers of TCs are detected in the NH using the objective detection method, they need not be identical to the ones in the observations.
To explore the POD and FAR values in more detail the storms that are in the observations and that match and do not match with those identified in the reanalyses, using both identification methods, preobjective direct matching and postobjective matching, are further analyzed relative to their attained category in the observations according to the Saffir–Simpson scale determined from the 1-min observed winds. Hence, the IBTrACS storms are partitioned into the categories according to the 1-min winds before matching them against the reanalysis tracks, as previously described. Since different agencies use different wind intensity scales, this approach provides a more consistent classification across the different ocean basins. Since some weak storms in IBTrACS have no wind information, they are excluded from this analysis; Murakami (2014) excluded tropical depressions from their study, although it is unclear how this is achieved for the reanalyses, apart from applying the agency wind thresholds.
The results of this analysis by category are shown in Tables 3 and 4 for the NH and SH, respectively. In the NH, Table 3 shows that for the objectively identified TCs it is the weakest categories that have the poorest level of matches between the reanalyses and IBTrACS, in particular for the tropical depressions, although many tropical depressions in IBTrACS are excluded due to lack of wind information. However, for the TS category (between tropical depression and category 1) the best-performing reanalyses at this level, JRA-25 and JRA-55, match with 78.5% of IBTrACS storms, while for the worst-performing (MERRA) only 41.6% of IBTrACS storms match. For the higher TS wind speed categories the percentage of matches with IBTrACS steadily increases with category on progressively smaller sample sizes: 92%, 98%, 99.5%, and 100% for from category 1 to category 5 (CAT1–CAT5), respectively, for the best-performing JRA-25 and JRA-55 and considerably worse for MERRA (63.5%, 75%, 83%, 82.5%, and 92%) with NCEP-CFSR and MERRA-2 comparable with JRA-25 and JRA-55. Recalculating the POD for just CAT1–CAT5 TS (Table 5) the best-performing reanalyses, JRA-25 and JRA-55, now have values 0.95.
In the SH, Table 4 shows that a fairly similar situation occurs as in the NH for the objectively identified TCs, except that it is apparent there are virtually no tropical depressions available to compare with in the observations, either because very few of this category of storms have any wind values or, more likely, that they are not generally included in the best-track datasets in this hemisphere; this is discussed further in section 4. The best degree of matches again occurs for the JRA-25 and JRA-55, ranging from 84% to 89% for the weakest TSs (TS category) to 95% for CAT5.
The POD, for CAT1–CAT5 objectively identified TS only, shown in Table 5, shows that for this intensity range the values are comparable in both hemispheres and comparable with the results in the study of Murakami (2014), who restricted their study to this intensity range, although they used different skill metrics compared to here and in the study here there is no special tuning of the objective detection parameters for each reanalysis, as in Murakami (2014).
For the TCs identified using the direct matching method (preobjective), previously discussed in section 3a, the matching by observation category (not shown) indicates consistently high POD values as reported in section 3a for all categories and reanalyses.
To understand the nature of the TCs, identified by the objective detection method, in the reanalyses that do not match with the IBTrACS TCs, in particular in the SH, those that do not match are binned according to the latitude of their genesis. For the SH this is shown in Fig. 6a. This shows essentially two groups of storms: those with genesis within 0°–20°S and those with genesis occurring south of 20°S. The genesis for all TCs in IBTrACS is almost entirely within 0°–20°S (not shown). Examining these two groups of nonmatching objectively identified TCs separately, a scan of the tropical storm advisories (discussed later) indicates that some of the identified storms in the first group can be found in the advisories but not IBTrACS; this is discussed further in section 4. Figure 6b shows examples of two tracks identified in ERAI that do not match with IBTrACS: the track labeled “Storm 1” occurs in January 2011 and is a storm that possibly occurs in the RMSC Nadi advisories, named 02F, but is not in IBTrACS, probably because it did not develop further into a true TS. Even so, it seems a substantial storm with 10-m winds in ERAI over 20 m s−1 while near Australia. Figure 6c shows the infrared satellite image, which presents an asymmetric structure, unlike a true TS, with this storm more likely to be a hybrid warm core TC. The second storm shown in Fig. 6b originates south of 20°S, where very few IBTrACS storms have their genesis. This particular storm seems to have formed in the vicinity of the South Pacific convergence zone (SPCZ) and travels south eastward with relatively weak 10-m winds in ERAI ~15 m s−1 through a region of very little habitable land. It has no reference in any tropical storm advisories, yet its structure in the satellite imagery (Fig. 6d) shows some similarities with Storm 1 (Fig. 6c) and it may also be a hybrid TC. As shown by Yanase et al. (2014) (Fig. 1) using the Hart phase space classification of cyclones (Hart 2003), applied to reanalysis data, storms found between 20° and 40°S in the SH summer tend to be hybrid storms. There are also storms in IBTrACS that do not match with an analysis track, but these tend to be the weakest storms below category 1 as shown in Tables 3 and 4. These issues are further discussed in section 4.
There are several possibilities for the poorer performance of the objective detection method in the SH compared with the NH in terms of the detection, relative to the observed TCs in IBTrACS. As shown above, the discrepancy in numbers is closely associated with the weakest storms, tropical depressions and tropical storms (below category 1). The first possibility for the differences between the NH and SH objective detection may be due to different biases in the best-track data in the SH compared with the NH; the second is due to different biases in the representation of TCs in the reanalyses between the NH and SH; and the third is due to the selection criteria used by the objective detection method to identify TCs in the reanalyses being not selective enough, or being mainly tuned to the NH. These will be addressed in turn.
In terms of possible biases in the IBTrACS observations, it is possible that the SH is observed differently than in the NH. The SH is sparsely inhabited in particular regions, such as the SP and SA, so that less emphasis may be placed on detection except for the most intense systems likely to make landfall (Kucas et al. 2014). Related to this is the application of different storm detection procedures in the different warning centers that produce the best-track data (Velden et al. 2006b; Kueh 2012). Storm classification is primarily based on the interpretation of satellite observations using empirical relationships such as the Dvorak scheme (Velden et al. 2006a); there is little aircraft reconnaissance apart for the North Atlantic with some other limited coverage associated with field campaigns and in specific regions, such as Taiwan (DOTSTAR; Wu et al. 2005). The uncertainties of applying operational detection and classification schemes when storms are relatively weak and show a poor organization (Torn and Snyder 2012) may make deciding between whether a tropical disturbance should be classified as a tropical depression and counted in best track, or is some other tropical storm such as a subtropical or hybrid cyclone, difficult and dependent on subjective forecaster interpretation. Gyakum (2011) states that “there is presently no single set of objective criteria that, if applied operationally, would irrefutably support a forecaster’s analysis of cyclone type (subtropical, hybrid or tropical)” (p. 1.6.23). It is also unclear whether all agencies report weaker storms such as tropical depressions consistently in their best-track analyses, and hence whether they make their way into IBTrACS. For example, HURDAT, which is produced by the National Hurricane Center (NHC) and forms part of IBTrACS and covers the North Atlantic and northeastern Pacific, includes subtropical cyclones (Landsea and Franklin 2013), whereas the Joint Typhoon Warning Center (JTWC), which covers the western North Pacific, South Pacific, and southern and northern areas of the Indian Ocean, do not routinely include subtropical cyclones (Kucas et al. 2014; Gyakum 2011) unless they undergo tropical transition (TT) (Bentley et al. 2016; McTaggart-Cowan et al. 2013). Even within a single ocean basin where multiple agencies are operational, considerable uncertainties exist between different best-track datasets. For example, Ren et al. (2011) and Barcikowska et al. (2012) highlight significant differences between JTWC and JMA best-track data in the western North Pacific (WNP) in terms of frequency and intensity of TCs, with better agreement for frequencies for category 2 TS and above; this is exactly where the objective detection scheme performs best in both hemispheres.
Therefore, uncertainties in the interpretation of the observations for the weaker tropical storms, and different agency operational procedures, may result in their exclusion from the best-track archive. Several reassessments of best-track data, in particular in the SH, have resulted in the inclusion of some additional storms but also the removal of some others (Diamond et al. 2012), so that actual numbers are not significantly changed.
However, evidence that the SH may be being treated differently for tropical storms in the observations than in the NH, in particular with respect to the weaker subtropical and hybrid storms, can be seen by considering the tropical storm advisories. Information on weak tropical disturbances, together with TCs, is available in text-based reports from the warning agencies, such as the JTWC “significant tropical cyclone advisories.” However, not all this information is included in the best-track postseason analysis and hence IBTrACS. For example, in the South Pacific, IBTrACS reports five storms in the 2011/12 season (July–June) but scanning the advisories (from RSMC Nadi) results in a much larger number of tropical disturbances, ~20.
A more quantitative comparison can be made using the combined advisories from each warning center, for each year, in each hemisphere (July–June in the SH). This information has been collated by Padgett and Young (2016) from 1998 onward for both hemispheres, although some very weak systems are not included. Comparing the numbers in the advisories with those in IBTrACS over the period 1998–2012, which overlaps with our study period, in the NH, IBTrACS has on average 69 storms per year and the advisories 72, hence the advisories have ~4% more storms; in contrast, for the SH, IBTrACS has on average 28 storms per year and the advisories 39, hence the advisories have ~40% more storms. Hence in the NH it appears that a much larger proportion of the storms in the advisories make their way into the best-track data than in the SH. This can partially explain the difference in numbers between IBTrACS and the TCs identified by the objective detection method in the reanalyses in the SH. It was discussed in the “Matching against IBTrACS” subsection [section 3b(2)] that some of the storms identified in the reanalyses appeared to be in the advisories but not IBTrACS.
Tropical disturbances and subtropical cyclones occur in all the ocean basins, and it seems that whether or not they contribute to the best-track data may vary between the NH and SH and be dependent on the warning center procedures. The SPCZ and South Atlantic convergence zone (SACZ) are known to be associated with weak tropical depressions and subtropical cyclones in the SH, as well as more intense tropical cyclones in the South Pacific (Vincent et al. 2011). A similar situation occurs in the North Pacific associated with the mei-yu front (Lee et al. 2006). The South Atlantic is not known as a very active TC region, due to relatively cool sea surface temperatures and relatively high vertical wind shear. However, several studies have highlighted this region as susceptible to the formation of subtropical cyclones (Evans and Braun 2012; Gozzo et al. 2014), often in association with the SACZ. This is also seen in simulations produced with high-resolution GCMs, where they are often identified as TCs (Roberts et al. 2015). The study of Gozzo et al. (2014), based on reanalysis data, found on average seven subtropical cyclones per year with genesis between 20° and 30°S, a number that is remarkably similar to the number of systems objectively detected in the reanalyses in this study in the SA region. The majority of the subtropical cyclones identified by Gozzo et al. (2014) do not seem to have made it into the advisories or best-track data, either because they are too weak, even for the advisories, or possibly because in general they are moving away from land and therefore not a threat (Kucas et al. 2014). Another possibility is that SA subtropical cyclones are more asymmetric than those found in the North Atlantic (Evans and Braun 2012) and hence do not satisfy the criteria for inclusion in the TC best tracks. A similar situation may also occur in the South Pacific. If these additional uncertainties in the best-track data are considered together with the numbers in the advisory data, then the actual numbers of TCs occurring in the SH may not be too far away from the numbers objectively identified here in the reanalyses. The results from section 3b(2) suggest that some of the differences between numbers in the SH between the objective identification used in this study and IBTrACS may be related to the identification of hybrid or subtropical cyclones by the objective identification scheme.
Other regions where subtropical or hybrid storms may need to be considered are the cool seasons in the eastern North Pacific, where they are called Kona storms (Kodama and Businger 1998). Monsoon depressions may also be confused with weak tropical cyclones in the reanalyses as these also have a warm core aloft structure and occur in the north and south Indian Ocean, the western Pacific, and the Australian region (Hurley and Boos 2015). They represent an additional uncertainty in the best-track archive, as they are occasionally included in the best-track data in the western Pacific via the JTWC (Hurley and Boos 2015); however, as with subtropical cyclones, this is not done consistently for all agencies. These may also contribute to uncertainties in the best-track data in the north and south Indian Ocean and South Pacific.
The second possibility for the differences in the numbers of TCs detected by the objective detection method in the reanalyses and IBTrACS in the NH and SH concerns the quality of the reanalyses in the two hemispheres, which may affect how TCs are represented and hence contribute to the uncertainties in their detection in the reanalyses. The primary observations assimilated in the SH come from satellite observing platforms, which generally provide data with relatively coarse vertical resolutions, whereas in the NH the surface-based observing system provides a more diverse range of observations, including from sondes and aircraft. The use of direct satellite radiance assimilation, variational bias correction, and modern assimilation methods has resulted in much better extraction of the information content in the observations, including for older observations (Rienecker et al. 2012).
Discriminating between weak TSs, subtropical cyclones, and other systems in the reanalyses is a problem in both hemispheres for the objective detection method, but could be more of a problem in the SH if the TCs are not as well simulated and storms, including subtropical or hybrid storms, do not have the correct structure. This could be exacerbated if there are more of the weaker type of storms in the SH associated with the convergence zones as discussed above, which, allied to the difficulty in separating these storms from other systems, may be a factor in the differences between the number of storms in IBTrACS and the number detected by the objective detection method in the reanalyses in the SH. The only way to test this is by using observing system experiments, where the NH observing system is degraded to that of the SH and the data assimilation is rerun. These types of experiments have been performed in the past and have shown the relative importance of the different types of observations used in the reanalyses and how changes to the observing system may affect the reanalysis (Bengtsson et al. 2004b; Whitaker et al. 2009). However, it is very time consuming and expensive to rerun modern data assimilation systems, even if we had access to the same systems used to produce the reanalyses used here. Hence this is beyond the scope of this paper. However, studies using the same detection criteria as used here, applied to relatively high-resolution climate model simulations for the current climate (Gleixner et al. 2013; Strachan et al. 2013; Roberts et al. 2015), have found similar results to those found here for the reanalyses, in that similar TC numbers to observations are found in the NH, albeit with some model-dependent basin by basin biases, and a larger number of TCs than in the observations are identified in the SH. This may indicate that the difference in the number of SH storms from the observations is not necessarily related to differences in the quality of the reanalyses in the two hemispheres but rather may depend more on possible biases in the best-track data and possibly the detection criteria used in our objective scheme, discussed next.
The larger bias in the number of TCs identified by the objective detection method in the SH compared with the NH relative to observations may also be related to the detection criteria used here, and whether they are selective enough for the data used, so that more tropical depressions, subtropical cyclones, and hybrid cyclones are identified as TCs, possibly related to the quality of the reanalyses as discussed above. TC detection schemes, applied to model or reanalysis data, are certainly sensitive to the detection criteria and tracking methodology employed (Horn et al. 2014), especially for weaker storms, as shown in this paper, and are most often tuned for the NH. An alternative approach would be to apply more selective criteria to remove subtropical and hybrid cyclones from the detection, based on previous studies focused on studying subtropical cyclones, for example the Hart phase space parameters (Guishard et al. 2009; Evans and Braun 2012; Yanase et al. 2014). Another idea in the literature suggests using TC development pathways (McTaggart-Cowan et al. 2013), whereby tropical cyclogenesis is categorized according to dynamical metrics, although this would necessarily introduce added complexity and possibly more parameters to choose subjectively. It would also remove these types of storms in the NH, so that, while the numbers detected in the SH may compare better with the observations, the numbers may compare less favorably in the NH. However, it might allow a better focus on the different storm types.
It is likely that all three of the issues discussed above can lead to TC detection biases in the reanalyses relative to the best-track data.
No TC tracking and/or identification scheme will be perfect and, although TC identification schemes can be retuned against the observations separately for the NH and SH or for individual ocean basins if necessary (Camargo and Zebiak 2002) to take account of possible deficiencies in the detection and the observational biases, this does not seem like a good idea if TC detection is to be applied to model simulations where methodological consistency is important.
5. Summary and conclusions
The study of TCs in six recent reanalyses has shown that all the reanalyses are capable of representing nearly all the TCs present in the best-track archive of IBTrACS, with a detection rate of ~98% in the period since 2000 and slightly lower before this. However, how well the TCs are represented in the reanalyses, in terms of their properties, is less encouraging, with wind intensities significantly lower than in the observations and pressures too high in value. Although significant amounts of observations are assimilated by the data assimilation systems used in the reanalyses, in particular from satellites, this is unable to correct these deficiencies in the TC properties, due to the still too low model resolution and dependence on parameterized processes used in the reanalyses. Additional methods of assimilating observations in the vicinity of the TCs and vortex relocation can help improve this situation, but not to the extent where intensities get anywhere near those observed at current reanalysis resolutions. However, it is apparent that there have been some improvements in the representation of TCs in the more recent reanalyses of NCEP-CFSR, JRA-55, and MERRA-2; in particular MERRA-2 shows a significant improvement over the older MERRA reanalysis in terms of wind and MSLP intensities. Separation distances between TCs identified in the reanalyses and the observations have also improved for the more recent reanalyses.
The improvements in the intensities and location are most likely due to the increases in model horizontal resolutions and the use of improved data assimilation and bias correction systems, which are capable of extracting more information content from the older observations, as well as resulting in less observation rejection and the introduction of new and better calibrated observing systems in recent years. This progress is likely to continue as new reanalyses are produced with ever higher resolutions, such as the new ECMWF ERA5 reanalysis. Further improvements in data assimilation are also expected as well as the introduction of new and more accurate observing systems, although the downside to this may be the introduction of spurious trends in TC properties.
The other aspect explored in this study is how well objective TC detection schemes are capable of detecting the same TCs that are in the observations using a widely used identification scheme. This is important in order to have confidence in these schemes when applied to climate model simulations and for comparisons made between models or experiment scenarios. This part of the study highlighted the problem of detecting TCs at the low intensity end of the TC intensity range: in particular, tropical depressions and up to category 1 (Saffir–Simpson), with gradual improvements in the detection rate with increasing TS category. This raises several issues: Are the current detection schemes used at operational centers and for climate studies of TCs, which all have a rather similar methodology of user chosen thresholds on intensity and/or structure, selective enough? Are TCs represented well enough in the reanalyses? Are there problems with observational biases in the best-track data for weak storms? The answer to these questions is probably that all three play a role in differences found between the objective identification of TCs in reanalyses and the observed best-track data. It is clear that the intensities, and probably structure, are not well enough simulated in the reanalyses, which will cause problems when trying to discriminate between weak TSs and other tropical systems.
In terms of more selective criteria, other approaches could certainly be introduced, such as the phase space approach, but this will also depend on how well TCs are represented in the reanalyses and the introduction of subjective thresholds on the phase space parameters (Yanase et al. 2014). However, it may be useful in removing the need for artificial boundaries in the TC identification such as the latitude band for genesis used in this study.
The problem of observational bias is also an important aspect, in particular for the weaker storms, since forecaster interpretation and subjectivity will play a role in whether a particular storm is included in the best-track data, as not all storms fall neatly into particular classifications. Allied to this are the different operational criteria employed by the different RSMC, which contribute data to the best-track archives, such as whether to include tropical depressions or subtropical cyclones. This is likely the primary cause of the differences between the number of TCs identified in the reanalyses and IBTrACS, in particular in the SH. This makes the observations less than ideal for calibrating TC identification and tracking schemes, or indeed in their use in global climatological studies of TC frequencies and variability. It could be concluded that, given the uncertainties in the best-track datasets, they should not be considered climate-quality datasets and should be used with some caution for climate studies of TCs and for validating TCs identified in climate model simulations. Better coordination between the RSMCs would help this situation going forward, although this is not necessarily part of their remit and their operational procedures are tailored to their region of responsibility.
The problems of objectively classifying TCs operationally has been recognized by the Seventh International Workshop on Tropical Cyclones, with a suggestion that “a substantial contribution to the operational TC forecasting community could be made by recommending a universal cyclone classification methodology based on the latest research, operational forecasting capabilities, and real-time data availability” (Gyakum 2011, p. 1.6.23).
A re-evaluation of the observational record over the satellite period using a combination of the satellite data and reanalyses, using consistent identification methods for all basins, could perhaps resolve the observational bias problem over historical periods covered by the satellites and provide a more complete record of tropical storms for use in risk assessment and validating climate models. There has been some discussion that tropical depressions and subtropical cyclones should be included in the best-track data for consistency (McAdie et al. 2009), since, before satellite observations became available, some subtropical systems were probably classified as TCs. Tropical depressions and subtropical cyclones are also associated with severe weather with TS-like properties of strong winds and precipitation (Guishard et al. 2009; Gyakum 2011), so their inclusion can be justified in terms of their impact and for a more complete record of TC activity.
While there are deficiencies in the representation of TCs in the reanalyses, and 10-m winds in particular should be used with caution, they can be complementary to the observations and provide added value information on TCs such as the pre- and post-TC stages of the life cycle. For example, the tracking method used here identifies these earlier and later life cycle stages, which can then be used to study the early development of TCs and their environment as well as the extratropical transition (Studholme et al. 2015) and how storms behave after this. The extratropical transition and its aftermath are becoming increasingly important for risk analysis at high latitudes following cases such as Hurricanes Sandy and Gonzalo and recent studies such as Haarsma et al. (2013); this is a known contributor to forecast uncertainty in the extratropics (Anwender et al. 2008).
The authors thank the various data archive centers for making the data used in this study available. ERA-Interim data were provided courtesy of ECMWF. MERRA was developed by the Global Modeling and Assimilation Office and supported by the NASA Modeling, Analysis and Prediction Program. Source data files can be acquired from the Goddard Earth Science Data Information Services Center (GES DISC). The NCEP-CFSR dataset used for this study is provided from the Climate Forecast System Reanalysis (CFSR) project carried out by the Environmental Modeling Center (EMC), National Centers for Environmental Prediction (NCEP). The JRA-25 dataset used for this study is provided from the Japanese 25-year Reanalysis (JRA-25), a cooperative research project carried out by the Japan Meteorological Agency (JMA) and the Central Research Institute of Electric Power Industry (CRIEPI). The JRA-55 dataset used for this study is provided from the Japanese 55-year Reanalysis (JRA-55) project carried out by the Japan Meteorological Agency (JMA). Vidale acknowledges funding from the Willis Chair in Climate System Science and Climate Hazards. Cobb acknowledges funding from the “Innovate UK” Knowledge Transfer Partnership (KTP). The authors would also like to thank the two reviewers for their constructive comments leading to an improved paper.
This article is included in the Modern-Era Retrospective analysis for Research and Applications version 2 (MERRA-2) special collection.