1. Introduction
Near-surface winds are important in problems such as air quality, engineering design, and renewable energy. Near-surface winds are influenced by processes on spatial scales from the microscale to planetary scales, not all of which are resolved (or simulated well) by any given physically based prognostic model. For example, atmospheric–ocean general circulation models (AOGCMs) are useful in assessing large-scale climate system in response to changes in natural or anthropogenic forcing. However, the resolution of AOGCMs is rarely finer than 1° × 1°, and therefore they are unable to explicit resolve small-scale processes such as those related to local topography (Schoof 2013). Therefore, it is useful to explore the statistical predictability of surface winds. The effectiveness of statistical prediction depends on the strength of the statistical relationship between small-scale surface winds and large-scale free-tropospheric climate variables. In this context, statistical prediction refers to the relationship of atmospheric fields (specifically surface winds and midtropospheric variables) at the same time but different locations, rather than prediction of future states. One example of the use of such models is statistical downscaling (SD) based on the assumption that synoptic-scale weather has a strong influence on local-scale weather (Maraun et al. 2010). For statistical prediction, a transfer function is built to link predictands (e.g., surface winds) to predictors (e.g., free-tropospheric climate variables). A range of statistical and machine learning methods can be used to derive the transfer function such as linear regression, generalized linear and additive models, and various nonlinear regression models. Relatively few studies have focused on using statistical prediction to model physically important vector variables such as surface winds, considering in particular the directional structure of predictability. The focus of this study is to investigate the statistical relationship between components of surface winds (predictands) and free-tropospheric climate variables (predictors) in order to assess the linear statistical predictability of near-surface wind vectors. To this end, we use linear regression to derive the transfer function between observed surface wind components and midtropospheric climate variables at a large number of observational stations across the world, and the strength of statistical relationship between predictands and predictors is assessed by the resulting predictability. Predictive anisotropy, which represents the unequal strength of the predictor–predictand relationships for surface wind components projected onto different directions, is of particular interest.
A few previous studies have considered the predictive anisotropy of surface winds. For example, Salameh et al. (2009) applied a generalized additive model as the transfer function to predict surface zonal (u) and meridional wind components (υ) from stations located in valleys of the French Alps, and found that in general only one of u and υ can be predicted well. Other studies have used linear regression based transfer functions to predict surface winds in western and central Canada (van der Kamp et al. 2012; Culver and Monahan 2013) and at buoys located over the ocean (Monahan 2012; Sun and Monahan 2013). These studies found that the predictability of wind components projected onto different compass directions generally exhibits predictive anisotropy. In addition, the best or worst predicted wind component is not always the conventional zonal or meridional component. Knowledge of the predictability of u and υ alone is not sufficient in general to assess the predictive anisotropy and the potential utility of statistical prediction at a station; it is necessary to know the predictability of wind components in all directions from 0° to 180° (as the projection along θ is negative to that along
The present study has two objectives. The first is to characterize the predictor–predictand relationship by applying linear regression based statistical prediction to a large dataset of station-based surface winds. The second is to explore the relationship between statistical predictability of surface wind components and some potential influential factors. We consider three types of factors in this study: 1) topographic complexity, 2) statistical properties of wind component fluctuations, and 3) directions of mean wind vectors. The statistical properties here refer to the magnitude and shape of the probability distributions of wind components, respectively measured by the standard deviation and kurtosis. A fundamental question of this study is whether the characteristics of predictability of surface wind components at a station can be associated with these factors.
This study considers empirical relationships between wind component predictability and potential explanatory factors, rather than the physical mechanism responsible for these relationships. As surface heterogeneity is a natural candidate cause of predictive anisotropy, it is natural to consider the influence of topography. While the mean vector wind cannot directly relate to predictability of components (as the linear regression models are based on fluctuations of anomalies with the means subtracted), common underlying physical mechanisms may determine the orientation of the mean wind and the anisotropy of predictability. The second factor considered in this study is standard deviation of surface wind components based on the hypothesis that the overall variability of surface wind components will contain both predictable “signal” and unpredictable “noise” associated with the relative influences of large-scale and local atmospheric circulations. Kurtosis of surface wind components is considered because linear regression models should be optimal when the predictand and predictors are all Gaussian (Yuval and Hsieh 2002), so non-Gaussianity might reduce linear predictability.
Some previous studies have shown that predictive anisotropy is observed in regions characterized by complex terrain, such as mountainous regions (Salameh et al. 2009; van der Kamp et al. 2012). For example, van der Kamp et al. (2012) argue that there is no straightforward relationship between directions of best predicted wind components and the frequency distribution of wind directions in western Canada. The results of these studies are derived from a small number of stations located within a limited geographic region. Our results will be based on much larger number of stations over a broader geographic range. However, these operational meteorological stations are not uniformly distributed across the land surface and are most densely concentrated in the Northern Hemisphere extratropics.
This paper is organized as follows. Section 2 presents the data and statistical methods used in this study. Section 3 explores the characterization of statistical predictability and the aforementioned factors for all stations considered. Section 4 presents a discussion and introduces a simplified statistical model synthesizing the results of this analysis. Conclusions are given in section 5.
2. Data and methods
In this study, we consider statistical predictions of observed surface wind components at a network of 2109 land stations over the period 1980–2012 (Fig. 1). While station data are available from all continents, they are concentrated in the midlatitudes of the Northern Hemisphere. Three major datasets are used in this study, described below.
Observational data from global weather stations, specifically hourly wind speed (w) and direction (φ; direction flow coming from, measured clockwise from north), from 1 January 1980 to 31 December 2012 obtained using the WeatherData function of Mathematica 9.0 (Wolfram 2016), which includes a wide range of data sources. Hourly wind speed and direction here represent the average speed and direction observed at 10 m above the ground during the 2-min period ending at the beginning of the hour. For wind sensors exposed at a higher elevation, readings have been corrected by the reporting station (WMO 2013). While this correction will generally influence both wind speed and direction, we do not expect that under most circumstances a change of a few meters will strongly influence the statistical relationship between the wind vectors and the midtropospheric flow. Chief among the data sources are the National Weather Service of the National Oceanic and Atmospheric Administration (NOAA), the Unites States National Climatic Data Center, and the Citizen Weather Observer program. Only stations with fewer than 10% missing data for the period under consideration are considered, resulting in a network of 2109 stations. To test how sensitive predictability is to the missing data gaps, we chose a few stations with near-complete data records and found that randomly removing 10% of data does not qualitatively affect the linear predictability of daily and monthly averaged data of wind components (not shown). All stations used in this study have network membership in the National Climate Data Center (NCDC) [now the National Centers for Environmental Information (NCEI)] of NOAA, and among them, 1779 stations also belong to the climate observation network of World Meteorological Organization (WMO).
Free tropospheric meteorological fields: temperature T, geopotential height Z, zonal wind U, and meridional wind V at 500 hPa spanning the entire globe with a grid resolution of 2.5° × 2.5° are obtained from NCEP Reanalysis 2 data provided by the NOAA/OAR/ESRL PSD, from http://www.esrl.noaa.gov/psd/ (Kanamitsu et al. 2002).
1 arc-minute global relief data H from the ETOPO1 Global Relief Model obtained from https://www.ngdc.noaa.gov/mgg/global/global.html (Amante and Eakins 2009).
The locations of the 2109 land stations used for statistical prediction of surface winds.
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
The four free tropospheric meteorological fields from the reanalysis data are used as predictors for the following reasons. Surface winds are related to atmospheric flow in the free troposphere through large-scale dynamical processes with structure throughout the troposphere. Large-scale balances aloft couple the atmospheric mass distribution, thermodynamic structure, and flow, which are related to geopotential height, temperature, and wind components respectively (Culver and Monahan 2013). A range of different reanalysis products exists, but the difference among these reanalyses is generally not large for the large-scale, free-tropospheric flow as shown in a previous study of the predictability of surface winds by Culver and Monahan (2013). In this study, we consider the predictability of both daily and monthly mean surface winds, in both the winter and summer seasons. Subhourly variability is neglected in the computation of daily and monthly averages due to hourly sampling; this limitation cannot be avoided with the meteorological data available. The summer season corresponds to DJF in the Southern Hemisphere and JJA in the Northern Hemisphere, and vice versa for the winter season.
a. Measures of predictability

Previous studies (Culver and Monahan 2013; Monahan 2012; Sun and Monahan 2013) have demonstrated that predictive structures, as represented by the field of correlation coefficients between wind components at a surface station and large-scale climate fields in the free troposphere, are generally spread across a large spatial area with structures that are physically reasonable from the perspective of synoptic and low-frequency atmospheric variability. Furthermore, the locations of the strongest predictors aloft are not generally immediately above the surface station. Based on the approximate size of the region of largest predictability in these previous studies, we choose a 40° × 40° grid box centered at the location of each weather station as the predictor domain. Since the resolution at which tropospheric variables are available from the NCEP II reanalysis is 2.5° × 2.5°, the grid box consists of 256 grid points (i.e., 16 grid points on each side). While the size of the domain is chosen subjectively, it is based on the results of previous studies. Qualitatively similar results were obtained using 20° × 20° boxes (not shown). We fix the domain size rather than optimizing the domain size for each station in order to minimize the potential for overfitting the statistical model for each grid point (i,j) in the predictor domain. Not all field values within the predictor domains will carry meaningful predictive information for surface winds. As another step to avoid model overfitting, all regression models are constructed using cross validation.




















b. Measures of topographic complexity
For our purpose, we seek the relationship between characteristics of predictability and variability of terrain. Different measures of topographic complexity have been proposed, none of which is clearly optimal (Lu 2008). Therefore, we choose a simplified approach, using statistics of local relief to represent topographic complexity.

















Schematic illustrating the quantification of topographic complexity at a representative station indicated by the white dot in the center of the left panel. The station is located at Kamloops Airport in British Columbia, Canada (50.70°N, 120.44°W). The grayscale indicates the topographic relief (in m). The dotted line represents all elevation points along the diameter for a chosen θ from north within a circle of 0.2° radius, where θ indicates compass directions ranging from 0° to 170°. The right panel shows the corresponding polar plot of directional index of topographic complexity [Eq. (11)].
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1













c. Methods of statistical analysis
Besides the index of topographic complexity, we will consider the standard deviation and kurtosis of the wind component fluctuations [respectively




Second, for each individual station, directional relationships between predictability and chosen factors for all 36 wind components are assessed using either the Spearman rank correlation coefficient
3. Results
This section first displays characteristics of the statistical predictability of surface wind components. An exploratory analysis of relationships between predictability and the three chosen factors is then presented and discussed.
a. Geographic distribution of predictability
In the following analysis, magnitudes of predictability will be characterized by
1) Magnitude of predictability
Maps of the quantities
Minimum, maximum, and anisotropy of predictability resulting from multiple linear regression based statistical prediction for daily averaged winter and summer observations.
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
As in Fig. 3, but for monthly averaged winter and summer observations.
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
Number of stations with strong predictive anisotropy (i.e.,
Figures 3 and 4 show evidence of some relation of these measures to topographic complexity. For instance, the distribution of predictability across the continent of North America demonstrates that lower predictability and stronger predictive anisotropy are more commonly found in the mountainous regions of the west relative to the rest of the continent. Similarly, the predictability along the west coast of South America (dominated by the Andes) is lower and the anisotropy is stronger than in the rest of South America. However, low or anisotropic predictability can also occur well away from mountainous regions.
The comparison of monthly and daily averaged predictions shows that there are more stations with higher overall monthly predictability than daily predictability. This is observed in both summer and winter results. Specifically,
2) Direction of predictability
The directional characteristics of predictability also vary by region. In some regions, the orientation of predictability shows no coherent spatial structure while in others it is evidently organized. For instance, the directions of
Directions of maximum predictability indicated by
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
While large-scale organization of the orientation of predictability is not as evident over other continents as in North America, some features related to orientation of predictability can still be identified. For example, directions of
b. Case studies
To illustrate the range of relationships between predictability of surface wind components and potential explanatory factors in different settings, the predictability of wind components is considered for five stations (Table 2 and Fig. 6) representative of different characteristics of predictability, together with their surrounding terrain features and the statistical properties of surface wind components,
Meteorological stations used for case studies. The name refers to the international ID of the stations.
Predictability of wind components
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
Stations 1, 3, and 5 are located in mountainous terrain. These three stations are characterized by strong predictive anisotropy; in particular, stations 3 and 5 are characterized by relatively low
Although stations 1, 3, and 5 are all expected to be influenced by local wind systems associated with mountainous terrain, there are differences among the characteristics of predictability at the three stations. This suggests that there is no single universal explanation for the observed characteristics of predictability. Terrain features may be one factor, but not the only one. The overall predictability is high and anisotropy is weak at stations 2 and 4 located in relatively flat terrains as indicated by small
By comparing the polar plot of
Predictability
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
c. Factors related to magnitude of predictability
The probability distributions of three measures of predictability
Probability density functions of
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
As in Fig. 8, but for distributions of predictability conditioned on the maximum and anisotropy of standard deviation of wind components. The red curve indicates the kernel estimate of probability density of the explanatory factor related to variability of wind components:
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
As in Fig. 8, but for the distribution of predictability conditioned on the maximum and anisotropy of the kurtosis of wind components. The red curve indicates kernel estimate of probability density of the explanatory factor related to kurtosis of wind components:
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
Figure 8 shows that lower predictability tends to be associated with topographically complex terrain. In particular,
Figure 9 shows that higher predictability tends to be associated with more variable wind components, and that the association between predictability and variability of wind components is more evident for
Figure 10 shows that higher predictability tends to be associated with wind components characterized by data distributions with lighter tails (or flatter centers) as indicated by smaller values of kurtosis. In particular, highest predictability corresponds to kurtosis less than or near a value of 3. However, the relative frequency of
Seasonal differences of statistical relationships shown in Figs. 8–10 are small. However, the results display a clear difference between daily and monthly time scales of averaging. In particular, the relationships between predictability and statistical properties of wind components are weaker for monthly averaged data than daily averaged data, especially for the relationships between predictability and kurtosis. A simple statistical model presented in section 4b attempts to give a qualitative explanation for the relationships between predictability and statistical properties shown in this section.
d. Factors versus direction of predictability
Directions of predictability may be related to directional variability of topographic complexity
The first three rows show histograms of rank correlation coefficient between directional variation of predictability
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
4. Discussion
a. Statistical analysis based on observations
The results presented in the previous section show that the relationships between predictability and explanatory factors display differences between daily and monthly averaging time scales. We propose two potential reasons for these differences. First, variability of large-scale tropospheric predictors is predominantly on synoptic time scales, while near-surface winds can also be influenced by mesoscale processes characterized by shorter time scales. Averaging the data over longer time scales will suppress the locally driven variability more than that associated with large-scale processes, thereby strengthening the statistical relationship between predictor and predictand quantities. Second, according to the central limit theorem, the distribution of monthly-averaged atmospheric quantities is in general expected to be closer to Gaussian than daily-averaged quantities. As multivariate Gaussian distributions are characterized by linear relationships between variables, it follows that the nonlinearity of the relationship between two datasets of climate data is often diminished as the time scale of averaging becomes longer (Yuval and Hsieh 2002). As a result, predictability by a linear regression model is likely to be higher for longer time scale of averaging as kurtosis approaches 3. This fact is also consistent with the observation that the relationship between predictability and kurtosis is weaker for longer time scale of averaging as shown in Figs. 10 and 11 since there are fewer pairs of large kurtosis and small predictability as most data are concentrated in the region of small kurtosis values for monthly averaged data.
The relationships between predictability and potential explanatory factors indicate either that one causes the other or that they have a common physical cause. For example, characteristics of local wind systems are influenced by local terrain, and local wind systems contribute to statistical properties of surface wind components as well as the direction of mean surface wind components. Anisotropy in standard deviation and kurtosis of wind components can both be created by anisotropy in surface topography. Our study can neither make a general statement about how topographic complexity, magnitude of variability, and shape of data distribution (i.e., degree of non-Gaussianity) of surface wind components are related to each other nor establish the cause and effect between these factors and characteristics of predictability. What have achieved in this study is to identify general patterns related to predictability of surface wind components. Since the patterns shown in the relationships between predictability and statistical properties are stronger than topographic complexity, we will develop a descriptive model aiming at clarifying the relationship between wind predictability and statistical properties (i.e., standard deviation and kurtosis) of wind component fluctuations in an idealized conceptual framework. Note that statistical predictability of wind components is not directly related to the mean vector wind, as the regression analysis considers fluctuations around the mean. It follows that the apparent relationship between the orientation of the mean wind and predictability must result because the variability characteristics and the mean state must share a common physical cause. In the future, a physically based study of relationships between predictability and physical phenomena related to topographic complexity and atmospheric circulations is needed in order to clarify the physical sources of predictive anisotropy.
b. Statistical analysis based on a descriptive model





























An ensemble of the quantities given by Eqs. (17)–(22) is obtained by sampling the parameter values randomly. We do not tune the parameters to match data characteristics of individual stations, but some restrictions are applied to the sampling. Specifically, in all samples,
Probability density functions of the maximum, minimum, and anisotropy of statistical predictability conditioned on the maximum and anisotropy of standard deviation and kurtosis for the idealized model Eqs. (17)–(22).
Citation: Journal of Climate 30, 16; 10.1175/JCLI-D-16-0507.1
This simple descriptive model makes no assumption regarding the nature of the transfer function used for the statistical prediction (except to the extent that nonlinear dependence is included in
In particular, from Eqs. (21) and (22), we obtain that
5. Conclusions
In this study, we have assessed the linear statistical relationship between large-scale atmospheric flow in the free troposphere and surface wind variability. The strength of such statistical relationship is important in determining the efficacy of statistical prediction, such as statistical downscaling. Particular attention has been paid to the anisotropy of predictability. We have demonstrated that predictive anisotropy is a common characteristic at surface meteorological stations at a range of locations across the world. In regions away from complex topography, both the magnitude and direction of predictive anisotropy are spatially continuous, indicating possible large-scale organization by either the surface or the flow aloft.
Furthermore, we conducted a preliminary study investigating how different aspects of fluctuating surface winds are related to the predictive anisotropy. The results demonstrate that low predictability is often associated with complex terrain and that the best-predicted wind components generally lie in the direction of largest variability and smallest kurtosis, which generally correspond to the direction of the time-mean wind. The results are effectively characterized by an idealized model in which surface wind variability is partitioned into a large-scale, linearly predictable part and a local-scale, linearly unpredictable part. The broad qualitative agreement of this model with the observed features of surface wind predictability provides evidence of the underlying hypothesis that predictive anisotropy can be characterized by the relative strength of large-scale “signal” and local-scale “noise” of the surface wind components. This study has not provided a physical mechanism for predictive anisotropy from the perspective of atmospheric dynamics related to large-scale and local-scale atmospheric circulations which can influence the strength of signal and noise.
A subsequent study will investigate the extent to which the use of a linear transfer function limits statistical predictability: that is, if the observed predictive anisotropy is a consequence of the use of a simple statistical model. If the underlying relationship between surface wind components and larger-scale predictors in the free atmosphere is more nonlinear in some directions than others, linear transfer functions can result in low predictability of surface wind components in these directions, and linear predictive anisotropy will emerge. The association between directions of relatively poor prediction and high kurtosis suggests the possibility for improvement of prediction by nonlinear models.
One the other hand, predictive anisotropy may be inherent to physical phenomena at the surface and/or flow aloft. While the use of historical data cannot directly address the changes in predictability of surface winds in response to changes in large-scale circulation or weather patterns, our results indicate that (in general) predictability should become larger and more isotropic as a result of any changes which increase the signal-to-noise ratio. We do not present any physical explanation for controls on the signal-to-noise ratio, which should be considered a hypothesis organizing many of the observed aspects of predictability. Determination of the accuracy of this hypothesis requires a more detailed, physically based analysis of the connection between near-surface flow and large-scale as well as local-scale variability in atmospheric circulations. Such an analysis is an important direction of future study.
Acknowledgments
The authors gratefully acknowledge helpful comments and suggestion of this paper from Charles Curry and from four anonymous reviewers. This research was supported by the Discovery Grants program of the Natural Sciences and Engineering Research Council of Canada.
REFERENCES
Amante, C., and B. W. Eakins, 2009: 1 arc-minute global relief model: Procedures, data sources and analysis. NOAA Tech. Memo. NESDIS NGDC-24, 19 pp.
Culver, A. M., and A. H. Monahan, 2013: The statistical predictability of surface winds over western and central Canada. J. Climate, 26, 8305–8322, doi:10.1175/JCLI-D-12-00425.1.
Kanamitsu, M., W. Ebisuzaki, J. Woollen, S. Yang, J. J. Hnilo, M. Fiorino, and G. L. Potter, 2002: NCEP–DOE AMIP-II Reanalysis (R-2). Bull. Amer. Meteor. Soc., 83, 1631–1643, doi:10.1175/BAMS-83-11-1631.
Lu, H., 2008: Modelling terrain complexity. Advances in Digital Terrain Analysis, Q. Zhou, B. Lees, and G. Tang, Eds., Springer, 159–176.
Maraun, D., and Coauthors, 2010: Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys., 48, RG3003, doi:10.1029/2009RG000314.
Monahan, A. H., 2012: Can we see the wind? Statistical downscaling of historical sea surface winds in the subarctic northeast Pacific. J. Climate, 25, 1511–1528, doi:10.1175/2011JCLI4089.1.
Salameh, T., P. Drobinski, M. Vrac, and P. Naveau, 2009: Statistical downscaling of near-surface wind over complex terrain in southern France. Meteor. Atmos. Phys., 103, 253–265, doi:10.1007/s00703-008-0330-7.
Schoof, J., 2013: Statistical downscaling in climatology. Geogr. Compass, 7, 249–265, doi:10.1111/gec3.12036.
Sun, C., and A. Monahan, 2013: Statistical downscaling prediction of sea surface winds over the global ocean. J. Climate, 26, 7938–7956, doi:10.1175/JCLI-D-12-00722.1.
van der Kamp, D., C. L. Curry, and A. H. Monahan, 2012: Statistical downscaling of historical monthly mean winds over a coastal region of complex terrain. II. Predicting wind components. Climate Dyn., 38, 1301–1311, doi:10.1007/s00382-011-1175-1.
WMO, 2013: Guide to the Global Observing System. WMO-No. 488, 172 pp. [Available online at http://library.wmo.int/opac/index.php?lvl=notice_display&id=12516#.VrO-3hG-3Hh.]
Wolfram, 2016: WeatherData source information. Accessed 1 January 2016. [Available online at http://reference.wolfram.com/language/note/WeatherDataSourceInformation.html.]
Yuval, and W. Hsieh, 2002: The impact of time-averaging on the detectability of nonlinear empirical relations. Quart. J. Roy. Meteor. Soc., 583, 1609–1622, doi:10.1002/qj.200212858311.