## Abstract

This study considers characteristics of the statistical predictability of surface wind vectors by linear regression using midtropospheric climate fields as predictors. Specifically, predictive anisotropy, which refers to unequal predictability of wind components projected onto different directions, is considered. The spatial distribution of predictability of surface wind components is determined at 2109 land surface meteorological stations across the globe. The results show that predictive anisotropy is a common feature that is spatially organized in terms of both magnitude and direction. The relationships between predictability and potential influential factors (topographic complexity, mean surface wind vectors, and standard deviation and kurtosis of wind components) are considered. It is found that poor predictability of wind components is generally associated with wind components characterized by relatively weak and non-Gaussian variability. While predictive anisotropy is often found in regions characterized by complex topography, marked predictive anisotropy also occurs away from evident surface heterogeneity. The relationships between predictability, variability, and shape of distribution of surface wind components are described using an idealized statistical model of large-scale and local influences on surface wind.

## 1. Introduction

Near-surface winds are important in problems such as air quality, engineering design, and renewable energy. Near-surface winds are influenced by processes on spatial scales from the microscale to planetary scales, not all of which are resolved (or simulated well) by any given physically based prognostic model. For example, atmospheric–ocean general circulation models (AOGCMs) are useful in assessing large-scale climate system in response to changes in natural or anthropogenic forcing. However, the resolution of AOGCMs is rarely finer than 1° × 1°, and therefore they are unable to explicit resolve small-scale processes such as those related to local topography (Schoof 2013). Therefore, it is useful to explore the statistical predictability of surface winds. The effectiveness of statistical prediction depends on the strength of the statistical relationship between small-scale surface winds and large-scale free-tropospheric climate variables. In this context, statistical prediction refers to the relationship of atmospheric fields (specifically surface winds and midtropospheric variables) at the same time but different locations, rather than prediction of future states. One example of the use of such models is statistical downscaling (SD) based on the assumption that synoptic-scale weather has a strong influence on local-scale weather (Maraun et al. 2010). For statistical prediction, a transfer function is built to link predictands (e.g., surface winds) to predictors (e.g., free-tropospheric climate variables). A range of statistical and machine learning methods can be used to derive the transfer function such as linear regression, generalized linear and additive models, and various nonlinear regression models. Relatively few studies have focused on using statistical prediction to model physically important vector variables such as surface winds, considering in particular the directional structure of predictability. The focus of this study is to investigate the statistical relationship between components of surface winds (predictands) and free-tropospheric climate variables (predictors) in order to assess the linear statistical predictability of near-surface wind vectors. To this end, we use linear regression to derive the transfer function between observed surface wind components and midtropospheric climate variables at a large number of observational stations across the world, and the strength of statistical relationship between predictands and predictors is assessed by the resulting predictability. Predictive anisotropy, which represents the unequal strength of the predictor–predictand relationships for surface wind components projected onto different directions, is of particular interest.

A few previous studies have considered the predictive anisotropy of surface winds. For example, Salameh et al. (2009) applied a generalized additive model as the transfer function to predict surface zonal (*u*) and meridional wind components (*υ*) from stations located in valleys of the French Alps, and found that in general only one of *u* and *υ* can be predicted well. Other studies have used linear regression based transfer functions to predict surface winds in western and central Canada (van der Kamp et al. 2012; Culver and Monahan 2013) and at buoys located over the ocean (Monahan 2012; Sun and Monahan 2013). These studies found that the predictability of wind components projected onto different compass directions generally exhibits predictive anisotropy. In addition, the best or worst predicted wind component is not always the conventional zonal or meridional component. Knowledge of the predictability of *u* and *υ* alone is not sufficient in general to assess the predictive anisotropy and the potential utility of statistical prediction at a station; it is necessary to know the predictability of wind components in all directions from 0° to 180° (as the projection along *θ* is negative to that along ). Note that in this discussion of the predictability of wind components, we are referring to projections of wind vectors onto a coordinate axis rather than wind coming from a specified direction. Although it is possible to estimate the conditional predictability of wind based on the direction of flow (e.g., the predictability of northerly and southerly winds separately), such an analysis is of limited practical utility as this conditional predictability assumes the direction of wind is known but not the speed. Furthermore, while any vector component can be expressed as a linear combination of the zonal and meridional components, the best-fit statistical predictive model for this component will not generally be the corresponding linear combination (with the same weights) of the individual regression models of the components.

The present study has two objectives. The first is to characterize the predictor–predictand relationship by applying linear regression based statistical prediction to a large dataset of station-based surface winds. The second is to explore the relationship between statistical predictability of surface wind components and some potential influential factors. We consider three types of factors in this study: 1) topographic complexity, 2) statistical properties of wind component fluctuations, and 3) directions of mean wind vectors. The statistical properties here refer to the magnitude and shape of the probability distributions of wind components, respectively measured by the standard deviation and kurtosis. A fundamental question of this study is whether the characteristics of predictability of surface wind components at a station can be associated with these factors.

This study considers empirical relationships between wind component predictability and potential explanatory factors, rather than the physical mechanism responsible for these relationships. As surface heterogeneity is a natural candidate cause of predictive anisotropy, it is natural to consider the influence of topography. While the mean vector wind cannot directly relate to predictability of components (as the linear regression models are based on fluctuations of anomalies with the means subtracted), common underlying physical mechanisms may determine the orientation of the mean wind and the anisotropy of predictability. The second factor considered in this study is standard deviation of surface wind components based on the hypothesis that the overall variability of surface wind components will contain both predictable “signal” and unpredictable “noise” associated with the relative influences of large-scale and local atmospheric circulations. Kurtosis of surface wind components is considered because linear regression models should be optimal when the predictand and predictors are all Gaussian (Yuval and Hsieh 2002), so non-Gaussianity might reduce linear predictability.

Some previous studies have shown that predictive anisotropy is observed in regions characterized by complex terrain, such as mountainous regions (Salameh et al. 2009; van der Kamp et al. 2012). For example, van der Kamp et al. (2012) argue that there is no straightforward relationship between directions of best predicted wind components and the frequency distribution of wind directions in western Canada. The results of these studies are derived from a small number of stations located within a limited geographic region. Our results will be based on much larger number of stations over a broader geographic range. However, these operational meteorological stations are not uniformly distributed across the land surface and are most densely concentrated in the Northern Hemisphere extratropics.

This paper is organized as follows. Section 2 presents the data and statistical methods used in this study. Section 3 explores the characterization of statistical predictability and the aforementioned factors for all stations considered. Section 4 presents a discussion and introduces a simplified statistical model synthesizing the results of this analysis. Conclusions are given in section 5.

## 2. Data and methods

In this study, we consider statistical predictions of observed surface wind components at a network of 2109 land stations over the period 1980–2012 (Fig. 1). While station data are available from all continents, they are concentrated in the midlatitudes of the Northern Hemisphere. Three major datasets are used in this study, described below.

Observational data from global weather stations, specifically hourly wind speed (

*w*) and direction (*φ*; direction flow coming from, measured clockwise from north), from 1 January 1980 to 31 December 2012 obtained using the WeatherData function of Mathematica 9.0 (Wolfram 2016), which includes a wide range of data sources. Hourly wind speed and direction here represent the average speed and direction observed at 10 m above the ground during the 2-min period ending at the beginning of the hour. For wind sensors exposed at a higher elevation, readings have been corrected by the reporting station (WMO 2013). While this correction will generally influence both wind speed and direction, we do not expect that under most circumstances a change of a few meters will strongly influence the statistical relationship between the wind vectors and the midtropospheric flow. Chief among the data sources are the National Weather Service of the National Oceanic and Atmospheric Administration (NOAA), the Unites States National Climatic Data Center, and the Citizen Weather Observer program. Only stations with fewer than 10% missing data for the period under consideration are considered, resulting in a network of 2109 stations. To test how sensitive predictability is to the missing data gaps, we chose a few stations with near-complete data records and found that randomly removing 10% of data does not qualitatively affect the linear predictability of daily and monthly averaged data of wind components (not shown). All stations used in this study have network membership in the National Climate Data Center (NCDC) [now the National Centers for Environmental Information (NCEI)] of NOAA, and among them, 1779 stations also belong to the climate observation network of World Meteorological Organization (WMO).Free tropospheric meteorological fields: temperature

*T*, geopotential height*Z*, zonal wind*U*, and meridional wind*V*at 500 hPa spanning the entire globe with a grid resolution of 2.5° × 2.5° are obtained from NCEP Reanalysis 2 data provided by the NOAA/OAR/ESRL PSD, from http://www.esrl.noaa.gov/psd/ (Kanamitsu et al. 2002).1 arc-minute global relief data

*H*from the ETOPO1 Global Relief Model obtained from https://www.ngdc.noaa.gov/mgg/global/global.html (Amante and Eakins 2009).

The four free tropospheric meteorological fields from the reanalysis data are used as predictors for the following reasons. Surface winds are related to atmospheric flow in the free troposphere through large-scale dynamical processes with structure throughout the troposphere. Large-scale balances aloft couple the atmospheric mass distribution, thermodynamic structure, and flow, which are related to geopotential height, temperature, and wind components respectively (Culver and Monahan 2013). A range of different reanalysis products exists, but the difference among these reanalyses is generally not large for the large-scale, free-tropospheric flow as shown in a previous study of the predictability of surface winds by Culver and Monahan (2013). In this study, we consider the predictability of both daily and monthly mean surface winds, in both the winter and summer seasons. Subhourly variability is neglected in the computation of daily and monthly averages due to hourly sampling; this limitation cannot be avoided with the meteorological data available. The summer season corresponds to DJF in the Southern Hemisphere and JJA in the Northern Hemisphere, and vice versa for the winter season.

### a. Measures of predictability

The predictands of this study are surface wind components projected onto compass directions from 0° and 360° at 10° intervals. Zonal and meridional winds are derived from the original hourly wind speed and direction data as follows:

where the sign convention is that *u* and *υ* are positive if winds are toward east and north, respectively. The wind component projected onto direction *θ* is then calculated as

with *θ* varying from 0° to 170°. In total, there are a total of 36 surface wind components at each station. Only 18 of these are distinct, because .

Previous studies (Culver and Monahan 2013; Monahan 2012; Sun and Monahan 2013) have demonstrated that predictive structures, as represented by the field of correlation coefficients between wind components at a surface station and large-scale climate fields in the free troposphere, are generally spread across a large spatial area with structures that are physically reasonable from the perspective of synoptic and low-frequency atmospheric variability. Furthermore, the locations of the strongest predictors aloft are not generally immediately above the surface station. Based on the approximate size of the region of largest predictability in these previous studies, we choose a 40° × 40° grid box centered at the location of each weather station as the predictor domain. Since the resolution at which tropospheric variables are available from the NCEP II reanalysis is 2.5° × 2.5°, the grid box consists of 256 grid points (i.e., 16 grid points on each side). While the size of the domain is chosen subjectively, it is based on the results of previous studies. Qualitatively similar results were obtained using 20° × 20° boxes (not shown). We fix the domain size rather than optimizing the domain size for each station in order to minimize the potential for overfitting the statistical model for each grid point (*i*,*j*) in the predictor domain. Not all field values within the predictor domains will carry meaningful predictive information for surface winds. As another step to avoid model overfitting, all regression models are constructed using cross validation.

Time series of are used to predict at a station. Before computing the linear regressions, we first remove seasonal cycles of the predictand as well as the time series of predictors estimated at each grid point. The seasonal cycles are obtained using the harmonic fit:

where with days for daily averaged time series (after removing data of 29 February for convenience), and months for monthly averaged time series. The deseasonalized time series are then scaled by their individual standard deviations in order to produce standardized predictors. The transfer function is built by a multiple linear regression model:

where *Y* is the deseasonalized predictand , and *X* is the standardized predictor set at each grid point; *β* is the vector of model parameters, and *ε* is the vector of model error. Including a large number harmonics in the seasonal cycle has essentially no effect on the resulting regression models (not shown). The influence of nonstationary seasonal variations on predictive skill is further minimized by constructing separate regression models for each of the winter and summer seasons.

The resulting predictability at each grid point is measured by the square of the correlation coefficient between the predicted and observed wind projections estimated using leave-one-year-out cross validation. Specifically, we use 32 out of 33 years of data (i.e., 1980–2012) to derive a linear regression model, and then apply this regression model to predict the surface wind components of the remaining one year. The process is repeated for 33 times to obtain the predicted time series of surface wind components at each grid point, from which is calculated:

where is the predicted time series using predictors at the grid point (*i*,*j*) and Eq. (5). A single measure of predictability denoted is then computed,

where the average is taken over the five grid points with the largest values of within the prediction domain (corresponding to the top 2% of prediction locations in the domain). In computing Eq. (6), only grid points for which are considered. Occasionally, the cross-validated correlation between the predicted and observed wind component can become negative. We interpret these negative correlations between prediction and predictand as resulting from sampling fluctuations and low intrinsic predictability (non-cross-validated values are small in these cases) and exclude them from the analysis.

As in previous studies (Culver and Monahan 2013; Sun and Monahan 2013), we represent predictability of wind components by a polar plot of showing predictability of wind components from 0° to 360° at 10° intervals. As a scalar measure of predictive anisotropy, we consider

where and are respectively the minimum and maximum over all *θ*. Values of range between 0 and 1, such that lower indicates a stronger degree of anisotropy.

### b. Measures of topographic complexity

For our purpose, we seek the relationship between characteristics of predictability and variability of terrain. Different measures of topographic complexity have been proposed, none of which is clearly optimal (Lu 2008). Therefore, we choose a simplified approach, using statistics of local relief to represent topographic complexity.

Two common measures of topographic variability in a specified domain are the difference between the maximum and minimum elevations , and standard deviation over the domain (Lu 2008). In this study, a circular region with radius of 0.2° arc length centered at each weather station is chosen as the domain used to characterize topographic complexity. The small radius 0.2° arc length is chosen subjectively from empirical tests showing that the resultant measure of topographic complexity in our study is not particularly sensitive for domain size smaller than 0.5°. One example is shown in Fig. 2. The index for measuring topographic complexity at each station is calculated by combining and within the domain. First, we normalize and for all 2109 stations to the range of 0 to 1; specifically, is normalized by the following:

where and refer to the largest and smallest values among all 2109 stations. The quantity is obtained in the same way. The index at an individual station is then computed by averaging the corresponding and :

Note that ranges between 0 and 1 with larger corresponding to more complex topography surrounding a station. Although simple, values of can distinguish mountainous regions from relatively flat regions.

Besides the overall topographic complexity at a station represented by , the variability of terrain along each compass direction surrounding a station is also considered in this study. The approach to quantifying directional variability of topography is the same as that used to calculate , and we use to denote topographic complexity along each compass direction *θ*. For each direction of *θ* from 0° to 170° at each station, we first obtain elevation data on the diameter oriented along *θ* within the domain, denoted , where *r* is the distance from the station to a point along *θ* within the domain, and *r* is between 0° and 0.2° (Fig. 2); and refer to the difference along the range of *r* considered between maximum and minimum elevation as well as standard deviation of . The normalized values and along each direction *θ* obtained by equations similar to Eq. (9) are then used to calculate the directional variability of topographic complexity denoted by Eq. (11):

The polar plot of shown in Fig. 2 clearly characterizes the topographic features of this example station. In particular, the orientation of the valley in which the station is situated is the direction of minimum , and larger values of correspond to across-valley directions along which terrain is more variable.

### c. Methods of statistical analysis

Besides the index of topographic complexity, we will consider the standard deviation and kurtosis of the wind component fluctuations [respectively and ] and the orientation of the mean vector wind as a potential influential factors. The factors potentially influencing that we consider are not necessarily independent because they may be linked by some common physical origin. Anisotropy in fluctuations of the wind components could conceivably be related to the mean wind; both of these are expected to be influenced by local topography. However, to simplify our objectives in this study, we will not explore the dependency among different factors, but look at the influence of each individual factor. The influence of interaction among these factors, such as mean wind flow and topography, on predictability of surface wind components is an interesting direction of future study.

We use two approaches to assess the relationship between predictability and potential influential factors in this study. First, relationships between predictability and any given factor are investigated using kernel density estimates of the conditional probability distribution of quantities related to predictability. By definition, the conditional distribution of random variable *Y* given *X* is

where is the joint probability density, and is the probability density of *X*. In this study, *Y* stands for quantities related to predictability and *X* represents quantities related to chosen factors. Distributions of are also shown in these figures, to provide an indication of the number of stations entering the estimate of .

Second, for each individual station, directional relationships between predictability and chosen factors for all 36 wind components are assessed using either the Spearman rank correlation coefficient [where refers to a directional factor] or the dot product between two unit vectors related to factors and predictability. The Spearman rank correlation is used to assess the directional relationship between predictability and the factors , , and , while the dot product is used to assess the directional relationship with the mean surface wind vectors. The quantity *ρ* measures the degree of similarity between the orientation of predictability and the chosen factors of 36 wind components at a station. Positive (negative) values mean that directions of larger (smaller) predictability correspond to larger values of the factor. A larger magnitude of rank correlation coefficient indicates a stronger directional relationship between predictability and the chosen factor of wind components at a station. The case when means that there is no directional relationship between predictability and factors of wind components. Values of the dot product of two unit vectors range between 0 (orthogonal) and 1 (parallel). We do not consider negative values of dot products as the component along *θ* is just the negative of the component along .

## 3. Results

This section first displays characteristics of the statistical predictability of surface wind components. An exploratory analysis of relationships between predictability and the three chosen factors is then presented and discussed.

### a. Geographic distribution of predictability

In the following analysis, magnitudes of predictability will be characterized by and and the corresponding anisotropy defined by Eq. (8). The corresponding directions of maximum predictability represented by unit vector are used to show directional characteristics of predictability.

#### 1) Magnitude of predictability

Maps of the quantities , , and for all stations considered are displayed in Figs. 3 and 4 (for daily and monthly averaging time scales, respectively). Inspection of these figures indicates that the characteristics of predictability considered are spatially organized. Table 1 summarizes the number stations with strong predictive anisotropy (i.e., ) for different cases of prediction in each region. We can see that evident predictive anisotropy is a common feature across the globe with the interesting exception of a band in Eurasia stretching from 10°W to 90°E where only a small fraction of stations show strong predictive anisotropy for all cases of prediction.

Figures 3 and 4 show evidence of some relation of these measures to topographic complexity. For instance, the distribution of predictability across the continent of North America demonstrates that lower predictability and stronger predictive anisotropy are more commonly found in the mountainous regions of the west relative to the rest of the continent. Similarly, the predictability along the west coast of South America (dominated by the Andes) is lower and the anisotropy is stronger than in the rest of South America. However, low or anisotropic predictability can also occur well away from mountainous regions.

The comparison of monthly and daily averaged predictions shows that there are more stations with higher overall monthly predictability than daily predictability. This is observed in both summer and winter results. Specifically, is larger for monthly averaged data with that for daily averaged data at approximately 60% of the stations. Similarly, about 80% of stations have larger for monthly averaged data than for daily averaged data. Features related to prediction using winter and summer data are similar.

#### 2) Direction of predictability

The directional characteristics of predictability also vary by region. In some regions, the orientation of predictability shows no coherent spatial structure while in others it is evidently organized. For instance, the directions of [denoted , generally orthogonal to the direction of ] at some stations in eastern North America, away from the Western Cordillera (east of 100°W), show evidence of large-scale organization which changes from summer to winter (Fig. 5). In contrast, the orientation of wind predictability in the mountainous western North America shows no evidence of large-scale organized spatial structure (away from the western coastlines). The orientations of the time-mean wind vectors are also shown in Fig. 5. While maximum predictability is aligned along the mean wind in some circumstances (such as from central Canada along the Mississippi River to the Gulf of Mexico in summer), in other circumstances these vectors are almost orthogonal (as in the Great Plains in summer) or the relative orientations show no systematic pattern (as in the Western Cordillera). The global-scale relationship between the orientation of predictability and the mean wind will be discussed in section 3d.

While large-scale organization of the orientation of predictability is not as evident over other continents as in North America, some features related to orientation of predictability can still be identified. For example, directions of at stations located in near-coastal areas are often nearly parallel or perpendicular to the coastline, and this feature is more evident for daily averaged summer predictions.

### b. Case studies

To illustrate the range of relationships between predictability of surface wind components and potential explanatory factors in different settings, the predictability of wind components is considered for five stations (Table 2 and Fig. 6) representative of different characteristics of predictability, together with their surrounding terrain features and the statistical properties of surface wind components, and .

Stations 1, 3, and 5 are located in mountainous terrain. These three stations are characterized by strong predictive anisotropy; in particular, stations 3 and 5 are characterized by relatively low for both seasons and time scales of averaging, whereas in station 1 is relatively high in winter and low in summer. Both stations 1 and 3 are located in valleys. The direction of maximum (minimum) predictability in station 1 is always oriented to the along-valley (cross-valley) direction for both seasons and time scales of averaging. These orientation features are not surprising given the local topography: the valley at station 1 is narrow, and the predictability features suggest that the connection between large-scale flow and surface flow is strong determined by funneling of winds. However, for station 3, the directional relationships between the orientation of maximum (minimum) predictability and along-valley (cross-valley) directions vary with season; while the direction of maximum predictability is along-valley in winter, this direction is substantially rotated away from the valley axis in summer.

Although stations 1, 3, and 5 are all expected to be influenced by local wind systems associated with mountainous terrain, there are differences among the characteristics of predictability at the three stations. This suggests that there is no single universal explanation for the observed characteristics of predictability. Terrain features may be one factor, but not the only one. The overall predictability is high and anisotropy is weak at stations 2 and 4 located in relatively flat terrains as indicated by small . Characteristics of predictability at stations 2 and 4 also differ from each other; in particular, local wind systems associated with land–water contrast are likely to influence predictability at station 2.

By comparing the polar plot of with polar plots of and of wind components in Fig. 7, we can see that in general predictability of wind components is higher when the variability is larger [larger ] and when the kurtosis is smaller. Moreover, sometimes the directions and are similar, while other times these direction are not well aligned. As discussed above, a clear relationship between the anisotropy of predictability and the orientation of local topographic complexity holds in some cases but not in others. Inspection of Fig. 7 demonstrates that the relationships between predictive characteristics and each explanatory factor are imperfect. For example, the (relatively weak) summer predictability at station 3 is such that the directions of maximum (minimum) predictability are characterized with small (large) standard deviation and large (small) kurtosis, and is close to perpendicular to the direction of . We can see from these examples that the relationships between and explanatory factors are not simple, and we need to consider more stations in a range of geographical settings. To this end, we will now consider these relationships across all stations in order to assess any broad relationships between explanatory factors and predictability in terms of both magnitude and direction.

### c. Factors related to magnitude of predictability

The probability distributions of three measures of predictability , , conditioned on measures of the potential factors, calculated according to Eq. (12), are presented in Figs. 8–10.

Figure 8 shows that lower predictability tends to be associated with topographically complex terrain. In particular, decreases steadily with . Given a large value of , we find that both and are small. However, this statement cannot be reversed, especially for the relationship between and . In addition, the relationship between predictive anisotropy and anisotropy of directional topographic complexity is weak. Many stations are in places with similar directional distributions of topography but different predictive anisotropy.

Figure 9 shows that higher predictability tends to be associated with more variable wind components, and that the association between predictability and variability of wind components is more evident for than . The increase of values saturates at intermediate values of ; beyond this point, increasing variability does not improve predictability. Furthermore, Fig. 9 also shows that there is a positive correlation between anisotropy of variability and anisotropy of predictability for both seasons and time scales of averaging, but this pattern is stronger for daily averaged data.

Figure 10 shows that higher predictability tends to be associated with wind components characterized by data distributions with lighter tails (or flatter centers) as indicated by smaller values of kurtosis. In particular, highest predictability corresponds to kurtosis less than or near a value of 3. However, the relative frequency of shown as the red curve in Fig. 10 indicates that the number of stations with smaller than 3 is small. Out of 2109 stations, there are fewer than 280 stations with for daily averaged data, and fewer than 170 stations with for monthly averaged data. The most common kurtosis values are near 3 (corresponding to the Gaussian value). Predictability generally decreases as kurtosis of components increases away from this value. In this way, Fig. 10 shows that stations with wind components with non-Gaussian distribution characterized by heavier tails or more peaked centers are more likely to have lower predictability, and this pattern is stronger for daily averaged data. Moreover, while decreases steadily with , tends to rapidly decrease to 0 as approaches 4. Last, tends to increase with , especially for daily averaged data.

Seasonal differences of statistical relationships shown in Figs. 8–10 are small. However, the results display a clear difference between daily and monthly time scales of averaging. In particular, the relationships between predictability and statistical properties of wind components are weaker for monthly averaged data than daily averaged data, especially for the relationships between predictability and kurtosis. A simple statistical model presented in section 4b attempts to give a qualitative explanation for the relationships between predictability and statistical properties shown in this section.

### d. Factors versus direction of predictability

Directions of predictability may be related to directional variability of topographic complexity and statistical properties of surface wind fluctuations, as well as directions of mean surface wind vectors , . Histograms of rank correlation *ρ* between and , , for all stations are considered (Fig. 11). For both monthly and daily time scales in winter and summer, values of are dominated by strong positive directional correlation, concentrated near the value of 1. That is, the most variable wind components are generally the best predicted components. For daily averaged winds, values of are dominated by strong negative directional correlation, concentrated around values near −1. It follows that in general better predicted wind components are characterized with light tails or flatter distribution centers indicated by small kurtosis. On the other hand, wind components characterized by heavy tails or peaked centers as indicated by large kurtosis tend to be poorly predicted. While this pattern is substantially weakened for monthly averages, the general tendency remains. The relative frequency of tends to decrease away from −1, but the overall directional relationship between predictability and terrain variability is weak. This indicates that while better predictability is somewhat more likely to be found in the directions of less variable terrain, this effect is weak. The directional relationship between predictability and mean wind vectors , is shown by the dot product of unit vectors: and . The histograms of show that most stations tend to have mean wind vectors parallel to directions of . Moreover, histograms of show that directions of mean wind vectors are parallel to directions of for most stations. It is common for maximum predictability and maximum variability to both be aligned along the mean wind. Of these factors, only the directional relationship between kurtosis and predictability of wind components shows considerable difference between daily and monthly predictions. The directional relationships between predictability and all factors show negligible difference between winter and summer.

## 4. Discussion

### a. Statistical analysis based on observations

The results presented in the previous section show that the relationships between predictability and explanatory factors display differences between daily and monthly averaging time scales. We propose two potential reasons for these differences. First, variability of large-scale tropospheric predictors is predominantly on synoptic time scales, while near-surface winds can also be influenced by mesoscale processes characterized by shorter time scales. Averaging the data over longer time scales will suppress the locally driven variability more than that associated with large-scale processes, thereby strengthening the statistical relationship between predictor and predictand quantities. Second, according to the central limit theorem, the distribution of monthly-averaged atmospheric quantities is in general expected to be closer to Gaussian than daily-averaged quantities. As multivariate Gaussian distributions are characterized by linear relationships between variables, it follows that the nonlinearity of the relationship between two datasets of climate data is often diminished as the time scale of averaging becomes longer (Yuval and Hsieh 2002). As a result, predictability by a linear regression model is likely to be higher for longer time scale of averaging as kurtosis approaches 3. This fact is also consistent with the observation that the relationship between predictability and kurtosis is weaker for longer time scale of averaging as shown in Figs. 10 and 11 since there are fewer pairs of large kurtosis and small predictability as most data are concentrated in the region of small kurtosis values for monthly averaged data.

The relationships between predictability and potential explanatory factors indicate either that one causes the other or that they have a common physical cause. For example, characteristics of local wind systems are influenced by local terrain, and local wind systems contribute to statistical properties of surface wind components as well as the direction of mean surface wind components. Anisotropy in standard deviation and kurtosis of wind components can both be created by anisotropy in surface topography. Our study can neither make a general statement about how topographic complexity, magnitude of variability, and shape of data distribution (i.e., degree of non-Gaussianity) of surface wind components are related to each other nor establish the cause and effect between these factors and characteristics of predictability. What have achieved in this study is to identify general patterns related to predictability of surface wind components. Since the patterns shown in the relationships between predictability and statistical properties are stronger than topographic complexity, we will develop a descriptive model aiming at clarifying the relationship between wind predictability and statistical properties (i.e., standard deviation and kurtosis) of wind component fluctuations in an idealized conceptual framework. Note that statistical predictability of wind components is not directly related to the mean vector wind, as the regression analysis considers fluctuations around the mean. It follows that the apparent relationship between the orientation of the mean wind and predictability must result because the variability characteristics and the mean state must share a common physical cause. In the future, a physically based study of relationships between predictability and physical phenomena related to topographic complexity and atmospheric circulations is needed in order to clarify the physical sources of predictive anisotropy.

### b. Statistical analysis based on a descriptive model

We now present an idealized statistical model used to characterize the relationship between predictability and the statistical properties of surface wind components. Let be the surface wind component in the direction *θ*. We will assume that can be partitioned into two parts:

where and are respectively variables perfectly correlated with and completely uncorrelated with the large-scale flow. We further assume that and are mutually uncorrelated. The variable represents that part of surface winds that is entirely associated with small-scale local processes and/or is nonlinearly related to free tropospheric variability such that there is statistical dependence but zero correlation. By construction, cannot be modeled by linear regression with free tropospheric predictors. For simplicity, we assume that variability of is isotropic, and that the distribution of is Gaussian with a kurtosis of 3. The quantity encodes the anisotropy of the part of that is linearly predictable by free tropospheric variability. The structure of accounts for predictive anisotropy inherent to the character of free tropospheric variability as well as the influence of surface inhomogeneities such as terrain. By construction, measures the predictability of by a linear regression model with free-tropospheric predictors. Based on the above assumptions, the statistical properties of the surface wind components *σ*, kurt, and statistical predictability can be calculated as follows:

where , , , , , and . Since *G* varies with the direction of wind projection *θ*, the quantities *σ*, kurt, and are functions of *θ*. By calculating the first derivative of *σ*, kurt, and with respect to *G*, we know that *σ* and increase with *G*, and kurt decreases with *G* when . For simplicity, we do not consider as wind components with kurtosis larger than 3 are dominant in observations. It follows that

where and we have taken .

An ensemble of the quantities given by Eqs. (17)–(22) is obtained by sampling the parameter values randomly. We do not tune the parameters to match data characteristics of individual stations, but some restrictions are applied to the sampling. Specifically, in all samples, and values of are drawn randomly from a uniform distribution between 3 and 10. Also, and are sampled so that most of the time is larger than (i.e., we assume a generally moderate to large “signal to noise ratio”). In the present analysis, and are positive numbers drawn from normal distribution with the following parameters: and , , and . As illustrated in Fig. 12, the patterns of the relationships between simulated predictability and statistics of wind components (kurtosis and variability) qualitatively resemble those in observations (Figs. 8–10). Specifically, the results of the simulation show that 1) predictability [both and ] tends to increase given larger variability [i.e., larger ] and a lower level of non-Gaussianity [ approaching 3] of wind components; 2) the relationship between the predictability and variability of wind components is stronger for and than for and , 3) saturates for larger values of , and 4) the relationship between predictability and kurtosis of wind components is stronger for and than and . Moreover, the anisotropy of predictability tends to be stronger given stronger anisotropy of the wind statistics [i.e., and ]. All of these features can also be found in the results of observation as discussed in section 3c.

This simple descriptive model makes no assumption regarding the nature of the transfer function used for the statistical prediction (except to the extent that nonlinear dependence is included in ), and it shows that much of the association between the linear predictability of surface wind components and their statistical properties can be accounted for by a simple partitioning of surface wind variability into a portion that is linearly related to free-tropospheric variability and a portion that is not. Although this simple descriptive model cannot explain the causes of predictive anisotropy, it provides a simple characterization of it and relates predictive anisotropy to the so-called signal-to-noise ratio corresponding to the relative magnitudes of large-scale and local influences on the wind.

In particular, from Eqs. (21) and (22), we obtain that for , and in the opposite limit . This indicates that if the influence of the unpredictable component becomes larger (i.e., increased noise indicated by large ), the prediction becomes more anisotropic. When , fluctuations in all directions are perfectly correlated with and predictable by the large-scale flow. As the variability in the linearly unpredictable part increases, the predictability of surface winds in the direction of smallest (i.e., wind components with least predictive signal) is reduced more than in the direction of largest (i.e., wind components with strongest predictive signal) and predictive anisotropy increases. Similarly, those directions with smaller are more influenced by the non-Gaussian local variability in the sense of being heavier tailed or more peaked in the distribution of , resulting in the association of larger kurtosis with smaller predictability.

## 5. Conclusions

In this study, we have assessed the linear statistical relationship between large-scale atmospheric flow in the free troposphere and surface wind variability. The strength of such statistical relationship is important in determining the efficacy of statistical prediction, such as statistical downscaling. Particular attention has been paid to the anisotropy of predictability. We have demonstrated that predictive anisotropy is a common characteristic at surface meteorological stations at a range of locations across the world. In regions away from complex topography, both the magnitude and direction of predictive anisotropy are spatially continuous, indicating possible large-scale organization by either the surface or the flow aloft.

Furthermore, we conducted a preliminary study investigating how different aspects of fluctuating surface winds are related to the predictive anisotropy. The results demonstrate that low predictability is often associated with complex terrain and that the best-predicted wind components generally lie in the direction of largest variability and smallest kurtosis, which generally correspond to the direction of the time-mean wind. The results are effectively characterized by an idealized model in which surface wind variability is partitioned into a large-scale, linearly predictable part and a local-scale, linearly unpredictable part. The broad qualitative agreement of this model with the observed features of surface wind predictability provides evidence of the underlying hypothesis that predictive anisotropy can be characterized by the relative strength of large-scale “signal” and local-scale “noise” of the surface wind components. This study has not provided a physical mechanism for predictive anisotropy from the perspective of atmospheric dynamics related to large-scale and local-scale atmospheric circulations which can influence the strength of signal and noise.

A subsequent study will investigate the extent to which the use of a linear transfer function limits statistical predictability: that is, if the observed predictive anisotropy is a consequence of the use of a simple statistical model. If the underlying relationship between surface wind components and larger-scale predictors in the free atmosphere is more nonlinear in some directions than others, linear transfer functions can result in low predictability of surface wind components in these directions, and linear predictive anisotropy will emerge. The association between directions of relatively poor prediction and high kurtosis suggests the possibility for improvement of prediction by nonlinear models.

One the other hand, predictive anisotropy may be inherent to physical phenomena at the surface and/or flow aloft. While the use of historical data cannot directly address the changes in predictability of surface winds in response to changes in large-scale circulation or weather patterns, our results indicate that (in general) predictability should become larger and more isotropic as a result of any changes which increase the signal-to-noise ratio. We do not present any physical explanation for controls on the signal-to-noise ratio, which should be considered a hypothesis organizing many of the observed aspects of predictability. Determination of the accuracy of this hypothesis requires a more detailed, physically based analysis of the connection between near-surface flow and large-scale as well as local-scale variability in atmospheric circulations. Such an analysis is an important direction of future study.

## Acknowledgments

The authors gratefully acknowledge helpful comments and suggestion of this paper from Charles Curry and from four anonymous reviewers. This research was supported by the Discovery Grants program of the Natural Sciences and Engineering Research Council of Canada.

## REFERENCES

*Advances in Digital Terrain Analysis*, Q. Zhou, B. Lees, and G. Tang, Eds., Springer, 159–176.

## Footnotes

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).