## Abstract

Spaceborne Global Navigation Satellite System reflectometry observations of the ocean surface are found to respond to components of roughness forced by local winds and to a longer wave swell that is only partially correlated with the local wind. This dual sensitivity is largest at low wind speeds. If left uncorrected, the error in wind speeds retrieved from the observations is strongly correlated with the significant wave height (SWH) of the ocean. A Bayesian wind speed estimator is developed to correct for the long-wave sensitivity at low wind speeds. The approach requires a characterization of the joint probability of occurrence of wind speed and SWH, which is derived from archival reanalysis sea-state records. The Bayesian estimator is applied to spaceborne data collected by the *Technology Demonstration Satellite-1* (*TechDemoSat-1*) and is found to provide significant improvement in wind speed retrieval at low winds, relative to a conventional retrieval that does not account for SWH. At higher wind speeds, the wind speed and SWH are more highly correlated and there is much less need for the correction.

## 1. Introduction

Spaceborne remote sensing observations have been used to estimate near-surface wind speed over the ocean since the 1970s. The two most mature methods are radar scatterometer measurements of the ocean surface backscatter (Grantham et al. 1977; Jones 1982) and microwave radiometer measurements of ocean surface thermal emission (Njoku et al. 1980; Pandey 1987). In the first case, the backscatter is sensitive to roughening of the surface by the wind (Wentz et al. 1984). In the second case, the emissivity is sensitive to both surface roughness and the fractional coverage of the surface by wind-driven foam (Stogryn 1967; Nordberg et al. 1971). More recently, spaceborne measurements of ocean surface forward scatter by a Global Navigation Satellite System reflectometry (GNSS-R) bistatic radar have also been shown capable of estimating the near-surface ocean wind speed (Gleason et al. 2005; Clarizia et al. 2014; Foti et al. 2015; Li et al. 2016). Spaceborne GNSS-R for retrieval of wind speed and other parameters has been successfully demonstrated following a large number of airborne demonstrations (e.g., Garrison et al. 2002; Germain et al. 2004; Katzberg et al. 2001, 2006; Katzberg and Dunion 2009; Rodriguez-Alvarez et al. 2013), similar to the series of airborne scatterometer experiments that preceded satellite scatterometry.

For radar observations, the ocean surface scattering cross section is mostly determined by its roughness spectrum and, in particular, by the surface slope spectrum for GNSS-R. Smaller capillary waves are largely forced by the local winds, since their shorter wavelengths respond more quickly to the forcing and tend to dissipate rapidly as they propagate. Longer wavelengths (i.e., swell) can take significantly longer to develop and can propagate hundreds to thousands of kilometers and so are forced by a variety of environmental factors, both local and distant, depending on the past history of the wind field and other forcing parameters. The off-nadir measurement geometry used by scatterometers is such that the magnitude of the backscatter cross section is dominated by Bragg resonance with the capillary wave portion of the surface roughness spectrum. In the case of GNSS-R, on the other hand, the forward scattering geometry does not support Bragg resonance and the corresponding scattering cross section will be sensitive to a wider range of the roughness spectrum and to surface waves with a wavelength of 50 cm or greater (Chen-Zhang et al. 2016). As a result, there is greater sensitivity of GNSS-R measurements to ocean swell or other, longer, wavelengths that are not directly forced by the local winds and that are often characterized through the significant wave height (SWH) (Germain et al. 2004; Clarizia et al. 2009; Marchan-Hernandez et al. 2010; Zavorotny et al. 2014). This sensitivity can be problematic for the retrieval of wind speed, since a component of the variance in the measurements may now be explained by other, nonlocal, influences.

Any correction to a GNSS-R wind speed retrieval algorithm for the influence of SWH on the measurements should take into account the fact that the local winds also have an effect on the SWH, and that the correlation between wind speed and SWH is variable (generally weaker at low wind speeds and stronger at high wind speeds). One possible way to account for this variable correlation is by use of a two-dimensional geophysical model function (GMF) to express the quantity representing the GNSS-R scattering as a 2D function of both wind speed and SWH. A robust 2D GMF determination would however require a significantly larger volume of data, which is not yet available with the *Technology Demonstration Satellite-1* [*TechDemoSat-1* (*TDS-1*)].

A Bayesian estimation represents another means of accounting for the variable correlation between wind speed and SWH, and it is more easily applicable in this case, since the joint probability of occurrence of wind speed and SWH can be adequately characterized through the large existing archives of datasets from models. The use of Bayesian approaches for remote sensing retrieval algorithms is well known (e.g., Pierdicca 2002; Stoffelen and Portabella 2006). Here we present the mathematical framework, implementation, and results of a Bayesian wind speed estimator for GNSS-R. The algorithm is applied to data from the *TDS-1* (Jales and Unwin 2015a,b). The paper is organized as follows. Section 2 motivates the need for this algorithm by describing the dependence of the wind speed retrieval error on SWH for a conventional retrieval algorithm. Section 3 presents the mathematical framework of the Bayesian method. Section 4 describes the algorithm’s implementation and its application to *TDS-1* data, and characterizes its improvements relative to the conventional approach. Section 5 summarizes and concludes the paper.

## 2. Dependence of wind speed retrieval error on significant wave height

GNSS-R data are typically provided in the form of delay Doppler maps (DDMs) of scattered GNSS power as a function of time delay and Doppler shift (Germain et al. 2004; Gleason et al. 2005, 2006; Clarizia et al. 2009; Clarizia 2012). DDMs are used to extract observables, and to estimate wind and waves from them. One of the observables most correlated with sea surface parameters is the delay Doppler map average (DDMA), which represents the average scattering cross section across a specific area of the glistening zone (Clarizia et al. 2014; Gleason et al. 2016; Clarizia and Ruf 2016). In the analysis that follows, we use a modified observable referred to as the DDMA-to-noise ratio (DNR). The DNR is calculated from each DDM by

where *N* is the noise floor, and both DDMA and *N* are computed as described in section II.A of Clarizia and Ruf (2016), with the DDMA derived from DDMs of the radar cross section (RCS) and scattering area (Gleason et al. 2016). Normalization of the scattering cross section by the noise floor serves to cancel out most receiver gain fluctuations, which can be significant with *TDS-1* measurements (Foti et al. 2015).

The dependence of the DNR observable on SWH is depicted in Fig. 1 at low wind speeds. Each curve represents a different wind speed interval.

The figure illustrates the fact that after controlling for wind speed, there remain significant variations in the measurements that are explained by SWH. If left uncorrected by a wind speed retrieval algorithm, this will produce errors in the wind speed, with higher SWH values tending to bias the retrievals toward erroneously high wind speed values. The sensitivity to SWH can be seen to decrease with increasing wind speed.

The sensitivity of a wind speed retrieval algorithm to SWH can be characterized by considering the dependence of its error on SWH. Specifically, a population of retrieved winds is compared to ground truth values of wind speed, their difference is regressed against coincident ground truth values of SWH, and the slope of a linear regression is a measure of the dependence. A large slope indicates that SWH has a strong impact on wind retrieval error. A slope of zero indicates that SWH has no impact on the error. This characterization was performed using a conventional retrieval algorithm applied to the *TDS-1* measurements. This algorithm is explained in detail later on in section 3b (and is illustrated in Fig. 4). The approach is the same as the algorithms in Foti et al. (2015) and Clarizia and Ruf (2016). The resulting slope of the linear regression is shown in Fig. 2 versus wind speed. The slope is largest at low wind speeds and tends toward zero at higher values (scatterplots of the populations of wind speed error and SWH, from which the slopes were derived, are shown in Fig. 6).

## 3. Bayesian estimator

Here we illustrate a Bayesian wind retrieval approach that suppresses the dependence of the retrieval error on SWH at low wind speeds. We first outline its mathematical derivation, followed by a description of the a priori statistics used in the algorithm, and finally the determination of two parameters—a scale parameter and a correlation parameter—that are necessary to implement the algorithm.

### a. Derivation of Bayesian estimator

Given the joint random variables *x* and *y*, Bayes’s theorem states that

where *p*(*x | y*) is the conditional probability of *x* given a particular value of *y*; *p*(*y | x*) is the conditional probability of *y* given a particular value of *x*; *p*(*x*) is the probability of *x*, independent of the value of *y*; and *p*(*y*) is the probability of *y*, independent of the value of *x*. For many remote sensing problems, Bayesian estimators treat *x* as the geophysical state to be estimated; *y* as the remote sensing measurement; the expected value of *p*(*y | x*) as the error-free measurement of *y* predicted by a forward model; the random component of *p*(*y | x*) as the measurement noise; and *p*(*x*) as the prior distribution of *x*, independent of the measurement (e.g., due to the climatology of that geophysical variable). In practice, *p*(*y*) is often not known or needed because only the numerator on the right-hand side of Eq. (2) depends on *x*, and the denominator serves only to normalize the right-hand side when viewed as a probability density function (pdf) in *x*.

For GNSS-R remote sensing of ocean surface scattering, a measurement observable *o* is used to estimate the wind speed *u*. In addition, information about the sea state—specifically the SWH—is known from model predictions prior to the wind speed estimate. In this case, Bayes’s theorem can be written as

where *s =* SWH and the proportionality is indicated because the denominator in Eq. (2) is not expressed. The expression on the left-hand side in Eq. (3) can be viewed as a posterior pdf for *u* given a measurement of *o* and prior knowledge of SWH. An estimate of the wind speed can be made in one of two ways: 1) The maximum likelihood (ML) estimate is that value of *u*, which maximizes the posterior pdf of *u*; or 2) the expected value (EV) estimate is the mean value of the posterior pdf of *u*. The ML wind speed estimated from (*s*, *o*) is given by

The EV wind speed estimated from (*s*, *o*) is given by

If the observable and SWH are modeled as joint Gaussian distributed random variables, then the conditional pdf on the right-hand side of Eq. (3) can be separated into three components: one due to *s*, one due to *o*, and one representing the correlation between variations in *s* and *o*. This decomposition has the form

where and are the mean values; and are the standard deviations of *s* and *o*, respectively; and *ρ* is the correlation between their variations. Although this decomposition would be strictly valid only for joint Gaussian variables, we use here an approximation by symbolically writing it as

and by allowing the conditional sea-state-only pdf *p*(*s | u*) not to be necessarily Gaussian. Thus, we estimate *p*(*s | u*) empirically from a large population of joint occurrences of SWH and wind speed, which is illustrated in detail in section 3b. Note that it is, in general, not Gaussian distributed, so the Gaussian form assumed in Eq. (6) is a simplification. The conditional observable-only pdf *p*(*o | u*) can be modeled assuming a known geophysical model function *g*(*u*) that relates the observable to the wind speed [i.e., = *g*(*u*)], together with an error model for the measurement noise. If the error in the observable is assumed to be additive Gaussian noise, then the pdf can be written as

where *σ*_{o} is the standard deviation of the measurement noise.

The coupling term in Eq. (6), which results from the correlation between variations in *s* and *o*, can be rewritten to explicitly emphasize its dependence on wind speed, as

where the mean and standard deviation of *s* given *u* are found by

and

respectively. The prior wind speed pdf *p*(*u*) is known to be well approximated globally by a Rayleigh distribution with a mean value of ~7 m s^{−1}. Using this Rayleigh distribution with the Bayesian estimator is one option, but doing so will tend to push the estimate toward the mean value. Another option is to use the so-called improper prior and assume *p*(*u*) to be a uniform distribution. This allows the estimator to rely more on the measured observable and less on the prior behavior of the wind speed distribution. It is, for example, a more reasonable prior to assume for retrieval in anomalously high wind speed conditions such as hurricanes.

The posterior pdf of *u* can be finally written as

### b. Characterization of conditional statistic p*(*s | u*)*

Implementation of the Bayesian estimator requires:

estimation of

*p*(*s | u*), and the mean and standard deviation of*s*given*u*;determination of

*g*(*u*);estimation of .

The conditional pdf *p*(*s | u*) is estimated using a large historical archive of SWH and wind speed values, output from the WAVEWATCH III (WW3) model (Tolman 2009). The WW3 model outputs used in this study were provided by the French Research Institute for Exploitation of the Sea [Institut Français de Recherche pour l’Exploitation de la Mer (IFREMER)], and are generated for the entire year 2007 using the Climate System Forecast Reanalysis (CSFR) input wind forcing. The dataset is freely available on the IFREMER FTP site (ftp://ftp.ifremer.fr). The population is large enough to allow for an empirical derivation of *p*(*s | u*), using wind speed and SWH steps of 0.1 m s^{−1} and 0.1 m, respectively. Examples of the empirical *p*(*s | u*) are illustrated in Fig. 3 for different wind speeds. The departure of these pdfs from Gaussianity is particularly evident at low wind speeds. At very low winds, the pdfs have a bimodal behavior, suggesting a low SWH component due to the local wind and a higher SWH component due to sustained waves that have reached the area from elsewhere and were not generated by the local wind.

The mean and standard deviation are computed using SWH values obtained from WW3 that are collocated with the *TDS-1* measurements.

Determination of the GMF is carried out using the *TDS-1* spaceborne GNSS-R dataset, version 0.3, acquired over the ocean, in the period ranging from September 2014 to March 2015 (Jales and Unwin 2015a,b). The same dataset has been used for scatterometric purposes in Foti et al. (2015) and for altimetric purposes in Clarizia and Ruf (2016). The dataset is in the form of the DDM of scattered power, accompanied by metadata that contain the time, location, geometry of the acquisition, and the gain of the receiver antenna at the specular point. A detailed description of the dataset is contained in Clarizia and Ruf (2016). A quality filter is applied to the data by selecting only samples acquired with an antenna gain at the specular point that is higher than 12 dB*i*, and with an SNR higher than 1, such that the signal power is at least as strong as the noise power. These thresholds are a trade-off between the need to retain enough samples for the analysis and the desire to discard noisy data.

ASCAT-retrieved winds collocated with the GNSS-R acquisitions are used in this study as ground truth winds (O&SI SAF Project Team 2016). Following the approach outlined in Clarizia and Ruf (2016), the dataset is randomly divided into training and test halves. The training dataset is used to derive a GMF, *g*(*u*), as an analytical function that best fits the observable as a function of ground truth wind speed. A GMF of the form

where *o* is the DNR observable, *u* is the wind speed, and *a*, *b*, and *c* are best-fit model parameters, has been found to provide a satisfactory fit. Figure 4a shows a density plot in logarithmic scale of the DNR data versus collocated wind speed, with the GMF in magenta, while Fig. 4b shows a density plot of true versus retrieved winds. A fairly large number of samples sits in the bottom-left corner of Fig. 4a, characterized by low observable values and low wind speed values. The anomalous behavior of these samples is not due to noise in the data, since a noise filter has been applied; hence, it is probably caused by a roughness higher than that indicated by the collocated wind speed, due to the presence of waves or swell. The retrieved wind for these samples is considerably higher than the true wind, and this is visible in the scatterplot of Fig. 4b.

Estimation of σ_{0} is done with a large population of simulated samples, generated by the Cyclone GNSS (CYGNSS) end-to-end simulator (E2ES) using input winds from the nature run model (O’Brien 2014; Ruf et al. 2016). These are the same samples used to perform the analysis in Clarizia and Ruf (2016). We choose not to use *TDS-1* data for this estimation because the *TDS-1* population size is not sufficient to obtain a robust statistical characterization. The E2ES-derived model is scaled to represent the standard deviation assumed for the *TDS-1* measurements, as follows:

where *σ*_{0} is the standard deviation of the *TDS-1* DNR, *σ*_{e} is the standard deviation of the DDMA derived from nature run data, and it is understood that both of the standard deviations depend on the wind speed *u* and on the gain of the receive antenna *G* at the specular point. The term *α* is a scale parameter accounting for differences between *TDS-1* and E2ES noise levels.

### c. Determination of scale and correlation parameter

The scale parameter *α* and the correlation parameter *ρ* in Eq. (5) are needed in order to implement the Bayesian estimator. We estimate their values by minimizing figures of merit (FoM), which represent improvements we want to achieve by adopting the Bayesian estimator. Specifically, we define an FoM1 for the slopes and an FoM2 for the standard deviation as

where *q*^{i} is the slope of the linear function that best fits the retrieval errors versus the SWH, for the *i*th wind interval. The slope *q*^{i} can be expressed as

where *L*_{i} is the number of samples within the *i*th wind interval, is the *l*th SWH value and is the *l*th retrieval error value, and the summations are performed over all the *L*_{i} samples of the *i*th wind interval.

Term is the standard deviation of the retrieval error for the *i*th wind interval, calculated as

where *m*_{i} is the mean retrieval error for the *i*th wind interval. FoM1 and FoM2 are computed for values of the scale parameter ranging from 0.5 to 10 with a step of 0.5, and for the correlation parameter ranging from −1 to 0 with a step of 0.1. Note that the correlation parameter should be negative, since an increase in the SWH *s* corresponds to a decrease in the DNR observable and vice versa. Figure 5 illustrates the dependence of FoM1 and FoM2 on *α* and *ρ*. A region of minima for both figures of merit can be identified, corresponding to 2 < *α* < 4 and −0.7 < *ρ* < −0.5. We select midpoint values of *α* = 3.5 and *ρ* = −0.6, bearing in mind that other values for *α* and *ρ* within the region of minimal values produce similar results. Note also that the selected value for *ρ* is comparable to the empirical correlation coefficient calculated between the collocated *TDS-1* SWH and the DNR observable, which was found to equal −0.4 for *TDS-1* data with receive gain higher than 12 dB*i* and SNR > 1.

## 4. Results and discussion

Both the maximum likelihood [Eq. (4)] and expected value [Eq. (5)] versions of the Bayesian estimator have been implemented, with *p*(*u | s*, *o*) computed using Eqs. (12)–(14), and *p*(*s | u*), , and derived as described in section 3b. Here we show results based on the EV approach, since it exhibited superior performance compared to the ML approach. The Bayesian EV-retrieved winds are compared with the winds retrieved by inverting the forward model expressed in Eq. (13), referred to as the GMF approach.

Initial results for the EV and GMF approaches at low wind speeds are shown in Fig. 6, which plots the retrieval error versus true wind speed. The performance of the GMF approach is degraded by the sensitivity of the observable to both wind speed and the component of SWH variability that is uncorrelated with wind speed. The dependence of the retrieval error on SWH is significantly reduced in the case of the Bayesian EV approach. Note that while the Bayesian approach is superior for winds below 5 m s^{−1}, a retrieval bias is present at higher winds, which increases with increasing true winds. One possible reason for the bias is that the Bayesian estimator tends to force the results toward the a priori statistics at all wind regimes and while this is desirable at low winds due to the wave contamination, the wave influence at high winds is much smaller and the wind speed is better determined by the GNSS-R observable only.

Figure 6 shows the retrieval error as a function of SWH over different wind intervals, for the GMF (left) and the Bayesian (right) approaches. A reduction in the scattering of the retrieval error and in the presence of outliers is evident in the Bayesian case for winds up to 6 m s^{−1}. For winds above 6 m s^{−1}, the Bayesian algorithm still shows a reduction in retrieval error scatter, but the retrieval bias noted earlier grows larger. Tables 1 and 2 quantitatively summarize the performances through the slope, the standard deviation FoMs, and the retrieval bias. Wind speed intervals above 9 m s^{−1} are not reported, since the population of samples is low and the SWH dependence is small.

The reduction of the retrieval error scattering in Fig. 6 is reflected in the standard deviation FoM, which reduces considerably for the Bayesian case, where the decrease ranges from 18% to 51%. A reduction in the slope FoM is also noticeable, implying a lower dependence of retrieval error on the SWH. Note that the slope FoM is usually negative, consistent with the tendency of the retrieval error to increase for increasing SWH. The bias in the Bayesian approach starts to increase for winds higher than 5 m s^{−1}, and it is always positive, evidence that the winds are underestimated with the Bayesian method. However, the SWH influence on the DNR observable tends to disappear for winds above 5 m s^{−1}, as highlighted in Fig. 1, suggesting that a Bayesian correction is not needed in this wind speed regime.

## 5. Conclusions

We have presented a Bayesian approach to the retrieval of wind speeds from GNSS-R measurements that exploits knowledge of the collocated SWH and uses a priori SWH statistics to compensate for the overestimation of the wind speed at low wind regimes, caused by the presence of long-wave swell. The mathematical framework of the Bayesian SWH correction builds upon 1) the generation of the empirical conditional sea-state-only pdf, using outputs from the WW3 model; 2) the determination of a good enough forward model that describes the observable as a function of wind speed; and 3) the determination of the standard deviation of the measurement noise. The algorithm is applied to spaceborne GNSS-R DDMs acquired onboard *TDS-1*, from which the DDMA-to-noise ratio is calculated and used as observable in the algorithm. The algorithm implementation required the determination of a scale parameter and a correlation parameter, which were optimized through the minimization of two figures of merit. The results show a significant reduction of the standard deviation of the error and of its dependence on SWH for the Bayesian approach, compared to using a conventional inversion of the forward model. These improvements are evident at low wind speeds, which is where the wave contamination is strongest and the retrieval error is higher. A significant retrieval bias is present in the Bayesian approach at high wind speeds, most likely because the prior SWH distribution unduly influences the estimator. However, a Bayesian correction is not needed in that wind speed regime. This suggests that a complete retrieval algorithm should consist of a combination of Bayesian estimator at low wind speeds and standard GMF approach at high wind speeds. The development of such an algorithm and the analysis of its capabilities and possible limitations will be the object of future work. It will be carried out using a much larger and well-calibrated dataset from CYGNSS, and with a much larger population of samples at high wind speeds that should be available from CYGNSS, as opposed to the limited dataset from *TDS-1*. The larger volume of data expected from CYGNSS should also support the implementation of other, alternative, methods to account for the SWH dependence (e.g., the 2D GMF approach mentioned in section 1) and to allow for a comparison of the results from these methods with the Bayesian wind speed estimator.

## Acknowledgments

The work presented was supported in part by NASA Science Mission Directorate Contract NNL13AQ00C.

## REFERENCES

*Remote Sensing of Atmosphere and Ocean from Space: Models, Instruments and Techniques*, F. S. Marzano and G. Visconti, Eds., Advances in Global Change Research, Vol. 13, Springer, 49–64, doi:.

## Footnotes

© 2017 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).