Forecasters working for Australia’s Bureau of Meteorology (BoM) produce a 7-day forecast in two key steps: first they choose a model guidance dataset to base the forecast on, and then they use graphical software to manually edit these data. Two types of edits are commonly made to the wind fields that aim to improve how the influences of boundary layer mixing and land–sea-breeze processes are represented in the forecast. In this study the diurnally varying component of the BoM’s official wind forecast is compared with that of station observations and unedited model guidance datasets. Coastal locations across Australia over June, July, and August 2018 are considered, with data aggregated over three spatial scales. The edited forecast produces a lower mean absolute error than model guidance at the coarsest spatial scale (over 50 000 km2), and achieves lower seasonal biases over all spatial scales. However, the edited forecast only reduces errors or biases at particular times and locations, and rarely produces lower errors or biases than all model guidance products simultaneously. To better understand physical reasons for biases in the mean diurnal wind cycles, modified ellipses are fitted to the seasonally averaged diurnal wind temporal hodographs. Biases in the official forecast diurnal cycle vary with location for multiple reasons, including biases in the directions that sea breezes approach coastlines, amplitude biases, and disagreement in the relative contribution of sea-breeze and boundary layer mixing processes to the mean diurnal cycle.
Modern weather forecasts are typically produced by models in conjunction with human forecasters. Operational forecasters working for the Australian Bureau of Meteorology (BoM) undertake two key steps to construct a 7-day forecast.
First, they choose a model guidance dataset on which to base the official forecast. Datasets from both the BoM and international modeling centers are available to Australian forecasters, with the BoM’s Operational Consensus Forecast (OCF) an increasingly common choice. In the second step, the forecaster uses the Graphical Forecast Editor (NOAA 2020) to manually edit the model guidance data. Such edits aim to incorporate processes that are underresolved at the resolutions of the model guidance products, or to correct for perceived biases of the model guidance being used. Forecasters working for the United States National Weather Service also use the Graphical Forecast Editor (GFE), and utilize a similar approach.
Australian forecasters regularly make two types of edits to the surface wind fields. The first involves modifying the surface winds after sunrise at locations where the forecaster believes the model guidance is providing a poor representation of boundary layer mixing processes. Boundary layer mixing occurs as the land surface heats up, producing an unstable boundary layer which transports momentum downward to the surface layer. Before this mixing occurs, winds are typically both weaker and ageostrophically oriented due to surface friction (Lee 2018), and so mixing can affect both the speed and direction of the surface winds. Australian forecasters perform boundary layer mixing edits using a GFE tool which allows them to specify a region over which to apply the edit, a height z and a percentage p, with the tool then calculating a weighted average of the surface winds and winds at z, weighted by p.
The second type of edit involves changing the afternoon and evening surface winds around those coastlines where the forecaster believes the model guidance is resolving the sea breeze poorly. Similarly to with boundary layer mixing, these edits are performed using a GFE tool that allows forecasters to trace out the relevant coastline graphically, choose a wind speed and a time, with the tool then smoothly blending in winds of the given speed perpendicular to the traced coastline at the given time. In Australia, the official gridded forecast datasets resulting from a forecaster’s choice of model guidance and subsequent edits are then provided to the public through the BoM’s online MetEye data browser (Bureau of Meteorology 2019b), and are also translated into text and icon forecasts algorithmically.
Forecasters, and the weather services that employ them, have good reasons for ensuring the diurnally varying component of their wind forecasts are as accurate as possible. In addition to the significant contribution diurnal wind cycles can make to overall wind fields (e.g., Dai and Deser 1999), diurnal wind cycles are important for the ventilation of pollution, with sea breezes transporting clean maritime air inland, where it helps flush polluted air out of the boundary layer (Miller et al. 2003; Physick and Abbs 1992). Furthermore, diurnal wind cycles affect the function of wind turbines (Englberger and Dörnbrack 2018) and the design of wind farms (Abkar et al. 2016), as daily patterns of boundary layer stability affect turbine wake turbulence, and the losses in wind power that result.
To my knowledge, no published work has assessed the diurnal component of human edited wind forecasts, although previous studies have assessed the performance of different operational models at specific locations. Svensson et al. (2011) examined thirty different operational model simulations, including models from most major forecasting centers utilizing most commonly used boundary layer parameterization schemes, and compared their performance with a large eddy simulation (LES), and observations at Kansas, United States, during October 1999. They found that both the models and LES failed to capture the roughly 6 kt (1 kt ≈ 0.154 m s−1) jump in wind speeds shortly after sunrise, and underestimated morning low-level turbulence and wind speeds.
Other studies have assessed near-surface wind forecasts, verifying the total wind speeds, not just the diurnal component. Pinson and Hagedorn (2012) studied the 10-m wind speeds from the European Centre for Medium-Range Weather Forecasting (ECMWF) operational model ensemble across western Europe over December, January, and February 2008/09. They found that the worst performing regions were coastal and mountainous areas, and attributed this to the small-scale processes (e.g., sea and mountain breezes) that are underresolved by the ensemble’s coarse 50-km spatial resolution.
The present study has two goals. First, to describe a method for comparing the diurnal wind signals of human edited forecasts to those of unedited model guidance forecasts, in order to assess where and when human choice of model guidance and edits produce a reduction in error or bias. Second, to apply this methodology across Australian coastal locations. The remainder of this paper is organized as follows. Section 2 describes the methodology, and datasets to which it is applied, section 3 provides results, and sections 4 and 5 provide a synthesis and conclusion, respectively.
2. Data and methods
This study compares both human edited and unedited BoM wind forecasts with automatic weather station (AWS) data across Australia. The comparison is performed by first isolating the diurnal perturbations of each dataset by subtracting 24-h running means, then comparing these perturbations on an hour-by-hour basis.
Five datasets are considered in this study (Bureau of Meteorology 2019a); the human edited official BoM wind forecast data that is issued to the public, observational data from AWS across Australia, unedited data from the ECMWF’s high-resolution 10-day forecast model (HRES), unedited data from the operational Australian Community Climate and Earth-System Simulator (ACCESS) regional model, and gridded OCF data, which blends output from multiple operational models. HRES, ACCESS and OCF are three of the model guidance products commonly used by Australian forecasters for winds. Only the lead-day-1 forecasts of the official forecast, HRES, ACCESS and OCF, are considered for reasons discussed below.
This study primarily considers the austral winter months of June, July, and August 2018. This short time period was chosen to reduce the effect of changing seasonal and climatic conditions, changing forecasting practice and staff, and of changes to the ACCESS and HRES models and OCF algorithms. Results for December, January, and February 2017/18 are occasionally mentioned to strengthen conclusions or provide a seasonal contrast.
ACCESS is a nested model: in this study only the ACCESS-R component is considered, which covers the Australian region 65.0°S–16.95°N, 65.0°–184.57°E. This model runs at a 0.11° (≈12 km) horizontal grid spacing, with a standard time step of 5 min: occasionally a shorter time step of 2.5 min is used to overcome numerical instabilities (Bureau of Meteorology 2016). HRES runs at an ≈9-km horizontal grid spacing, with a 7.5-min time step (Modigliani and Maass 2017).
Both ACCESS and HRES use parameterization schemes to simulate subgrid-scale boundary layer turbulence, and the resultant mixing. ACCESS uses the schemes of Lock et al. (2000) and Louis (1979) for unstable and stable boundary layers, respectively (Bureau of Meteorology 2010). HRES uses similar schemes that the ECMWF develop in-house (ECMWF 2018).
The BoM’s gridded OCF is based on the work of Woodcock and Engel (2005) and Engel and Ebert (2007). OCF first corrects biases in model data, then forms a weighted average of an ensemble of models in a way that minimizes error with recent observations. The methodology was expanded by the BoM in order to produce gridded datasets that could be used by forecasters within the GFE, with 10-m horizontal winds added in June 2012 (Bureau of Meteorology 2005, 2008, 2012). For the time period of this study, the OCF ensemble was comprised of the ACCESS and HRES datasets described above, and five other model datasets (Bureau of Meteorology 2018).
To form a consensus wind forecast, OCF works with wind speed and direction, as taking averages of u and υ wind components can suppress wind speeds (Glahn and Lowry 1972), and this is viewed as undesirable in an operational context. Speeds are calculated from each ensemble member, bias corrected, then a weighted average calculated, with weights chosen based on the performance of each member over the previous 20 days. Consensus wind direction is chosen as the circular median wind direction from the members (Bureau of Meteorology 2012). Because data from some members are only provided to the BoM at 3-hourly time intervals, temporal interpolation and postprocessing is applied to produce an hourly OCF dataset that forecasters can use in GFE (Bureau of Meteorology 2008). Gridded OCF is an objective alternative to the forecaster’s subjective choice of model guidance. When the overall wind field is assessed at 6-hourly intervals, gridded OCF produces lower errors in both wind speed and direction than all the model guidance products that comprise it (Bureau of Meteorology 2012).
The Bureau’s official forecast dataset is produced on a state by state basis at forecasting centers located in most state capitals. To construct the official forecast dataset, forecasters make a choice of model guidance in the GFE, which then downscales the model data, or in the case of high spatial resolution mesoscale model guidance, upscales the model data, onto a standard 3-km spatial grid for Victoria and Tasmania, or a 6-km grid for the rest of the country. GFE displays model data at hourly intervals by taking the model guidance output at each hour UTC. An exception is the HRES model data, which is only provided to the BoM at 3-hourly intervals, and is therefore linearly interpolated to hourly intervals by the GFE. Forecasters then make edits to these 3- or 6-km hourly grids to produce the official forecast datasets.
The official forecast and model guidance datasets are therefore compared as they appear in the GFE (i.e., the upscaled or downscaled datasets on the standardized 3- or 6-km hourly grids are compared). This both ensures a consistent comparison between model guidance products of different spatial resolutions, and an assessment of how the official forecast compares to the model guidance products as they actually appear to forecasters in the GFE. This is the standard approach the BoM takes when comparing the performance of the official forecast to unedited model guidance (e.g., Griffiths et al. 2017).
These datasets are compared with observations from Australian automatic weather stations (AWS), which typically record wind speed and direction each minute. After basic quality control, 10-min averages of speed and direction are taken at each station at each hour UTC, usually over the 10 min leading up to each hour. To calculate verification results, each station is matched with the nearest 3- or 6-km grid point in the datasets described above.
b. Assessing diurnal variability
Forecasters edit model guidance wind data to account for underresolved sea-breeze and boundary layer mixing processes. Instead of attempting to assess each type of edit individually, the overall diurnal signal is examined by subtracting a 24-h centered running mean background wind from each zonal and meridional hourly wind data point, to create wind perturbation datasets. Because records are not kept as to which model guidance product was used for the official forecast on a given day, nor of what kinds of edits where performed, the official forecast is compared on a pairwise basis with three unedited model guidance datasets commonly used by Australian forecasters for winds, ACCESS, HRES, and OCF.
The first metric considered is the difference of absolute errors (DAE) in the perturbations, with Fig. 1 illustrating how DAE is calculated. To compare errors in the diurnal signals of the official forecast and model guidance, we calculate the Euclidean distances between the official or model guidance perturbation vectors at each hour UTC, and the corresponding AWS perturbation vectors at each hour UTC, and take their difference, viewing the Euclidean distance as a measure of absolute error.
For example, to assess whether the official forecast perturbations uO or model guidance perturbations uM produce lower absolute errors when compared with the observed AWS perturbations uAWS, we calculate
We then calculate statistics from the DAE values on an hourly basis, in particular, we calculate the arithmetic mean of all the 0000 UTC DAE values, denoting such an average by , and repeat this for each hour of the day. If at a particular hour, then the official forecast perturbations at that hour are, on average, closer to the observed perturbations than model guidance, and vice versa if .
Diurnal processes like the sea-breeze and boundary layer mixing depend on the background atmospheric conditions in which they occur. By comparing wind perturbations rather than the overall wind fields we are not claiming these background conditions are irrelevant to these processes. However, when a forecaster makes an edit of a wind forecast to better resolve these processes, they are implicitly assuming that future background conditions will be close enough to the preceding 24-h mean state, or to model predictions of the mean state, to justify making the edit. Thus, it makes sense to compare forecast perturbations to observed perturbations, as long as differences are interpreted as a consequence not only of how the forecaster or model resolves diurnal processes, but of how differences in the background state contribute to differences in the perturbations. To minimize the importance of background state differences, this study focuses exclusively on lead-day-1 forecasts.
Given the large degree of random variability in both the AWS, official forecast, and model guidance datasets, care must be taken to avoid pre-emptively concluding the official forecast has outperformed model guidance when purely by chance. The method for estimating confidence in is based on a method proposed by Griffiths et al. (2017). Time series formed from the DAE values at a particular time, say 0000 UTC, across the 3-month time period, are treated as an independent sample of a random variable E. The sampling distribution for each can be modeled by a Student’s t distribution, and from this we calculate the probability that E is positive, denoted Pr(E > 0).
Although temporal autocorrelations of DAE (i.e., correlations between DAE values at a particular hour from one day to the next), are in practice small or nonexistent, they are still accounted for by reducing the “effective” sample size to n(1 − ρ1)/(1 + ρ1), where n is the actual sample size and ρ1 is the lag-1 autocorrelation (Zwiers and von Storch 1995; Wilks 2011). In the language of statistical hypothesis testing, the null hypothesis that E = 0 would be rejected at significance level α if Pr(E > 0) > 1 − (α/2) or Pr(E < 0) > 1 − (α/2). However, in this study we simply state the value of Pr(E > 0), referring to this as a confidence score, and noting Pr(E < 0) = 1 − Pr(E > 0). We say the official forecast outperforms model guidance with “high confidence” if Pr(E > 0) ≥ 95%, or that model guidance outperforms the official forecast with “high confidence” if Pr(E > 0) ≤ 5%, with high confidence implicit whenever it is not explicitly mentioned.
Following the “fuzzy verification” approach outlined by Ebert (2008), forecast and observational perturbation datasets are compared not only at individual stations, but are also averaged over two coarser spatial scales before being compared. The individual stations we consider are the seven capital city airport stations, marked by stars in Fig. 2, as their high operational significance means that they are typically the most well maintained. An intermediate spatial scale is formed by averaging perturbation data over the 10 stations closest to each capital city airport station, with some flexibility allowed to ensure stations are roughly parallel to the nearest coastline. These station groups are referred to as the city station groups. The coarsest spatial scale is formed by averaging over all stations within 100 km of the nearest coastline, and grouping these by state. The Western Australian coastline (see Fig. 2) is subdivided into three pieces, and stations along the Gulf of Carpentaria, north Queensland Peninsula, and Tasmanian coastlines are neglected, in order to ensure each station group corresponds to an approximately linear segment of coastline to better resolve the land–sea breeze after spatial averaging (e.g., Vincent and Lane 2016). These eight station groups are referred to as the coastal station groups.
To compare errors in the perturbations over the two coarser spatial scales, we modify the definition of DAE in Eq. (1) so that each perturbation dataset is first spatially averaged over either the city or coastal station groups. Confidence scores are calculated for the city and coastal station groups in the same way as for the individual airport stations, treating the spatially averaged data as a single time series. This provides a conservative way to deal with spatial correlation between the stations in each group (Griffiths et al. 2017).
To compare biases in the diurnal cycles of each dataset, we calculate the difference of biases (DB):
where the overbars denote temporal averages of the perturbations at a particular hour, over June, July, and August 2018. These temporally averaged perturbations can be viewed as the mean diurnal wind cycle over the 3-month study period for each dataset. Biases over the city and coastal station groups are calculated by taking the spatial average before the temporal average. Uncertainty in the DB is estimated through bootstrapping (Efron 1979). This is done by performing resampling with replacement on the underlying perturbation datasets, and calculating the DB 1000 times using these resampled datasets. This provides a distribution of DB values, which analogously to with DAE, we treat as a sample from a random variable B, and use this to estimate Pr(B > 0).
Note that on a given day, at a given location, wind perturbations do not necessarily reflect genuinely diurnal processes. There is a large degree of random turbulence in AWS wind observations, and convective cold pools or synoptic fronts can produce rapid changes in background winds that induce large perturbations. However, averaging multiple perturbations at a given hour over many days cancels out much of the variability not associated with diurnal processes. When this is repeated for each hour of the day, the signal that remains reflects the mean diurnal cycle (e.g., Figs. 10 and 11). Similar ideas apply to the DAE metric. Note that spatially averaging perturbations accomplishes a similar thing to temporal averaging, helping to cancel out random variability. These ideas can be explored with synthetic data, and some preliminary work to this end is available online (Short 2020).
Another approach to forecast verification is to assess structural features of the phenomena being forecast rather than errors or biases of point predictions; this approach is particularly important at small spatiotemporal scales (e.g., Mass et al. 2002; Rife and Davis 2005). Gille et al. (2005) obtained summary statistics on the observed structure of mean diurnal wind cycles by using linear regression to calculate the coefficients ui, υi, where i = 0, 1, 2, for the fits:
where ω is the angular frequency of Earth and t is the local solar time in seconds. These fits trace out ellipses in the (x, y) plane, and descriptive metrics like the eccentricity of the ellipse and the angle the semimajor axis makes with lines of latitude, can be calculated directly from the coefficients u1, u2, υ1, and υ2. Gille et al. (2005) applied this fit to scatterometer data, which after temporal averaging resulted in just four zonal and meridional values per location, and as such the fit performed very well.
However, Eqs. (3) and (4) do not provide a good fit for the hourly data considered here, primarily because they assume a 12-h symmetry in the evolution of the diurnal cycle. In practice, asymmetries between daytime heating and nighttime cooling (e.g., Svensson et al. 2011) result in surface wind perturbations accelerating rapidly just after sunrise, but remaining comparatively stagnant at night (e.g., Fig. 11). Thus, we instead fit the equations:
to the climatological perturbations, with α the function from [0, 24) × [0, 2π) → [0, 2π) given by
with t the time in units of hours UTC, and ψ providing the time when the wind perturbations vary least with time, noting that the same value of ψ is used for both the zonal and meridional perturbations. For each mean diurnal wind cycle, we solve for the seven parameters u0, u1, u2, υ0, υ1, υ2, and ψ using nonlinear regression.
Importantly, the metrics defined in this section compare just some aspects of the official forecast with model guidance: they do not, for instance, assess whether diurnal variance of the official forecast is more realistic than that of model guidance. Thus, any statements about performance made throughout this paper refer solely to the metrics defined here, and no claim is being made that these are sufficient to completely characterize the accuracy, or value to the user, of how the diurnal wind cycle is represented in competing forecasts. Furthermore, comparing results at different locations is not intended as a “ranking” of forecasting centers in different states because, for instance, station density varies significantly with location so it is hard to define station groups at a given spatial scale in a completely consistent way across locations.
In this section, the methods described in section 2 are applied to Australian forecast and station data over the months of June, July, and August 2018. First, mean differences in absolute errors (DAE) and differences in biases (DB) over this time period are assessed. Second, structural indices are compared to elucidate the physical reasons for biases. Unless otherwise noted, times are given in UTC.
a. Absolute errors
Figure 3 provides the mean difference of absolute error values and confidence scores defined in section 2 for the coastal station groups shown in Fig. 2. Results are given for the official forecast versus ACCESS, official forecast versus HRES, and official forecast versus OCF comparisons. The results indicate that for the majority of station groups and hours, the unedited ACCESS, HRES and OCF datasets outperform the official forecast. The lowest values occur at the Northern Territory (NT) station group at 2300 and 0000 UTC for both the official forecast versus ACCESS, and official forecast versus HRES comparisons, and at 2200 and 2300 UTC for the official forecast versus OCF comparison. Although the official forecast outperforms at least one of ACCESS, HRES and OCF at multiple times and station groups, the only group and time where it outperforms all three is 0500 UTC over the South Western Australia (WA) station group.
Figures 4 and 5 provide case studies of the Northern Territory (NT) and South Western Australia (WA) station groups, respectively. Figure 4a provides a time series of DAE for the NT station group at 2300 UTC. The time series shows significant temporal variability, with DAE frequently dropping below −2 kt. Figures 4b and 4c show hodographs of the winds and wind perturbations, respectively, at each hour UTC 3 July, which provides an interesting example. Note that care must be taken when interpreting perturbations and DAE scores on individual days physically, as discussed in section 2.
Figure 4b shows that the official wind forecast on this day was likely based on edited ACCESS from 0000 to 0600 UTC, then edited HRES from 0700 to 1300 UTC, then unedited ACCESS from 1500 to 2100 UTC. At 2200 and 2300 UTC, the official forecast winds acquire stronger east-southeasterly components than the other datasets. For comparison, Fig. 6a shows the first 10 values from wind soundings at Darwin Airport at 1200 UTC 3 July and 0000 UTC 4 July. In both instances the winds are east-southeasterly, and so the rapidly changing wind perturbations at 2200 UTC in the official forecast may reflect a boundary layer mixing edit that has been applied either too early, or has strengthened the southeasterly component of the winds too much. Similar issues appear to create the low DAE values on 8 June and 9 and 10 July.
Figure 5a provides a time series of DAE for the South WA station group at 0500 UTC. As with the NT station group there is significant temporal variability, with DAE frequently exceeding 1 kt. Figures 5b and 5c provide hodographs of the winds and wind perturbations, respectively, on 9 June, another interesting example. Both the raw winds and the perturbations appear to show both HRES and ACCESS underpredicting the amplitude of the diurnal wind cycle on this day, with OCF performing better in this regard. Figure 6b shows wind soundings at Perth Airport, the nearest station to provide wind soundings, between 1200 UTC 8 June and 1200 UTC 9 June. The 1200 UTC 8 June sounding shows surface northerlies of around 6 kt, becoming west to northwesterlies of over 20 kt at 2.4 km above the surface. However, the subsequent sounding at 0000 UTC 9 June shows that the winds acquire a strong northerly component of 30 kt in the first 500 m of the atmosphere, with the final sounding indicating a strong northwesterly wind at 725 m persisting until 1200 UTC.
In Fig. 5c, the OCF and official forecast perturbations from 0400 to 0700 UTC show stronger westerly perturbations than either ACCESS or HRES, improving the magnitude of both dataset’s perturbations. However, the AWS perturbations are more northerly than those of the official forecast or OCF. Possible explanations for this discrepancy are that the official forecast has been edited based on the 1200 UTC 8 June sounding, with the winds above the surface changing direction in the subsequent 12 h, or that the official forecast has been based on OCF, which underestimates the northerly component of the perturbations.
Figure 7 presents the values and confidence scores for the city station groups, for the official forecast versus HRES and official forecast versus OCF comparisons; the official forecast versus ACCESS comparisons (not shown) are similar to those for HRES and have been omitted for space. Both HRES and OCF outperform the official forecast almost uniformly, with the Darwin city station group the main exception. At Darwin, the official forecast outperforms both HRES and OCF at 0200 UTC, and there is ambiguity at some other times of day. The OCF comparison shows less ambiguity at Darwin, but more at Melbourne and Brisbane. The city station group results for December, January, February 2017/18 (not shown) are similar but slightly more ambiguous, particularly for ACCESS. These results were replicated using alternative city station groups, defined by taking all stations within 100 km × 100 km boxes centered on each capital city airport: the results (not shown) were very similar, with both HRES and OCF almost uniformly outperforming the official forecast.
Figure 8 presents the comparisons for the airport stations. Here the results are noisier than at both the city and coastal spatial scales, but similarities also exist. For instance, the official forecast outperforms both OCF and HRES at 0200 UTC at Darwin Airport, the Darwin city station group, and the NT coastal station group with at least 90% confidence. There are four other instances where the official forecast outperforms HRES with at least 90% confidence, although this could simply be occurring by chance due to repeated testing (Wilks 2011, p. 178). By contrast, the official forecast outperforms OCF over 4-h intervals at both Perth and Brisbane airports.
b. Seasonal biases
Figure 9 provides the difference of biases (DB) and confidence scores defined in section 2, for the coastal station groups, for the official forecast versus ACCESS, official forecast versus HRES, and official forecast versus OCF comparisons. At the NT station group at 0300 UTC, the official forecast outperforms both ACCESS and HRES with confidence ≥ 93%. However, ACCESS, HRES and OCF each outperform the official forecast at 2300 and 0000 UTC, and from 0600 to 1000 UTC, consistent with the results of Fig. 3. Figure 10c shows that these DB results reflect amplitude biases in the official forecast’s mean diurnal cycle.
At the South WA station group from 0100 to 0500 UTC, the official forecast outperforms HRES with confidence scores of at least 88%. Figure 11a shows that HRES underestimates the westerly perturbations at these times, with these perturbations potentially associated with boundary layer mixing processes, as discussed in section 3a. The official forecast, ACCESS and HRES all underestimate the amplitude of the diurnal cycle between 0200 and 1000 UTC, including both the westerly perturbations and the southerly sea-breeze perturbations. OCF better approximates the amplitude of the diurnal cycle between 0200 and 0500 UTC, but shows the greatest underestimation of the southerly perturbations between 0600 and 1000 UTC.
At the South Australia (SA) station group, the official forecast slightly outperforms ACCESS and HRES from 0200 to 0500 and 0900 to 1200 UTC, although confidence scores do not exceed 64% and 90%, respectively. The official forecast also slightly outperforms OCF between 0000 and 0200 UTC, and between 0800 and 0900 UTC, although confidence scores do not exceed 74%. Figure 11b shows that although the official forecast captures the amplitude of the perturbations from 0100 to 0500 UTC almost perfectly, its mean diurnal cycle is out of phase with that of AWS during this period, explaining the only slightly positive DB values.
For comparison, Figs. 12 and 13 present the DB values and confidence scores for the official forecast versus HRES and official forecast versus OCF comparisons, for the city station groups and airport stations, respectively. Some regions exhibit consistent results across all three spatial scales. For example, the official forecast outperforms HRES between 1400 and 1800 UTC, with at least 83% confidence, at Sydney Airport, the Sydney city station group, and the NSW coastal station group.
Other results are markedly different between spatial scales. For instance, the official forecast outperforms OCF for most of the day at Darwin Airport, but the opposite is true at the Darwin city and NT coastal station groups. Figure 10a shows that the mean AWS diurnal cycle is highly asymmetric, with a sharp peak occurring at 0600 UTC. This peak is captured well by HRES and the official forecast, but not by OCF or ACCESS. Figures 10b and 10c show that over the Darwin city and NT coastal station groups, the mean diurnal cycles are much smoother, with the amplitudes of the official forecast diurnal cycles exaggerated relative to AWS and OCF.
c. Ellipse fits
The hodographs in Figs. 10 and 11 are roughly elliptical in shape, suggesting that descriptive quantities can be estimated by fitting Eqs. (5) and (6) to the zonal and meridional mean perturbations, as described in section 2. Figure 14 gives the R2 values for the fits of the zonal and meridional perturbations to Eqs. (5) and (6), respectively. The fit performs best at the coastal station group spatial scale, with R2 generally above 95%.
Figure 15 provides four descriptive quantities based on the fits of Eqs. (5) and (6) to the mean perturbations: these are maximum perturbation speed, eccentricity of the fitted ellipse, angle the semimajor axis makes with lines of latitude, and the time at which the maximum perturbation speed is achieved.
Figure 15a shows OCF has mean diurnal cycle amplitude biases at the airport station scale, with the exception of Hobart. These biases persist, but are smaller, at the city station group scale, but are absent at the coastal station group scale, with the exception of Queensland (QLD). Given that OCF represents a blended average of multiple model guidance datasets (Engel and Ebert 2007), and that OCF’s gridding process involves additional interpolation steps (Bureau of Meteorology 2008, 2012), this result is perhaps not surprising: at the individual station scale OCF has undergone more smoothing than ACCESS or HRES, but at the coarser spatial scales this lessens in importance as all datasets undergo comparable smoothing. Note that this does not mean OCF’s overall wind speeds or directions are biased at the individual station scale, only the amplitude of OCF’s mean diurnal cycle, subject to how mean diurnal cycles are treated in this study.
Considering specific locations, Brisbane provides an interesting example, as Fig. 15a) shows that at Brisbane Airport the maximum AWS perturbation is at least 1 kt greater than the official forecast, ACCESS and HRES, and 3.5 kt greater than that of OCF. Furthermore Fig. 15c shows that the orientation of the AWS fitted ellipse is at least 20° anticlockwise from that of the other datasets.
Figures 16a and 16b show hodographs of the Brisbane Airport mean perturbations and ellipse fits, respectively. Although the ellipse fits suppress some of the asymmetric details, they capture the amplitudes and orientations of the real mean diurnal cycles well. In this case the results show that the mean AWS sea breeze approaches from the northeast, whereas the official forecast, HRES, ACCESS and OCF sea breezes approach more from the east-northeast. The amplitude of OCF’s mean diurnal cycle is significantly weaker than those of the other datasets.
To check whether these results just represent a direction bias of the Brisbane Airport weather station, Fig. 16c shows the mean diurnal cycle at the nearby Spitfire Channel station (see Fig. 2). While the amplitude biases are slightly smaller at Spitfire Channel than Brisbane Airport, the directional bias is at least as high. A similar directional bias is evident at the nearby Inner Beacon station (not shown), although the bias is smaller than at Spitfire Channel and Brisbane Airport. Similar biases are also evident at these stations in analogous figures for December, January, and February 2017/18 (not shown), with the semimajor axis of the official forecast’s ellipse fit oriented 29° clockwise from AWS’s at Brisbane Airport. Figure 2 shows there are two small islands to the east of Brisbane Airport; the more north-northeasterly orientation of the Brisbane Airport sea breeze suggests these islands may be redirecting winds between the east coast of Brisbane and the west coasts of these islands, and that this local effect is not being captured in the official forecast, ACCESS, HRES or OCF.
The South WA station group provides another interesting example, as Fig. 15 shows the semimajor axes of the ACCESS, OCF and official forecast ellipse fits are oriented at least 48° anticlockwise from those of the AWS and HRES ellipse fits, and the HRES perturbations peak between 1.2 and 4 h after the other datasets. Figure 11a shows that these differences occur because the westerly perturbations, potentially associated with boundary layer mixing, are weaker for HRES than for the other datasets, resulting in HRES’s semimajor axis being oriented more meridionally. Analogously, the southerly perturbations, potentially associated with the sea breeze, are stronger for AWS than the other datasets, with a similar effect on orientation and timing as with HRES. Similar points can be made for the Victorian (VIC) and NT coastal station groups, and at Darwin Airport.
For land–sea-breeze and boundary layer mixing edits to reduce absolute errors in the subsequent day’s wind forecast, these edits should reduce the absolute errors in the diurnal component of the wind fields. However, Figs. 3, 7, and 8 indicate that this is generally only possible when absolute error is considered at coarse spatial scales, as at individual airport stations results are generally noisy and ambiguous, and over the intermediate city station group scale model guidance outperforms the official forecast almost uniformly.
Taking the effective resolutions of the models considered in this study to be approximately 7Δx (e.g., Skamarock 2004; Abdalla et al. 2013), where Δx is the horizontal grid spacing, the effective resolutions of ACCESS and HRES are ≈84 and ≈63 km, respectively. From resolution considerations alone, one might expect that forecaster edits would be able to reduce errors at the individual airport station scale, and the intermediate city station group scale (see Fig. 2), as motion at these scales is unresolved or only partially resolved by ACCESS and HRES.
To further investigate the effect of spatial scale on error, consider first just the zonal components of the AWS and official forecast wind perturbations, denoted by uAWS and uO, respectively. Considering just the values at a particular hour UTC, over the entire June, July, and August time period, the mean square error can be decomposed mse(uAWS, uO) =
where var, cov, and the overbar denote the sample variance, covariance, and mean, respectively. The first three terms are the variance of uAWS − uO (i.e., the error variance) and the last term is the square of the bias between uAWS and uO. Equation (8) can also be applied to the mean square errors (MSEs) of ACCESS, HRES, and OCF. Note that the MSE is closely related to and the squared bias components of the MSEs are closely related to DB.
Figure 17 shows the terms of Eq. (8) for both the official forecast and OCF, for Brisbane Airport, the Brisbane city station group, and the QLD coastal station group. At all three scales the official forecast varies more than OCF. The official forecast also generally varies more than ACCESS and HRES (not shown), and this is also true for the other stations and station groups considered in this study.
At Brisbane Airport the variance of AWS is significantly larger than either the official forecast or OCF. This additional variability is mostly uncorrelated to either dataset. Although the covariance between the official forecast and AWS increases between 2000 and 0800 UTC, the increase is not sufficient to offset the official forecast’s additional variance, and the error variances are thus of comparable magnitude for both the official forecast and OCF.
The larger AWS variances are unsurprising from representation considerations alone (e.g., Zaron and Egbert 2006), as the official forecast and OCF data represent averages over 6-km spatial grid cells, whereas the AWS data represent point values. As a result, error variance terms are generally much larger than the squared bias terms at this scale. The exception is OCF at 0400 UTC, where the squared bias is ≈6 kt, while error variance is ≈15 kt. This results in a higher MSE for OCF than the official forecast around 0400 UTC, consistent with the airport station results of Figs. 8c and 8d.
At the intermediate Brisbane city station group scale, the AWS variances are again larger than those of OCF, but of comparable magnitude to those of the official forecast, with the official forecast’s additional variability again mostly uncorrelated to AWS. This results in larger error variance terms for the official forecast, consistent with OCFs almost complete outperformance of the official forecast in Figs. 7c and 7d. However, OCF’s squared bias terms remain larger than the official forecast’s, resulting in OCF’s MSE slightly exceeding the official forecast’s at around 0400 UTC. These results are consistent with Figs. 7c and 7d, where the official forecast slightly outperforms OCF at 0400 UTC with a confidence score of 79%.
Over the coarse QLD coastal station group scale, variances in all three datasets are small enough that the error variance terms are less dominant over the bias terms. Although the error variance of the official forecast is still larger than that of OCF, OCF’s zonal biases around 0400 UTC are again sufficient to result in larger MSEs around this time. When considered with the analogous plots for the meridional perturbations (not shown), for which OCFs squared bias terms peak slightly later, the results are consistent with Figs. 3c and 3d.
Analogous points can be made for the other locations and datasets considered in this study. At the airport station scale, AWS variance is generally significantly higher than that of the official forecast and model guidance, producing high error variance and likely explaining why the airport station DAE results of Fig. 8 are comparatively noisier than those of the city or coastal station group scales. Interesting exceptions include OCF at Brisbane and Perth airports, where amplitude biases in OCF’s diurnal cycle are sufficient to affect airport station DAE scores.
At the city station group scale, the official forecast is generally outperformed by HRES and OCF in the results of Fig. 7, and in the analogous comparisons with ACCESS (not shown). This occurs because the official forecast is generally more variable than model guidance, and this additional variability is mostly random, in the sense of being uncorrelated with AWS. At the coastal station group scale, random variability in each dataset is reduced, and biases are sufficiently large relative to error variance to affect the results of Fig. 3.
These results suggest that switching model guidance products or performing edits can add more random noise to the diurnal component of the official forecast than what can be offset by reductions in bias, or improved correlations with AWS. Because the official forecast is built from multiple model guidance datasets, switching between datasets with different means will tend to produce greater variance than any of the component datasets. If the choice of model guidance is made primarily on which model best captures more slowly evolving, larger amplitude synoptic scale features, then switching model guidance may add random variability to the diurnal component of the official forecast. Furthermore, unless all forecasters follow identical thought processes when making edits, the edits will also add random variability.
These results could have implications for forecasting practice. Model guidance products are indeed biased in how they resolve diurnal wind cycles (e.g., Fig. 16), and there is therefore scope for forecaster edits to reduce these biases. However, editing model guidance generally fails to reduce error in the forecast diurnal signal, even at scales finer than the effective resolutions of the models, as at these scales diurnal cycles are significantly masked by random variability. Averaging over large areas reduces this random variability, better revealing the diurnal cycle, and so biases have a greater impact on forecast error. However, even at large scales Fig. 3 shows model guidance still outperforms the official forecast more often than not.
Reducing the random variability of the official forecast, or the model guidance datasets that comprise it, could therefore improve the capacity of these types of edits to reduce error in the diurnal cycle. One way to accomplish this would be to use an ensemble average model guidance product like OCF, another would be to further postprocess model guidance products, such as by averaging multiple time steps around the hour, before including them in the GFE.
In this study we have presented methods for assessing the diurnal component of wind forecasts, with the intended application being the assessment of the edits Australian forecasters make to model guidance datasets to better resolve land–sea-breeze and boundary layer mixing processes. We considered both errors and seasonal biases at each hour UTC, over three spatial scales, but the methods are immediately generalizable to other spatiotemporal scales. Crucially, the results of this study depend on the metrics and methods chosen, and no claim is being made that these are sufficient to completely describe the overall accuracy, or value to the user, of competing forecasts.
When the methods are applied to Australian forecast data, the results indicate that the official edited forecast only produces lower absolute errors in the diurnal wind signal when wind perturbation data are averaged over the coarse “coastal station group” spatial scale (see Fig. 2) of 500 × 100 km2 to 2000 × 100 km2. Even at these scales, reductions in error are isolated to particular locations and times of day, and the official forecast rarely has lower mean absolute error than the three model guidance products considered in this study simultaneously.
By contrast, the official forecast can produce lower seasonal biases than model guidance at all three spatial scales, but again, it rarely produces lower biases than the three model guidance products considered here simultaneously. Reduced seasonal biases do not translate into reduced errors at the two smaller spatial scales because the diurnal cycle is mostly masked by the random variability in each dataset. Furthermore, because the official forecast generally exhibits much greater random variability than model guidance, model guidance almost uniformly outperforms the official forecast over the intermediate 50 × 50 km2 to 200 × 200 km2 city station group spatial scale.
We also compare structural features of the mean diurnal wind cycles of each dataset by fitting modified ellipses to their temporal hodographs, then deriving metrics from these ellipses. This approach reveals structural biases in the official forecast, including directional biases in the approach of the sea breeze at Brisbane Airport, and amplitude biases along the southwest coast of Western Australia.
Future research could extend this study in multiple directions. One approach would be to study how the difference of absolute errors (DAE) metric defined in this study responds to synthetic, or idealized model data, so that the influence of random and synoptic variability can be better understood: some preliminary work to this end is available online (Short 2020). Another important question is whether the random variability in the official forecast, or the model guidance products that comprise it, can be reduced through ensemble forecasting or postprocessing, as reducing random variability would both decrease errors, and could increase the value of land–sea-breeze and boundary layer mixing edits. The BoM’s Operational Consensus Forecast (OCF) is an effective way to accomplish this, and future work could assess whether it is possible, or desirable, to adjust OCF’s wind algorithm to reduce the amplitude biases identified in OCF’s mean diurnal cycle, noting that these biases are subject to how the mean diurnal cycle has been defined in this study. Another goal could be to identify precisely the spatiotemporal scales at which diurnal wind cycles can be separated from random variability, so as to better understand the scales at which land–sea-breeze and boundary layer mixing edits can reduce error in a forecast.
Funding for this study was provided for Ewan Short by the Australian Research Council’s Centre of Excellence for Climate Extremes (CE170100023). Datasets and software were generously provided by the Australian Bureau of Meteorology’s Evidence Targeted Automation team, with additional code available online (Short 2019). Thanks are due to Michael Foley, Deryn Griffiths, Nicholas Loveday, Ben Price, and Alexei Hider for providing support at the Bureau of Meteorology’s Melbourne and Darwin offices; and to Craig Bishop, Todd Lane, and Claire Vincent from the University of Melbourne; and Carly Kovacik from the U.S. National Weather Service for some helpful conversations.