NWP Grid Editing at the Met Office

E. B. Carroll Operations Centre, Met Office, Exeter, United Kingdom

Search for other papers by E. B. Carroll in
Current site
Google Scholar
PubMed
Close
and
T. D. Hewson Operations Centre, Met Office, Exeter, United Kingdom

Search for other papers by T. D. Hewson in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Over the past few years the way in which central forecast guidance is disseminated by forecasters in the Met Office headquarters has been changing, with an increasing reliance on modification of variables’ output from NWP models. The editing of grids of forecast data at the Met Office Operations Centre in Exeter is described, and two case studies are presented. Results of verification of modified versus raw fields are shown, presenting the concept of “lead time gain” as a unifying measure of relative forecast accuracy. At all lead times net lead time gain outweighs time spent considering and effecting modifications.

Corresponding author address: Edward B. Carroll, Operations Centre, Met Office, FitzRoy Rd., Exeter EX1 3PB, United Kingdom. Email: edward.carroll@metoffice.gov.uk

Abstract

Over the past few years the way in which central forecast guidance is disseminated by forecasters in the Met Office headquarters has been changing, with an increasing reliance on modification of variables’ output from NWP models. The editing of grids of forecast data at the Met Office Operations Centre in Exeter is described, and two case studies are presented. Results of verification of modified versus raw fields are shown, presenting the concept of “lead time gain” as a unifying measure of relative forecast accuracy. At all lead times net lead time gain outweighs time spent considering and effecting modifications.

Corresponding author address: Edward B. Carroll, Operations Centre, Met Office, FitzRoy Rd., Exeter EX1 3PB, United Kingdom. Email: edward.carroll@metoffice.gov.uk

1. Overview

Since 1997, grids of numerical weather prediction (NWP) output have been modified by forecasters at the National Meteorological Centre, Bracknell, United Kingdom, and now the Operations Centre at the Met Office’s new headquarters in Exeter. An application that allows modification of fields via quasigeostrophic potential vorticity (Carroll 1997) has been in use, allowing forecasters to reposition, intensify, or weaken features such as depressions and fronts. Working on a large-scale domain that corresponds with the standard analysis and forecast guidance products, ASXX, FSXX forecasters modify, when necessary, output from the global model at a resolution of about 50 km out to 132 h. Synoptic features are manipulated via their associated potential vorticity (PV) along with the boundary temperature; the PV is then inverted to give a new balanced and realistic set of dynamic fields. Wind is retrieved by remapping the ageostrophic component and adding it to the recalculated geostrophic part of the wind, while a simple boundary layer model helps recalculate low-level winds. Humidity and precipitation have the same remapping function applied as the PV, ensuring that these water parameters retain a realistic distribution in relation to the dynamic forcing features. The whole process is quick and ensures that the new fields are realistic, the PV approach imposing a dynamical consistency.

More recently, since October 2002, a new application has been in use that extends the functionality to allow modification of more parameters on a finer grid; the focus here being mainly sensible weather on the mesoscale up to 36 h ahead. The two applications are referred to as field modification I and field modification II, respectively.

To check whether modifications made are on average improving the forecast, and to provide feedback, a wide range of verification statistics, both objective and subjective, have accumulated as field modification has been used operationally. Wherever possible validating data consist of observations, rather than model analyses, which could introduce bias. As regards subjective verification, we tried to make this as “fair” as possible by using a different verifier every day, taken from a pool of about 10 forecasters, none of whom ever issues any forecasts of the types being verified. While it is in the nature of subjective techniques that consistency of judgment and freedom from individual bias cannot be guaranteed, they add useful information that complements that obtained by objective verification. Results are discussed extensively in Hewson (2004); some are reproduced here. Note that instances of “no modifications made” are included in every statistic.

2. Field modification 1

a. Practice

Central guidance forecasts are generated at Exeter from two positions: the chief, who determines the official forecast out to about T + 30, and the deputy chief, who fills a similar role for lead times beyond 30 h. Field modification I was developed to allow modification of the primary guidance fields, shown in Table 1. These are used to generate the traditional front and isobar forecast charts; key variables also appear on the Met Office Intranet, including precipitation, 850-hPa wet-bulb potential temperature, 10-m wind, and 700-hPa humidity. These products supplement the long-standing mix of text guidance and fronts and isobars charts.

The two main scenarios in which forecasters modify fields are those in which

  1. there are errors in the model analysis or early forecast fields and

  2. output from the model lies toward one end of the range of solutions from different modeling centers.

Modifications made at one time can be time-linked so that a change, for example to the position of a low center, can be applied with a user-specified time evolution and system-relative position so that fields at other times are changed consistently and step changes avoided.

Situations in which scenario 1 is the main consideration are of more direct relevance to the shorter periods, and only occasionally result in significant changes to the synoptic-scale fields. Scenario 2 is more likely to lead to modifications at longer time scales, as models tend to diverge further from each other. Young and Carroll (2002) have described situations in which fields are changed at longer ranges, including examples.

b. Case study—Depression of 15 October 2002

In the early hours of 14 October 2002, a shallow depression to the west of Spain was predicted by the Met Office’s Global Model to run northeast into the Low Countries with little development, bringing rain to southern parts of the United Kingdom but with only light winds associated. Using a combination of satellite imagery, which showed the type of cloud-head structure normally associated with cyclogenesis, and model output from other centers, field modification I was used to generate quite an intense depression over central southern England. Figure 1 shows the model T + 36 mean sea level pressure (mslp) and 10-m wind compared with the modified versions, along with the verifying analysis and wind observations. Table 2 shows the maximum mean 10-m winds in sea areas adjacent to England and Wales upon which gale and strong wind warnings are based.

Points to note include the following:

  • The modified fields, while good with the central pressure and better in positioning than the raw data, did not locate the low center far enough west.

  • The modified 10-m winds for the English Channel offered good guidance, with winds in excess of 30 kt, as compared to the 5–10-kt values in the raw fields. There were reliable reports of 30–35 kt, with a single ship observation in the sea area near Plymouth of 50 kt; this is treated as less reliable than the other fixed observations though, since in most ship observations wind is estimated from state of sea rather than directly measured.

  • While underestimating winds in the western sea areas and overestimating them in the eastern ones, in all sea areas the modified fields were closer to observations than were the raw fields. However, over some of the land areas of NE France and the Low Countries, raw winds were closer to observations.

The modified fields were judged subjectively to be superior since they resulted in timely forecasts of windy conditions across the region that would otherwise not have been given, with a relatively small region of false alarms compared with the area of improvement. It is interesting to note that the root-mean-square (rms) error of the modified pressure field was very similar to that of the unmodified field. This highlights a characteristic of this statistic, which is to penalize small positional differences for systems that are forecast to be intense much more than for systems that are forecast to be weak. It is partly for this reason that subjective verification—which can intrinsically overcome such problems—forms a vital component of the verification scheme. An alternative approach would be the objective verification of wind strength, while work on techniques for identifying separate location and pattern error components (Ebert and McBride 2000) offers the prospect of objective techniques that address the inadequacies of standard statistics such as rms error.

Modifications as large as those illustrated by Fig. 1 occur only rarely at such a short lead time—perhaps two or three times a year—though crucially an intense low is always involved. At longer lead times, substantial modifications are made more often—perhaps 50 times a year. This needs to be borne in mind when interpreting verification statistics presented below, as occasions when no changes were made are always included. The modification impact will always be “watered down” by including these null cases, though this effect will be seen much less at longer leads.

c. Verification

Objective verification covers most of the parameters in Table 1, using both model analysis and observational data [sondes and synoptic observations (SYNOPs)] as “truth,” and is performed twice per day. Figure 2 shows some of the results. Subjective verification is done once per day and focuses on mean sea level pressure and frontal positions, using the forecaster’s analysis charts as truth. Mean sea level pressure in these analyses is derived from an on-screen analysis of all available surface data, quality controlled by the forecaster, and blended with model background fields. In principle the process is similar to the model analysis technique, being essentially an analysis of departures from a first-guess field, added on to that first-guess field, including wind as well as pressure observations if required. However, it differs in a number of ways from the NWP analysis product; human judgment is used as to whether observations should be included in or rejected from the analysis and it is often exercised using other information such as satellite imagery and observation past record. In addition, there are occasions on which the model will not accommodate observations that are judged to be good, because they differ too much from the short-period forecast, and sometimes the model’s 60-km resolution cannot fully capture some intense smaller-scale cyclones. As such, the human analysis regularly exhibits smaller discrepancies compared to observations than does the model analysis. The area used throughout is effectively NW Europe, centered on the United Kingdom. Some key results from the verification are presented here:

  1. Modifications improve the frontal and surface pressure fields, on average, from T + 48 onward. As Fig. 2b shows, the greatest positive impact is at longer lead times [subjective verification results in Hewson (2004) support this]. This is mainly because model divergence increases with lead time, and because forecasters usually move synoptic features toward a weighted multimodel consensus solution. Impact is neutral at lead times less than T + 48 (though as indicated earlier such fields are changed much less often). Figure 2a shows smoothed T + 36 monthly mean rms error results from the objective scheme, indicating that there are some large month-to-month variations. Occasionally one very large change will dominate the monthly figure. However, it is more typical for the monthly figure to reflect the net impact of say 5–15 modifications of smaller magnitude.

  2. The relative humidity field at 700 hPa, being a good large-scale proxy for cloud, fronts, and significant weather, shows a similar signature of forecaster improvements to the surface pressure field (see Fig. 2c); although the magnitude of the improvement is larger. This suggests that the forecasters are more skilled at repositioning bands of significant weather than they are at redefining the pressure field features. This final observation, however, must not be taken as a pointer toward abandoning pressure modifications; dynamical consistency, such as keeping a front within an isobaric trough, will always be an important prerequisite of forecast output. Indeed the PV inversion intrinsic to field modification will successfully move the pressure field along with frontal humidity and precipitation. It is more likely to be the difficulties of correctly specifying depth and position of low centers (as in the Fig. 1 example) that account for the smaller improvements seen in pressure.

  3. Over time, while there are good and bad months, the positive impact of modifications has generally been increasing, at least for shorter lead times. Trend analysis of plots such as Fig. 2a shows statistically significant improvement (downward) in about 50% of the cases; the other 50%, generally for longer lead times, show no significant change. These results probably reflect the increasing availability and timeliness of output from other relatively accurate numerical models, which is particularly relevant for the shorter-range forecasts that are issued first. The especially useful American Global Forecast System (GFS) model, for example, is run four times per day and its output now arrives in a timely fashion at the Operations Centre through various routes (including the Internet). In 2001, there were only two runs per day and these often reached forecasters too late to be utilized. Increased forecaster experience and software familiarity may also be contributory factors.

  4. Objective verification for 500- and 250-hPa heights against sonde data shows a broadly neutral impact of modifications on rms errors at all lead times. Plots for these variables (not shown) are structurally almost identical to Fig. 2b, albeit with “raw” and “mod” closer together at longer leads. This indicates that the quasigeostrophic assumptions implicit in the PV inversion process intrinsic to field modification do not adversely affect fields remote in height from the lower-level fields being targeted by the forecaster. It should be noted here that forecasting practice in the United Kingdom differs somewhat from that found in North America: at longer lead times U.K. forecasters retain a focus on the lower-tropospheric pattern, whereas in the United States emphasis shifts toward upper levels. It is not unreasonable to expect to see the greatest forecaster impact at the level of focus; so if U.K. forecasters had adopted the North American approach, we would probably have seen a proportionately greater impact at 500 hPa than at 1000 hPa.

3. Field modification II

a. Practice

The mesoscale formulation of the Met Office’s unified model (Cullen 1993; Cullen et al. 1997) is the Met Office’s primary forecasting tool for the shorter ranges. It is run every 6 h with output available from about T + 2 h and 45 min (e.g., 0845 UTC for a 0600 UTC integration) and has a grid length of 12.7 km. Increased reliance on the sensible weather parameters predicted directly from the model rather than inferred from other diagnostics has led to further development of the field modification application and an augmented parameter list (Table 3), offering an expanded range of tools, and allowing a higher-resolution grid on a U.K.-sized domain to be edited. Data are extracted on a 22-km grid for the purposes of modification and, subsequently, compared against raw data on the same grid when assessing verification. Since October 2002 the chief forecaster has generated, as an integral part of the 6-hourly guidance, a set of modified fields based primarily on the mesoscale model. He or she has about 20 min from time of receipt of the output to make modifications and save the modified data, in addition to writing comments, for example, on confidence, with the modified fields available from T + 3 h and 10 min. Despite the increasing reliability of the output, there are still instances in which the model can be seriously wrong, and modifications are often in response to errors in initial conditions, especially when cloud or precipitation have been poorly initialized; this, despite the fact that radar and satellite imagery are assimilated. In addition, a further category of modification is often made to counteract the effect of perceived systematic errors, such as overforecasting of areas of light rain. It has always been the practice of the chief forecaster to comment on the NWP output in terms of its reliability or perceived tendency to over- or underplay a particular aspect of the weather in a given situation. The new guidance allows a more direct approach in which explicit adjustments are made to the data themselves rather than addressed by text commentary.

A wide range of tools is at the disposal of the forecaster, with operations that fall into three main categories:

  1. Dynamical modifications, whereby one operation results in changes to the full range of fields achieved by the PV approach, usually invoked to correct position and intensity errors in synoptic features, relatively rarely for the short period.

  2. Individual changes to other variables, in particular cloud and precipitation, which have simple interparameter linkages (e.g., to ensure that there is realistic cloud where precipitation has been added). Minor changes to precipitation and cloud fields are made quite commonly. Other quantities that can be changed directly and independently include visibility, 1.5-m temperature, and 10-m wind, though this is rarely done.

  3. Merging model output from two different models, or from successive runs of the same model. This type of change can be applied selectively to individual parameters or to all variables, and with varying weights between the two models. For instance, in a situation in which it is considered that one model is better in the early stages, whereas another becomes the favored solution by T + 36, it is possible to achieve a set of fields that merges smoothly from one to the other over a defined period.

For category 1, the forecaster has control over the vertical and horizontal influence of the change, using a slider bar and mouse pointer to input intensity and positional changes respectively.

With category 2, a comprehensive set of tools allows the forecaster to apply operators (multiplication, addition, subtraction) to selected parameters over a specified region to achieve, for example, a painting in of a rain area, or a thinning out of a cloud region. The variability of a parameter within a selected region can also be increased or decreased by using a standard deviation multiplier, which allows the user to make the chosen field more or less variable by altering the deviation of each gridpoint value from the mean in the chosen area by some factor; values larger than 1 have the effect of making the field less smooth.

Alternatively, cloud and rain may be advected using a wind level and decay period selected by the forecaster. Precipitation can be pasted into the model grid from radar data, before being advected by the forecaster with a chosen decay period.

In addition, if it is considered that the model is under representing orographic enhancement of precipitation, this can be increased. To do this, the system makes use of the horizontal wind field and the slope of the orography to derive a vertical wind component. This is used in combination with the humidity profile to derive an orographic enhancement potential, which can be added in varying proportions to selected regions. Of course, if the model is doing well, these processes will be implicit within its representation of the weather, but it is sometimes noted that orographic enhancement is underdone, especially with the relatively smooth orography of the coarser-resolution global model.

b. Case study—Snow of 30 January 2003

On 30 January 2003, a northerly airstream covered the United Kingdom, with a small low pressure area running southward over the North Sea, close to East Anglia. The mesoscale model was forecasting a mixture of snow and rain showers over the North Sea, with very little inland penetration of snow elsewhere. It generated a band of light precipitation associated with an occluded front, extending from East Anglia down across the London area for the afternoon and evening period, though with negligible amounts of snow within it.

The NWP fields on this day were modified to bring significant amounts of snow inland across England, both in airmass showers coming in across the eastern half of northern England, and with the more organized band of precipitation affecting parts of East Anglia and the southeast. The modified fields were used as guidance by forecasters at the regional Met Office in London to change the first-guess automatic forecast produced in the morning for the Highways Agency and local authorities to be much more pessimistic. Figure 3 shows the 12-h mesoscale model forecast from 0600 UTC, along with the modified equivalent and the weather radar composite for 1800 UTC. Figure 4 shows a photograph taken on the following day at Harpenden, marked × in Fig. 3. Points to note are that

  • airmass convection was allowed in the modified fields to come well inland from Lincolnshire northward, and

  • the band of precipitation associated with the occluded front was intensified and turned to snow, extending down to the south coast.

In this instance, the modification was made partly on the basis of experience of model deficiencies (inadequate inland penetration of wintertime maritime convection) and partly on the basis of the interpretation of the satellite imagery. The system over the North Sea at the time of the forecast issue looked active enough to retain its vigor for some time and to be of sufficient intensity to generate snow rather than rain. So snow distribution was estimated from the satellite imagery at 0900 UTC, pasted into the T + 3 field and advected through the forecast period with a slow decay, the rate of which was chosen by the forecaster, along with the level of the advecting wind.

In the event, a fall of snow paralyzed parts of East Anglia and southeast England, the weather making the top story nationally in both newspapers and the broadcast media as the M11 motorway in particular was badly hit—thousands of motorists were stranded, some for more than 20 h (The Guardian, 1 February 2003). The Met Office came out of the event well, having issued the crucial forecast in good time.

c. Verification

Due to the provision of higher-resolution graphical guidance by field modification II, verification needed to be more focused than it did for field modification I. The detailed cloud and precipitation structure, including precipitation type, which should be correctly forecast more often at short ranges, are additional parameters that are verified. Validating data for the objective verification comprises surface observations of current weather, low cloud, and hourly rainfall. For the subjective verification, model surface pressure analyses as well as satellite and radar imagery are used. Radar data have been calibrated in real time against gauges, and are depicted as rainfall rates using the same color scale as in the raw and modified fields. In conjunction with surface observations (incorporated to help address any remaining radar deficiencies), this allows the verifier to see whether the rainfall areal coverage is too large or too small, and whether forecast rates are too high or too low. In practice, the verifier uses an automatically generated three-frame Web page animation, showing an extended U.K. area. One frame shows the unmodified forecast, another the modified (in an identical format), and a third equivalent validating data; by toggling, the verifier can examine and compare directly how each forecast has performed. For the purposes of trying to avoid bias, this setup is viewed as near optimal. Examples from the three-frame animation can be found in Hewson (2004).

The subjective verification is performed twice daily, for the 0600 and 1800 UTC model forecasts, each time referencing four forecast frames, divided into pairs: T + 6 and 12 and T + 18 and 24. Using the Web data, the verifier responds to questions, summarized below, via a Web form:

  1. For the purposes of giving an appropriate graphical impression of the weather experienced, which forecast was better—modified, unmodified, or neither?

  2. If the answer to question 1 is not “neither,” then what was it, specifically, about the good forecast that made it better? (Several aspects can be indicated, though commonly only one is; all possible responses are shown on Fig. 5.)

  3. From the perspective of hazardous weather, were there serious errors in either the modified or unmodified forecast?

It is clear from the above questions that the subjective verification imposed a priori knowledge, for the verifier, of which were the modified and which the unmodified forecasts. While it is possible that this introduced a degree of bias whereby the verifiers favored their fellow forecasters’ modifications, the authors’ view is that this effect was minimal. A number of cases were revisited and a few “discrepancies” were uncovered, in both directions; in the interests of integrity, all such cases were left unchallenged.

During the verification period, which lasted about 12 months and contained 666 verified forecasts, responses to question 1 indicated that about 70% of forecasts were either not modified, or were modified in such a way that there was no net improvement or degradation. Considering the remaining 30%, the forecasters were four times more likely to improve the forecast than they were to make it worse. Figure 5 shows this 30%, dividing modification types that had a positive impact (left of the bar) from those that had a negative impact (right), with labeled categories highlighting the responses to question two above. The following conclusions can be drawn:

  1. Forecasters are much more likely to improve the areal cloud coverage than make it worse, and they are more likely to do this by adding cloud. This is fully consistent with objective verification results (shown in Hewson 2004), where part of the forecaster’s success is attributed to reducing a negative bias in the raw fields (which probably occurs most commonly with stratocumulus).

  2. It is difficult to improve the mean sea level pressure field, although this parameter seems to be rarely changed (2%–3% of the time).

  3. Precipitation timing errors are much easier to correct at very short range, though some skill is still apparent at T + 18 and 24.

  4. Forecasters are good at improving the model’s areal coverage of precipitation; at longer lead times this generally involves a reduction (often this is done in cold-air convection, or in light rain and drizzle in warm sectors).

  5. Attempts to change precipitation intensity are generally successful, with a clear bias toward successful enhancement of rates at short lead times (again this is often in cold-air convection, where convective parameterization produces a rate distribution that is usually too peaked at low rates).

Although objective and subjective verification broadly agree, one area of apparent disagreement is in precipitation rates, which conclusions 4 and 5 above suggest were better in the modified forecast. Figure 6 shows (objective) equitable threat scores (see Jolliffe and Stephenson 2003) for two rate thresholds. In neither case is modified better; at the higher threshold (2 mm h−1), performance is about the same, while, at the lower (0.5 mm h−1), from T + 12 onward modified fields score worse. There are many possible reasons for this discrepancy. One is that the primary output of the forecaster is graphical, and the subjective verification is better able to assess the ability of that graphical output to convey a visual impression for users that is consistent with the eventual weather. Another concerns the validating data for objective verification. As instantaneous rates are what is forecast, but not what is recorded, hourly rainfall totals for 2 h spanning the verification time are averaged out to give a nominal validating rate in millimeters per hour. Possible impacts of this discrepancy are complex, though because of the nature of model systematic errors, particularly in marine convection, it may on occasion predispose raw fields to objectively verify better despite the fact that radar-modified fields look better. Nevertheless there is also separate evidence (not shown) of a greater positive bias in the modified field rate forecasts, which will in itself detrimentally impact the equitable threat score by escalating the false alarms, and this clearly does need to be addressed. This undesirable characteristic of the modified fields has been fed back to forecasters for corrective action. They are also being helped by providing new field modification tools that are better tailored to address systematic model errors.

It is appreciated that the methodology used here for verifying instantaneous precipitation rates is suboptimal. However, there were few if any alternatives, other than changing the forecast parameter to an accumulation, which cannot be done retrospectively. Consideration was given at the outset to using radar-inferred rates as validation, but this idea was abandoned because such rates typically have a quoted accuracy of only a factor of 2. Similarly, they are not direct measurements of precipitation.

Operationally, hazardous weather is considered that which triggers the issuance of warnings and watches, and it is naturally an important focal point of any forecaster’s activities. Question 3 in the subjective verification scheme specifically addressed this. There have been 22 occasions of “serious errors from the perspective of hazardous weather” recorded for raw model forecasts, but only 13 for modified, indicating a significant positive contribution of the forecaster in this field (sections 2b and 3b provide good examples). Objective results for snow forecasts (Hewson 2004) concur with this, showing a much greater hit rate (i.e., probability of detection) for snow in modified fields with very little increase in false alarm rate.

4. Quantifying “added value”

Section 2b and 3b discuss a number of different objective and subjective measures of relative forecast accuracy. It would be very illuminating to be able to reconcile and directly intercompare these measures. Here, a unifying measure of relative forecast accuracy will be presented, which will be called the lead time gain, defined to be a function of lead time (conceptually, this relates not just to raw and modified forecasts, but in fact to any two forecasting systems).1 Provided the initial accuracy measures satisfy two simple constraints, the lead time gain, referring to the performance of one forecast relative to the other, can be very easily calculated. The constraints are, first, that accuracy (or error) measures be available for at least two separate lead times for both forecasts, and second, that these accuracy measures, when applied to a sufficiently large sample, decrease monotonically with lead time (or equivalently error measure increases monotonically with lead time).

Figure 7 shows hypothetical accuracy traces for two forecast systems, as represented by the thick black and thick gray lines, which satisfy the above constraints. It is intrinsically the same type of plot as Figs. 2b, 2c and 6. The lead time gain, ψ, of the gray forecast system (F) over the black forecast system (G), around time T1, is the horizontal separation, in Fig. 7, of the two accuracy traces. This could in principal be expressed as either a or b, but a more complete definition, referring specifically to ψ at time T1, should take the average:
i1520-0434-20-6-1021-eq1

This essentially represents, around time T1, the lead-time difference beyond the black forecast at which the gray forecast would likely achieve a similar accuracy. When quoting a lead-time gain, the lead time at which it applies (T1 here) must also be stated. The same formula applies if the accuracy measure is an error measure, which increases with lead time (e.g., Fig. 2b). However, if the curves cross, considerable care is required in computation. Where forecast accuracy curves terminate, a simplified version of the above equation, based on just a or b, can also be usefully applied.

As an example, consider T + 96 (=T1) in Fig. 2c. Visually, one can quickly estimate a and b (using T0 = T + 84 and T2 = T + 108) to be about 6 and 12 h, respectively, which gives, for modified over raw, a lead time gain ψT+96 ≈ 9 h.

To enable further “unification,” one can also compute the “percentage gain,” as given by
i1520-0434-20-6-1021-eq2
Referring again to T + 96 in Fig. 2c as an example gives ψ(T+96) (%) ≈ 9%.

While measures of lead-time gain for different parameters are of course informative in their own right, particularly when considering the needs of different customers, the job of the guidance center forecaster is ultimately to provide one feed of data to all these customers. In recognition of this, and by analogy with operational numerical model development,2 a way of combining several measures of lead time gain into “modification indices” (after Hewson 2004) is proposed for the guidance center. These indices express, as a function of lead time, a “weighted lead time gain.” Clearly index composition should be tailored to the guidance center aims, but as the aims for the short and medium ranges differ, a different type of modification index is recommended for each (Table 4). For both types of index, a one-third weight is given to subjective measures, and two-thirds to objective. Six components have been selected for the short-range index (taking in—left to right in Table 4—low cloud cover, light rain, heavier rain, falling snow, overall subjective score, and hazardous weather); and three for the medium range, which focus on more broadscale aspects [mean sea level pressure, relative humidity at 700 hPa (to denote significant weather), and overall subjective score]. For simplicity, each component in each index receives the same weight. As far as the two indices represent a weighted lead-time gain, they are comparable.

Most of the components in Table 4 should be self-explanatory. This includes “subjective skill score” in the medium range, which can be plotted on a graph like Fig. 7. However the short-range subjective verification methodology does not lend itself to analysis in this way, and an expedient way of converting responses to questions 1 and 3 in section 3c, respectively, into ψsubj (the “notional subjective verification lead time gain”) and ψhaz (“notional hazardous weather lead time gain”) had to be introduced.

Thus, ψsubj was defined as
i1520-0434-20-6-1021-eq3
where Nbetter is the number of times modified forecasts were rated better, Nworse is the number of times they were rated worse, and Ntotal is the total number of cases. This would give a value between −6 h and +6 h whenever half or more of the modified forecasts fell into the neutral category. The index properties seem to be sensible, with likely values in a similar range compared to other objective values in Table 4.
The definition of ψhaz is then
i1520-0434-20-6-1021-eq4
where Nrawerror is the number of reports of serious errors in the raw model, Nmoderror is the number of reports of serious errors in modified, and Ntotal is the total number of cases with serious error reports. The value of ψhaz would equal 6 if half as many modified forecasts had serious errors as did raw model forecasts, and would be negative if the raw model had fewer such errors overall.

Evidently the factor of 12 in both equations above is somewhat arbitrary. However it does impose an extreme upper limit of 12 h of lead time gain for each component. This is of the right order given that it would be nonsensical to achieve a lead time gain at T + 12 (the shortest lead in Table 4) that was more than 12 h. Thus despite there being some arbitrariness in how ψubj and ψhaz were arrived at, it is still considered that they form an important component of a short-range modification index, and that the conversion methodology has been devised in a fair and consistent fashion

Numerical entries in Table 4 reflect, quantitatively, comments in section 2c and 3c. In the short range, lighter rainfall is the only significant negative contributor, while forecasts of low cloud, snow, and hazardous weather, all score a significant positive contribution. The net effect is to give a modification index (or weighted lead time gain) in short-range forecasts of just over 2 h. In the medium range, relative humidity fields show the greatest positive contribution. Though the index at T + 48 is small at under 1 h, it increases markedly beyond this.

One can also directly compare modification indices with the average time taken by the forecaster to consider and effect modifications (called modification time). This addresses the pertinent question of whether saving time by issuing unmodified fields might on average provide better forecasts, and in so doing further illustrates the benefits of putting together composite indices whose units are in hours of lead time gain. The two parameters are plotted together in Fig. 8 (solid black and solid gray lines). At all lead times the benefit, as represented by the modification index, exceeds the modification time, indicating that the forecaster is making a significant positive contribution, in a cost-effective manner—especially considering the short amount of time spent on modifications—and is thereby providing justification for continued operational use of field modification. Modification time varies little from run to run; most is spent considering other model and observational data and this is necessary even when no modifications are ultimately made.

Although methods of computation of short- and medium-range indices differ, there is nonetheless something of a minimum in forecaster contribution around T + 24–48 (see percentage gain). The early fall probably reflects the reducing relevance, as lead time increases, of using current trends to improve upon the short-range forecast, while the subsequent rise then reflects the increasing utility and, considering issue times, availability, of other model runs.

5. Concluding remarks

Over the past few years, there has been a shift in the work by the central guidance forecasters at the Met Office, away from the traditional text-based forecasting and toward direct editing of NWP grids. It has always been the case that forecasters evaluate NWP output critically, combining it with experience and knowledge of the behavior both of the real atmosphere and the NWP models, to formulate a forecast. To the extent that modification of NWP output is a more efficient means to this end, it needs no special justification. The tool simply allows a more efficient practice whereby perceived shortfalls of the output are addressed directly rather than in commentary, with the spinoff that verification of the human contribution is made easier. Results indicate a positive impact overall from modifications—one key statistic being that the ratio of improved to degraded short-range forecasts is about 4:1. To exceed this by much would, in the view of the authors, be extremely difficult, especially given the ever-increasing sophistication of numerical models. In turn, this suggests that the degree of modification used currently is probably about right.

Statistical postprocessing methods such as model output statistics (MOS) offer an automated route to improving the raw NWP output. However, these correct systematic errors in the models, while the manual techniques described address shortfalls that vary with the situation from day to day

It is fully expected that improvements in NWP will continue and that the need for forecaster intervention will decrease, probably to a minimal level at some future date. However, there has in the past been over optimism in the projected rate at which automation would reduce the need for forecaster input. Verification remains an important part of the exercise, providing some measure of the human contribution. Verification results will be important in informing decisions about the transition to greater automation in future years, though there needs to be care in the interpretation, with certain objective statistics not necessarily providing an indication of absolute worth. To address this an index has been proposed, which combines objective and subjective results for different forecast parameters into one measure. It should also be reiterated that the impact of severe weather is of disproportionate importance, and this ideally should be taken into account when assessing the value of modified fields. One area recommended for future work is improving the objective precipitation verification—validating rate data are currently lacking. While methods employed in this study to address this shortfall were arguably the best available, they still had limitations. Another avenue for future work might be a cost–benefit analysis for the downstream impact of modifications. Benefits arise solely from improved forecasts. Costs arise primarily from degraded forecasts, as the direct cost of applying modifications is minimal (at least compared to the expenditure on model development).

In the short term, the existence of a database of fields that are consistent with forecaster expectations opens the way to greater automation of forecasts and the freeing up of forecasters to spend more time on the meteorology and consultancy rather than on the generation of routine graphical and text products. Thus, the situation in which forecasters at different offices correct for similar model shortcomings to generate products for different customers could be replaced by one in which modifications made at the center could then feed many forecast products—a so-called change once, use many approach.

Acknowledgments

The helpful contribution of three anonymous reviewers is gratefully acknowledged.

REFERENCES

  • Carroll, E. B., 1997: A technique for consistent alteration of NWP output fields. Meteor. Appl., 4 , 171178.

  • Cullen, M. J. P., 1993: The Unified Forecast/Climate Model. Meteor. Mag., 122 , 8194.

  • Cullen, M. J. P., Davies T. , Mawson M. H. , James J. A. , Coulter S. C. , and Malcolm A. , 1997: An overview of numerical methods for the next generation UK NWP and climate model. Numerical Methods in Atmospheric and Ocean Modelling, The Andre J. Robert Memorial Volume, C. A. Lin, R. Laprise, and H. Ritchie Eds., Canadian Meteorological and Oceanographical Society, 425–444.

    • Search Google Scholar
    • Export Citation
  • Ebert, E. E., and McBride J. L. , 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239 , 179202.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hewson, T. D., 2004: The value of field modifications in forecasting. Forecasting Research Tech. Rep. 437, Met Office, 18 pp. [Available online at http://www.metoffice.com/research/nwp/publications/papers/technical_reports/2004/FRTR437/FRTR437.pdf.].

  • Jolliffe, I. T., and Stephenson D. B. , 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley and Sons, 240 pp.

  • Young, M. V., and Carroll E. B. , 2002: Use of medium-range ensembles at the Met Office. Part 2. Applications for medium-range forecasting. Meteor. Appl., 9 , 273288.

    • Crossref
    • Search Google Scholar
    • Export Citation

Fig. 1.
Fig. 1.

(a) Raw and (b) modified forecasts of mean sea level pressure and 10-m wind for 1200 UTC 15 Oct 2002, along with mean sea level pressure analysis (4-mb intervals) and (c) plotted wind observations. (d) Shipping areas referred to in Table 2 are given.

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 2.
Fig. 2.

Selected rms errors, for modified (mod) vs raw model forecasts, for a NW European area: (a) average monthly error differences, for mslp at T + 36 validated against surface observations; (b) average errors in mslp, as a function of lead time, validated against surface observations, for an 18-month portion of the period shown in (a); and (c) average 700-hPa errors in relative humidity, as a function of lead time, validated against model analysis, for a 3-yr period. Sample size per point is about 60 in (a), 1000 in (b), and 2000 in (c).

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 3.
Fig. 3.

(a) Raw and (b) modified mean sea level pressure and precipitation 12-h forecasts valid at 1800 UTC (open circles, rain; closed circles, snow; large circles, moderate to heavy; small circles, light). The X marks the location of Harpenden shown in Fig. 4. (c) The 1800 UTC radar-derived precipitation rates, shown varying from water equivalent 0.05 mm h−1 (black) to 4 mm h−1 (light gray).

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 4.
Fig. 4.

Photograph taken in Harpenden 31 Jan 2003. (Photograph courtesy of J. Davies)

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 5.
Fig. 5.

Summary of short-range forecast errors as derived from responses to question 2 in the subjective verification scheme (section 3c). (left) Errors corrected by the forecaster, and (right) errors introduced by the forecaster. As an example, consider the highest bar, which denotes that for the lead time pairing T + 6 and 12, when the modified forecast was better than raw, the net improvement achieved by the forecaster was on 77 occasions judged to be at least partly attributable to them having reduced the areal coverage of the precipitation. The equivalent bar on the right-hand side of the diagram shows only eight occasions when the forecaster reduced the areal precipitation coverage and made the forecast worse.

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 6.
Fig. 6.

Equitable threat scores vs forecast lead time for two rainfall rate thresholds, (top) 0.5 mm h−1 and (bottom) 2.0 mm h−1, for all U.K. stations reporting hourly rainfall (unmod is raw model; mod is modified). Sample size per point is about 4000.

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 7.
Fig. 7.

The derivation of lead-time gain from two accuracy curves (thick black and gray lines).

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Fig. 8.
Fig. 8.

Modification index (h), percentage gain, and modification time (h) for (left group) short-range forecasts and (right group) medium-range forecasts, from approximately Jul 2002 to Jan 2004. The methods of computation of the modification index (and thereby percentage gain) differ between the two groups (see text).

Citation: Weather and Forecasting 20, 6; 10.1175/WAF872.1

Table 1.

Parameters modified in field modification I, verifying 12 hourly.

Table 1.
Table 2.

Maximum 10-m mean winds in sea areas adjacent to England and Wales at 1200 UTC 15 Oct 2002. Values are in knots (kt) rounded to the nearest 5 according to plotting convention, with equivalent in m s−1 in parentheses.

Table 2.
Table 3.

Parameters modified in field modification II, verifying 3 hourly.

Table 3.
Table 4.

Modification indices [= weighted lead time gain (h)] and their components for the short and medium range, after Hewson (2004). ETS is equitable threat score, HR is hit rate (i.e., probability of detection), and FAR is false alarm ratio (ψ for snow is calculated as 0.5 times the sum of the lead-time gains for HR and FAR). Both ψsubj and ψhaz are subjective verification components (see text).

Table 4.

1

Such a concept has been used locally to compare model performance in the past, at the European Centre for Medium-Range Weather Forecasts, for example, but to the authors’ knowledge has never been formally defined.

2

Within the Met Office, potential model changes are usually evaluated by their impact on an all-embracing single “NWP index,” which represents “net skill” among a number of forecast components. These components are subjectively selected and weighted a priori, and include mean sea level pressure and 250-hPa wind.

Save
  • Carroll, E. B., 1997: A technique for consistent alteration of NWP output fields. Meteor. Appl., 4 , 171178.

  • Cullen, M. J. P., 1993: The Unified Forecast/Climate Model. Meteor. Mag., 122 , 8194.

  • Cullen, M. J. P., Davies T. , Mawson M. H. , James J. A. , Coulter S. C. , and Malcolm A. , 1997: An overview of numerical methods for the next generation UK NWP and climate model. Numerical Methods in Atmospheric and Ocean Modelling, The Andre J. Robert Memorial Volume, C. A. Lin, R. Laprise, and H. Ritchie Eds., Canadian Meteorological and Oceanographical Society, 425–444.

    • Search Google Scholar
    • Export Citation
  • Ebert, E. E., and McBride J. L. , 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239 , 179202.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hewson, T. D., 2004: The value of field modifications in forecasting. Forecasting Research Tech. Rep. 437, Met Office, 18 pp. [Available online at http://www.metoffice.com/research/nwp/publications/papers/technical_reports/2004/FRTR437/FRTR437.pdf.].

  • Jolliffe, I. T., and Stephenson D. B. , 2003: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley and Sons, 240 pp.

  • Young, M. V., and Carroll E. B. , 2002: Use of medium-range ensembles at the Met Office. Part 2. Applications for medium-range forecasting. Meteor. Appl., 9 , 273288.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    (a) Raw and (b) modified forecasts of mean sea level pressure and 10-m wind for 1200 UTC 15 Oct 2002, along with mean sea level pressure analysis (4-mb intervals) and (c) plotted wind observations. (d) Shipping areas referred to in Table 2 are given.

  • Fig. 2.

    Selected rms errors, for modified (mod) vs raw model forecasts, for a NW European area: (a) average monthly error differences, for mslp at T + 36 validated against surface observations; (b) average errors in mslp, as a function of lead time, validated against surface observations, for an 18-month portion of the period shown in (a); and (c) average 700-hPa errors in relative humidity, as a function of lead time, validated against model analysis, for a 3-yr period. Sample size per point is about 60 in (a), 1000 in (b), and 2000 in (c).

  • Fig. 3.

    (a) Raw and (b) modified mean sea level pressure and precipitation 12-h forecasts valid at 1800 UTC (open circles, rain; closed circles, snow; large circles, moderate to heavy; small circles, light). The X marks the location of Harpenden shown in Fig. 4. (c) The 1800 UTC radar-derived precipitation rates, shown varying from water equivalent 0.05 mm h−1 (black) to 4 mm h−1 (light gray).

  • Fig. 4.

    Photograph taken in Harpenden 31 Jan 2003. (Photograph courtesy of J. Davies)

  • Fig. 5.

    Summary of short-range forecast errors as derived from responses to question 2 in the subjective verification scheme (section 3c). (left) Errors corrected by the forecaster, and (right) errors introduced by the forecaster. As an example, consider the highest bar, which denotes that for the lead time pairing T + 6 and 12, when the modified forecast was better than raw, the net improvement achieved by the forecaster was on 77 occasions judged to be at least partly attributable to them having reduced the areal coverage of the precipitation. The equivalent bar on the right-hand side of the diagram shows only eight occasions when the forecaster reduced the areal precipitation coverage and made the forecast worse.

  • Fig. 6.

    Equitable threat scores vs forecast lead time for two rainfall rate thresholds, (top) 0.5 mm h−1 and (bottom) 2.0 mm h−1, for all U.K. stations reporting hourly rainfall (unmod is raw model; mod is modified). Sample size per point is about 4000.

  • Fig. 7.

    The derivation of lead-time gain from two accuracy curves (thick black and gray lines).

  • Fig. 8.

    Modification index (h), percentage gain, and modification time (h) for (left group) short-range forecasts and (right group) medium-range forecasts, from approximately Jul 2002 to Jan 2004. The methods of computation of the modification index (and thereby percentage gain) differ between the two groups (see text).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 917 277 30
PDF Downloads 256 46 1