## Abstract

In Part I of this series on ensemble-based exigent analysis, a Lagrange multiplier minimization technique is used to estimate the exigent damage state (ExDS), the “worst case” with respect to a user-specified damage function and confidence level. Part II estimates the conditions antecedent to the ExDS using ensemble regression (ER), a linear inverse technique that employs an ensemble-estimated mapping matrix to propagate a predictor perturbation state into a predictand perturbation state. By propagating the exigent damage perturbations (ExDPs) from the heating degree days (HDD) and citrus tree case studies of Part I into their respective antecedent forecast state vectors, ER estimates the most probable antecedent perturbations expected to evolve into these ExDPs. Consistent with the physical expectations of a trough that precedes and coincides with the anomalously cold temperatures during the HDD case study, the ER-estimated antecedent 300-hPa geopotential height trough is approximately 59 and 17 m deeper than the ensemble mean at around the time of the ExDP as well as 24 h earlier, respectively. Statistics of the explained variance and from leave-one-out cross-validation runs indicate that the expected errors of these ER-estimated perturbations are smaller for the HDD case study than for the citrus tree case study.

## 1. Introduction

Gombos and Hoffman (2013, hereafter Part I) described ensemble-based exigent analysis, a technique to estimate the “exigent” or worst-case scenario (WCS), the forecast that maximizes the damage for a particular weather event and specified risk or confidence level. For each ensemble member, a damage function estimates the potential damage or cost (e.g., the heating demand) from the weather parameters (e.g., the near-surface air temperature). The potential damage is then weighted by what is at risk (e.g., the gridpoint number of inhabitants) to evaluate the actual damage. The exigent damage state (ExDS) is then calculated from the damage ensemble using a Lagrange multiplier optimization. The exigent scenario may be useful to emergency planners, insurers, and the general public because it is the unique [for multivariate Gaussian (mG) ensembles] forecast that maximizes the event-wide damage for a specified risk or confidence level.

This article combines exigent analysis with another ensemble-based technique called ensemble regression (ER; Gombos and Hansen 2008; Gombos 2009; Gombos et al. 2012) to estimate the most probable atmospheric model states expected to precede and coincide with the ExDSs, using the two case studies in Part I as examples. These *pre-exigent* conditions determined by ER are potentially useful to 1) understand the atmospheric dynamics that may lead to the ExDS (Gombos et al. 2012), 2) preemptively update the ExDS probability with incoming observations in advance of relatively slow data assimilations (Gombos 2009), and 3) identify, for purposes including supplementary forecast guidance and adaptive observing, the antecedent atmospheric features to which the ExDS is most sensitive (Gombos et al. 2012).

Ensemble regression is a multivariate linear inverse technique that uses ensemble model output to make inferences about the linear relationships between vector-valued forecast and/or analysis fields, often gridded “maps” of meteorological variables. ER uses the ensemble members of these fields as training samples to estimate a covariance-based matrix that maps an ensemble perturbation in the predictor field(s) to the most probable ensemble perturbation in the predictand field(s).

The multivariate nature of ER distinguishes it from a number of other univariate approaches that use ensemble-based statistics to infer relationships between scalar or vector predictors and scalar predictands, including the ensemble synoptic analysis of Hakim and Torn (2008), the ensemble transform Kalman filter targeting approach of Bishop et al. (2001), and the analysis sensitivity diagnostic of Liu et al. (2009). Whereas univariate techniques use only correlations, ER makes use of the joint distribution between the predictors and predictands to assess sensitivities, thereby ensuring that the predictand perturbation is a statistically feasible member of the predictand distribution. For example, ensemble synoptic analysis might find the statistical relationship between each point in the antecedent geopotential height field and the subsequent surface pressure at the center of a storm, while ER might find the statistical relationship between the *entire* antecedent geopotential height field and the *entire* subsequent surface pressure field.

Previously, using the ensemble covariances of potential vorticity, potential temperature, and geopotential height (*Z*) to approximate an ER operator, Gombos and Hansen (2008) showed that potential vorticity ER and the dynamical piecewise potential vorticity inversion of Davis and Emanuel (1991) yield nearly identical *Z* perturbations. Gombos et al. (2012) used ER to study the sensitivity of Supertyphoon Sepat's (2007) 1000-hPa potential vorticity track to the position and strength of the antecedent 500-hPa *Z* field.

Here, ensemble regression is applied to calculate the pre-exigent conditions for the two case studies of Part I—one on heating demand on 8–9 January 2010 and another on freeze-damage to Florida citrus trees on 11 January 2010. Part I estimated the heating degree days (HDD) 90%-WCS (i.e., the WCS at the 90% confidence level; Fig. 8 in Part I) to yield about 1.26% more heating demand than the ensemble average forecast and the citrus tree 90%-WCS (Fig. 15 in Part I) to damage approximately 14.2 million trees, about 4.3 times more than the ensemble average. For each case study, Part II now uses ER to map the exigent perturbation from Part I backward in time to estimate the corresponding pre-exigent conditions.

This article is organized as follows. Section 2a introduces some notation and sections 2b and 2c briefly review the concepts and mathematics of exigent analysis and of ER, respectively. Section 2d introduces the ER predictand Mahalanobis distance probability *q _{y}*. Section 3 describes two correlation error metrics used to quantify the goodness of the ER. The heating demand and citrus tree case studies are presented in sections 4 and 5, respectively. Section 6 provides a summary and discussion.

## 2. Methods

### a. Notation

Let **x** be a column vector representing an arbitrary model state or series of such states. Let **p** and **y** define transformations of **x**, which may make a subset of **x** and/or calculate diagnostic quantities from **x** at the analysis time and/or at one or more forecast times. In one example that follows, **p**, the ER predictor, will be a vector of forecast gridpoint HDD, and **y**, the ER predictand, will be a vector of forecast gridpoint 300-hPa geopotential heights. In this case, the time of **p** is after the time of **y**, and the ExDS will be used to predict, in a statistical sense, the antecedent geopotential heights associated with the 90%-WCS heating demand.

Given the ensemble members , let define an ensemble-averaged quantity and let and **p**′ define perturbations with respect to for an ensemble member and for an arbitrary model state, respectively. Let be the matrix formed by concatenating the column vectors , let refer to the exigent damage perturbation (ExDP), and let be the covariance of . Let the **y** variables be defined in the same way as the **p** variables. Let refer to the ER estimate. Note that the hat accent refers to an ER-estimated quantity, not to principal components (PCs) as in Part I. Let *t _{p}* and

*t*denote the forecast lead times of the

_{y}**p**and

**y**variables, respectively.

Where subscripts are necessary, the ensemble members are indexed by *n* = 1, … , *N*_{ens}; the elements of **p** are indexed by *i* = 1, … , *I*; and the elements of **y** are indexed by *j* = 1, … , *J*—making an *I* × *N*_{ens} matrix and a *J* × *N*_{ens} matrix. For example, in section 4, the damage ensemble is 2482 × 50. See Table 1 and Table 1 of Part I for a legend of the symbols and abbreviations used in this paper.

### b. Exigent analysis

To define the WCS, Part I introduced the Mahalanobis distance quantile (MDQ) and its associated cumulative density function (CDF) value, the Mahalanobis distance probability (MDP), denoted by *q*, to describe the unusualness of a perturbation **p**′ with respect to the probability distribution function (PDF) defined by the ensemble mean and covariance . It is assumed here that this PDF is mG and is a reasonable approximation of the true state-dependent PDF. In ensemble phase space, any **p**′ whose squared statistical (Mahalanobis) distance (Mahalanobis 1936) to the origin (i.e., the mean of the ensemble perturbations) equals is located on the surface of an ellipsoid defined by . In exigent analysis, this ellipsoid is considered a confidence region of the ensemble PDF, an mG generalization of a confidence interval. The size of the ellipsoid is determined by the specified value (here ) of the MDP. The exigent MDQ, referred to as the -MDQ or , is given by the inverse chi-squared cumulative distribution function with *ν* degrees of freedom, such that . A randomly chosen perturbation from the ensemble PDF, at this Mahalanobis distance, is on the -MDQ ellipsoid. Equivalently, the exigent MDP (i.e., ) equals the integral of the PDF over the volume of the -MDQ ellipsoid.

Part I defined the damage functional, *J _{d}* =

**w**

^{T}

**p**, the weighted sum of damage over the

*I*grid points. Here,

**w**is a vector of weights of length

*I*that ascribe a user-determined absolute or relative importance to each grid point. The cumulative probability of the damage functional is referred to as the damage functional probability (DFP) or by the symbol

*β*. The DFP is the probability that

*J*is less than some constant

_{d}*C*and so equals the integral of the damage functional PDF in the half-space bounded by the hyperplane

*J*=

_{d}*C*in the direction −

**w**.

In Part I, the specified MDP, , and the damage ensemble covariance, , together define the exigent damage perturbation as the perturbation at the -MDQ that maximizes . Equivalently, the ExDP is the perturbation on the specified confidence ellipsoid whose area-integrated weighted damage is maximized. Using the method of Lagrange multipliers, Part I showed that the ExDP at the -MDQ is

where and .

Figures 1 and 2, respectively, show the ExDPs for for the HDD and citrus tree case studies. Note that Figs. 8 and 15 in Part I are the **w**-weighted versions of the ExDPs in Figs. 1 and 2, respectively. In this article, ER is used to estimate contemporaneous and antecedent conditions associated with these ExDPs.

### c. ER

Ensemble regression (Gombos and Hansen 2008; Gombos 2009; Gombos et al. 2012) uses the ensemble members and as training samples to compute a multivariate linear regression operator given by

Right multiplying Eq. (2) by and then rearranging yields

where _{pp} = and _{yp} denotes the cross covariance of and . Equation (3) shows that is a function only of the covariance of the combined predictor and predictand ensemble (Hakim and Torn 2008). The appendix describes extended exigent analysis, an equivalent alternative to using ER to estimate the preexigent perturbation that embeds the ER into the exigent analysis.

Assuming a sufficiently large and calibrated ensemble and a linear relationship between and , given any prescribed predictor perturbation **p**′, is used to approximate

the most probable perturbation (with respect to the ensemble mean at *t _{y}*) in the span of when the perturbation (with respect to the ensemble mean at

*t*) in the span of equals . The quantity is called the effective predictor perturbation and is equal to the projection of

_{p}**p**′ onto the subspace spanned by (see Gombos and Hansen 2008 for details). In summary, ER uses the predictor and predictand ensemble covariances to approximate a regression operator , which is used to estimate the most probable perturbation when is the portion of the prescribed perturbation

**p**′ resolved by (i.e., not in the null space of) .

In realistic applications, *I* and *J* are typically much greater than *N*_{ens}, making rank deficient and and not strictly invertible. Moreover, strong spatial dependences due to geographic proximity are likely to render ill-conditioned. To alleviate problems related to multi-collinearity, inflated regression parameter variances, and computational tractability, the ER machinery is regularized by projecting and onto their leading *n _{p}* and

*n*PCs, respectively. See Gombos et al. (2012) and Part I for details. Also see the conclusion of Part I for a discussion of how the characteristics of the ensemble may affect the exigent analysis. There it is mentioned that neither the PC filtering approach used here, nor an alternative approach to reduce the impact of small sample size by applying the Gaspari and Cohn (1999) covariance localization, resulted in major changes to the ExDPs.

_{y}For the particular ER application where and , respectively, equal full model state vector perturbations and *t _{p}* precedes

*t*, ER is effectively a linear least squares approximation [subject to sampling and other errors discussed in section 3 and Gombos and Hansen (2008)] to the full nonlinear model used to integrate the ensemble. For these applications, is analogous to the transition matrices used in linear inverse modeling (LIM; Penland 1989) and climate applications of the fluctuation dissipation theorem (FDT; Leith 1975) that have been used to predict low-frequency Northern Hemisphere 700-hPa geopotential height anomalies (Penland and Ghil 1993), seasonal and interannual tropical Atlantic sea surface temperatures (Penland and Matrosova 1998), and the excitation of Atlantic storm tracks by tropical heating (Gritsun and Branstator 2007). The ER operator, however, is approximated from state-dependent synoptic-scale ensemble statistics, whereas LIM and FDT operators are typically estimated from stationary decadal-scale climate statistics.

_{f}For more general applications, such as the ones in this article, for which one (transformed) subset of the state vector at *t _{p}* (e.g., the HDD forecast ensemble) is used to estimate the most probable perturbation (e.g., the ER-estimated preexigent perturbation) of a different subset of the state vector at

*t*(e.g., the 300-hPa

_{y}*Z*forecast ensemble), ER is simply ensemble-based multivariate regression with vector-valued predictors and predictands. For these general applications, ER uses least squares to relate the fields of interest and is not intended to approximate the atmospheric model dynamics.

For the ER applications in this article, in order to estimate atmospheric conditions antecedent to the ExDP, the prescribed perturbation is , the ExDP estimated from Eq. (1). Note that the ExDP is, by construction, always spanned by , so that in this case . When the predictand ensemble is a subset of or the entire atmospheric state vector antecedent to *t _{p}*, the ER-estimated predictand perturbation approximates the most probable (perturbation) conditions at

*t*spanned by . The sum of and approximates the projection of the true preexigent perturbation onto the space spanned by .

_{y}### d. Ensemble regression Mahalanobis distance probability, d_{y}

As summarized in section 2b, *q* is the value of the CDF of the Mahalanobis distance of an ensemble perturbation with respect to a particular ensemble distribution. Here, and in Part I, this ensemble distribution is that of the damage state vector **p**. A similar quantity is now defined for the ER predictand distribution. Let *q _{y}* denote the value of the CDF of the Mahalonbis distance of an ER predictand perturbation with respect to this distribution. In analogy to Eq. (4) of Part I,

*q*is defined by

_{y}## 3. ER error metrics

Throughout this article it is assumed that the ensembles follow mG distributions and that the predictors explain a significant fraction of the total variance of the respective predictands. The validity of the mG assumption is assessed using the Gaussian *Q*–*Q* plot (e.g., Wilks 2006) in Part I. This section outlines two metrics that are used to assess the validity of the covariability assumptions: the composite correlation coefficient (Glahn 1968) and leave-one-out cross validation (LOOCV; e.g., Wahba and Wendelberger 1980; Michaelson 1987; Wilks 2006; Gombos et al. 2012). The and LOOCV error statistics for the HDD and citrus tree case studies are presented in sections 4b and 5b, respectively.

### a. Composite correlation coefficient

The composite correlation coefficient (Glahn 1968), , is used to assess the degree to which the *n _{p}* predictor PCs are linearly related to the

*n*predictand PCs. Here, can be considered a multivariate extension to the ordinary (Pearson) correlation (e.g., Wilks 2006) for the assessment of the joint statistical linear association between vector-valued predictors and vector-valued predictands. Additionally, gives the fraction of the total variance of the

_{y}*n*predictand PCs accounted for by the

_{y}*n*predictor PCs:

_{p}where tr denotes the trace of the indicated argument, **Σ** is a diagonal matrix of the variances of the *n _{y}* predictand PCs, and is the diagonal matrix of multiple correlation coefficients [i.e., the coefficients of determination in multiple regression; e.g., Glahn (1968); DelSole and Tippett (2009)]. Using

*j*= 1, 2, … ,

*n*to index the predictand PCs, the

_{y}*j*th entry of the diagonal of is equal to the multiple correlation coefficient between the

*j*th predictand PC and the

*n*predictor PCs. Note that, because PCs are defined to be orthogonal, the square of the

_{p}*j*th entry of the diagonal of is nothing more than the sum of the squares of the ordinary correlation coefficients between the

*j*th predictand PC and each of the

*n*predictor PCs (or, equivalently, the reduction of the variance of the

_{p}*j*th predictand PC by the

*n*predictor PCs) (e.g., Abdi 2007). Note that quantifies the statistical linear relationship between the predictor PCs and predictand PCs, not between the full predictor and predictand fields; however, large values of do suggest a strong statistical association between the full fields as long as the PCs explain a substantial fraction of the variance of their respective full fields.

_{p}### b. LOOCV

The LOOCV technique discussed in Gombos (2009) and Gombos et al. (2012) is applied to estimate the expected ER accuracy. The technique estimates the error of ER-estimated perturbations by 1) removing a single predictor ensemble perturbation (and its twin ensemble perturbation, which is equal in magnitude and opposite in direction about the ensemble mean at the analysis time) and the corresponding predictand perturbation (and its twin ensemble perturbation) from and , respectively, when calculating ; 2) applying to the left-out predictor perturbation ; and then 3) comparing the ER-estimated predictand perturbation to the actual left-out predictand perturbation . Note that the LOOCV is performed using the leading *n _{p}* and

*n*PCs.

_{y}After repeating the LOOCV for each ensemble member, the median of the *n*_{ens} anomaly correlation coefficients (ACCs; e.g., Wilks 2006) between each and pair is a measure of the ER error. The median is used rather than the mean to reduce the influence of outliers on the expected value. For the particular ER application where and equal full state vector perturbations and *t _{p}* precedes

*t*, the median ACC is a measure of how closely approximates the full nonlinear model used to integrate the ensemble. For more general applications, LOOCV only approximates the expected ACCs for the specific ER application of interest. For the backcasting LOOCV applications in this article, the median ACC approximates the expected value of the ACC between the ER-estimated preexigent perturbation and the true perturbation antecedent to the ExDP. The variance of the ACCs is a measure of the spread of the expected ACC value.

_{y}## 4. HDD case study

This section uses ER to estimate the atmospheric conditions that precede and coincide with the HDD ExDP (Fig. 1). Section 4a describes the HDD data and defines the predictor ensemble, predictor perturbation, and predictand ensembles used for the ER. Section 4b presents and LOOCV errors statistics. Section 4c presents and analyzes the ER-estimated antecedent perturbations, depicted in Figs. 3 and 4, associated with the HDD ExDP.

### a. HDD data and synoptic overview

The overall synoptic scenario during the time period of this HDD case study was marked by a strong upper-level trough that swept eastward through the central and eastern United States. The magenta lines in Fig. 3 contour the ensemble mean forecast 300-hPa *Z* at forecast lead times of *t _{f}* = 0, 12, 24, and 36 h. Northerly cold-air advection associated with the strong trough brought anomalously cold temperatures toward the domain of interest. This set the stage for a potentially extreme bout of cold temperatures, for which the HDD ExDP (measured in HDD units) is depicted in Fig. 1. The remainder of this section uses ER to estimate the evolution of the position and strength of this trough throughout the exigent scenario.

The predictor ensemble used here is the HDD damage ensemble from Part I. That is, the predictor ensemble is the forecast daily average HDD for 8–9 January 2010 computed using the *N*_{ens} = 50 ECMWF ensemble forecast *T* data initialized at 0000 UTC 8 January 2010, retrieved from the THORPEX Interactive Grand Global Ensemble (TIGGE) dataset (Bougeault et al. 2010), and linearly interpolated to 0.25° resolution at points between 31.75° and 40°N and 98° and 80°W. The forecast HDD is computed by applying the HDD damage function [Eq. (9) in Part I] to the average forecast *T* from the four forecast times *t _{p}* centered on 2100 local (central standard) time on 8 January 2010 (i.e., the , , , and forecasts). The notation in black on the timeline in Fig. 5 depicts the sequence of events for the HDD case study. The HDD damage ensemble is projected onto its leading

*n*= 7 PCs (see appendix C and Fig. 4 in Part I).

_{p}Because the goal here is to estimate the antecedent perturbations associated with the HDD ExDP, the predictor perturbation in this case is the HDD ExDP depicted in Fig. 1. Note that Fig. 8 in Part I is the corresponding 90%-WCS damage map, the population-density-weighted version of the ExDP in Fig. 1.

The predictand ensembles use ECMWF ensemble forecasts of 17 fields: 2-m temperature (*T*), and the zonal wind (*u*), meridional wind (*υ*), air temperature (*T _{a}*), and geopotential height (

*Z*) at the 1000-, 850-, 500-, and 300-hPa pressure levels. The ensemble forecasts were initialized at 0000 UTC 8 January 2010 and also linearly interpolated to the same grid as the HDD. For each forecast lead time

*t*= 0, 12, 24, and 36 h, a separate ER is performed to estimate the most probable perturbation. For each lead time, the

_{y}*N*

_{ens}= 50 ensemble members of the 17 fields are concatenated into a single predictand ensemble matrix that is projected onto its leading

*n*PCs. The values for

_{y}*n*for lead times

_{y}*t*= 0, 12, 24, and 36 h are 7, 9, 11, and 11, respectively, based on the respective integer rounded median of the four metrics described in appendix C of Part I (Maaten 2010). See the scree graphs (e.g., Wilks 2006) depicted in Fig. 6. Also note that temperatures are cool enough throughout the domain for all ensemble members so that

_{y}*T*and HDD are linearly related.

### b. Assessing the statistical linear relationship of the HDD predictor and predictand ensembles

The quality of the ER-estimated preexigent perturbations for the HDD case are assessed using the two metrics outlined in section 3. The solid line in Fig. 7 depicts the variation of with lead time. At *t _{y}* = 0, the lead time most distant from the lead times that define the HDD damage ensemble (

*t*= 18, 24, 30, and 36 h) = 0.57, increases to = 0.82 at

_{p}*t*= 12 h, and then remains nearly constant through

_{y}*t*= 36 h. Note that the maximum of occurs between

_{y}*t*= 12 and 36 h because these lead times are the most contemporaneous with those that define the HDD damage ensemble. An value of 0.82 at

_{y}*t*= 12 h implies that 67% of the variance of the 300-hPa

_{y}*Z*predictand ensemble PCs is explained by the HDD damage ensemble PCs and suggests that a linear model may be appropriate to model the relationship between perturbations of these two fields.

Figure 7 also displays a boxplot (e.g., Wilks 2006) at each lead time illustrating the variability of the ACC at that lead time. The ensemble median ACC between the ER-predictand perturbation and left-out ensemble member equals 0.40 at *t _{y}* = 0 h, quickly increases to 0.78 at

*t*= 18 h, and then remains nearly constant through

_{y}*t*= 36 h. These ensemble median ACC values approximate the expected ACC values between the ER-estimated antecedent state vector perturbation and the actual antecedent perturbation, implying considerable confidence in the pattern of the ER predictions presented in the following. Note that the choices for

_{y}*n*and

_{p}*n*are based on the four metrics described in appendix C of Part I and are not optimized to maximize the ACC; other choices for

_{y}*n*and

_{p}*n*yield even higher median ACCs (not shown).

_{y}### c. ER-estimated perturbations associated with the HDD ExDP

Using ExDP as the predictor perturbation, ER is employed to investigate the atmospheric conditions that precede and coincide with the HDD ExDP illustrated in Fig. 1. The filled contours in Fig. 3 depict the 300-hPa *Z*-only portion of the ER predictand perturbations from the four separate ERs with respective lead times *t _{y}* = 0, 12, 24, and 36 h. Magenta contours depict the ensemble mean 300-hPa

*Z*and black contours illustrate the preexigent 300-hPa trough state, the sum of the ensemble mean and the predictand perturbation. The 300-hPa-only portion is displayed here because it illustrates upper-tropospheric dynamics relevant to the anomalously cold temperatures at

*t*.

_{p}In Fig. 3, the ensemble mean 300-hPa *Z* represents a 300-hPa trough that deepens as it progresses eastward between *t _{y}* = 0 h (Fig. 3a) and

*t*= 36 h (Fig. 3d). Compared to the mean trough (magenta), the preexigent 300-hPa trough state (black) is significantly stronger, with a −17-m perturbation at

_{y}*t*= 0 h that rapidly deepens to approximately −37 and −59 m at

_{y}*t*= 12 and 24 h, respectively. The maximum of the

_{y}*t*= 36 h

_{y}*Z*perturbation is approximately 24 m, making the trough weaker than the ensemble mean at that time.

The preexigent 300-hPa *Z* perturbation (Fig. 3) is consistent with the physical expectations of a trough that precedes and coincides with anomalously cold temperatures in the domain of interest between lead times *t _{y}* = 18 and 36 h. A deeper and stronger trough would strengthen northerly cold-air advection on its western side, ushering anomalously cold arctic air southward into the domain throughout this time window. At around

*t*= 36 h, cold air continues to advect southward on the western side of the exigent trough as an upstream ridge progresses eastward toward the center of the domain. See Gombos et al. (2012) for a more detailed example of ER perturbation patterns and their physical interpretations for the case of a tropical cyclone.

_{y}Meanwhile, surface temperatures decrease with lead time and pockets of cold-air anomalies evolve in a manner consistent with the HDD ExDP in Fig. 1. Figure 4 depicts the HDD ExDP from Fig. 1 (magenta lines) overlaid on top of the preexigent 2-m-temperature-only portion of the ER predictand perturbations from the four separate ERs with respective lead times *t _{y}* = 0, 12, 24, and 36 h (filled contours). In the first two panels, the ER predictions are of small amplitude and, for plotting purposes, have been multiplied by 5 and 2 at

*t*= 0 and 12 h, respectively. At

_{y}*t*= 0 h, the preexigent

_{y}*T*perturbation has a cold anomaly in the northern central portion of the domain. By

*t*= 12 h, this pocket begins to spread zonally, and a separate cold pocket forms to the southeast. At

_{y}*t*= 24 h and particularly at

_{y}*t*= 36 h, these cold pockets become collocated with the HDD ExDP (magenta lines). The ACCs between these ER predictions and the HDD ExDP are −0.58, −0.42, −0.82, and −0.84 for

_{y}*t*= 0, 12, 24, and 36 h, respectively, and the ACC between the sum of the 24- and 36-h ER predictions and the ExDP is −0.94. Considering that the weights,

_{y}**w**, are population estimates, Fig. 4 depicts the evolution of a cold outbreak that targets the pattern of the HDD ExDP and is consistent with the ensemble statistics from lead time to lead time.

The title of each panel in Figs. 3 and 4 states the value of *q _{y}* [Eq. (5)] for the respective predictand perturbations. Note that in each case. Since the Mahalanobis distance is invariant for linear affine transformations (e.g., Delsole and Tippett 2008), is expected to equal

*q*for “perfect” ER applications for which = 1 and is well fit for out-of-sample perturbations, invertible, and does not contain a null space. However, perfect ER operators are not to be expected because of null spaces associated with rank deficiency and PC truncation and because of the nonlinearity and complexity of atmospheric dynamics that results in < 1. Therefore, , even for well-chosen, quasi-linearly evolving mG ensembles.

_{y}## 5. Citrus tree case study

In contrast to the HDD case study, the statistical relationship between the citrus tree damage ensemble and the relevant upstream antecedent state vector predictand ensemble is weak for reasons explained below in section 5b.

### a. Citrus tree data

The predictor ensemble for ER applications in this section is the citrus tree damage ensemble used in Part I. That is, the citrus freeze damage function [Eq. (10) from Part I] is applied to the *N*_{ens} = 50 ECMWF ensemble forecast *T* data initialized at 1200 UTC 10 January 2010 and retrieved from the TIGGE dataset (Bougeault et al. 2010) to estimate the citrus freeze damage ensemble. This damage ensemble is used to approximate the forecast covariance of the fraction of trees damaged at *t _{p}* = 24 h (near dawn local time). Note that a single forecast time

*t*is used to define the citrus damage ensemble, whereas four forecast times are used to define the HDD damage ensemble. The notation in gray on the timeline in Fig. 5 depicts the sequence of events for the citrus tree case study. These ECMWF

_{p}*T*ensemble data are linearly interpolated to 0.25° resolution at points between 25.75° and 29.5°N and 82.75° and 80°W. The citrus tree damage ensemble is projected onto its leading

*n*=

_{p}*ν*= 5 PCs (see appendix C and Fig. 4 in Part I).

The predictor perturbation for this section equals the citrus tree ExDP depicted in Fig. 2, the forecast perturbation of the fraction of citrus trees expected to be damaged during the 90%-WCS freeze event at *t _{p}*. Again, note that Fig. 15 in Part I is the corresponding 90%-WCS anomaly damage map, the citrus-tree-density-weighted version of the ExDP in Fig. 2.

The predictand ensembles used to assess the predictor–predictand relationship in the following subsection are the same as those described in section 4a, except that the ensemble data are initialized at 1200 UTC 10 January 2010 and interpolated to 0.25° resolution between 22° and 40°N and 103° and 80°W, which is a somewhat larger area than the area used for the HDD case. Predictand ensembles are projected onto their leading *n _{y}* = 7, 10, 11, 10, or 11 PCs for

*t*= 0, 6, 12, 18, or 24 h, respectively, based on the four metrics (Maaten 2010) described in appendix C of Part I.

_{y}### b. Citrus predictor–predictand relationship and expected errors

The values for the citrus tree case study (solid line in Fig. 8) are markedly worse than are those for the HDD case study; = 0.38 at *t _{y}* = 0 h and remains nearly constant through

*t*= 24 h. Correspondingly, the median cross-validated ACCs (dashed line in Fig. 8) are also poor for the citrus tree case study; the ensemble median ACC between the ER-predictand perturbation and left-out ensemble member equals 0.12 at

_{y}*t*= 0 h and remains nearly constant through

_{y}*t*= 24 h.

_{y}The poor expected quality of the estimate of the preexigent conditions implied by the low and LOOCV values is in part attributable to the citrus damage function [Eq. (10) from Part I] being nonlinear. The citrus damage fraction is defined to equal zero over a large portion of the domain for most ensemble members (8969 times out of *I* × *N*_{ens} = 195 × 50 = 9600) where the temperature is above −4.2°C, and to equal one over a small part of the domain for a few ensemble members (14 times out of 9600) where the temperature is below −6.7°C. However, as discussed below, the citrus damage ensemble is at least very strongly linearly related to the contemporaneous *T* ensemble.

Another factor contributing to the low and ACC values is the large size of the predictand domain and the small size of the predictor domain; the area covered by the citrus tree damage ensemble is only 2.6% of the area covered by the citrus tree predictand ensemble. Even for cases when the nonlinearity is negligible and the predictors and predictands are indeed physically related, the covariability of the atmospheric state over such a large area is unlikely to be strongly constrained by that of a relatively small region, especially one that is distant in both space and time.

### c. ER-estimated perturbations associated with the citrus tree ExDP

Although the entire upstream state vector for this citrus tree case study cannot be accurately estimated, ER nevertheless can accurately estimate certain portions of this state vector. This section displays the ER-estimated contemporaneous *T* perturbation associated with the citrus tree ExDP and, given the low and ACC values, omits the ER estimate of the antecedent state vector over the much larger upstream domain.

The leading *n _{p}* = 5 PCs of the citrus damage ensemble explains a high fraction (90%) of the variance of the

*t*= 24 h forecast

_{y}*T*over Florida (i.e., for the same domain and time as the damage ensemble) with = 0.95. This is unsurprising given that the damage ensemble is derived from this

*T*ensemble and that the

*T*values over a portion of the domain (approximately 6% of the grid points) are in the linear interval between −6.7° and −4.2°C.

The filled contours of Fig. 9 display the most probable contemporaneous *t _{p}* =

*t*= 24 h

_{y}*T*perturbation associated with the citrus tree ExDP. As is to be expected, given = 0.95,

*q*= 0.88 approximately equals and applying this temperature perturbation (Fig. 9) to the citrus damage equation [Eq. (10) in Part I] does in fact very closely approximate the ExDP in Fig. 2 (not shown).

_{y}Perhaps counterintuitively, the minimum of this contemporaneous *T* perturbation (Fig. 9) is located well to the northeast of the maximum citrus tree density (Fig. 12 in Part I). As portrayed by the ensemble variance map of the forecast *T* ensemble (magenta line contours in Fig. 9), this location mismatch is attributable to the significantly greater variability at the location of the temperature minimum than at the location of high citrus tree density. At the location of the variance maximum, temperatures are free to vary significantly without being extreme relative to the ensemble PDF, allowing for anomalously cold, yet plausible, temperatures at that location. On the other hand, the ensemble temperature variations are significantly smaller at the location of the high citrus tree density (Fig. 12 in Part I), so temperatures any lower at that location would correspond to the MDP exceeding 0.9. The positioning of the temperature perturbation in Fig. 9 is a result of having sufficiently cold temperatures at the locations of the citrus trees to maximize damage and strong positive temperature correlations at neighboring locations with high ensemble variability.

## 6. Summary and discussion

This is the second part of a two-part series on ensemble-based exigent analysis. In Part I, Gombos and Hoffman (2013) used a Lagrange multiplier technique to derive the equation for the exigent damage perturbation (ExDP) and then estimated the ExDP for two case studies from a cold outbreak in January 2010: one of a HDD 90%-WCS and another of a citrus tree 90%-WCS. This article combines exigent analysis with ensemble regression (ER) to predict the most probable perturbations expected to precede and/or coincide with the ExDPs from Part I. For the HDD case study, the ER results are consistent with physical expectations; the trough that precedes and coincides with the anomalously cold temperatures during the HDD case study associated with the ExDP (i.e., the ER-estimated preexigent 300-hPa geopotential height trough) is approximately 59 and 17 m deeper than the ensemble mean at around the time of the ExDP and 24 h earlier, respectively.

For the HDD case study, leave-one-out cross-validation (LOOCV) statistics suggest that the anomaly correlation coefficient (ACC) between the ER-estimated preexigent perturbation and the true (assuming perfect forecasts) preexigent perturbation varies between 0.40 and 0.78 depending on the lead time. For the citrus tree case study, and LOOCV statistics imply no useful ability to specify the preexigent conditions. This is attributable to the nonlinearity of the damage function and the relatively small size of the predictor ensemble domain causing a weak linear relationship between the damage ensemble and the antecedent state vector. Skillful estimates of the preexigent conditions can generally be expected when using large, mG, calibrated ensembles with strong linear covariance between the damage ensemble and the antecedent predictand ensemble.

Estimating preexigent conditions using ER has many potential applications. For example, the methods described in this article can be used to 1) gain insights into the atmospheric dynamics associated with a particular -WCS using the techniques outlined in Gombos et al. (2012), 2) preemptively forecast (Etherton 2007; Gombos 2009) worst-case scenarios in advance of slow data assimilation systems using supplementary forecast guidance, 3) target locations for adaptive observing in order to reduce the variance of high-impact forecasts (e.g., Ancell and Hakim 2007), and 4) perform damage-minimization weather modification (e.g., Hoffman 2002), where one could potentially use exigent analysis to find the ExDP that *minimizes* a damage functional, at an MDP level that reduces the damage to a reasonable degree dictated by resources and other requirements, and apply ER to find the associated preexigent perturbation that will minimize the damage. These and other possible ER applications have potential uses for forecasting and planning in future extreme weather situations that pose risks to life and property.

## Acknowledgments

The authors gratefully acknowledge funding provided by National Science Foundation Grant 0838196.

### APPENDIX

#### Embedding ER into Exigent Analysis

An equivalent alternative to the two-step algorithm of approximating the preexigent perturbation (by first estimating the ExDP and then using it as the predictor perturbation for ER) is extended exigent analysis, a one-step procedure that embeds ER directly into exigent analysis.

In extended exigent analysis, the predictor and predictand ensembles are concatenated so that is replaced by the (*I* + *J*) × *n*_{ens} matrix . Also, **w** is replaced by the (*I* + *J*) × 1 vector , where **w**_{p} equals the original weighting matrix **w** and **w**_{y} is a *J* × 1 vector of zeros. Then, if *q* and *ν* are chosen as in the original exigent analysis, the values of *Q _{p}* and

*Q*are unchanged and the solution of the extended problem, obtained from Eq. (1), may be written as

_{w}where _{pp} = is the covariance of , _{yy} is the covariance of , and is the covariance of and .

## REFERENCES

*Encyclopedia of Measurement and Statistics,*N. Salkind, Ed., Sage Publications, 648–651.

*Synoptic–Dynamic Meteorology and Weather Analysis and Forecasting: A Tribute to Fred Sanders,*L. Bosart and H. Bluestein, Eds., Amer. Meteor. Soc., 147–161.

*Statistical Methods in the Atmospheric Sciences.*2nd ed. Elsevier, 672 pp.