## Abstract

Remote sensing instruments are heavily used to provide observations for both the operational and research communities. These sensors do not provide direct observations of the desired atmospheric variables, but instead, retrieval algorithms are necessary to convert the indirect observations into the variable of interest. It is critical to be aware of the underlying assumptions made by many retrieval algorithms, including that the retrieval problem is often ill posed and that there are various sources of uncertainty that need to be treated properly. In short, the retrieval challenge is to invert a set of noisy observations to obtain estimates of atmospheric quantities. The problem is often complicated by imperfect forward models, by imperfect prior knowledge, and by the existence of nonunique solutions. Optimal estimation (OE) is a widely used physical retrieval method that combines measurements, prior information, and the corresponding uncertainties based on Bayes’s theorem to find an optimal solution for the atmospheric state. Furthermore, OE also allows the relative contributions of the different sources of error to the uncertainty in the final retrieved atmospheric state to be understood. Here, we provide a novel Python library to illustrate the use of OE for inverse problems in the atmospheric sciences. We introduce two example problems: how to retrieve drop size distribution parameters from radar observations and how to retrieve the temperature profile from ground-based microwave sensors. Using these examples, we discuss common pitfalls, how the various error sources impact the retrieval, and how the quality of the retrieval results can be quantified.

Science relies on observations to develop theories about nature, and to determine if those theories accurately approximate how nature works. A wide variety of in situ and remote sensing techniques are used to characterize, understand, and quantify properties and processes that occur in the atmosphere and the surface. Improving our understanding of these processes, and how they interact with each other and the environment is critically important for advancing numerical weather prediction and climate models.

In response to this recognized need, our field has seen an explosion in the number and diversity of remote sensing instrumentation. We are using advanced active remote sensors such as lidars, radars of various wavelengths, sodars, scintillometers, and global positioning systems. We are using passive remote sensors like infrared spectrometers, microwave radiometers, and imaging radiometers that operate at wavelengths from the visible to the infrared, and beyond. All of these instruments are taking advantage of various physical laws, many embodied in the principles of radiative transfer, to gain new insights into the processes in the atmosphere.

There is a common thread that holds in most of these observations: we are not actually observing what we want to know. These instruments are measuring a change in voltage, the number of photons passing into a detector over a certain period, a Doppler shift in radar or lidar frequency, and the intensity of the backscattered energy. However, we are generally interested in very specific atmospheric variables, e.g., the ice water content at a certain altitude or within a certain volume, temperature and water vapor at distinct heights, or the aerosol and cloud droplet number concentration. Thus, we are left with the problem of extracting very specific information from the remote sensing observations that are typically only partially related to the variable of interest. In atmospheric sciences, we often call this inverse process a retrieval.

Very often, if we have a measurement that has some sensitivity to the atmospheric variable we desire, we can compute the signal that we would observe with our remote sensor using a so-called forward model, i.e., the forward process. In other words, if we knew the atmospheric state, which includes here all variables affecting the observed signal, we could reproduce the measurement. These forward models ideally are based upon first principles, so that we have a fair degree of confidence in their fidelity. However, they are often nonlinear, which makes them difficult if not impossible to invert analytically. Thus, the retrieval problem is essentially the development of an algorithm that is used to invert the forward model (*F*) so that we can derive the atmospheric variable that we desire (**x**; e.g., humidity, temperature, drop size distribution) from the observation that we have made from our remote sensor (**y**; e.g., brightness temperature, radar reflectivity).

Stephens (1994) provided a classic illustration for a retrieval. Suppose that you desire a description of a dragon but you observe only the footprints that the dragon makes in the sand. Now, if you already know the dragon, you can pretty easily describe the tracks it might make in the sand; i.e., you can develop a forward model. But if you observe only the tracks in the sand, it will be much more difficult to describe the dragon in any detail. You will likely be able to tell that it was a dragon and not a deer, but there will be aspects that you will be unable to characterize: the dragon’s color, if it has wings, etc. A retrieval can combine the observations (large footprints) with prior information (most dragons have wings and the ones making large footprints are green) to get the most likely state (it was a green dragon with wings).

The retrieval process is complicated by many factors. First, there is the uniqueness problem. There is no guarantee that there is only a single **x** that maps to the observation **y**; it is quite possible that *F*(**x**) = *F*(**x**) = **y**, where **x** ≠ **x**. This could imply that there is a distribution of states around **x** that will map to **y**, but may also mean that there are several very different states mapping to **y**. Because of this, and because there are often more unknowns than there are measurements, inversion problems are more often than not ill posed. Second, there is no such thing as a perfect instrument implying that **y** has some noise component and can also be represented by a probability distribution. Third, there are still uncertainties within the models: either in the physics themselves (single-scattering properties, for instance) or in the ancillary input datasets that are needed to drive the model (along with the atmospheric variable we desire) to simulate the observation. Thus, we have to invert a set of noisy observations using an imperfect forward model where multiple discrete descriptions of the atmospheric state could reproduce the measurements!

Often, a prior dataset (i.e., a climatology) of the atmospheric observations is used together with a forward model to simulate the observations that would be associated with each **x**. Using a prior constrains ill-posed problems where the most likely state is selected from the possible solutions based on prior information. The accuracy of a retrieval will depend strongly on the prior information the user has before taking the measurement (i.e., the prior dataset with mean **x**_{a} and covariance matrix $Sa$) used in the construction of the retrieval relative to the particular case being retrieved. If the current case is well represented by the prior dataset (i.e., **x**_{truth} is “close” to **x**_{a}), then the retrieval will likely be reasonably accurate, but if the current case is poorly represented in the prior dataset (i.e., **x**_{truth} is “far” from **x**_{a}), then the accuracy of the retrieval will likely be poorer, especially if the forward model is nonlinear. This means that during retrieval development that the user must not only choose an adequate prior (e.g., using a tropical dataset as a prior for tropical measurements) but must also have a basic physical understanding about the variable the user wants to retrieve. However, the benefits of adding information from the prior to the retrieval comes at a price and the retrieval can perform poorly when applied to very unlikely cases such as extreme events.

A wide range of techniques has been developed to perform retrievals (e.g., Twomey 1977; Tibshirani 1996; Rodgers 2000; Tarantola 2005; Stephens and Kummerow 2007; Aster et al. 2018), and many scientific disciplines use inversion theory as part of their normal activities. Retrieval algorithms generally fall into two distinct groups: “statistical” and “physical” retrievals^{1} (Kidder and Vonder Haar 1995). Statistical retrieval algorithms develop empirical relationships between the atmospheric variable **x** and the observation **y** using techniques like linear, polynomial or lasso regression algorithms, empirical orthogonal functions, neural networks, and theoretical relationships between scattering, emission, and transmission with the atmospheric properties of interest (e.g., Nakajima and King 1990). These are essentially statistical inferences. While being computationally efficient, statistical retrievals can struggle when dealing with nonunique or nonlinear problems and including all uncertainty sources for estimating retrieval uncertainties can be challenging. In contrast, physical retrievals use and invert a forward model by an iterative process to converge to a solution considering the prior. By this, physical retrievals can deliver more insight into the nature of the inverse problem. This includes quantifying the information content of the measurement, but also consistency checks whether the measurement can be reproduced with the solution.

Physical retrievals exploit the idea, introduced earlier, that observations, forward models, and the assumptions necessary to drive them have inherent uncertainty and can be represented by probability distributions. These algorithms typically use the Bayes’s theorem:

which states how probable a certain **x** is given **y** [i.e., *P*(**x**|**y**)]. This probability density function is already the solution for the inverse problem, which means that the solution includes an uncertainty estimate. Bayes’s theorem tells us that it can be obtained by multiplying the probability of **y** given **x** [*P*(**y**|**x**); the likelihood] with the a priori probability density of the state *P*(**x**) (e.g., based on a climatology, a model forecast, or a measurement at some distance away in space or time). The probability *P*(**y**|**x**) represents the uncertainty in the measurement process and also in the model that converts the state **x** into observations **y**, while the division by *P*(**y**) is effectively a normalization factor.

The optimal estimation (OE) is a widely used physical retrieval method and the equations (Rodgers 2000) can be derived from Eq. (1) if one assumes all of the probabilities have a Gaussian distribution. OE also implicitly assumes that there is no bias in the measurements and prior dataset; in other words, the average of their uncertainties is zero. In this study, we illustrate the use and typical challenges of OE, which naturally combines the uncertainties in both the observations and the a priori information. A complete literature overview is beyond the scope of this study, but important applications include atmospheric profiling based on microwave or infrared radiometers (Turner and Löhnert 2014), cloud and precipitation properties (Löhnert et al. 2004; Mace et al. 2016) trace gas concentrations retrievals (Crisp et al. 2004; Wunch et al. 2011), and data assimilation for numerical weather prediction models (Bauer et al. 2006; Janisková 2015). Since retrieved data are utilized so heavily in our field and we continually develop more advanced instrumentation and algorithms, it is important that the retrieval uncertainties are accurately quantified and made available with the retrieved information.

To illustrate how OE works, we discuss two simplified retrieval applications: retrieving temperature and humidity profiles from a multichannel ground-based microwave radiometer and retrieving drop size distribution parameters from ground-based cloud radar observations. We use the recently published Python pyOptimalEstimation library^{2} and also provide the code to reproduce all results and figures in supplemental Jupyter notebooks that can be executed in a web browser^{3} without installing any software. We strongly encourage the reader to explore the behavior of the example retrieval applications with the Jupyter notebooks on their own. The library, the examples, and the extensive documentation form a toolset that allows the reader to develop their own retrieval algorithms.

## How OE works

OE is trying to find an optimal solution **x**_{op} given an observation and prior information using an iterative process (Fig. 1) where the prior is described by the mean **x**_{a} and the covariance matrix $Sa$. We start the iteration with a first guess for the atmospheric state **x**_{0}. If available, measurements from another instrument, model results, or the retrieval results at an earlier sample time can be used for **x**_{0}, but in the absence of other information we follow the common approach to use **x**_{a} for **x**_{0}. The forward operator *F* is used to transform **x**_{a} from **x** to **y** space, resulting in *F*(**x**_{a}). In the next step, the difference between *F*(**x**_{a}) and **y**_{obs} is used to make the next guess **x**_{1} considering the distance from **x**_{1} to **x**_{a}. This is done by weighting these differences by the measurement uncertainty covariance matrix $Sy$ and, respectively, the prior state covariance matrix $Sa$. The fact that $Sa$ and $Sy$ are covariance matrices implies that the errors of **x**_{a} and **y**_{obs} can be described with normal distributions. Obtaining **x**_{1} from **x**_{a} and **y**_{obs} requires inverting *F*, which is typically not possible analytically. Therefore, the Jacobian matrix $K$ = *∂***y**/*∂***x** is used to linearize and invert *F*, which results in the requirement that *F* is *moderately linear*. This means that for OE to work, *F* must behave in approximately a linear manner for small perturbations used to obtain $K$. In atmospheric science, most problems are moderately linear due to the propensity of exponentials and power laws used within them and the absence of discontinuities. Because $K$ is only an approximation for *F* around **x**_{0}, **x**_{1} obtained with this process is typically not the solution yet and the process must be repeated until **x** converges to a stable, optimal solution **x**_{op}. Note that $K$ depends on **x** used for estimating the Jacobian, which means that $K$ must be recalculated for every iteration step, which might be computationally expensive depending on the lengths of **x** and **y**.

Mathematically, the iterative process for iteration *i* + 1 can be summarized with

where $Se$ is the *effective* measurement uncertainty. In fact, *F* depends not only on **x**_{i}, but also on the parameter vector **b**, which includes additional input variables for the forward operator that are not retrieved by the algorithm and are considered fixed. For example, radiative transfer models used for microwave radiometers require not only temperature and humidity profiles (which compose typically **x**), but also a pressure profile. Additionally, there are almost always other fixed parameters in the forward model that encompass sources of uncertainty; e.g., for microwave radiometers, there are uncertainties in the line parameters that are used in the radiative transfer model (Cimini et al. 2018). Even though often ignored, the uncertainties of **b** can be also considered with the corresponding covariance matrix $Sb\u2032$ (in **y** space^{4}),which is why $Sy$ is replaced by the more general $Se=Sy+Sb\u2032$ in Eq. (2).

The iteration stops when the convergence criterion^{5}

is fulfilled. Here, $Si$ describes the uncertainty of the retrieved **x**_{i} and can be obtained by combining the uncertainties related to **x** and **y** space:

Thus, the convergence criterion in Eq. (3) is really stating “the retrieval is considered converged when the change in **x** is smaller than the derived uncertainty of **x**.” After convergence is reached at iteration *i*, **x**_{i} and $Si$ are assumed to be the optimal solution **x**_{op} and uncertainty $Sop$. The fact that $Sop$ can be easily obtained is an important advantage of OE and other physical retrieval methods in comparison to statistical retrievals where additional steps are required to estimate the uncertainties (e.g., Cadeddu et al. 2009).

## Sources of retrieval uncertainty

Every retrieval algorithm suffers from the same fundamental sources of uncertainty related to the three covariance matrices $Sy$, $Sa$, and $Sy$. To illustrate this, we designed a simplified retrieval (see supplement A) for atmospheric temperature and humidity profiles based on brightness temperature measurements of a common ground-based microwave radiometer [humidity and temperature profiler (HATPRO); Rose et al. 2005].

Here, the prior temperature and humidity profiles consist of a 16-yr spring average (March–May) of radiosonde observations at the U.S. Department of Energy Atmospheric Radiation Measurement (ARM) program at the North Slope of Alaska (NSA) site in Utqiaġvik (formerly known as Barrow), Alaska (Verlinde et al. 2016). As a forward operator, we use a fast and simple radiative transfer model that does not account for scattering by hydrometeors (Löhnert et al. 2004). We do not use real radiometer observations but obtain a synthetic measurement based on a radiosonde and the forward operator. This has two main advantages: first, systematic forward model errors cancel each other out and—more importantly—we know the true atmospheric state **x**_{truth}, which we can compare to the retrieval’s result **x**_{op}. Therefore, we always recommend using synthetic observations during retrieval development before applying it to real observations. Standard values for the uncertainties in the observed microwave brightness temperatures are used for the diagonal elements of $Sy$; other error sources are neglected in the beginning. We want to stress, however, that the retrieval accuracy can be significantly reduced when using real observations. This is associated to “real world” challenges such as biases in the measurements, the representatives of the prior data, or the quantification of forward model errors. An extensive discussion of OE retrievals based on real observations can be found, e.g., in Ebell et al. (2017) or Martinet et al. (2017).

### Uncertainties related to the prior.

The prior dataset serves as a tremendous constraint for the entire retrieval; in fact, the uncertainty in the retrieval must be smaller than the uncertainty in the prior. Otherwise, the retrieval does not add any information from the observations. However, if we do not have a good measure of the variability and covariance of the true atmospheric state, then the retrieval could easily give a biased result. In some cases, the prior does not serve as a serious constraint because the information content in the observations is so high, such as retrieving liquid water path from a microwave radiometer (Turner et al. 2007). In other cases, it is difficult to derive the $Sa$ matrix from existing observations (e.g., the vertical level-to-level covariance of liquid water content), and thus model simulations (Löhnert et al. 2001) or educated guesses (Maahn et al. 2015) are used to provide this information to serve as a constraint in the retrieval. However, for retrieving temperature and humidity profiles from microwave radiometer (MWR) observations, the prior is critically important. In supplement A, we are trying to obtain 120 atmospheric quantities (60 height levels for each temperature and humidity from 0 to 22,500 m) from only 14 radiometer measurements, which makes the retrieval problem underconstrained. The discrepancy between the lengths of **x** and **y** underlines why a prior is required to get useful information out of the radiometer.^{6}

Here, we modify the MWR retrieval to assess the impact of using a suboptimal $Sa$ that is not consistent with the prior dataset. In the atmosphere, temperature and humidity are correlated to each other and to themselves at various height levels. This makes the nondiagonal elements of $Sa$ crucial (Fig. 2): The retrieval considering nondiagonal elements (reference run) converges to a **x**_{op} with a reasonable agreement to **x**_{true} and a significant reduction of the uncertainties (Fig. 2a). If we assume that there are no level-to-level correlations in the prior (i.e., by setting nondiagonal elements to zero), the humidity solution stays close to the prior, the uncertainties are hardly reduced with respect to the prior, and there is even a nonphysical artifact in the humidity profile (Fig. 2b).

The result **x**_{op} is also biased when a wrong prior dataset is used (Fig. 2c): When using summer radiosonde observations (June–August) to obtain **x**_{a} and $Sa$, **x**_{truth} is mostly outside of the 1-sigma range of $Sa$. The result **x**_{op} differs from **x**_{truth} mostly in the upper atmosphere (200–300 hPa) and close to the surface where **x**_{op} bends to **x**_{a}. But how can the reader detect such a situation for a real-world application when **x**_{truth} is unknown? To identify issues with $Sa$, we strongly advise that the user apply statistical *χ*^{2} tests (see supplement A) to test whether the retrieved measurement **y**_{op} agrees with **y**_{obs} and whether **y**_{op}, **y**_{obs}, and **x**_{op} are in agreement with the prior state considering the assumed uncertainties. These tests fail for the retrieval ignoring nondiagonal elements or using a wrong prior.

### Uncertainties related to the forward operator.

The second source of uncertainty is associated with the forward model. While forward models are usually built on first principle understanding, many approximations still exist either due to limitations in our understanding of the physics or for computational efficiency reasons. The use of a 1D radiative transfer model instead of a 3D radiative transfer model is one example of an approximation that is frequently used, even though the 3D effects are important in most cloud scenes (e.g., Liang and Girolamo 2013). Furthermore, the inputs and assumptions needed by *F* often consist of more than just the elements of **x** that are represented by the parameter vector **b** and its covariance matrix $Sb$. These so-called model parameters need to be known as their uncertainties also contribute to the uncertainty of *F*.

The forward model uncertainties are typically ignored in most current retrieval algorithms (i.e., $Se=Sy$) assuming that the forward model is perfect. However, for some cases, ignoring the uncertainties associated with **b** is the Achilles heel of the retrieval itself, which can result in a drastic underestimation of the error in the retrieved **x** (e.g., Mace et al. 2016). This can be also seen for our MWR retrieval. Because the uncertainties of our forward operator have not been specifically quantified yet, we rely on the results of Cimini et al. (2018), who analyzed the uncertainties in the gas spectroscopy used in our forward microwave radiative transfer model. They found significantly correlated errors for some channels, particularly around the oxygen line. When adding their $Sb\u2032$ (their Fig. 9) to our $Se$, the balance between $Se$ and $Sa$ shifts and more emphasis is put on $Sa$. This leads to larger uncertainties for **x**_{op} so that the optimal to prior uncertainty ratio^{7} is increased by up to 11 percentage points (Fig. 3a).

This can also be seen as a reduction in the information content that is retrieved from the measurements that can be quantified using the degrees of freedom for signal *d*.^{8} These describe the number of independent information pieces that can be obtained from the measurement, given the prior state. For a retrieval where the observations have perfect information content, *d* would be equal to the number of measurements or state variables, whichever is smaller. The reference retrieval can retrieve a total of 4.67*d* (of a maximum value of length of **y** = 14) for the humidity and temperature profiles combined. Also *d* can be separated by state variables (Fig. 3b), which shows that the highest information contents can be obtained for temperature close to the surface and for humidity around 500 hPa. By including the forward model uncertainty $Sb\u2032$ in the retrieval, the total *d* is reduced to 4.01, with the primary loss of information occurring in the middle atmosphere (Fig. 3b).

### Uncertainties related to the measurement.

The perhaps most obvious source of uncertainty in a retrieval is the uncertainty in the observations themselves. **y** is typically multidimensional, i.e., there are observations from multiple channels from a single instrument, or multiple instruments used in the retrieval. There will be random uncorrelated errors for each particular element of **y**, as well as correlated errors between the different elements of **y**. This can be concisely specified in a covariance matrix $Sy$. Similar to $Sa$, making $Sy$ larger (smaller) changes the weights in the retrieval and puts more (less) weight on the prior at the expense of the observations. In supplement A, we detail how quadrupling (i.e., doubling the measurement uncertainty deviation) $Se$ reduces the retrieved *d* by 0.75 and leads to **x**_{op} being closer to **x**_{a}.

For many instruments, it is easier to estimate the uncorrelated error in each particular element of the observation. For example, the basic measurement of microwave radiometers is typically a voltage output by the receiver where the random uncertainty is related to the noise characteristics of the receiver. But errors can be also correlated among measurements, e.g., if multiple measurements are obtained by the same receiver subject to common amplifier elements. Determining this correlated error is typically much harder but can be done (Tobin et al. 2007; Maahn and Löhnert 2017; Turner and Blumberg 2019). In our MWR retrieval, we also neglected nondiagonal elements in $Sy$ initially. To assess the impact, we included the nondiagonal elements to the $Sy$ used in $Se$ (not accounting for $Sb$). Both the random noise and the correlation of the random noise between the HATPRO frequencies have been determined by continuously measuring the brightness temperatures of a blackbody with a known temperature.

Even though the changes in retrieval uncertainty and the element-wise *d* are small in comparison to the reference run that assumed $Sy$ to be diagonal, the resulting total *d* is increased by 0.19. Therefore, accounting for nondiagonal elements is one of the few occasions where considering “additional” error sources leads to an enhancement instead of a decrease of retrieval quality because correlations between the different elements of **y** (i.e., the different channels in the microwave radiometer) can reduce the effective noise of the instrument. Referring to Fig. 1, correlated uncertainties would result in smaller shaded areas around **y**_{obs} compared to uncorrelated uncertainties having the same variances.

## Impact of retrieval assumptions

So far, we have investigated error sources related to the individual components of OE. But what happens when the core assumptions about OE are violated? OE requires that the forward operator be moderately linear^{9} (i.e., linear when perturbing **x**) and that the measurement as well as the prior error can be described by normal (Gaussian) distributions. We have assumed that the long-term averaged radiosonde profile prior dataset represents the un-biased population truth. Thus, the distribution of prior data itself must follow a normal distribution.

We illustrate the importance of assuming normal distributions by introducing a second example (see supplement B for details) using radar reflectivity *Z*_{e} and mean Doppler velocity *V*_{d} of a vertically pointing cloud radar to retrieve a raindrop size distribution (DSD). We parameterize the DSD using a scaling parameter *N*_{w}, the raindrop mass-spectrum mean diameter *D*_{m} and the normalized mass spectrum standard deviation $\sigma m\u2032$ (Williams et al. 2014). These three quantities form the state vector **x** while *Z*_{e} and *V*_{d} are combined into the measurement vector **y**. The a priori information (**x**_{a} and $Sa$) is obtained from the observations from three disdrometers deployed near Huntsville, Alabama, from December 2001 to October 2011 (Williams et al. 2014). Because violating retrieval assumptions will likely impact the robustness of the retrieval, we analyze the retrieval for 100 randomly chosen profiles. Similar to the first example, we use synthetic observations **y**_{obs} derived with a radar simulator (Mech et al. 2020) that are not included in the prior.

One option to deal with nonnormally distributed prior data are to transform the data so that it follows a normal distribution more closely. In supplement B, we use quantile–quantile (QQ) plots to show that the prior **x** follows a normal distribution closer when applying a logarithm to **x**. Note, this is a rather simple solution for making the distributions more normal; however, some retrieval problems require more complex transformations (e.g., Anscombe 1973; Mason et al. 2017). When comparing the retrieval quality for linear and logarithmic **x**, a clear improvement in retrieval quality can be seen for the latter: the percentage of converged retrievals raises from 86% to 100%. This also applies to the number of profiles passing the statistical *χ*^{2} tests (see “Uncertainties related to the prior” section) and a linearity test that tests the assumption that the forward operator is moderately linear. Further, we show with another QQ plot in supplement B that the distribution of “real” retrieval errors of the optimal solution (**x**_{truth} − **x**_{op}) is in better agreement with the estimated $Sop$ when using a logarithmic **x**.

Another way to show the enhanced quality is to look into the distributions of the optimal to prior uncertainty ratio and the individual *d* for all profiles (Fig. 4). The clear reduction in spread when using the logarithm of the state variables is obvious indicating that the retrieval produces more consistent results that do not depend as much on the individual profile. The median values of the optimal to prior uncertainty ratio distribution are lower for the linear state variables indicating that the linear retrieval version is in general underestimating retrieval uncertainties. At the same time, *d* can be greatly overestimated for individual profiles.

## Summary and concluding remarks

Most of the observations of the Earth–atmosphere system are from remote sensors (e.g., satellite observations, weather radar observations) that are used in a wide range of operational and research applications. For all of these sensors, atmospheric variables of interest are never directly observed but must be derived from the observations within an inversion algorithm, which often requires the use of prior constraints. The uncertainties in these retrieved quantities arise from three sources: the prior dataset used to either construct or constrain the retrieval (characterized by the covariance matrix $Sa$), the uncertainties in the forward model assumptions and input ancillary datasets used in the forward model ($Sb$), and the uncertainties in the observations themselves ($Sy$). Within the inversion algorithm, these uncertainties can be propagated to provide uncertainties in the retrieved atmospheric state **x**_{op} given by the covariance matrix $Sop$. In a series of examples, we have utilized the optimal estimation (OE) retrieval method to show the impact of how changes in these uncertainties result in changes in the retrieved state vector **x**_{op} and $Sop$. We also show that non-Gaussian uncertainty distribution can lead to a degradation of the OE performance. Nonnormally distributed state variables should be normalized to avoid negative impacts on retrieval quality and robustness. Using forward operators that are grossly nonlinear (i.e., are not moderately nonlinear) will also lead to a decrease of accuracy.

The supplemental Jupyter Notebooks, which can be run online in a web browser,^{10} provide the complete code for all examples. We strongly encourage the readers to experiment with the examples by themselves. Together with the pyOptimalEstimation Python library,^{11} this gives the readers all information to get started with their own OE retrieval projects.

We contend that more work is needed to accurately characterize these three input covariance matrices. Too often, our community assumes that the measurement covariance matrix $Sy$ is diagonal and that there is no correlation between different measurements within the observational vector **y**. We challenge instrument developers to devise methods to test this assumption, and for scientists that develop retrievals to include the results from these tests in their algorithms. However, it must be stressed that such a detailed error characterization may not lead to a retrieval improvement if other error sources such as measurement biases are not correctly determined and accounted for before the retrieval application.

The prior dataset is critical as a constraint for physical retrievals. Often, however, the data needed to build a well-characterized prior dataset are inadequate or simply unavailable. For example, there are very few observational datasets available that allow us to determine the level-to-level covariance of cloud microphysical properties, which is critical information that is needed for cloud property retrievals, and therefore the community is using other sources such as model simulations.

Also, we believe that the uncertainties and assumptions in the forward model parameters have been neglected for too long. Forward models may be fundamentally incorrect (e.g., applying 1D radiative transfer approaches to situations that are inherently 3D), or may have uncertainties in the model parameters that affect retrieval results. In some applications, the uncertainty in **b** can dominate the uncertainty in the retrieval. Characterizing the uncertainties in *F* may require additional supporting data so that closure studies can be performed. Thus, a well-constructed set of field campaigns may be able to improve the characterization of both $Sa$ and $Sb$ simultaneously.

It is important to recognize that OE is just one tool, albeit a powerful one, that can be used to retrieve atmospheric information from remote sensing observations. If the limitations of OE make it inapplicable to a problem, other physical retrieval approaches such as the computationally expensive Markov chain Monte Carlo method (Posselt and Mace 2014; Posselt et al. 2017) are more appropriate for highly non-Gaussian cases or where the forward model is highly nonlinear.

Last, even the best retrievals can be only as good as the underlying observations stressing the need for enhanced instruments that can constrain retrievals better; this could be achieved using new instrument concepts, improved designs, or smarter sensor synergies.

## Acknowledgments

We thank Christopher Williams for supporting us with the Huntsville dataset (Petersen et al. 2010), Susan Cobb for her assistance in creating Fig. 1, and Clive Rodgers for helping us to track down a coding error. Datasets from the North Slope of Alaska (NSA) site are provided and archived by the U.S. Department of Energy Atmospheric Radiation Measurement (ARM) program (Giangrande and Toto 2016). This work was partially supported by the U.S. Department of Energy’s Atmospheric Systems Research (ASR) program (DE-SC0013306 and DE-SC0014375), NOAA’s Physical Sciences Lab, and the NOAA Atmospheric Science for Renewable Energy Program. A portion of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. We thank the three anonymous reviewers for providing valuable comments.

## References

*Amer. Stat.*

*Parameter Estimation and Inverse Problems*. Elsevier, 406 pp.

*Quart. J. Roy. Meteor. Soc.*

*IEEE Trans. Geosci. Remote Sens.*

*Atmos. Chem. Phys.*

*Adv. Space Res.*

*J. Geophys. Res. Atmos.*

*Quart. J. Roy. Meteor. Soc.*

*Satellite Meteorology: An Introduction*. 1st ed. Academic Press, 466 pp.

*J. Geophys. Res. Atmos.*

*J. Atmos. Oceanic Technol.*

*J. Appl. Meteor.*

*J. Appl. Meteor. Climatol.*

*J. Atmos. Oceanic Technol.*

*J. Geophys. Res. Atmos.*

*Atmos. Meas. Tech.*

*Atmos. Chem. Phys.*

*Geosci. Model Dev. Discuss.*

*J. Atmos. Sci.*

*J. Appl. Meteor. Climatol.*

*J. Geophys. Res.*

*J. Appl. Meteor. Climatol.*

*Proc. 17th Python Sci. Conf*., Austin, TX, Enthought, 113–120, https://doi.org/10.25080/Majora-4af1f417-011.

*Inverse Methods for Atmospheric Sounding: Theory and Practice*. World Scientific, 240 pp.

*Atmos. Res.*

*Remote Sensing of the Lower Atmosphere: An Introduction*. Oxford University Press, 562 pp.

*J. Atmos. Sci.*

*Inverse Problem Theory and Methods for Model Parameter Estimation*. Society for Industrial and Applied Mathematics, 349 pp.

*J. Roy. Stat. Soc.*

*J. Appl. Remote Sens.*

*J. Appl. Meteor. Climatol.*

*IEEE J. Sel. Top. Appl. Remote Sens. Earth Obs.*

*IEEE Trans. Geosci. Remote Sens.*

*J. Atmos. Oceanic Technol.*

*J. Atmos. Sci.*

*The Atmospheric Radiation Measurement (ARM) Program: The First 20 Years, Meteor. Monogr*., No. 57, Amer. Meteor. Soc., 8.1–8.13, https://doi.org/10.1175/AMSMONOGRAPHS-D-15-0023.1.

*J. Appl. Meteor. Climatol.*

*Philos. Trans. Roy. Soc.*

## Footnotes

Supplemental material on Python code belonging to all shown retrieval examples available at https://github.com/maahn/pyOptimalEstimation_examples

^{1}

For brevity, we omitted hybrid approaches combining both approaches.

^{2}

v1.1, available at https://github.com/maahn/pyOptimalEstimation

^{3}

Jupyter Notebooks are a web application for creating documents that contain live code, equations, and figures. The supplemental notebooks are available online (https://github.com/maahn/pyOptimalEstimation_examples) using Binder (Project Jupyter et al. 2018).

^{4}

The matrix $Sb\u2032$ in y space can be easily obtained from the covariance of the elements of b (call this $Sb$) and then computing the Jacobian of *F* with respect to **b**, denoted as $Kb$ with $Sb\u2032=KbSbKbT$.

^{5}

There is also a criterion to test for convergence in **y** space instead of **x** space [Rodgers 2000, their Eq. (5.33)], but we found that our sample retrievals obtain convergence faster in x space without any significant difference in the quality of the solution (see supplement A).

^{6}

However, if we were trying to retrieve only 14 atmospheric quantities, the prior would still be critical because passive radiometer measurements are typically correlated.

^{7}

Defined as sqrt[diag($Sop$)/diag($Sa$)].

^{8}

We obtain *d* from the averaging kernel $A=I\u2212SopSa\u22121$ (Turner et al. 2016). The elements of the diagonal of $A$ show the degrees of freedom for a signal that can be obtained for the individual elements of **x**. We define *d* as the trace of $A$. In case diag($SopSa\u22121$) → 0, *d* reaches the maximum value of length of **x**.

^{9}

If the uncertainties in either the observation space or the state space are highly non-Gaussian, then other techniques such as the Markov chain Monte Carlo (MCMC; Posselt et al. 2008) method should be used.