Diagnostic Analysis of Various Observation Impacts in the 3DVAR Assimilation System of Global GRAPES

Observation impact studies have received increasing amounts of research attention. The impacts of observations on numerical weather prediction (NWP) are highly dependent on assimilation algorithm, prediction system, and observation source. Therefore, the major NWP centers worldwide have each developed their own diagnostic techniques to assess observation impacts. However, similar diagnostic techniques have not yet been developed in China. In this study, a diagnostic technique was exploited with the randomized perturbation method in the Global/Regional Assimilation and Prediction System (GRAPES) 3DVAR system, and then applied to evaluate observation impacts for various regions of the world. It was found that a reasonable and stable estimation could be obtained when the number of perturbations was greater than 15. Because of differences in observations in the Northern and Southern Hemispheres, refractivity data from GNSS radio occultation (GNSS-RO), satellite radiance, and atmospheric motion vector data had more impact in the Southern Hemisphere than in the Northern Hemisphere. However, radiosonde data, aircraft, and surface data were more important in the Northern Hemisphere. Low-impact observation points were located in data-rich areas, whereas high-impact observation points were located in data-poor areas. In the equatorial region, the contributions of observations to the analysis were smaller than those in the nonequatorial regions because of the lack of proper mass–wind balance relationship. Radiosondes contributed the largest impact in China and its surrounding regions, with contributions of radiosondes and GNSS-RO data exceeding 60% of the total contributions, except for wind speed below 700 hPa.


Introduction
The objective of atmospheric data assimilation is to produce an accurate representation of the state of the atmosphere based on all existing information (Talagrand 1997). This method can provide an improved initial field for numerical models by combining various observations with an NWP product (the first-guess or background forecast) and their respective error statistics, and thus reduces errors in the initial field. With assimilation methods becoming increasingly sophisticated (Parrish and Derber 1992;Rabier and Courtier 1992;Rabier et al. 2000;Xue et al. 2008;Huang et al. 2009;Zhang et al. 2013;Gao and Stensrud 2014;Xu et al. 2015) and the number of assimilated observations increasing rapidly, the assessment of the contribution of each observation to the analysis has become one of the most challenging topics in data assimilation and NWP (Cardinali and Healy 2014).
At present, there are two major methods for assessing the impacts of observations on the analysis. One method uses a Lanczos-conjugate gradient method to calculate the degree of freedom for signals (DFS;Fisher 2003; Denotes content that is immediately available upon publication as open access. Cardinali et al. 2004;). The other method uses a randomization procedure whereby observations are perturbed, to estimate the error variance reduction caused by observations (Desroziers et al. 2005). Desroziers and Ivanov (2001) and Chapnik et al. (2004) used a randomization procedure to compute a particular matrix, which was closely associated with the DFS. Desroziers et al. (2005) applied the randomization procedure to diagnose the contributions of observations to error reduction in the analysis, and successfully conducted a diagnostic analysis in a 4DVAR system developed in France. Chapnik et al. (2006) found that in a simplified data assimilation system, the randomization procedure can be extended to a slightly nonlinear case. Brousseau et al. (2014) found that the randomization method and DFS had similar results and suggested that the variance reduction could be used as a diagnostic to improve the understanding of how observations are used.
The observation impacts on the model forecast are very important and are measured mainly by observing system experiments (OSEs; Atlas 1997;English et al. 2004;Radnoti et al. 2012) and forecast sensitivity to observations (FSO; Baker and Daley 2000;Liu et al. 2009). The OSEs (Kelly et al. 2007) can measure the impacts of these assimilated observations on all forecast metrics and can estimate the observation impacts on longer range forecasts without linear approximation constraints. The FSO approach is to calculate the forecast error sensitivity to observations by adjoint-based (Langland and Baker 2004; or ensemblebased (Li et al. 2010;Sommer and Weissmann 2014) techniques. The FSO approach only computes the variation in the short-range forecast error caused by the assimilated observations, where the forecast error is represented and measured by a scalar function of the model parameters. The adjoint-based FSO requires development of the adjoint of a forecast model, which is complicated and is also constrained by linear assumptions. Although the ensemble-based FSO does not use the adjoint model, it requires a large number of ensemble members to assess the observation impacts on the forecast. Thus, the FSO method is complementary to the OSEs for estimating observation impacts in a forecasting system (English et al. 2004;Radnoti et al. 2012).
Since an observation impact study can provide valuable information for the improvement of NWP and the observing system, almost all the major NWP centers worldwide have developed their own diagnostic techniques to assess the impacts of observations on NWP (Cardinali and Prates 2011;Cardinali 2013;Auligné et al. 2011;Brousseau et al. 2014;Joo et al. 2013;Cucurull and Anthes 2014). At present, the OSE is the most important method for estimating the observation impacts in the operational Global/Regional Assimilation and Prediction System (GRAPES) global system. This method, however, requires a large amount of computational resources. The observation impacts on NWP are highly dependent on the assimilation algorithm, prediction system, and observation sources. Thus, it is necessary to perform diagnostic analysis for a specific assimilation and prediction system.
Since the observation changes first led to the variations of model initial fields and then caused the variations of model forecast, estimation of the impacts of the observations on the analysis is the basis for measuring the observation impacts on the forecast. Therefore, this paper focuses on the observation impacts on the analysis. The operational GRAPES three-dimensional variational (3DVAR) assimilation system and the randomization method with perturbed observations are described in section 2, which are important components of the diagnostic technique in this paper. The technique is validated in section 3 and applied to evaluate the impacts of various observations on different regions in section 4. Final conclusions and discussions are presented in section 5.

Diagnostic method, assimilation system, and observations
a. Randomization method In this paper, a diagnostic technique is developed, which is based on a randomization method (Desroziers et al. 2005) and the GRAPES 3DVAR system. A detailed introduction to this method with perturbed observations is provided here, including the transformation operator and terms related to error variance reduction (EVR) of state variables.

1) THEORETICAL EQUATIONS
Various observations contribute differently to the reduction of the background error variance. Impacts of various observations on the analysis can be obtained through the randomization method with perturbed observations. Cohn (1997) pointed out that when the background error covariance and observation error covariance are reasonably specified, the analysis error covariance matrix could be expressed as where A and B are the analysis and background error covariance matrices, respectively; H is the tangent linear observation operator; K is the gain matrix; and R is the observation error covariance matrix. KHB or AH T R 21 HB represents the error reduction term after all observations are assimilated. In the GRAPES 3DVAR system, R is assumed to be a diagonal matrix. AH T R 21 HB can be expressed as the sum of error covariance reduction of each individual observation dataset: where H i and R i , respectively, are the linearized observation operator and observation error matrix of the observation subset i. If defining a projection operator p i , by which the observation subset i can be obtained from the complete set of observations, H i and R i can be written as H i 5 p i H and R i 5 p i Rp T i . Define S i as S i 5 p T i p i in which the diagonal element s ii is equal to 1 and the remaining elements are 0, and then we have: Based on Eqs. (1) and (3), it is clear that KHB and KS i HB are the error reductions caused by assimilating all observations and the observation subset i, respectively. According to the matrix trace definition, the EVR can be obtained by the matrix traces of KHB and KS i HB. The EVR of all observations can be written as Similarly, the EVR of the observation subset i can be written as r i 5 Tr(KS i HB).
Here r and r i can only be expressed in theory. Because KHB and KS i HB contain various state variables along the diagonal, which have different units, it is impossible to solve KHB and KS i HB directly in the operational 3DVAR system. Therefore, a transformation operator and an approximate method are applied to obtain the estimates of variance reduction. By using the transform operator L, the EVR for the variable of interest is obtained from that of multiple state variables, and can be written as t 5 Tr(LKHBL T ) . (4) Similarly, the EVR caused by the observation subset i is written as The definition of L is very flexible, that is, it can be a space projection operator for projections to an isobaric surface or projections over a certain horizontal range, or it can be a variable transform operator from state variables to other variables.
In the next section, a randomization method is used to solve Eqs. (4) and (5).
2) SOLUTIONS OBTAINED BY THE RANDOMIZATION METHOD Equation (4) is transformed first in terms of the nature of matrix trace, that is, Tr(PQ) 5 Tr(QP), and R 1/2 is also introduced. We have t 5 Tr(R 21/2 HBL T LKR 1/2 ) .
(6) Girard (1989) suggested that a large matrix trace could be estimated using the Monte Carlo method. That is, for the matrix Q, Tr(Q) ffi h T Qh, where h is a random number vector that satisfies the standard normal distribution. Then, a randomized estimation of t is given by In an operational 3DVAR data assimilation system, the gain matrix K is not explicitly known and computed. It is necessary to transform all those terms that include K. For a linear system or a weakly nonlinear system, after the assimilation of observations y o , the analysis field x a can be written as where x b is the background field, d is the innovation vector, and d 5 y o 2 Kx b . With the assimilation of perturbed observations, the analysis field is expressed as where Similarly, the perturbation of the observation subset i is defined as dy o i 5 p i dy o , and the EVR caused by the observation subset i can be expressed as OCTOBER 2018 Z H A N G E T A L .
The EVR of the variable of interest can be obtained from all these known variables. The randomization method with perturbed observations can be applied to both linear and weakly nonlinear systems (Desroziers et al. 2009).
Finally, if the assimilated observations have positive contributions to the analysis, the EVR will be greater than zero and equal to or smaller than the background error variance, that is, Equation (11) is an important diagnostic formula. Whether or not the EVR caused by assimilation of observations is reasonable can be determined in terms of the ratio of Tr(LKHBL T ) to Tr(LBL T ). The percentage of EVR Tr(LKHBL T ) in total background error variance Tr(LBL T ) can be obtained, which can improve understanding of the diagnostic results.
b. GRAPES global 3DVAR system GRAPES is a global and regional NWP system developed by the China Meteorological Administration (CMA; Chen et al. 2008Chen et al. , 2012Xue et al. 2003Xue et al. , 2008. The data assimilation system and the forecast model are the most important components of GRAPES. Since 1 June 2016, the global GRAPES has been used operationally in the Numerical Weather Prediction Center (NWPC) of the CMA (Liu et al. 2016;J.-C. Wang et al. 2017).
The incremental form of the 3DVAR assimilation scheme is used in the data assimilation system (Xue et al. 2008), and the GRAPES 4DVAR is currently under development Zhang and Liu 2017). The objective function is as follows: This assimilation system employs the same heightbased terrain-following vertical coordinate and atmospheric state variables as those of the GRAPES model. Variables are staggered in the vertical direction following the Charney-Phillips approach (Charney and Phillips 1953;Cullen et al. 1997). An Arakawa C grid staggering (Arakawa and Lamb 1977) is applied in the horizontal direction. In the operational system, there are 60 levels in the vertical direction from the surface to the model top (about 37.7 km) and the horizontal resolution is 0.258. There is a single iteration in the outer loop. The inner loop ends when the norm of the gradient obtained in the minimization process is 10 23 of the original norm or the maximum number of iterations is 150. Because of the limitation of computational resources, 0.58 horizontal resolution was used in this study.
To effectively reduce the tremendous cost of minimization and make convergence easier, a series of transforms (Wang et al. 2012;R.-C. Wang et al. 2015;Gong et al. 2016) were performed, such as physical variable transform, balance constraints, and preconditioning transform. J.-C. Wang et al. (2014) used the National Meteorological Center (NMC) method (Parrish and Derber 1992) to re-estimate the background error covariance of the GRAPES 3DVAR system.
There are the corresponding observation operators for various observations in the GRAPES 3DVAR system. The observation error covariance matrix is assumed to be a diagonal matrix and is mainly obtained by inheriting the previous operational settings Tao et al. 2006Tao et al. , 2008Zhuang et al. 2006) or using the innovation vector method (Xue et al. 2013;J.-C. Wang et al. 2015). There are some quality control schemes for observations in the GRAPES assimilation system, such as the background check, the variational quality control for conventional observations (Hao et al. 2013), and the radiosonde observation blacklisting check ). Quality control for satellite radiance data is performed from five aspects: channel selection (G. Wang et al. 2014a), outlier elimination (G. Wang et al. 2014b;Wang et al. 2016), bias correction (Han 2014), cloud detection (Chen and Li 2011;Zhu et al. 2013), and data thinning (G. Wang et al. 2017).

c. Observations
At present, the operational assimilation system can assimilate upper-air observations of wind, temperature, and humidity by radiosondes (TEMP); surface pressure from land stations and ships (SYNOP); wind data from aircraft measurements (AIREP); atmospheric motion vector data inferred from cloud motions detected by geostationary satellites (AMV); refractivity radio occultation data from the Global Navigation Satellite System (GNSS-RO); and radiance data from Advanced Microwave Sounding Unit-A measurements (AMSU-A) carried by polar-orbiting satellites. In this paper, clearsky AMSU-A radiances are assimilated, including the observations over the ocean of channels 5 and 6, and the global observations of channels 7-10. Many studies have shown that the forecast accuracy of the GRAPES can be improved by assimilating these observations (Zhang et al. 2004;Zhang et al. 2007;Ding et al. 2010;Li et al. 2012;J.-C. Wang et al. 2015;Zhuang et al. 2014;Liu and Xue 2014) As a result of differences in the various measurement methods, the spatial distribution of each observation type has its own characteristics. The observations at 0000 UTC 20 June 2014 were taken as an example to show the horizontal distribution of various observations assimilated by the GRAPES 3DVAR system (Fig. 1) during a 6-h assimilation window. TEMP, SYNOP, and AIREP data are mainly distributed over land, and more observations are available in the Northern Hemisphere than in the Southern Hemisphere and the equatorial region. TEMP and SYNOP data are particularly dense in China and its surrounding area, especially in East China. However, more AMV data are collected over the ocean than over land, and most of the AMV data are concentrated between 608S and 608N. GNSS-RO and AMSU-A data are widely distributed over land and the ocean. Table 1 lists the number of assimilated observations for each observation type shown in Fig. 1. Although the vertical resolutions of TEMP and GNSS-RO data are higher than other observations, there are major differences between the two. TEMP data only cover land and contain three observation types (wind, temperature, and humidity) that can be assimilated, whereas GNSS-RO data are distributed over both land and ocean and are assimilated as the refractivity information. Thus, there are more TEMP data than GNSS-RO data in the Northern Hemisphere, and the opposite is found in the Southern Hemisphere.

Stability and rationality of the diagnostics
The randomization method with perturbed observations can only provide approximate estimations. Comparative experiments need to be conducted for any FIG. 1. Horizontal distributions of all assimilated observations at 0000 UTC 20 Jun 2014. The black rectangle denotes the target verification domain (08-558N, 708-1368E). TEMP, observations from radiosondes; SYNOP, surface pressure from land stations and ships; AIREP, wind data from aircraft measurements; GNSS-RO, refractivity data of radio occultation from the Global Navigation Satellite System (GNSS); AMV, atmospheric motion vector; and AMSU-A, satellite microwave sounder radiances.
terms that may affect the results to ensure the stability and rationality of the estimation. These experiments were performed for 0000 UTC 20 June 2014 over the region (08-558N, 708-1368E) with a focus on wind, temperature, and specific humidity at standard isobaric surfaces.

a. Appropriate number of perturbations
When using the Monte Carlo method to estimate the influence-matrix trace (Desroziers et al. 2005), multiple perturbations are required to obtain stable estimation. It is critical to determine the number of perturbations that can ensure the stability of estimation results but also preserve computational resources. In this section, the impact of the perturbation number on the estimation is discussed and the appropriate number of perturbations is determined based on all observations and different observation types. Figure 2 presents the percentage of the estimate of the error variance reduction obtained with various numbers of perturbations compared with the mean estimate obtained with 50 perturbations for all the assimilated observations (Fig. 2a) and different observation-type subsets (Fig. 2b). Here, the average of 50 different estimates is used as a reference, which can be denoted by t 50 , and the mean estimate of k perturbations (k # 50) can be denoted by t k . The closer the percentage of t k in t 50 is to 100%, the more stable the mean estimate t k , and vice versa. Figure 2a displays the percentage results of all the assimilated observations. When k . 10, the mean estimates of all the observations for wind speed and specific humidity are within 610% of t 50 ; when k . 15, the mean estimates for wind, temperature, and specific humidity stay within 65% of t 50 as k is increased. Figure 2b presents the results for various observation types. When the observations are divided into subsets of different types, the mean estimates for each observation subset cannot converge as fast as those for all observations. For k . 15, the mean estimates for these observation subsets, except AMSU-A, are within 610% of t 50 for the same observation subset, but mainly outside 65% of t 50 as k is continuously increased.  2. Percentage of the estimate of the error variance reduction obtained with various numbers of perturbations compared with the mean estimate obtained with 50 perturbations for (a) all assimilated observations and (b) different observation-type subsets at 0000 UTC 20 Jun 2014. The abscissa is the number of perturbed analyses used to obtain the estimate. The numbers shown in each panel indicate pressure levels with the unit of hPa: W, wind speed; T, temperature; Q, specific humidity; and other abbreviations are as in Fig. 1. b. Rationality and flexibility of the diagnostics When the randomization method with perturbed observations are first introduced (Desroziers et al. 2005), it is suggested that the operator L can be a variable transform operator or a space projection operator. In the study of observation contributions to EVR in the verification domain, the operator L consists of variable transform operator (P) and space projection operator (G) simultaneously. Equation (7) can be written as The observation impact study can be conducted in a more flexible way with more details using Eq. (13). For example, in the observation impact study, the observations and the analysis variables affected can be in the same region or in different regions. Using an ideal model, Desroziers et al. (2005) investigated the impacts of observations inside and outside the verification domain. In the present study, three comparative experiments with assimilation of TEMP data were designed (Exp_in, Exp_out, and Exp_all; Table 2) in the operational GRAPES global 3DVAR assimilation system.
The vertical distributions of EVR for different analysis variables in the three experiments are presented in Fig. 3, and clearly shows that the EVR in Exp_out is much smaller than those in Exp_in and Exp_all, although there are 485 sounding stations outside the verification domain. Results in Exp_in are only slightly lower than those in Exp_all, suggesting that observations in the verification domain make the largest contribution to background EVR in the GRAPES 3DVAR system. This supports the findings of Desroziers et al. (2005). The EVR distributions (not shown) also indicate that the observations in the verification domain have the largest impacts on the analysis.
These results are consistent with the 3DVAR theory and indicate that the diagnostic tools are consistent. The findings also reflect the following characteristics of 3DVAR method: the spatial spreading of information from observation points to analysis grids can only be performed in a finite domain surrounding observation points since the B matrix is usually derived from assumptions of homogeneity and isotropy, and without flow-dependent attributes, and the decorrelation length is limited within a few hundred kilometers.

Impact estimation of observations for various regions
At present, GRAPES has been running operationally in the NWPC (CMA). The impacts of various observations in different regions will be helpful for further improvement of assimilation technique and better application of observations. In the following, daily analyses at 0000 UTC throughout June 2014 are used as examples to analyze the impacts of different observation types for various regions. Four groups of experiments were conducted: Exp_nh (the Northern Hemisphere, 208-908N), Exp_sh (the Southern Hemisphere, 208 to 908S), Exp_eq (the equatorial region, 208S-208N), and Exp_cn (China and its surrounding area, 08-558N, 708-1368E). China and its surrounding area were examined because TEMP and SYNOP data are densely distributed and the terrain is very complicated in this region.

1) IMPACTS OF ALL OBSERVATIONS ASSIMILATED
To investigate the impacts of all observations assimilated in the GRAPES 3DVAR system, Figs. 4a-c display the total EVR and the percentage of [Tr(LKHBL T )] compared with the background error variance Tr(LBL T ) for different variables in the three zones at 0000 UTC during 1-30 June 2014. For the wind speed analysis, the maximum EVR of the three zones was mainly between 200 and 300 hPa, which is basically the same as the locations of their maximum values of Tr(LBL T ). Below 300 hPa, the EVR was similar in the Southern and Northern Hemisphere and considerably larger than that in the equatorial region. Above 300 hPa, however, the EVR in the Northern Hemisphere was larger than that in the Southern Hemisphere. From 300 to 200 hPa, the contribution to wind speed grew faster in the equatorial region than in the Southern Hemisphere and thus total EVR in the equatorial region was larger than that in the Southern Hemisphere above 250 hPa. The percentage results (lines in Fig. 4a) show that total EVR in the Northern Hemisphere reached 25%-50% of its Tr(LBL T ), which was the largest of the three regions. Below 250 hPa, the percentage of the Southern Hemisphere was between the percentages of the other two regions, but above 250 hPa, less than the percentages of the two regions. The results presented in Fig. 4a may be related to the high number of observations of SYNOP, TEMP, AMV, and AIREP in the Northern Hemisphere. In addition, the number of AMV (5999) and AIREP (2598) observations in the equatorial region is larger than that of AMV (4753) and AIREP (2293) (Fig. 4d) in the Southern Hemisphere, and AMV data are mainly distributed at 300-200 hPa.
The percentage changes in temperature (Fig. 4b) varied among zones with the largest values in the Northern Hemisphere, followed by the Southern Hemisphere, and the smallest values in the equatorial region. This result is consistent with the number of mass field observations in the three regions.
The total EVR and Tr(LBL T ) for the specific humidity analysis (Fig. 4c) were mainly below 300 hPa. Therefore, percentage results for specific humidity below 300 hPa were further examined. Figure 4c shows that below 300 hPa, where water vapor is concentrated, the minimum percentages of the EVR for specific humidity were in the equatorial region and the EVR percentages in the Southern Hemisphere were slightly larger than those in the Northern Hemisphere. This is probably because there were more GNSS-RO data assimilated in the Southern Hemisphere (11 108) than in the Northern Hemisphere (9126) or because it was summer in the Northern Hemisphere during the study period and there were more uncertainty in the moist field. However, in these three regions, the maximum percentages for specific humidity were less than 20% and the mean values were also less than those for wind speed and temperature. This was mainly because humidity information assimilated in the operational assimilation system were provided by TEMP and GNSS-RO data and less than wind and temperature information provided by various observations.
The results of the wind and temperatures analyses are in good qualitative agreement with previous studies indicating that reanalysis errors are smaller in the Northern Hemisphere than those in the Southern Hemisphere (Kalnay et al. 1996;Kistler et al. 2001). The GRAPES 3DVAR provided a similar finding as ECMWF 4DVAR (Cardinali et al. 2004) that the average contribution of all observations to the analysis of the Northern Hemisphere was larger than that of the Southern Hemisphere. There is a difference regarding observation impacts in the equatorial between the GRAPES 3DVAR and ECMWF 4DVAR that the average contribution in the equatorial regions in the GRAPES 3DVAR was smaller than that in the Northern and Southern Hemispheres, whereas in ECMWF 4DVAR, the average contribution of all observations in the equatorial regions was larger than that in the Northern Hemisphere. This is closely related with the difference in the assimilation method and B matrix used by the two NWP centers. It has been previously indicated that there are more complicated mass-wind balance relationships in the equatorial region than at higher latitudes (Zagar et al. 2004;R.-C. Wang et al. 2015). Thus, the NMC method does not yield good estimates for the desired background error statistics (Errico et al. 2015), leading to inefficient data assimilation in the equatorial region. Therefore, it is necessary to improve the B matrix from the balancerelated and multivariate aspects, use a more complicated assimilation method, or assimilate more observations related to humidity.

2) IMPACTS OF VARIOUS OBSERVATION TYPES
Different observation types have different contributions to the total EVR. Figure 5 shows the percentage of wind speed EVR compared with the total EVR at various isobaric surfaces caused by different observation types at 0000 UTC during 1-30 June 2014. In the Southern Hemisphere at 150 hPa and below (Fig. 5a), in terms of the average contribution to the wind speed from large to small, observation subsets were as follows: AMV, AMSU-A, GNSS-RO, TEMP, SYNOP, and AIREP. The contribution of SYNOP was mainly below 500 hPa, and the largest contributions of GNSS-RO and AIREP occurred at around 200 hPa. Above 150 hPa, only three observation subsets showed important contributions, in descending order: TEMP, AMSU-A, and GNSS-RO. In the Northern Hemisphere (Fig. 5c), the types of observation subsets that contribute to the wind speed analysis were the same as those in the Southern Hemisphere, but the percentages of contributions were different. The percentages of AMV, GNSS-RO, and AMSU-A decreased, and those of SYNOP, TEMP, and AIREP clearly increased in the Northern Hemisphere as the number of conventional observations increased. TEMP data became the largest contributor to the improvement of wind analysis. In the equatorial region (Fig. 5b), AMV made the largest contribution at 150 hPa and below, accounting for more than 42% of the total EVR, with the remainder from TEMP and AIREP. In contrast, above 150 hPa, TEMP accounted for more than 93% of the total EVR. Figure 5b shows that in the equatorial region, the contributions of SYNOP, GNSS-RO, and AMSU-A data were very small, which was different to the findings for the Northern and Southern Hemispheres because of the different dynamic balance constraints (R.-C. Wang et al. 2015) and because of fewer SYNOP, GNSS-RO, and AMSU-A data in the equatorial region (Fig. 4d). This indicates that data density is one of the major reasons for the small impact. For the temperature EVR analysis, in the Southern Hemisphere (Fig. 6a) where conventional observations are sparse, GNSS-RO data were the largest contributor in most cases. The contribution of GNSS-RO increased with height to more than 60% above 500 hPa. The AMSU-A data had the second largest contribution, with a minimum value still larger than 20%. The SYNOP data were the main contributor at the lowest atmospheric levels. The TEMP data had some contribution at all isobaric surfaces but the maximum value was less than 15%. The Northern Hemisphere was different from the Southern Hemisphere because of the higher density of conventional observations. TEMP data were the largest contributor to temperature EVR in the Northern Hemisphere, whereas the second largest contributor was SYNOP data below 500 hPa or GNSS-RO data above 500 hPa. The contribution of AMSU-A data was about 15% at all isobaric surfaces (Fig. 6c). The impacts of AMV and AIREP on temperature were indirect and small in both hemispheres because only wind data were assimilated from these subsets and their impacts on the temperature analysis can only be realized through the cross correlations of the GRAPES 3DVAR B matrix. In the equatorial region (Fig. 6b), the averages of percentage contributions of the observation subsets decreased in the order TEMP . AMSU-A . SYNOP . GNSS-RO below 300 hPa. In contrast, the GNSS-RO becomes the largest contributor to temperature EVR, followed by AMSU-A and TEMP above 300 hPa.
The specific humidity EVR contributions (figure not shown) in the three regions below 300 hPa showed that the same types of observations that contribute to the specific humidity analysis and the orders of the percentage contributions of these observations is similar, with the largest contribution of GNSS-RO, followed by that from TEMP, AMSU-A, and SYNOP. The percentage values of the same type observations in the three regions, however, were different. For example, below 300 hPa, the contributions of TEMP were less than 10% in the Southern Hemisphere, 10%-20% in the equatorial region, and more than 25% in the Northern Hemisphere.
Here, the contributions of TEMP and AMSU-A to temperature supported some of the following findings of the French operational ARPEGE 4DVAR assimilation system (Desroziers et al. 2005). The contribution of TEMP data, in descending order, was largest in the Northern Hemisphere and smallest in the Southern Hemisphere, whereas the opposite was found for the AMSU-A data. No covariances between humidity and other fields were considered in the B matrix of the GRAPES 3DVAR and ARPEGE 4DVAR systems. In the ARPEGE 4DVAR system, if the assimilated observations did not contain humidity information, these observations had no effect on the humidity analysis, such as SYNOP. However, the observation operator of the GRAPES 3DVAR takes it into account that temperature changes can cause humidity adjustments, so AMSU-A and SYNOP data, which do not contain the humidity observation information, have impacts on humidity analysis. In addition, those observations related to mass fields (such as surface pressure, AMSU-A, and GNSS-RO data), can influence the wind analysis in the extra-equatorial region, but have little effect on the wind analysis in the equatorial region because of lacking proper mass-wind dynamic balance relations in this region. These conclusions indicate that the B matrix plays an important role in variational data assimilation systems.

3) IMPACT OF SINGLE OBSERVATION
The number of observations of the same type is different in the three regions, so it is necessary to estimate the impact of single observation. Figures 7a-c shows the mean EVR for the wind speed analysis of single observation of the same type in the different regions. The mean impact of single observation for wind speed in the Southern Hemisphere was always larger than that in the Northern Hemisphere at the same isobaric surface. In conjunction with Fig. 4d, it can be seen that the average number of the same types of observations in the Northern Hemisphere was larger than that in the Southern Hemisphere, except for GNSS-RO and AMSU-A data. These results indicate that low-impact data points occur in data-rich areas and high-impact data points are in data-sparse regions, which is consistent with the findings of ECMWF (Cardinali 2013). In addition, the EVR change of single observation in the Northern and Southern Hemispheres was significantly different from that in the equatorial region, and the impact of single observation in the equatorial region is smaller than that in the Northern and Southern Hemispheres in most cases below 300 hPa, but considerably larger than that in the Northern Hemisphere above 300 hPa. The conclusion above may be related to the dynamic balance constraint scheme of the B matrix and observation density.
Similar features were found in the contribution of a single observation for the temperature analysis in the three regions (figure not shown). It is noteworthy that although the impact of all AMSU-A data for the temperature (Fig. 6) was evident in the three regions, the impact of single AMSU-A was very small, which is consistent with the conclusions from ECWMF (Cardinali 2013). FIG. 6. As in Fig. 5, but for the temperature analysis.

OCTOBER 2018
Z H A N G E T A L . Figure 8 displays the average EVR of single observation for the specific humidity analysis. Although the contributions to the specific humidity analysis again mainly came from TEMP and GNSS-RO data in the three regions, the differences between the contributions of single TEMP and single GNSS-RO were obvious. In the isobaric surface where water vapor is more concentrated (925-500 hPa), the contribution of single GNSS-RO data was considerably greater than that of TEMP in the equatorial region and the Northern Hemisphere where the density of TEMP data is large. In contrast, the contribution of single GNSS-RO was close to the contribution of single TEMP in the Southern Hemisphere where TEMP data are sparse.

b. Situation in China and its surrounding area
Vertical profiles of average EVR by different observation types in the verification domain at 0000 UTC during 1-30 June 2014 are shown in Fig. 9. It is shown that TEMP makes the largest contribution to wind speed EVR and the maximum reduction occurred at 10 hPa, whereas AIREP and AMV made the second largest contribution and their maxima were at around 200 hPa. The above three observation types can reduce the background wind speed error by half. The contribution of observation subsets to temperature EVR (Fig. 9b) decreased in the order TEMP . GNSS-RO . AMSU-A . SYNOP data. After assimilating all observations, the background temperature error variance could be reduced by more than 60% at 700 hPa and above, of which the TEMP data contributed more than two-thirds. For specific humidity EVR (Fig. 9c), TEMP and GNSS-RO data made the largest contributions, mainly below 300 hPa with the maximum reduction occurring at 700 hPa. However, the total EVR reached less than 20% of the background error variance of specific humidity. Figure 10 presents the average EVR of single observation in each observation subset at various isobaric surfaces. Figure 10a indicates that although the TEMP subset made the largest contribution to wind speed EVR, the contribution of single TEMP observation was less than that of single AMV observation in most cases. This is likely because the average density of TEMP in China is larger than that of AMV, and TEMP observations only cover land, while AMV data are more distributed over the ocean. The largest contributions of single TEMP, AMV, and AIREP data all occurred at around 200-250 hPa, which is because the maximum value of wind speed error variance in the B matrix below 50 hPa was concentrated at 200-250 hPa and the horizontal correlation length scale of the related state variables in B was larger at the higher level. Figure 10b suggests that single GNSS-RO observation had the largest contribution to the temperature analysis above 300 hPa and single TEMP observation had the largest contribution below 300 hPa. In the middle and lower troposphere, although the contribution of the SYNOP subset was not distinct, the contribution of single SYNOP was more important mainly because most of the SYNOP observations were concentrated below 700 hPa. The contribution of single observation to specific humidity EVR was smaller than that to wind speed and temperature EVR. Single GNSS-RO observation had the largest contribution to the specific humidity analysis, followed by single TEMP. Single observation of other subsets made little contribution to the specific humidity analysis.

Conclusions and discussion
Observation impacts on NWP is highly dependent on assimilation algorithm, prediction system, and observation sources. In the present study, the EVR estimation method proposed in Desroziers et al. (2005) was applied in the global GRAPES operational 3DVAR system to evaluate impacts of observations on the analysis results for various regions of the world. The major conclusions are as follows.
In the GRAPES global 3DVAR operational system, 16 perturbed analyses are necessary to obtain a stable estimation of the EVR and also sufficient for estimating the EVR of every observation type subset. To investigate the impacts of all assimilated observations on the analysis, the total EVR was compared with the background error variance in different regions. For the wind FIG. 8. As in Fig. 7, but for the specific humidity analysis.

OCTOBER 2018
Z H A N G E T A L .
speed and temperature analysis, the percentage of total EVR was larger in the Northern Hemisphere than that in the Southern Hemisphere, because there was a greater amount of TEMP, AIREP, and SYNOP data in the Northern Hemisphere. This result is consistent with Kalnay et al. (1996) and Kistler et al. (2001) where it is found that the reanalysis fields of the Northern Hemisphere are more accurate than those of the Southern Hemisphere and similar conclusions have been also obtained by ECMWF (Cardinali et al. 2004). However, for the specific humidity analysis, the total EVR percentage of the Southern Hemisphere is larger than that of the Northern Hemisphere at the isobaric surfaces where water vapor is concentrated, and the percentage of the equatorial region is the smallest in most cases.
The contributions of different observation types to the total EVR were investigated in the global GRAPES operational 3DVAR system. In the Southern Hemisphere, 60%-90% of the EVR for wind speed, temperature, and specific humidity was attributed to space-based GNSS-RO, AMSU-A, and AMV data in most cases. In the equatorial region, more than 50% of the EVR was attributed to space-based observations except for a few isobaric surfaces. In the Northern Hemisphere, however, more than 50% of the EVR originated from TEMP, AIREP, and SYNOP data contributions; the contribution of AMSU-A data was not dominant. The results show that GNSS-RO and TEMP data were the most important observation types in GRAPES, which agrees with the results of OSEs   and other verification methods (Liu et al. 2016). For temperature analysis, the contribution of TEMP data in the Northern Hemisphere was greater than that in the Southern Hemisphere, whereas the opposite was shown for AMSU-A data, which was consistent with the findings under the French operational ARPEGE 4DVAR assimilation system (Desroziers et al. 2005).
The estimation of impact of single observation indicated that low-impact data points occurred in data-rich regions, whereas high-impact data points were located in data-poor regions. Although the AMSU-A data have a big impact on the temperature analysis, the impact of a single AMSU-A observation is very small. The conclusions are consistent with the findings from ECWMF (Cardinali 2013).
In China and its surrounding area, TEMP made the largest contribution to the analysis, followed by GNSS-RO, whose contribution to specific humidity EVR was equivalent to that of TEMP.
All experiments in this paper were conducted under the assumption that the observation error and background error covariance are reasonably close to their true values. Some sensitivity experiments were performed. When the observation error variances of all observations were magnified 3 times and were not optimal, the contribution of every observation subset was smaller than that obtained with the original observation error variances, but the order of contributions of different observation subsets was unchanged (figures not shown). Brousseau et al. (2014) found that when climatological B was used in their 4DVAR assimilation system, the EVR nevertheless provided reasonable information for the analysis. These studies indicated that when B and R are not optimal, the EVR obtained with the randomization method can still provide useful diagnostic information about the assimilation process, but the closer B and R get to the truth, the more accurate the estimated EVR. In this paper, many diagnostic results illustrate the weaknesses of the operational GRAPES system. First, the contributions of all observations in the equatorial region were less than those in the extra-equatorial region. This may be because there are more complicated balance relationships in this region, but these important balances may not be captured by the B matrix in 3DVAR, and it is thus difficult for the observations to have large impacts on the analysis field in the equatorial region. Second, the contribution of satellite data in the GRAPES 3DVAR operational system is very small, compared with the diagnostic results in the other operational centers (Kistler et al. 2001;Kelly and Thépaut 2007;Cardinali 2013), especially in humidity analysis. These results are consistent with the diagnostic results obtained based on other verification methods in GRAPES 3DVAR (Liu et al. 2016). Therefore, future work is needed to improve the representation of background error and observation error covariance matrices, especially in the equatorial region, in order to extract maximum information from the observations. In addition, it is essential to increase the number and type of satellite data under cloud conditions assimilated in the global GRAPES operational system. It must be emphasized that the order in which the observations affect the analysis is not completely consistent with the order in which the observations affect the forecast, which has been pointed out in previous studies (Langland and Baker 2004;Cardinali , 2013. Diagnosing will be performed in the future to evaluate the impacts of observations on forecasts under the global GRAPES system.
In summary, although there are fewer observations assimilated in the global GRAPES operational system compared with other NWP operational centers, the above diagnostic results are very positive for the development of the GRAPES assimilation system.