The authors suggest a three-parameter bounded distribution from the family of mixed gamma distributions for characterizing the probability density distributions of fractional total and low cloud cover over the global oceans. The authors derive both the continuous form of this distribution and its discrete counterpart, which can be directly applied to cloud cover reports. The distribution is applied to the cloud cover characteristics reported by voluntary observing ships (VOS) for the period from 1950 to 2011 after filtering nighttime observations with poor lunar illumination. The suggested distribution demonstrates a high goodness of fit to the data and good skill in capturing probability distributions with different shapes. The authors present seasonal climatologies of the parameters of the derived distribution for the chosen 60-yr period and demonstrate that applying the PDF-based concept to the analysis of cloud cover allows identification of areas where similar mean cloud amounts can be produced by probability distributions with very different shapes. The roles of the different parameters of the distribution in producing the observed cloud conditions in different regions of the World Ocean are discussed. The application of the derived probability distribution allows for accurate estimation of the percentiles of the distribution, which represent the probabilities of specific cloud conditions. These probabilities are presented for both total and low cloud cover, as well as for daytime and nighttime. The authors also discuss the applicability of the suggested distribution for the validation of different cloud cover data products over the globe and the prospects of additional applications.
Accurate estimation of cloud cover over the ocean is critically important for many applications. Clouds largely determine shortwave and longwave radiation fluxes over the ocean, and the cloud fraction is a key parameter of many parameterizations of radiative fluxes (Zillman 1972; Reed 1977; Dobson and Smith 1988; Malevsky et al. 1992; Gulev 1995; Josey et al. 2003; Fairall et al. 2008; Dupont and Haeffelin 2008; Kalisch and Macke 2008, 2012; Hanschmann et al. 2012; Bumke et al. 2014). Clouds play an important role in regulating key climate feedbacks, representing one of the most effectively used parameters for diagnosing the climate sensitivity (Curry et al. 1996; Bony et al. 2004, 2006, 2015; Fasullo and Trenberth 2012; Taylor et al. 2013; Bellomo et al. 2014). Clouds determine the earth’s albedo (Stephens et al. 2015), as well as the occurrence of dimming and brightening periods (Wild et al. 2007; Dallafior et al. 2015). Variability in cloud cover over the oceans is important to enable a better understanding of the mechanisms underlying leading climate modes, such as El Niño, monsoons, ITCZ shifts, midlatitudinal oscillations (i.e., the NAO, PNA pattern, and PDO), and Arctic climate changes, all of which have cloud cover signatures (Lietzke et al. 2001; Bedacht et al. 2007; Previdi and Veron 2007; Kay and Gettelman 2009; Chiacchio and Wild 2010; Chiacchio et al. 2010; Mendoza et al. 2014; Parding et al. 2014; Radley et al. 2014; Schwartz et al. 2014; Chernokulsky et al. 2017). Analysis of cloud cover over the oceans may also help us to better quantify and understand the mechanisms behind regional air–sea interactions in the western boundary currents (Minobe et al. 2008; Hand et al. 2014) and the upwelling regions (e.g., Hernández et al. 2012).
All mentioned processes are associated not only with the changes in the mean cloud amount (the measure that is widely used), but also with changes in the probability distributions of cloud cover. In the midlatitudes, the same monthly mean can originate from very different cloud regimes: it can be a continuous period of broken clouds (40%–60% cloud fraction) or short, alternating periods of clear skies and overcast. In this case, monthly mean (which can be similar in both cases) will tell a little about the changes and underlying mechanisms. Similar examples can also be found in the tropics and other areas.
To derive probability distributions of cloud cover, different data can be used. Historical visual observations made by voluntary observing ships (VOS) are available from the International Comprehensive Ocean–Atmosphere Data Set (ICOADS; Woodruff et al. 2011; Freeman et al. 2017). Global cloud data are also available from different satellite missions assembled in the International Satellite Cloud Climatology Project (ISCCP) dataset (Rossow and Schiffer 1991, 1999; Rossow et al. 1991; Rossow and Dueñas 2004), as well as in other datasets (Ackerman et al. 2008; Frey et al. 2008; Foster and Heidinger 2013; Karlsson et al. 2013; Stubenrauch et al. 2013; Heidinger et al. 2016; Holz et al. 2016). Satellite data are quite accurate and relatively homogenous in terms of sampling, but they are generally not available before 1984. Moreover, these data also include a number of biases and random errors (Evan et al. 2007; Cermak et al. 2010; Norris and Evan 2015). Cloud cover reanalysis products have become more accurate as models have advanced and achieved higher resolutions; however, they still remain inaccurate and require further improvements and validation (Bedacht et al. 2007; Chernokulsky and Mokhov 2012; You et al. 2014; Free et al. 2016; Dolinar et al. 2016; Calbó et al. 2016; Enriquez-Alonso et al. 2016; Liu and Key 2016).
Visual VOS observations provide the longest record, formally going back to the mid-nineteenth century. These data are characterized by spatially and temporally heterogeneous sampling (e.g., Gulev et al. 2007a,b) and are subject to relatively high observational errors that can only be partially corrected for (see, e.g., Warren et al. 1986; Hahn et al. 1995; Hahn and Warren 1999; Norris 1999). When these datasets are applied for different climate purposes, including those mentioned above, their intercomparison starts to become an important issue. A similar problem appears with the analysis of the model experiments (historical or scenario). So far, most comparative assessments, including those cited above, are limited to the mean cloud amounts, sometimes with a consideration of the low cloud amounts. Thus, accurate intercomparisons and diagnostics of model experiments also would benefit from the explicit knowledge of probability distributions of cloud cover. Moreover, probability distributions, when properly justified, can help to quantify and potentially correct the biases in observational datasets. This relates first to sampling uncertainties (Gulev et al. 2007a,b) and observational errors (Hahn et al. 1995; Hahn and Warren 1999; Norris 1999) in VOS observations, but is also relevant to satellite data (Norris and Evan 2015). Finally, the use of probability-based approaches to the analysis of cloud cover is very natural, as the observed cloud fraction always represents a mixture of different weather regimes that are closely associated with cloud regimes (Hahn et al. 2001; Rossow et al. 2005; Gordon et al. 2005; Gordon and Norris 2010; Haynes et al. 2011; Mason et al. 2014; Rémillard and Tselioudis 2015). Remarkably, analysis of the probability density functions (PDFs) of different cloud-associated variables is widely used for building parameterizations of clouds (e.g., Settle and van de Poll 2007; Alexandrov et al. 2010; Kawai and Teixeira 2012).
With all these issues in mind, we considered a justification of a widely applicable probability distribution of cloud cover to be a challenging task. Falls (1974) was likely the first to apply a two-parameter beta distribution to fractional cloud cover. Later, Bean and Somerville (1981) applied the beta distribution to a global, 45-month-long satellite dataset and described the global distribution of the distribution parameters. However, the beta PDF is bounded by 0 and infinity at 0 (clear skies) and/or 1 (overcast conditions). This property does not fit the observed frequency distributions of cloud cover fraction, especially if clear-sky and/or overcast conditions coexist with moderate cloud fractions. Thus, the development of a universal probability distribution that accounts for all possible conditions is challenging and allows for another dimension in the analysis of cloud cover, as with other meteorological variables, such as wind speed (Monahan 2006a,b) and turbulent fluxes (Gulev and Belyaev 2012).
Here, we present a new probability distribution of the fractional cloud cover values reported by VOS. In section 2, we describe the shortcomings of visual cloud observations and demonstrate the importance of the development of cloud cover PDF by showing several examples. In section 3, we present a three-parameter continuous distribution and its discrete analog, which can be applied to visual observations of cloud cover reported in octas. Section 4 presents the estimated parameters of the distribution over the global ocean. Section 5 concludes the paper and presents a discussion of potential applications of the probability distributions in the analysis of fractional cloud cover.
2. Visual cloud cover data over the ocean
a. Data description and preprocessing
We used cloud amounts for the period 1950–2011, estimated visually by VOS officers, that are available as part of the ICOADS (Freeman et al. 2017). The sampling density of cloud characteristics in VOS (Fig. 1a) is similar to that of other meteorological variables (Gulev et al. 2007a,b). The midlatitudes of the North Atlantic and North Pacific, as well as the major shipping routes, are characterized by the sampling densities from 500 to more than 1000 reports per season per 5° grid cell. The sampling density drops considerably south of 40°S, where the seasonal number of observations per season per 5° cell varies from less than 5 to 30; approximately 1.2% of the 5° grid cells are unsampled 20% of the time. The number of reports increased by 4–5 times during the 1950s and 1960s and remained relatively stable from the late 1960s to the late 1980s (Fig. 1b), as for most of the VOS variables (Gulev et al. 2003, 2007a; Freeman et al. 2017). In the 1990s, the number of reports decreased due to the massively widespread use of automated packages on VOS that measure wind speed, pressure, air temperature, and SST. The portion of the reports that also include information on low cloud cover increased from 80% to 90% during 1950–1980 and then dropped to 75%–80% (Fig. 1b). However, for most regions, analysis of the low cloud cover can be performed, and the accuracy matches that of the total cloud cover closely.
In addition to the ICOADS quality control checks (Woodruff et al. 2011; Freeman et al. 2017), we excluded from the analysis all cases in which the other observed parameters suggest unrealistic cloud cover (e.g., when precipitation under clear skies was indicated). Additionally, cases in which the low cloud amount was reported to be higher than the total cloud amount were excluded from the analysis. Visual cloud cover is estimated in fractions of unity ranging from 0 to 1. In practice, VOS report cloud cover in octas (eighths) ranging from 0 to 8, according to the WMO Manual on Codes (WMO 1974). Some national meteorological services have used a coding system based on tenths instead of octas during certain periods. The conversion between the two systems is accomplished using the coefficient 1.25. WMO (1974) also includes the code “9” (“sky obscure, cloud cover cannot be observed”). In parameterizations of shortwave radiation (Lumb 1964; Lind et al. 1984), these conditions are considered to be close to overcast. Here, we also consolidated all reports with octa 9 in cases when precipitation was reported by the ICOADS weather code, with those reporting octa 8. Resetting N = 9 to N = 8 resulted in an increased number of reports with overcast by less than 2%, with major contribution in the subpolar regions and in the Southern Ocean, where, locally, an increase can be higher than 5%. The nighttime observations of cloud cover are less accurate than those made during daylight hours. Hahn et al. (1995) first suggested an algorithm for filtering observations performed under insufficient lunar illumination. It is based upon the moonlight brightness model of Krisciunas and Schaefer (1991) and the lunar phase functions (Hapke 1963, 1971). We adopted this algorithm in our computations, using local time and the associated solar altitude to identify the nighttime hours. Following Hahn et al. (1995), we applied a solar angle of <−9° and the relative lunar illumination of ≤0.11 to filter potentially erroneous reports. This resulted in the elimination of 34.4% and 31.7% of the total number of reports in January–March (JFM) and July–September (JAS), respectively. These numbers are a little bit higher, compared to those of Hahn et al. (1995). However, Hahn et al. (1995) considered somewhat different definitions of the seasons [December–February (DJF) and June–August (JJA)] and also a much shorter period of observations. For instance, the fraction of “night” reports in our dataset is also higher, compared to Hahn et al. (1995).
We estimated the cloud cover statistics and probability distributions for 5° grid cells north of 40°S. While the chosen spatial resolution is coarser than the 1° or 2° grid cells traditionally used for the analysis of VOS data (Josey et al. 1999; Gulev et al. 2007a,b; Berry and Kent 2009; Gulev and Belyaev 2012), it guarantees a higher number of observations available for the analysis (Fig. 1). Note that previous analyses of cloud data from VOS (Warren et al. 1986; Hahn and Warren 1999; Eastman et al. 2011; Eastman and Warren 2014) were performed for 10° grid cells in order to provide a sufficient number of observations for the estimation of the occurrence of different cloud types. Gulev et al. (2013) used 5° cells in the North Atlantic to estimate the variability in surface fluxes over the last century. However, even this coarse resolution is not appropriate south of 40°S. Here, we used a grid with a much coarser resolution (10° × 20° cells; Fig. 1). A physical argument for this choice can be made, given the large spatial correlation of cloud cover and other atmospheric parameters in this area. Analysis of spatial correlation functions for the Southern Ocean (not shown) shows that significant correlation holds on the distances equivalent to 10° in the longitudinal direction and to 15°–20° in the latitudinal direction. Romanou et al. (2006) reported for the Southern Ocean an ~2000-km decorrelation spatial scale for the turbulent heat fluxes whose space–time variability is comparable with that for the cloud cover. Also, Schmidt et al. (2017) applied their comparative assessment of glider wind data with gridded products to the area of about 20°. Moreover, due to the proximity to the pole, grid cells in high latitudes are considerably smaller than those in mid- and low latitudes (for the same size in degrees). Hahn and Warren (1999) used twice as large grid cells of 50° when analyzing the Extended Edited Synoptic Cloud Reports Archive (EECRA). Also, the ISSCP grid has larger cells in high latitudes (Rossow et al. 1991). Nevertheless, our results for this region should be considered with caution.
All of the computations were performed for the individual seasons [JFM, April–June (AMJ), JAS, and October–December (OND)]. The climatologies were then derived by averaging the period of 1950 to 2011. This approach accounts for strong temporal changes in observational density (Fig. 1). Also, estimates for individual years allow for the further analysis of variability and a proper performance of comparative assessments. This approach is also commonly accepted for the development of long-term climatologies of marine variables and fluxes with inhomogeneous sampling (e.g., Josey et al. 1999; Gulev et al. 2003, 2007a,b; Berry and Kent 2009).
b. Cloud cover characteristics from the raw VOS data
Visual cloud cover data have been used to assess the climatology and long-term variability in fractional cloud cover and different cloud types (Warren et al. 1986; Norris 1999; Eastman and Warren 2010, 2014; Eastman et al. 2012). Our climatologies of the fractional total cloud cover for JFM and JAS, shown for reference in Figs. 2a and 2b, are generally consistent with the results of earlier analyses (e.g., Warren et al. 1986; Bedacht et al. 2007). The tropical minima are characterized by seasonal means of 1.5–3 octas, and the maxima of 6.5–7.5 octas are observed in midlatitudinal and subpolar regions. The seasonal increases in total cloud cover from winter to summer in the corresponding hemisphere are approximately 0.5–1 octa, which is also consistent with previous assessments. The smallest seasonal cloud cover (0.5–2.5 octas) is observed over the Mediterranean in summer. Low cloud amounts (Figs. 2c,d) are smaller than the total cloud cover by 0.5–1 octa in the tropics and subtropics and by 1–1.5 octas in the middle and subpolar latitudes. The difference between the amounts of total and low cloud cover is somewhat larger during the summer season of each hemisphere.
Figures 3 and 4 show several examples of the probabilistic analysis of the raw VOS data for individual years. Figure 3 shows histograms built for the two regions in the Labrador Sea and Pacific warm pool for selected years for JFM. In the Labrador Sea, winter cloud regimes are dominated by close-to-overcast conditions, with the persistent occurrence of broken clouds and small cloud amounts during the periods of the active Beaufort Sea high. The year of 1969 (Fig. 3a) was characterized by extremely high sampling because Ocean Weather Station B (OWS B; 56.5°N, 51°W) contributed 224–248 reports per month until 1973. However, after the termination of OWS B in 1973, the sampling in this region dropped to 5–30 reports per month per 5° box. Remarkably, the histogram for the year of 2008 does not effectively represent the frequency distributions of the fractional cloud cover due to an insufficient number of observations, while the histogram for 1969 is quite representative. Similarly, in the Pacific warm pool (Fig. 3b), the histogram for the poorly sampled year of 2009 is not representative, contrasting with the histogram for the well-sampled year of 1997.
The problem identified in Figs. 3a and 3b is clearly reflected in Figs. 4a–c, showing the probability of occurrence of selected cloud cover categories for JAS 2006. In most regions outside of the major shipping routes, the distribution is affected by strong spatial noise, which makes the climatological pattern quite obscure and hardly interpretable. When the seasonal sample size is small, empirical histograms can tell very little about the actual frequency distribution, frequently demonstrating anecdotal values for the occurrence of cloud categories. Importantly, in many regions, even a very limited number of observations is unevenly distributed over time (e.g., month or season). Thus, 10–15 observations per season in a 5° box are frequently provided by the two ship passes of 18 to 30 h each through the box (Gulev et al. 2007a). Thus, even these limited numbers of reports are consolidated into one or two very short time slots, which are not representative for characterizing changes of cloud regimes.
Figure 5 shows examples of empirical histograms of the fractional cloud cover in octas reported by VOS for several large ocean regions with different cloud regimes. These histograms were built using data from the 1980s, characterized by relatively homogenous sampling (Fig. 1), and demonstrate the highly variable character of the frequency distributions of cloud cover. The regions of western boundary currents (the Gulf Stream and the Kuroshio) in JFM reflect an increased occurrence of overcast skies, with a limited occurrence of moderate cloud cover conditions and of clear skies (Fig. 5a). Over the Benguela Current and Agulhas leakage regions in the southeastern Atlantic in austral summer (Fig. 5b), the histograms show an increase in the occurrence of clear-sky conditions. The convex shape of the distribution is typical for the western tropical Atlantic (Fig. 5c) and Hawaiian shallow cumulus region (Fig. 5i). In the Gulf of Aden and in the Northern Arabian Sea (Fig. 5d), the distributions are dominated by a higher occurrence of clear-sky conditions and small amounts of cloud cover. Remarkably, in the regions dominated by the Indian monsoon circulation, the histograms display strong changes from the winter (Fig. 5e) to summer (Fig. 5f) seasons. The eastern ocean subtropics, characterized by stratocumulus cloud regimes (Fig. 5h), also demonstrate a decrease of occurrence from large to small cloud fractions. A similar shape, though with higher frequency of the moderate cloud cover, is found in the western Pacific (Fig. 5g), characterized by tropical convection.
3. Probability distribution of fractional cloud cover over the ocean
The occurrence histograms analyzed above for the different regions imply that the distribution of fractional cloud cover can be modeled by the following family of continuous distributions:
where P(x) is a probability density function. The variable x represents fractional cloud cover distributed on the [0, 1] interval, and α and β are, respectively, the shape and the scale parameters. Additional steering parameter θ controls the concavity and the convexity of the PDF and is equal in (1) to
This parameter can be estimated as
where p(0) and p(1) represent estimates of the probabilities of clear-sky and overcast conditions, respectively, and can be computed from observations. The normalizing constant C can be obtained from the continuous form of the distribution (1) and is given by
where is the incomplete gamma function
We note that (1) implies that the probability of a cloud fraction of 0.5 (50%) does not depend on θ, since the terms (1 − x) and x in (1) are equal for a fraction of 0.5. A detailed derivation of the normalizing constant C(α, β) is given in appendix A. In fact, Eq. (1) represents a mixture of two incomplete gamma-type distributions, in which the argument ascends and subsequently descends over the interval [0, 1]. In this sense, θ can be interpreted as the parameter that controls the mixing of the two incomplete gamma distributions, derived with respect to their ascending and descending arguments. Different mixtures of gamma distributions with applications to the analysis of queues with Markovian arrivals and general service problems have been considered by, for example, Dey et al. (1995) and Wiper et al. (2001). Integration of the PDF (1) implies the following cumulative distribution function (CDF):
The expression for the mean is as follows:
Detailed derivations of the distribution moments are presented in appendix B. Figure 6 shows examples of the PDF (1) for different values of the parameters α, β, and θ. Under fixed β and θ, parameter α controls the concavity and convexity of the distribution. With increasing α, the convexity of the PDF evolves to concavity (Figs. 6a–c). Parameter β controls the skewness of the distribution for both convex and concave forms (Figs. 6d,e). As predicted, the parameter θ controls the change in the PDF shape according to the ratio between the probabilities of clear skies and overcast. Figures 6g–i demonstrate that when changing θ from 0 to 1, the probabilities of high and low cloud amounts change accordingly. Figure 6 clearly shows that the proposed PDF allows for practically all shapes of the observed distributions and gives finite values at x = 0 and x = 1, thus contrasting with the beta distribution, which has been applied to the analysis of cloud cover by Falls (1974), Bean and Somerville (1981), and others.
Since the cloud cover in VOS is given in octas, in order to apply the new distribution to the VOS observations, we must derive a discrete analog of (1). The following equation for the probability mass function (PMF) gives the discrete form of the PDF (1):
where k is an integer (an octa; 0 to 8) instead of the fractional x used in (1). The denominator , with n being the number of discrete bins (octas). Similarly, in cases where tenths (0, 1, … . , 10) are used, the denominator equals 11. For the mean, the discrete analog (8) gives
While the parameter θ of the distributions given in Eqs. (1) and (8) can be easily estimated from the raw data according to (3), estimation of the parameters α and β of the distribution is not easy. For now, we use the descent minimization procedure to estimate α and β. In this procedure, we minimize over all bins ki = 0, …, 8, the value
where υi represents the estimate of p for the same bin ki derived from the data. The sum (10) is further considered to be a function of the unknown α and β and the already-known θ. The minimum of this function is found by sequential descents starting from the initial values α0 and β0, first with respect to α and then with respect to β. This procedure repeats in a loop until the difference between the two consecutive iterations drops below a small value (e.g., 10−5), which implies accurate estimation of α and β. While in practice this procedure converges rapidly for all cases, with the number of paired iterations being less than 3–4, there is presently no proof that this procedure must converge. Note that changing the order of descents, that is, starting iteration first with respect to β and then with respect to α, results in a negligible difference in estimation of the coefficients. Application of the chi-square test does not distinguish the resulting mass functions even at the 99.9% level. Further details of this procedure are given in appendix C.
Figures 7a–c show the mean cloud cover and probabilities of different cloud fractions implied by the distribution (1) as a function of the shape and scale parameters, given a fixed value of the parameter θ. While the diagrams shown in Figs. 7a–c were derived for the continuous form (1), for simplicity of interpretation, we show the means in octas and probabilities for the different octa categories. With an increase in the mean under given θ, the probabilities of both clear-sky (octa 0; Fig. 7a) and overcast conditions (octa 8; Fig. 7c) increase. This increase is associated with increasing α and decreasing β parameters. The maximum probabilities of the fractions 25% (octa 2), 50% (octa 4), and 75% (octa 6) are associated with high values of the scale parameter linked to the shape parameter, increasing toward higher cloud fractions. This example shows that the same mean cloud cover can be produced by very different parameters of the distribution and that the probabilities of different cloud fractions can also vary significantly under the same mean. To illustrate this, we show in Figs. 7d–f several forms of distribution (1) characterized by different values of the parameters but implying the same means. Importantly, the same mean values can be observed under both convex and concave forms of PDFs and can be associated with different occurrences of different cloud categories (e.g., in Figs. 7d,e).
4. Characteristics of the statistical distribution of cloud cover over the ocean
a. Fitting the distribution to the data
Before analyzing the characteristics of the parameters of the proposed probability distribution, we first address the issue of the goodness of fit of this distribution. Approximations of the empirical histograms by the derived three-parameter distribution (8) for several regions (Fig. 5) show that the distribution allows for very effective approximation in practically all cases of empirical distributions. For a discrete analog (8), we applied the chi-square test to quantify the goodness of fit. In 97% of the cases tested, the chi-square test implies a goodness of fit of the distribution (8) that is higher than 90%, and in 92% of the tested cases, the goodness of fit exceeds 95%. Figures 8a and 8b show the percentage of the number of years for which the goodness of fit (chi-square test) exceeds 99%. Over most of the middle and subpolar latitudes, the empirical distributions are accurately approximated by the derived distribution (8), and nearly all years show goodness-of-fit values that exceed 99%. Lower goodness-of-fit values are observed in the tropics of the central Pacific and in the midlatitudes of the eastern Pacific, where the number of years with 99% goodness of fit drops to less than 70% in places. However, even in this area, more than 85% of the years examined yield goodness-of-fit values of 95% or greater (figure not shown). Lower goodness-of-fit values in the Southern Ocean during austral winter (Fig. 8b), compared to austral summer (Fig. 8a), can likely be explained by the very sparse observational density.
Besides the goodness of fit, we also analyzed the applicability of the alternative distributions. For this purpose, we adopted the Neyman–Pearson criterion for testing the hypothesis H1 that an alternative distribution fits better to the data than the analyzed mixed gamma distribution (MGD). The competitive hypothesis H0 is that the data imply mixed gamma distribution. According to the Neyman–Pearson theorem (e.g., Borovkov 1984), testing the hypothesis H0 with respect to H1 is based on the criterion
where palt is the competitive distribution, and pMGD is the suggested MGD. The H0 hypothesis is rejected if the probability
where ε is the p value corresponding to the chosen significance level (e.g., 90% or 95%). Analyzing these statistics at a given significance level, we either accepted or rejected the null hypothesis that the fractional cloud cover is distributed according to MGD. As alternatives, we considered the mixed beta distribution and the mixed power distribution, and the latter was found to be more competitive with MGD, compared to the mixed beta distribution. Mixed power distribution was considered in the following form (for continuous case):
where , γ, u, and υ are the parameters of distribution. Figures 8c and 8d demonstrate the number of years (for JFM and JAS) in which the alternative mixed power distribution is as effective as MGD at the 95% level. The areas where the effectiveness of both distributions is competitive are associated with the Pacific tropics (in both JFM and JAS), as well as with the western equatorial Indian Ocean and the western tropical South Atlantic in JFM. In neither location did an alternative distribution demonstrate a better effectiveness, compared to MGD, for more than 50% of years, and only in very few locations does this hold for 30%–50% of years. Similar analysis performed for the mixed beta distribution shows that our distribution is more effective, according to the Neyman–Pearson criterion, during more than 94% of years everywhere.
b. Probability distribution characteristics: Total cloud cover
Figure 9 shows the climatological values of the parameters of the distribution (8) of the total cloud cover for JFM and JAS. The shape parameter α (Figs. 9a,b) varies from 0.5 to 1.5 in the tropics and equatorial regions, where these values are associated primarily with convex distributions (Fig. 5). High values of α of 3–6 occur in the middle and subpolar latitudes and over the upwelling regions, where PMFs are primarily concave. This reflects the steering role of α on the shape of distribution (convex vs concave) in agreement with Figs. 6a–c. The scale parameter β (Figs. 9c,d) resembles a reversed spatial distribution of the mean total cloud cover, with the smallest values observed in the midlatitudes of the Northern Hemisphere (β < 0.1) and the highest values (β > 2) occurring in the tropics, where the lowest values of total cloud cover are observed (Fig. 2). For the concave cases, smaller β implies the stronger concavity of distribution (Figs. 6e,f), which is the case for the North Atlantic and the North Pacific midlatitudes (Fig. 5). Remarkably, the scale parameter β does not go to very small values in the Southern Ocean (β values span from 0.1 to 1), as would be expected, given the high cloud amounts in that region (Fig. 2). However, our results over the Southern Ocean may be influenced by poor sampling and should be considered with caution. During JFM, a local maximum in the scale parameter (up to 3) occurs in the region of the Somali Current, characterized by the PMF with small concavity (Fig. 5d) and corresponding to the evolution of β shown in Fig. 6e. The steering parameter θ (Figs. 9e,f) varies from 0.02–0.1 in areas with high total cloud cover to values exceeding 0.5 in regions with relatively small cloud cover. Its local maxima of 0.8 are observed in the regions with low mean cloud cover in the Arabian Sea and the Bay of Bengal in JFM and over the northwestern Australian coast and the Mediterranean in JAS. We note here that combining reports showing octa 9 and precipitation with those showing octa 8 has a visible effect only on the α parameter in the subpolar regions and only in JAS, when, locally, α may increase by 15%, potentially changing convex distributions toward concave ones (no figure shown).
The suggested distribution allows for the robust estimation of the probability of a given fractional cloud cover directly from the PMF (8) and for the further association of these probabilities with the parameters of the distribution. Figure 10 shows the probabilities of different cloud conditions in JFM and JAS derived from the distribution (8) after filtering observations with a poor lunar illumination. The probability of clear skies (Figs. 10a,b) over most of the ocean is below 5%. The maximum probability (30%–50%) is observed over the Arabian Sea, the Bay of Bengal, and the eastern ocean tropics, with the absolute maximum of approximately 70% being observed in the eastern Mediterranean. This is consistent with the spatial pattern of the θ parameter (Figs. 9e,f), which also controls seasonal changes (e.g., over the eastern ocean tropics). The highest probability of overcast conditions (octa 8; Figs. 10c,d) over the subpolar regions of both hemispheres exceeds 70% in the summer season of the corresponding hemisphere and amounts to 50% in the winter. The minimum probabilities (less than 10%) are associated with the tropical oceans. Besides the θ parameter, the probability of overcast is also controlled by the β parameter, especially in the case of the concave distributions in the midlatitudes, as smaller values of β imply the increasing concavity and the growing occurrence of overcast (the inverse case is shown in Figs. 6e,f). The probabilities of octa 4 (50% cloud cover; Figs. 10e,f) are smaller than 10% in the subpolar regions and increase slightly in the midlatitudes and tropics, and the highest probabilities (15%–20%) are observed in the western subtropics of both hemispheres. This pattern is closely associated with the changes in the α parameter, controlling the shape of the distribution (Figs. 6a–c). Importantly, the probabilities derived from the proposed MGD are considerably more robust, compared to those derived from the raw data, especially in poorly sampled areas. Figures 4d–f clearly demonstrate this message for the same year of 2006 (JAS), for which the occurrences were derived from the raw VOS data (Figs. 4a–c). The pattern for the individual year becomes less noisy and much more easily interpreted. It also reveals quantitative differences to that reported by raw VOS, especially in poorly sampled regions (e.g., in Figs. 4c,f).
c. Analysis of diurnal differences
Cloud cover holds a pronounced diurnal cycle (e.g., Bergman and Salby 1996; Eastman and Warren 2014), which has serious implications for daily temperatures and radiative forcing (e.g., Betts et al. 2013). Analysis of probability distributions for day and night conditions may provide new insights for understanding diurnal variability in cloud cover. Daily differences in the total cloud cover derived from the data after removing reports with insufficient moonlight (section 2) are shown in Fig. 11a for JAS, when the magnitude of the “day minus night” differences is the largest. Higher daytime than nighttime cloud amounts are found in the western ocean tropics of the Northern Hemisphere and in the equatorial Indian Ocean, where the differences amount to 0.3–0.5 octa. Higher nighttime total cloud cover is observed in the eastern ocean tropics and subtropics, with the maximum magnitudes exceeding 0.5 octa in the eastern equatorial Pacific. The continuous pattern of the positive differences (0.2–0.5 octa) in the Southern Ocean can be attributed to very small amount of data here, especially in JAS. An alternative explanation could be the use of larger (20° × 10°) grid cells in this region. We tested this hypothesis by using averaging over 20° × 10° grid cells for the global ocean (no figure shown); however, this did not change the pattern of positive differences presented in Fig. 11a. The pattern of the differences for JFM (not shown) reflects similar structure following seasonal ITCZ migrations. Associated changes in the α and β parameters (Figs. 11b,c) show that the negative differences are associated with decreasing α, implying stronger convexity of the distributions (Fig. 6). Positive differences are associated with the increase in β, flattening the distributions and changing the peak occurrences toward higher cloud cover.
Figures 12a and 12b show zonally averaged probabilities of different cloud categories computed from the parameters of the proposed distribution. As expected, for JFM and for JAS, zonally averaged probabilities of both clear skies and overcast tend to slightly increase during nighttime. This can be potentially explained by the impact of cloud-top cooling onto stratiform clouds and surface warming onto cumuliform clouds. Further analysis should engage data on cloud types that are not considered here. Occurrences of the low and moderate cloud amounts (octas 2 to 6) are alternatively slightly decreasing from day to night. This is consistent with the change in the α and β parameters (Figs. 11b,c), implying increasing concavity for the concave shapes, and also with changes from convex to concave forms of distribution. This is demonstrated for selected regions in Fig. 13. In the eastern equatorial Pacific in JAS, distribution is evolving from convex to concave form between daytime and nighttime (Fig. 13a). In the area of the Indian monsoon (Fig. 13d), there is a tendency toward decreasing occurrence of small cloud amounts and slight increasing occurrence of moderate-to-high cloud amounts. In the region of the intense convective activity in the western equatorial Pacific in JFM (Fig. 13e), daytime and nighttime distributions are very close to each other. The eastern equatorial Atlantic and western equatorial Indian Oceans, where Fig. 11a identifies quite strong positive day-minus-night differences, are characterized by the changes of PMFs associated with increasing concavity and growing occurrences of low cloud amounts (Figs. 13b,c).
As we pointed out earlier, filtering reports with insufficient moon illumination (section 2) has a minor effect on total cloud cover and on the distribution of the parameters; however, it can seriously impact nighttime observations, thus affecting estimates of the diurnal cycle in cloud cover. Figures 12c and 12d show zonal differences in the occurrences of different cloud categories between subsets of the nighttime reports after filtering observations with poor lunar illumination and the filtered reports. Observations performed under insufficient lunar illumination tend to underestimate high cloud amounts and overestimate low cloud amounts, with the major differences of both signs being up to 5% in the tropical regions. Locally, underestimation of overcast conditions may result in the decrease of probability by 10%–15%. This effect is also clearly seen in all regional diagrams in Fig. 13, with the strongest changes in the occurrence of overcast in the western Pacific and western Indian Oceans. Further analysis of the diurnal cycle using the concept of probability distributions should involve the analysis of specific cloud types that may demonstrate highly variable character of diurnal variations in different regions, as shown by Eastman and Warren (2014).
d. Probability distribution characteristics: Low cloud cover
We also applied the PMF analysis to the low cloud cover using the subset of reports that contain observations of both total and low cloud cover. Analyzing low cloud characteristics and comparing them to those for the total cloud may help to identify specific cloud regimes in different geographical areas (e.g., tropical deep convective regimes). However, when considering the difference between the total and low cloud amount from ship-based observations, one should be careful, as the upper clouds can be obscured by low clouds, especially in regions with large amounts of low-level clouds (Norris 1998). Low cloud cover is also well approximated by the proposed distribution, and comparable or even higher goodness-of-fit values are obtained. Figure 14 compares the zonally averaged parameters of the probability distribution for the total and low cloud amounts. The shape parameter α for total cloud cover is larger than that for low cloud cover in the extratropics of both hemispheres by 20% to 30%. In the tropical regions, the alternating positive and negative differences result in minor differences in α. The scale parameter β displays strong differences in the tropics, where it can be nearly twice as large for low cloud cover, compared to that of total cloud cover. Zonal averaging also identifies significantly stronger differences in the scale parameter in the tropics in JFM, compared to JAS. Parameter θ shows a slight increase between the total and low cloud cover everywhere, with the strongest difference of −0.15 in the western ocean subtropics in JAS.
The resulting changes in the shape of PMFs of the mixed gamma distribution derived from (8) for selected regions are presented in Fig. 15 and a companion Table 1, showing quantitative estimates of the mean cloud amounts and parameters. In all regions, the PMFs of low cloud cover show an increasing probability of small amounts of clouds and decreasing occurrences of large amounts of clouds. In the western boundary current regions (Fig. 15a), this change is associated with decreasing α and slightly increasing θ (Table 1). In the tropical regions, the changes in the PMFs are especially evident, with the strongest differences for the occurrences of small amounts of clouds (increasing probability of low cloud cover) and large amounts of clouds (decreasing probability of low cloud cover). This change is primarily controlled by an increasing parameter β which implies stronger convexity for the convex shapes, and by also increasing parameter θ, resulting in the increase of occurrences of small cloud amounts (Figs. 15c–f, Table 1).
5. Summary and discussion
We have analyzed the characteristics of the probability distribution of total and low cloud cover over the global oceans using VOS visual cloud cover reports for the period extending from 1950 to 2011. We suggested a new three-parameter distribution from the family of mixed gamma distributions, for which we derived both continuous and discrete forms. The shape and scale parameters of this distribution control the concavity and convexity (shape) of the distribution and the mean value of the cloud cover (scale). An additional steering parameter θ quantifies the probabilities of clear sky and complete overcast, thus controlling the mixing of the two incomplete gamma distributions. The application of this distribution to the cloud cover data over the global ocean allowed us to identify the areas where the different parameters control the characteristics of the PMFs describing the fractional cloud cover and where the mean values of cloud cover lie close to one another but may be associated with probability distributions having quite different shapes (e.g., the tropical regions and the midlatitudes of the Northern Hemisphere). The proposed approach, thus, opens another dimension in the analysis of cloud cover data additionally to conventional analyses of the mean cloud fraction and the associated cloud types (e.g., Hahn et al. 1982). Application of this distribution to the analysis of the diurnal cycle demonstrates significant day-to-night changes in probability distributions in the tropics. Analysis of low cloud cover identified regions of significant changes in the PMFs that are due the shape parameter (in the midlatitude regions) and changes in both the scale and steering parameters (in the tropical band).
There are several lines of further development and applications of the suggested concept. Our study opens up the possibility of the advanced intercomparison and validation of cloud cover data over the ocean from satellite missions and reanalyses. The suggested approach may demonstrate the ability of satellite data and reanalyses to capture the characteristics of probability distributions and extend conventional analyses of the mean characteristics of fractional cloud cover. However, for a proper application of our concept for intercomparisons, some issues need to be analyzed and clarified. Reanalyses and satellite measurements present the cloud cover data as percentages or fractions of unity. In this case, a continuous form of the distribution (1) should be used. This choice is unproblematic, as the whole formalism was initially developed for the continuous case. To enable comparison with VOS-based distributions, either the initially discrete visual data should be represented as percentages, or, alternatively, the continuous data could be converted to virtual octas (or tenths). Both conversions result in some minor uncertainties in matching the PDF (or PMF) to the data. However, more importantly, another uncertainty is associated with the scaling of the cloud cover observations. In fact, fractional cloud cover (whether it is measured in octas or as percentages) represents a scale-dependent parameterization of cloud conditions that asymptotically (for a very narrow cone with a vanishing aperture) should be binary (reflecting cloudy and clear-sky states). In this sense, the larger the area that is considered, the smaller the probability of observing exclusively clear-sky or completely overcast conditions over this area becomes. Visual observations are typically attributed to a radius of approximately 10 km to more than 20 km (the distance to the visible horizon). The spatial extent of satellite measurements, and especially of model output data (reanalyses), differs strongly from that of visual observations; it is associated with the pixel size and postprocessing used to produce the satellite data and with the grid cell size used in the models. For example, the ISCCP gridded data are available on an equal-area grid with a resolution of ~280 km (Rossow and Schiffer 1991). Data attributed to different space scales may seriously affect probability distributions of cloud fraction (Chang and Coakley 1993).
The proposed distribution can also be used for the estimation and minimization of sampling uncertainties in the visual cloud cover reports from ICOADS. Sampling errors may exceed 30% in the poorly sampled Southern Ocean and the subpolar North Atlantic (Gulev et al. 2007a). Application of this distribution in the manner suggested in Gulev et al. (2007a) and Gulev and Belyaev (2012) may help to quantify the changes in the shape of the PMF due to sampling, identify the impact of sampling on the different parameters, and potentially minimize the sampling uncertainties. This may have serious implications for the analysis of variability in cloud conditions in poorly sampled areas. While we are not considering here climate variability of cloud cover, we show in Fig. 16 two remarkable examples of this kind for the regions analyzed in Fig. 3. Multiyear time series of the probability of octa category 8 (overcast) in the Labrador Sea clearly mark the period of operation of OWS B, including several years after that, when the observations were maintained by the Coast Guard ships. Starting from 1980 (period of poor sampling), interannual variability in the occurrence of overcast in the Labrador Sea derived from the raw data shows unrealistically strong variations, with probabilities varying from 0 to 1. The time series derived from the theoretical PMF is very close to that from raw VOS data for the period prior to 1980 and then strongly differs from it, showing variability that is likely more reliable. In the Pacific warm pool, variability in the occurrence of N = 4 (Fig. 3b) is also quite strong, with probabilities varying from 0 to 0.4 when derived from the raw VOS data. The time series built from the theoretical PMF shows much weaker variations, demonstrating remarkably close probability values for the year of 1997, characterized by good sampling (Fig. 3).
Thus, the proposed approach also provides a useful framework for the analysis of the observed interannual variability of cloud cover conditions over the ocean and land. Changes in the parameters of cloud cover PMFs may provide a better understanding of the regional patterns of cloud cover variability (Sun and Groisman 2000; Sun et al. 2001; Schwartz et al. 2014; McLean 2014; Sun et al. 2015; Zhu et al. 2016; Chernokulsky et al. 2017). Moreover, the variability in the parameters can be effectively associated with the observed signals in surface radiative fluxes and can largely explain the tendencies identified using satellite measurements of surface radiation (Wild et al. 2005, 2007; Pinker et al. 2005). The suggested approach can be also used in the evaluation of cloud characteristics and associated feedbacks in climate model simulations for the present climate and in scenario runs (Zhang et al. 2005; Jiang et al. 2012; Lauer and Hamilton 2013; Rémillard and Tselioudis 2015; Enriquez-Alonso et al. 2016; Sanchez-Lorenzo et al. 2017).
In the future, it will be of great interest to associate the characteristics of the probability distribution with the cloud cover at different levels and the occurrences of different cloud types. Clouds of different types have strong effects on radiative fluxes at the surface (e.g., Poetzsch-Heffter et al. 1995; Shukla et al. 2009; Hakuba et al. 2017) and effectively characterize the spatial and temporal variability in the atmospheric synoptic conditions (Hahn et al. 2001). By associating the characteristics of the probability distribution with the occurrence of different cloud types, it is possible to characterize different cloud regimes and to further associate them with surface radiation fluxes. However, analysis of cloud types (as well as amounts for different levels of cloudiness) may be influenced by uncertainties associated with the obscuring effects of low clouds on upper-level clouds (e.g., Norris 1998). Of interest would be also the association of the cloud cover PMF properties with the column water content (e.g., Forsythe et al. 2012) and the cloud liquid water (Greenwald et al. 1993). Moreover, as we noted above, analysis of cloud types will allow for identification of the impacts of the diurnal cycle on the properties of the cloud cover PMF (Eastman and Warren 2014).
The visual VOS cloud cover data were obtained as part of the ICOADS Release 3.0 dataset, which was supported by the NOAA Climate Program Office. We greatly appreciate the suggestions and criticism of the editor and three anonymous reviewers on the first version of the manuscript. The FORTRAN-95 codes for estimation of the parameters of both continuous and discrete forms of the distribution can be obtained from the authors by request. This research was supported by the Russian Ministry of Education and Science (Agreement 14.616.21.0075, Project ID RFMEFI61617X0075).
Derivation of the Normalizing Constant for Probability Distribution (1)
For the proposed distribution (1),
the normalizing constant C can be found from the condition
Splitting the integral in the lhs of (A2) into 2 and factoring out (1 − θ) and θ, we obtain
Substituting in (A3) implies , with varying from 1 to 0 if x is varying from 0 to 1. After this substitution, we obtain
Since for any a and b and any function f(x), we arrive at
because the expressions under the integrals are identical, and the sum of coefficients equals 1. Integral (A5) can be expressed through the incomplete gamma function:
Incomplete gamma function by definition is:
Derivation of the Equations for Mean and Variance
By definition, for the mean of the distribution (1) we have
Simple manipulations with the integral on the rhs of (B1) give
This directly implies for the mean,
which is identical to (6). Similarly, for the variance, we arrive at
For the discrete case (8), we can obtain the following equation for the variance:
Equation (B3) implies a corollary:
Iterative Parameter Estimator
Under fixed θ, the estimator for α and β is sought as the unbiased estimator, with a minimum variance derived from the relation
where P(x) is the theoretical distribution,
with , and Fn(x) is the empirical distribution, built from the sample. The minimum (C1) is searched with respect to the iterative procedure:
Then the iteration
is changing sequentially α and β, starting from given initial α0. Since the inequality always holds, this procedure finally determines such values of parameters that provide the minimum variance of
However, this can be only local minimum; there is no evidence that this is either a global or a unique minimum at the interval of the change of x [0, 1]. As we noted in section 2, the change of the order of descents, that is, starting iteration first with respect to β and then with respect to α, results in a negligible difference in estimation of the coefficients. Application of the above parameter estimator is equally valid for both continuous and discrete forms of the distribution.