• Almonacid, F., , P. J. Pérez-Higueras, , E. F. Fernández, , and L. Hontoria, 2014: A methodology based on dynamic artificial neural network for short-term forecasting of the power output of a PV generator. Energy Convers. Manage., 85, 389398, doi:10.1016/j.enconman.2014.05.090.

    • Search Google Scholar
    • Export Citation
  • Beyer, H. G., , C. Costanzo, , and D. Heinemann, 1996: Modifications of the Heliosat procedure for irradiance estimates from satellite data. Sol. Energy, 56, 207212, doi:10.1016/0038-092X(95)00092-6.

    • Search Google Scholar
    • Export Citation
  • Bhardwaj, S., , V. Sharma, , S. Srivastava, , O. S. Sastry, , B. Bandyopadhyay, , S. S. Chandel, , and J. R. P. Gupta, 2013: Estimation of solar radiation using a combination of hidden Markov model and generalized fuzzy model. Sol. Energy, 93, 4354, doi:10.1016/j.solener.2013.03.020.

    • Search Google Scholar
    • Export Citation
  • Bilionis, I., , E. M. Constantinescu, , and M. Anitescu, 2014: Data-driven model for solar irradiation based on satellite observations. Sol. Energy, 110, 2238, doi:10.1016/j.solener.2014.09.009.

    • Search Google Scholar
    • Export Citation
  • Bouzerdoum, M., , A. Mellit, , and A. Massi Pavan, 2013: A hybrid model (SARIMA–SVM) for short-term power forecasting of a small-scale grid-connected photovoltaic plant. Sol. Energy, 98, 226235, doi:10.1016/j.solener.2013.10.002.

    • Search Google Scholar
    • Export Citation
  • Chow, C. W., , N. Urquhart, , M. Lave, , A. Dominquez, , J. Kleissl, , J. Shields, , and B. Washom, 2011: Intra-hour forecasting with a total sky imager at the UC San Diego solar energy testbed. Sol. Energy, 85, 28812893, doi:10.1016/j.solener.2011.08.025.

    • Search Google Scholar
    • Export Citation
  • Chu, Y., , H. T. C. Pedro, , and C. F. M. Coimbra, 2013: Hybrid intra-hour DNI forecasts with sky image processing enhanced by stochastic learning. Sol. Energy, 98, 592603, doi:10.1016/j.solener.2013.10.020.

    • Search Google Scholar
    • Export Citation
  • Chu, Y., , H. T. C. Pedro, , M. Li, , and C. F. M. Coimbra, 2015: Real-time forecasting of solar irradiance ramps with smart image processing. Sol. Energy, 114, 91104, doi:10.1016/j.solener.2015.01.024.

    • Search Google Scholar
    • Export Citation
  • Cros, S., , O. Liandrat, , N. Sébastien, , and N. Schmutz, 2014: Extracting cloud motion vectors from satellite images for solar power forecasting. Proc. IEEE Int. Geoscience and Remote Sensing Symp., Quebec City, QC, Canada, IEEE, 4123–4126, doi:10.1109/IGARSS.2014.6947394.

  • Diagne, M., , M. David, , P. Lauret, , J. Boland, , and N. Schmutz, 2013: Review of solar irradiance forecasting methods and a proposition for small-scale insular grids. Renewable Sustainable Energy Rev., 27, 6576, doi:10.1016/j.rser.2013.06.042.

    • Search Google Scholar
    • Export Citation
  • Fernandez, E., , F. Almonacid, , N. Sarmah, , P. Rodrigo, , T. K. Mallick, , and P. Perez-Higueras, 2014: A model based on artificial neuronal network for the prediction of the maximum power of a low concentration photovoltaic module for building integration. Sol. Energy, 100, 148158, doi:10.1016/j.solener.2013.11.036.

    • Search Google Scholar
    • Export Citation
  • Fu, C.-L., , and H.-Y. Cheng, 2013: Predicting solar irradiance with all-sky image features via regression. Sol. Energy, 97, 537550, doi:10.1016/j.solener.2013.09.016.

    • Search Google Scholar
    • Export Citation
  • Hammer, A., , D. Heinemann, , E. Lorenz, , and B. Lückehe, 1999: Short-term forecasting of solar radiation: A statistical approach using satellite data. Sol. Energy, 67, 139150, doi:10.1016/S0038-092X(00)00038-4.

    • Search Google Scholar
    • Export Citation
  • Haupt, S. E., and Coauthors, 2016: The SunCast Solar Power Forecasting System: The results of the public-private-academic partnership to advance solar power forecasting. National Center for Atmospheric Research Tech. Note NCAR/TN-526+STR, 307 pp., doi:10.5065/D6N58JR2.

  • Heidinger, A. K., , M. J. Foster, , A. Walther, , and X. Zhao, 2014: The Pathfinder Atmospheres–Extended AVHRR climate dataset. Bull. Amer. Meteor. Soc., 95, 909922 doi:10.1175/BAMS-D-12-00246.1.

    • Search Google Scholar
    • Export Citation
  • Hinkelman, L. M., , and M. Sengupta, 2012: Relating solar resource variability to cloud type. Abstracts, Fall Meeting, San Francisco, CA, Amer. Geophys. Union, abstract A31F-0086. [Available online at http://adsabs.harvard.edu/abs/2012AGUFM.A31F0086H.]

  • Huang, H., , J. Xu, , Z. Peng, , S. Yoo, , D. Yu, , D. Huang, , and H. Qin, 2013: Cloud motion estimation for short term solar irradiance prediction. Proc. IEEE Int. Conf. on Smart Grid Communications, Vancouver, BC, Canada, IEEE, 696–701, doi:10.1109/SmartGridComm.2013.6688040.

  • Inman, R. H., , H. T. C. Pedro, , and C. F. M. Coimbra, 2013: Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci., 39, 535576, doi:10.1016/j.pecs.2013.06.002.

    • Search Google Scholar
    • Export Citation
  • IRENA and CEM, 2014: The socio-economic benefits of solar and wind energy. International Renewable Energy Agency Clean Energy Ministerial Rep., 107 pp. [Available online at http://www.irena.org/DocumentDownloads/Publications/Socioeconomic_benefits_solar_wind.pdf.]

  • Kleissl, J., 2013: Solar Energy Forecast and Resource Assessment. Academic Press, 462 pp.

  • Lippmann, R. P., 1987: An introduction to computing with neural nets. IEEE Acoust. Speech Signal Process. Mag., 4, 422, doi:10.1109/MASSP.1987.1165576.

    • Search Google Scholar
    • Export Citation
  • Lopez, G., , F. J. Batlles, , and J. Tovar-Pescador, 2005: Selection of input parameters to model direct solar irradiance by using artificial neural networks. Energy, 30, 16751684, doi:10.1016/j.energy.2004.04.035.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E., , A. Hammer, , and D. Heinemann, 2004: Short term forecasting of solar radiation based on satellite data. Proc. ISES Europe Solar Congress, EuroSun 2004, Freiburg, Germany, International Solar Energy Society, 841–848. [Available online at https://www.uni-oldenburg.de/fileadmin/user_upload/physik/ag/ehf/enmet/publications/solar/conference/2004/eurosun/short_term_forecasting_of_solar_radiation_based_on_satellite_data.pdf.]

  • Lorenz, E., , J. Kuhnert, , and D. Heinemann, 2012: Overview on irradiance and photovoltaic power prediction. Weather Matters for Energy, A. Troccoli, L. Dubus, and S. E. Haupt, Eds., Springer, 429–454.

  • Marquez, R., , and C. F. M. Coimbra, 2013: Intra-hour DNI forecasting based on cloud tracking image analysis. Sol. Energy, 91, 327336, doi:10.1016/j.solener.2012.09.018.

    • Search Google Scholar
    • Export Citation
  • Marquez, R., , H. T. C. Pedro, , and C. F. M. Coimbra, 2013: Hybrid solar forecasting method uses satellite imaging and ground telemetry as inputs to ANNs. Sol. Energy, 92, 176188, doi:10.1016/j.solener.2013.02.023.

    • Search Google Scholar
    • Export Citation
  • Martín, L., , L. F. Zarzalejo, , J. Polo, , A. Navarro, , R. Marchante, , and M. Cony, 2010: Prediction of global solar irradiance based on time series analysis: Application to solar thermal power plants energy production planning. Sol. Energy, 84, 17721781, doi:10.1016/j.solener.2010.07.002.

    • Search Google Scholar
    • Export Citation
  • McCandless, T. C., , S. E. Haupt, , and G. S. Young, 2014: Short term solar radiation forecasts using weather regime-dependent artificial intelligence techniques. 12th Conf. on Artificial Intelligence and its Applications to the Environmental Sciences, Atlanta, GA, Amer. Meteor. Soc., J3.5. [Available online at https://ams.confex.com/ams/94Annual/webprogram/Manuscript/Paper240879/STATCast_AMS_TM_Final.pdf.]

  • McCandless, T. C., , S. E. Haupt, , and G. S. Young, 2016: A regime-dependent artificial neural network technique for short-range solar irradiance forecasting. Renewable Energy, 89, 351359, doi:10.1016/j.renene.2015.12.030.

    • Search Google Scholar
    • Export Citation
  • Mellit, A., 2008: Artificial intelligence technique for modeling and forecasting of solar radiation data: A review. Int. J. Artif. Intell. Soft Comput., 1, 5276, doi:10.1504/IJAISC.2008.021264.

    • Search Google Scholar
    • Export Citation
  • Mellit, A., , A. Massi Pavan, , and V. Lughi, 2014: Short-term forecasting of power production in a large-scale photovoltaic plant. Sol. Energy, 105, 401413, doi:10.1016/j.solener.2014.03.018.

    • Search Google Scholar
    • Export Citation
  • Miller, S. D., and Coauthors, 2014: Estimating three-dimensional cloud structure via statistically blended satellite observations. J. Appl. Meteor. Climatol., 53, 437455, doi:10.1175/JAMC-D-13-070.1.

    • Search Google Scholar
    • Export Citation
  • Morf, H., 2014: Sunshine and cloud cover prediction based on Markov processes. Sol. Energy, 110, 615626, doi:10.1016/j.solener.2014.09.044.

    • Search Google Scholar
    • Export Citation
  • Notton, G., , C. Paoli, , S. Vasileva, , M.-L. Nivet, , J.-L. Canaletti, , and C. Cristofari, 2012: Estimation of hourly global solar irradiation on tilted planes from horizontal one using artificial neural networks. Energy, 39, 166179, doi:10.1016/j.energy.2012.01.038.

    • Search Google Scholar
    • Export Citation
  • Pedro, H. T. C., , and C. F. M. Coimbra, 2012: Assessment of forecasting techniques for solar power prediction with no exogenous inputs. Sol. Energy, 86, 20172028, doi:10.1016/j.solener.2012.04.004.

    • Search Google Scholar
    • Export Citation
  • Quesada-Ruiz, S., , Y. Chu, , J. Tovar-Pescador, , H. T. C. Pedro, , and C. F. M. Coimbra, 2014: Cloud-tracking methodology for intra-hour DNI forecasting. Sol. Energy, 102, 267275, doi:10.1016/j.solener.2014.01.030.

    • Search Google Scholar
    • Export Citation
  • Reed, D. R., , and R. J. Marks, 1998: Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks. MIT Press, 346 pp.

  • Rosello, E. G., , J. B. G. Perez-Schofield, , J. G. Dacosta, , and M. Perez-Cota, 2003: Neuro-Lab: A highly reusable software-based environment to teach artificial neural networks. Comput. Appl. Eng. Educ., 11, 93102, doi:10.1002/cae.10042.

    • Search Google Scholar
    • Export Citation
  • Rosenblatt, F., 1958: The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev., 65, 386408, doi:10.1037/h0042519.

    • Search Google Scholar
    • Export Citation
  • Tapakis, R., , and A. G. Charalambides, 2013: Equipment and methodologies for cloud detection and classification: A review. Sol. Energy, 95, 392430, doi:10.1016/j.solener.2012.11.015.

    • Search Google Scholar
    • Export Citation
  • Voyant, C., , M. Muselli, , C. Paoli, , and M.-L. Nivet, 2012: Numerical weather prediction (NWP) and hybrid ARMA/ANN to predict global radiation. Energy, 39, 341355, doi:10.1016/j.energy.2012.01.006.

    • Search Google Scholar
    • Export Citation
  • Voyant, C., , M. Muselli, , C. Paoli, , and M.-L. Nivet, 2013: Hybrid methodology for hourly global radiation forecasting in Mediterranean area. Renewable Energy, 53, 111, doi:10.1016/j.renene.2012.10.049.

    • Search Google Scholar
    • Export Citation
  • Zagouras, A., , A. Kazantzidis, , E. Nikitidou, , and A. A. Argiriou, 2013: Determination of measuring sites for solar irradiance, based on cluster analysis of satellite-derived cloud estimations. Sol. Energy, 97, 111, doi:10.1016/j.solener.2013.08.005.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Locations of SMUD irradiance observations (blue triangles) and the three nearest METAR surface weather observations (red times signs). Map data: Google; CopyPasteMap.com.

  • View in gallery

    Location of the BNL irradiance observation site (blue triangle) and the three nearest METAR surface weather observations (red times signs). Map data: Google; CopyPasteMap.com.

  • View in gallery

    Overall process design for our regime-dependent prediction technique and the comparison techniques.

  • View in gallery

    Sensitivity-study results for the optimal number of training epochs of the ANN for the RD-ANN at SMUD sites for the 180-min lead time.

  • View in gallery

    MAE as a function of lead time for all methods of the satellite-determined cloudy instances for the SMUD site. The method that performs best in the majority of the forecast lead times is the RD-ANN-GKtCC method.

  • View in gallery

    Percent improvement over the clearness-index-persistence forecasts for all methods on the satellite-determined cloudy instances.

  • View in gallery

    Results for all methods on the satellite-determined cloudy instances for the BNL forecast site. The method that performs best in the majority of the forecast lead times is the clearness-index-persistence method.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 60 60 14
PDF Downloads 59 59 17

Regime-Dependent Short-Range Solar Irradiance Forecasting

View More View Less
  • 1 National Center for Atmospheric Research, Boulder, Colorado, and Department of Meteorology, The Pennsylvania State University, University Park, Pennsylvania
  • 2 Department of Meteorology, The Pennsylvania State University, University Park, Pennsylvania
  • 3 National Center for Atmospheric Research, Boulder, Colorado, and Department of Meteorology, The Pennsylvania State University, University Park, Pennsylvania
  • 4 University of Washington, Seattle, Washington
© Get Permissions
Full access

Abstract

This paper describes the development and testing of a cloud-regime-dependent short-range solar irradiance forecasting system for predictions of 15-min-average clearness index (global horizontal irradiance). This regime-dependent artificial neural network (RD-ANN) system classifies cloud regimes with a k-means algorithm on the basis of a combination of surface weather observations, irradiance observations, and GOES-East satellite data. The ANNs are then trained on each cloud regime to predict the clearness index. This RD-ANN system improves over the mean absolute error of the baseline clearness-index persistence predictions by 1.0%, 21.0%, 26.4%, and 27.4% at the 15-, 60-, 120-, and 180-min forecast lead times, respectively. In addition, a version of this method configured to predict the irradiance variability predicts irradiance variability more accurately than does a smart persistence technique.

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Corresponding author address: Tyler McCandless, National Center for Atmospheric Research, 3450 Mitchell Lane, Boulder, CO 80301. E-mail: mccandle@ucar.edu

Abstract

This paper describes the development and testing of a cloud-regime-dependent short-range solar irradiance forecasting system for predictions of 15-min-average clearness index (global horizontal irradiance). This regime-dependent artificial neural network (RD-ANN) system classifies cloud regimes with a k-means algorithm on the basis of a combination of surface weather observations, irradiance observations, and GOES-East satellite data. The ANNs are then trained on each cloud regime to predict the clearness index. This RD-ANN system improves over the mean absolute error of the baseline clearness-index persistence predictions by 1.0%, 21.0%, 26.4%, and 27.4% at the 15-, 60-, 120-, and 180-min forecast lead times, respectively. In addition, a version of this method configured to predict the irradiance variability predicts irradiance variability more accurately than does a smart persistence technique.

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Corresponding author address: Tyler McCandless, National Center for Atmospheric Research, 3450 Mitchell Lane, Boulder, CO 80301. E-mail: mccandle@ucar.edu

1. Introduction

Utility companies and independent system operators (ISOs) require accurate short-range forecasting of variable renewable energy sources, such as solar energy, to maintain power-grid load balance (IRENA and CEM 2014). Cloud cover is the most important variable in forecasting short-range solar-energy power generation because clouds cause near-instantaneous changes in power generation as they move over the solar power plant. Forecasts of the change in cloud cover, and thus the amount of solar irradiance reaching the surface of Earth, provide necessary information for utility companies and system operators to maximize solar-energy penetration while maintaining balanced grid operation. Therefore, deterministic forecasting of the solar irradiance reaching the ground is important so that the generation resources required to maintain this balance can be allocated efficiently. In addition, forecasts of variability of the resource aid in strategic allocation of reserves. Our goal in this study is to leverage a statistical classification of cloud regimes to better tune artificial intelligence (AI) prediction algorithms so as to improve the skill of deterministic global horizontal irradiance (GHI) predictions.

The forecast lead time substantially impacts the optimal predictors and forecast method for irradiance prediction. Day-ahead and longer forecasts are necessary in planning conventional and variable power generation, and for these lead times numerical weather prediction (NWP) forecasts are generally used (Lorenz et al. 2012; Kleissl 2013). Intraday irradiance forecasts are used by utility companies and ISOs for load following and planning for dispatch. At these lead times, a combination of methods—empirical models, satellite-based techniques, statistical methods, and NWP models—works best (Bouzerdoum et al. 2013; Voyant et al. 2012, 2013), with the combination producing the lowest forecast error depending on the specific lead time and available predictors. At the shortest time scales of less than 15 min, sky-image data can be used as input to cloud-based advection techniques (Chow et al. 2011; Marquez and Coimbra 2013; Huang et al. 2013; Quesada-Ruiz et al. 2014; Chu et al. 2015); the number of sky imagers deployed is generally limited, however. We focus on forecast lead times from 15 min to 3 h, which is a sufficiently short range for statistical methods to outperform NWP but beyond the range where persistence or sky-imager forecasts are difficult to outperform.

At forecast lead times from 15 min to 3 h, historically satellite-based cloud-advection techniques have been used. These techniques use cloud-motion vectors (CMVs) that are computed from consecutive satellite images and then used to advect the satellite-observed clouds into the future. The use of CMVs for prediction of solar irradiance and solar power was proposed by Beyer et al. (1996), with Hammer et al. (1999) and Lorenz et al. (2004) developing more-advanced advection schemes. A forecasting method that uses a phase correlation between consecutive Meteorological Satellite-9 (Meteosat-9) images has been used to predict 30-min values of cloud index out to 4-h lead time and on average showed 21% improvement in root-mean-square error (RMSE) when compared with cloud-index “persistence” (Cros et al. 2014). Bilionis et al. (2014) use a probabilistic prediction technique with the application of a Gaussian process model after applying a principal component analysis in an attempt to model the evolution of the clearness index from satellite images. To address the errors that arise from assuming steady clouds during advection, Miller et al. (2014) group cloud pixels into cohesive cloud structures and then employ an appropriate steering flow that uses cloud-group properties to forecast their downstream development and shearing characteristics. Their intermediate position in the lead-time spectrum makes satellite-based techniques prime candidates for blending with other forecast techniques.

Statistical methods are well suited to combining multiple predictors in such blended forecast systems. Statistical models of appropriate complexity for the GHI forecast problem maximize the predictive value from the available predictors (e.g., satellite and ground-based observations). Any regression method can be applied to GHI forecasting, but the artificial neural network (ANN) is one of the most powerful, most general, and, therefore, most widely used (Mellit 2008; Martín et al. 2010; Pedro and Coimbra 2012; Notton et al. 2012; Bhardwaj et al. 2013; Bouzerdoum et al. 2013; Diagne et al. 2013; Fu and Cheng 2013; Marquez et al. 2013; Inman et al. 2013; Chu et al. 2013; Fernandez et al. 2014; Almonacid et al. 2014; Quesada-Ruiz et al. 2014; among others). The relevant predictors for estimating direct normal irradiance with a Bayesian ANN method were found by Lopez et al. (2005) to be the clearness index and the relative air mass. Pedro and Coimbra (2012) found that an ANN time series model outperformed persistence, autoregressive integrated moving average (ARIMA), and k-nearest neighbors (kNN) models for 1–2-h solar power predictions. Marquez et al. (2013) used processed satellite images as input into ANNs to predict GHI from 30 to 120 min and found between 5% and 25% reduction in RMSE relative to that of persistence. A challenge with ANNs, however, is the large number of tunable parameters, which is on the order of the number of predictors multiplied by the number of neurons. This requires a large quantity of training data to prevent overfitting and the consequent loss of skill on independent data (i.e., operational use). Another concern with using ANNs in operational forecasting is the lack of physical interpretability that could directly provide the user with information on forecast variability.

We partition the data into subsets on the basis of cloud regimes to forecast variability and to tune the ANN model more accurately for the peculiarities and consequent forecast challenges of each specific cloud regime. This variability in solar irradiance was shown by Hinkelman and Sengupta (2012) to differ among satellite-data-derived cloud types. Regime-based prediction has been used in several different solar irradiance and solar power applications. Tapakis and Charalambides (2013) provide a review of various methods for both supervised and unsupervised cloud classification. The unsupervised techniques classify on the basis of the pixels of an image. The supervised techniques, which are divided into simple, statistical, and artificial subgroups, classify on the basis of available training datasets and the arithmetic complexity of the technique. A one-step stochastic prediction process of cloud cover or clearness index with transition matrices dependent on the relative sunshine amount is presented in McCandless et al. (2014) and Morf (2014). Zagouras et al. (2013) used a k-means clustering algorithm with a stable initialization method to identify regimes on the basis of step changes of the average daily clear-sky index in the San Diego, California, region. Mellit et al. (2014) used a simple approach that is based on the daily total solar irradiance to identify clear, partly cloudy, and cloudy regimes with separate ANN models developed on each regime and showed that, particularly for the cloudy days, the ANN model that was trained on only those days improved on the ANN model that was trained on all days. McCandless et al. (2016) used a k-means algorithm on surface weather and irradiance observations to identify regimes before applying an ANN. The separation into cloud regimes allows an AI model to identify repeatable patterns in surface solar irradiance, but there is a lack of research into 1) what are the most important inputs for cloud-regime classification, and 2) what are the most important predictors for an AI method to most efficiently make accurate short-range predictions of solar irradiance.

Rather than burden the ANN with the task of both identifying cloud regimes and responding to them correctly, a separate statistical model can be used to identify regimes before fitting the ANN. This approach allows the ANN to focus on the forecast mission for a specific cloud type. This simplification of each ANN’s mission allows it to be implemented with a simpler configuration (fewer neurons and tunable parameters). Thus, better tuning can be achieved for a given amount of training data. However, the accurate classification of cloud regime is necessary for the ANN to focus on each cloud regime’s peculiarities. To do this, we utilize a combination of inputs that are specific to the goal of identifying cloud regimes in a k-means regime-classification method. Because training data are always limited, this new approach offers the potential for improving the skill of ANNs in solar irradiance prediction.

Section 2 describes the datasets and the derived predictors. Section 3 provides an overview of the process, and section 4 explains the clearness-index-persistence baseline prediction method and the AI prediction techniques. We illustrate the various regime-dependent ANNs used in this study in section 5. Section 6 presents the results, and section 7 provides discussion and conclusions. Note that the overall project report by Haupt et al. (2016) contains and discusses in more detail the methods, models, and results of both this paper and McCandless et al. (2016).

2. Data

We wish to determine the optimal set of inputs for the k-means algorithm and predictors for the ANNto create the best configuration for the regime-dependent artificial neural network (RD-ANN) forecasting system. To do so, we use data from three types of sources: irradiance observation systems, surface weather observation networks, and satellite observations. We use two irradiance observation systems located in different regions of the United States to test the prediction system in different climates with different training-data sizes.

We use approximately 1 year of data from the Sacramento Utility District (SMUD) located in the Sacramento Valley of California. As in McCandless et al. (2016), we use data from eight solar power forecast sites that measure irradiance, shown in Fig. 1 as blue triangles. The GHI observations are available for a period of 367 days from 25 January 2014 through 26 January 2015. The temporal resolution of the raw data is 1 min, and averages are computed over 15-min intervals ending at 0, 15, 30, and 45 min after the hour. The 15-min-averaged GHI data are then converted to clearness-index values. The clearness index is the ratio of the GHI observed at the surface to the expected GHI at the top of the atmosphere (TOA), which is computed through a series of geometric calculations for a given location, date, and time. This averaging interval was selected after communication with several utility companies and corresponds to the shortest time range for which a forecast is currently useful for dispatch decision-making in the United States. All instances with missing data or nighttime observations are excluded from the final dataset. This approach is the same as that reported in McCandless et al. (2016).

Fig. 1.
Fig. 1.

Locations of SMUD irradiance observations (blue triangles) and the three nearest METAR surface weather observations (red times signs). Map data: Google; CopyPasteMap.com.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

Brookhaven National Laboratory (BNL), located on Long Island in New York, is our second irradiance measurement system. We use data from one solar power forecast site that measures irradiance, shown in Fig. 2 as a blue triangle. The dataset includes 1 year of data, from 20 May 2014 to 19 May 2015. All instances with missing data or nighttime observations are excluded from the final dataset.

Fig. 2.
Fig. 2.

Location of the BNL irradiance observation site (blue triangle) and the three nearest METAR surface weather observations (red times signs). Map data: Google; CopyPasteMap.com.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

The two locations of irradiance observations, Long Island and Sacramento, have different climates and therefore have different irradiance-variability characteristics. This allows a test of our method’s robustness in predicting irradiance under different weather conditions and a different number of training instances. For the BNL site on Long Island, the climate is characterized by more variable cloud cover that is due to higher humidity resulting from its close proximity to the Atlantic Ocean. Monthly average precipitation for Long Island is relatively consistent, in contrast to Sacramento, which typically experiences rainy winters and dry summers.

Surface weather observations are not available at the irradiance observation sites; therefore the three nearest aviation routine weather report (METAR) sites are used to characterize the local weather. The three closest METAR sites are shown as red times signs in Fig. 1 for the SMUD region and in Fig. 2 for the BNL region. These observations are recorded at the top of every hour. As in McCandless et al. (2016), we use six weather variables: cloud cover, dewpoint temperature, precipitation occurrence in the last hour (1 = precipitation occurred and 0 = precipitation did not occur), precipitation amount, temperature, and wind speed.

The satellite data used as forecast predictors came from NOAA’s GOES-East Geostationary Operational Environmental Satellite. The GOES data were chosen for this work because they are acquired operationally every 15 min with a nominal nadir footprint of just 1 km in the shortwave band and 4 km in the infrared channels. GOES-East was selected over GOES-West for two reasons. First, the position of GOES-East at 75°W provides views of both the California and New York forecast sites at angles that are less oblique than at the 135°W location of GOES-West. Second, processed GOES imager data were only available from the GOES-East acquisitions at 15 and 45 min after the hour and from GOES-West acquisitions at 0 and 30 min after the hour. Allowing for a latency time of 15 min, the 45-min acquisition provides the most up-to-date information for the reinitialization of our forecast system at the top of every hour.

The GOES-East data consist of both directly measured and retrieved variables provided in level-2 output from the Pathfinders Atmospheres–Extended (PATMOS-x) retrieval suite (Heidinger et al. 2014) run operationally by NOAA’s Cooperative Institute for Meteorological Satellite Studies and, for this project, by the Cooperative Institute for Research in the Atmosphere. The directly measured variables are radiance values at wavelength bands centered on 650 nm (visible) and 3.75 μm (infrared) and brightness temperatures at 3.75 μm and 11.0 μm (water vapor window). The retrieved variables applied in this study were cloud-top temperature, cloud fraction, cloud optical depth, hydrometeor effective radius, and cloud type, where the cloud types included the categories fog, liquid water clouds, supercooled water clouds, opaque ice clouds, cirrus clouds, vertically overlapping clouds, and overshooting clouds. Instantaneous solar zenith angles were also taken from the satellite data files. The data are provided as ungridded 4-km footprints. The values supplied to the forecast system are averages over the nine footprints closest to each of the forecast locations at 45 min after each hour.

In addition to the observed irradiance and weather predictors, it is often useful to derive additional variables to emphasize important physical processes. As stated in our previous work (McCandless et al. 2016), we derive inputs specific to the k-means classification system as well as predictors that are specific to the ANN prediction system. In particular, we leverage our meteorological knowledge to provide the k-means algorithm with inputs to identify cloud regimes and to provide the ANNs with predictors for predicting solar irradiance. From that previous work (McCandless et al. 2016), variables used as inputs for the k-means algorithm include the cloud cover squared averaged over the three nearest METAR sites and the standard deviation of the cloud cover for the three nearest METAR sites so as to weight higher regional cloud-cover values and to quantify the regional solar irradiance variability. Another predictor, dewpoint depression, defined as the difference between the temperature and the dewpoint temperature, quantifies the atmosphere’s nearness to saturation at the surface. This derived predictor, and the cloud-cover-squared predictor, are averaged over the three METAR sites on the basis of a sensitivity study that showed no improvement by including the predictor for each site independently (McCandless et al. 2016). For the SMUD region, we derive two additional predictors by computing the spatial average and standard deviation of the clearness index at the previous 15-min interval over the remaining sites. These predictors are computed so as to quantify the regional distribution of cloud cover as measured by the eight solar irradiance observation sites. These predictors are not computed at BNL because there is not a regional network of sites such as that operated by SMUD and thus there are no additional data from which to compute these predictors.

3. Process overview

Our prediction process requires sensitivity studies to determine the best configuration before applying the final prediction models to an independent validation dataset. We predict the clearness index Kt, which is defined in Eq. (1), because it quantifies the amount of irradiance attenuated from the maximum possible irradiance expected at the TOA and thus removes much of the zenith-angle dependence so that the ANN can focus on cloud effects:
e1
Therefore, we create separate training datasets, sensitivity-test datasets, and validation datasets—labeled “train,” “sensitivity test,” and “validation” in Table 1—by randomly selecting instances. The validation datasets are used as an independent verification of our final models. For the sensitivity studies, we explore the sensitivity of the mean absolute error (MAE) to the dataset used for tuning the model. Table 1 lists the number of instances in each of the datasets for both SMUD and BNL. The SMUD datasets have substantially more instances because there are eight prediction sites within the SMUD region and there were fewer missing observations than for the BNL datasets.
Table 1.

List of instances in each training, testing, and validation dataset for both BNL and SMUD. The data were randomly split into the different partitions.

Table 1.

We wish to develop a “best practices” method for regime-dependent statistical forecasting of clearness index. To that end, we test multiple regime-dependent prediction methods for solar irradiance prediction given various inputs and predictors; therefore, we use a dataflow diagram (Fig. 3) to describe the relationships among the various techniques. The top tier represents the data sources: irradiance observations, METAR surface weather observations, derived predictors, and satellite data, which are split into two boxes for the measured variables and the derived variables. The GOES-East satellite-derived variables are included only in the instances that are not defined as clear. The second tier illustrates this separation into the satellite-determined clear instances and the satellite-determined cloudy instances. This is the first regime separation in our prediction process. The third tier of Fig. 3 describes the prediction methods for all other instances. From left to right, the first prediction technique is the ANN applied on the clear dataset. The next prediction technique is an ANN without additional regime classification. The final three are the RD-ANNs. The first RD-ANN method is based on regimes that are determined explicitly from the “cloud type” variable in the GOES-East data and is labeled RD-ANN-GCT, where GCT stands for GOES cloud type. The next RD-ANN technique is the k-means cloud-regime classification that includes inputs from all of our data sources; we name this technique RD-ANN-GKtCC because it includes GOES-East data, Kt observations, and cloud cover from the METAR observations. The final prediction technique does not include the satellite measurements and is a direct comparison with previous work (McCandless et al. 2016). This method is named RD-ANN-KtCC because it includes the Kt observations and the cloud cover. The fourth-tier elements are the final predictions from all of the prediction techniques, including the baseline technique of clearness-index persistence. The validation-dataset results from these predictions are shown in section 6.

Fig. 3.
Fig. 3.

Overall process design for our regime-dependent prediction technique and the comparison techniques.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

4. Prediction methods

a. Baseline: Clearness-index persistence

We use clearness-index persistence as our baseline prediction technique for comparison. Clearness-index persistence is commonly referred to as “smart persistence.” It inherently corrects for changes in solar elevation with time and can be easily converted back to GHI for operations if the clearness-index forecast is multiplied by the TOA GHI (McCandless et al. 2016).

This baseline technique uses the last available observation of the clearness index (i.e., 15-min average) as the prediction for subsequent times. For locations with either generally clear conditions or steady cloud cover, this technique is difficult to improve on. In contrast, when the sky condition is characterized by mixed or variable clouds, the clearness-index-persistence technique performs poorly.

b. Artificial neural network

The ANN is our choice for a nonlinear AI prediction technique because an ANN can model any functional relationship (which may have potentially complex relationships between the predictors and predictand), with proper tuning of the number of hidden layers and neurons. ANNs attempt to replicate how the human learning process works, and, when given a sufficiently large set of training data, ANNs can model complex—that is, nonlinear—relationships between the predictors and the predictand (Lippmann 1987). The ANN used here is a feed-forward neural network trained by a backpropagation algorithm (Reed and Marks 1998), which is commonly referred to as a multilayer perceptron (Rosenblatt 1958). As in McCandless et al. (2016), the specific neural-network module used in this study is the “newff” model in the Python programming-language library of the “NeuroLab” simulation-software (https://pythonhosted.org/neurolab/; Rosello et al. 2003) trained with a resilient backpropagation algorithm. The ANN used here has three layers: the input layer consisting of the predictors, the hidden layer consisting of tunable neurons, and the output layer, which computes the final prediction. The actual processing is done by the neurons in the hidden layer, each of which is a linear regression postprocessed by a sigmoid function so that all outputs are on a common finite scale. These neuron outputs are then merged by a final linear-regression neuron to yield the ANN’s forecast. Each predictor of the input layer is connected to all neurons within the hidden layer, but the iterative training results in special weights for each neuron that together address the different aspects of the problem.

Varying the number of neurons in the hidden layer changes the complexity of the model. As more neurons are added, more complex nonlinear relationships between the predictors and the predictand can be modeled. This increase in complexity, however, increases the risk of overfitting the training data and decreasing the performance of the model on the independent data. Moreover, as the number of training epochs (i.e., iterations) is increased, an overly complex ANN may begin to tune to the random noise in the training data as well as to the real relationships. Therefore, both the number of neurons of the hidden layer and the number of training epochs determine the ANN’s fit to the training and independent data. The goal of configuring the ANN is to find the best level of complexity (i.e., the number of hidden-layer neurons) and number of training epochs that model the true relationships in the training data and thus yield the lowest error on independent data. The mean square error (MSE) was the score that was minimized in the training of the algorithm. We held the learning rate (0.01) and weight decay (0.5) constant because sensitivity studies (not shown) found these values to be best.

We have a total of 42 predictors for the SMUD sites, which includes data from SMUD irradiance observation sites, METAR weather observation sites, GOES-East satellite data, and several derived predictors. A list of all predictors for the ANN is provided in Table 2. For the BNL locations, the predictors, “Kt nearby mean” and “Kt nearby variability (std dev)” are not available because, unlike for SMUD, the BNL data come from a single location.

Table 2.

List of predictors for the ANN model. The Kt nearby mean and variability are marked with an asterisk because they are only available for the SMUD sites. QPF is quantitative precipitation forecast.

Table 2.

c. Regime-dependent artificial neural network

The ultimate goal of the ANN is to find the true relationship between the predictors and the predictand; therefore, we partition the dataset into cloud-regime subsets to allow the ANN to find the simpler relationships applicable to each cloud regime rather than having to model both these relationships and regime identification with a single complex network. To improve the deterministic forecast, the regime-identification technique must split regimes with different underlying forecast problems, each with different physical, and thus statistical, relationships between predictors and predictand. Therefore, the regime-classification method must capture differences that are directly related to short-term irradiance forecasting, given the available predictors.

The three methods that we use to classify regimes before applying the ANNs to each subset separately are discussed in detail in section 5. Two regime-identification methods (RD-ANN-KtCC and RD-ANN-GKtCC; named after the input data) use a k-means clustering algorithm. The k-means clustering algorithm is explained in detail in McCandless et al. (2016). For the RD-ANN-KtCC method described in section 5a, the inputs to the k-means clustering algorithm are the past irradiance (converted to Kt) observations and cloud-cover observations from the METAR data. This method is tested to determine the predictive skill of an RD-ANN method using only surface observations. For the RD-ANN-GKtCC method described in section 5b, the inputs to the k-means clustering algorithm are the past irradiance (converted to Kt) observations, cloud-cover observations from the METAR data, and variables from the GOES-East data. This method is tested to determine the predictive skill of an RD-ANN method using both surface observations and satellite data. In contrast, the RD-ANN-GCT method, explained in section 5c, does not use the k-means algorithm to classify regimes but rather uses the derived cloud-type variable in the GOES-East data to separate regimes. This test will determine whether off-the-shelf cloud typing can compete with mission-specific cloud-regime typing in solar forecasting.

5. Regime-dependent ANN configuration

a. RD-ANN-KtCC

The first regime-dependent method tested uses the original configuration of the regime-dependent ANN of McCandless et al. (2016), referred to as RD-ANN-KtCC. This technique does not include any GOES-East data as either inputs to the k-means regime classification or as predictors for the ANN. Sensitivity studies in McCandless et al. (2016) showed that the best inputs to the k-means clustering algorithm are the following: Kt average in the previous 15 min, nearby Kt in the previous 15 min, standard deviation of the Kt in the previous 15 min among the nearby sites, the most recent change in the Kt (Kt for the previous 15 min minus Kt for the 15 min before that), the slope of the Kt in the past hour, the standard deviation of the Kt over the previous hour, and standard deviation of the cloud cover. Because there are seven inputs into the k-means algorithm, there are therefore seven dimensions in the phase space of the k-means distance computation. These seven inputs provide the k-means algorithm with information that captures the meteorological state as based on surface observations. Sensitivity studies indicate that the number of regimes k that produced the lowest error on the sensitivity-test dataset was also seven. For the BNL site, only a single irradiance observation site was available; therefore, the RD-ANN-KtCC method does not include either the nearby Kt in the previous 15 min or the standard deviation of the Kt in the previous 15 min among the nearby sites.

b. RD-ANN-GKtCC

The RD-ANN-GKtCC method uses 16 inputs into the k-means clustering algorithm for the SMUD sites, as shown in Table 3. Again, the multisite inputs are unavailable for BNL; thus, the RD-ANN-GKtCC method does not include either the nearby Kt in the previous 15 min or the standard deviation of the Kt in the previous 15 min among the nearby sites. Because there are 16 inputs into the k-means algorithm, there are 16 dimensions in the phase space of the k-means distance computation. These 16 inputs provide the k-means algorithm with information to capture the meteorological state given both surface irradiance and weather observations as well as satellite-based data, with careful consideration given to avoiding colinearity. The inputs include all inputs used in RD-ANN-KtCC as well as additional variables from the GOES-East observations: cloud fraction, cloud-top height, cloud optical depth, hydrometeor radius, reflectance at 650 nm (i.e., wavelength for visible light), reflectance at 3.75 μm (i.e., wavelength for water vapor), temperature at 11 μm (also wavelength for water vapor), and temperature at 3.75 μm.

Table 3.

List of inputs for the k-means algorithm in the RD-ANN-GKtCC configuration. The Kt nearby mean and variability are marked with an asterisk because they are only available for the SMUD sites.

Table 3.

To match the level of complexity of the ANN with the number of training cases and the complexity of relationships within each regime, we perform multiple sensitivity studies to determine the best number of training epochs and the best number of hidden-layer neurons. We examine the MAE of the RD-ANN-GKtCC method on the sensitivity test cases for each lead time. The MAE is calculated as
e2
where n is the number of instances i in the testing data and obs and pred indicate the observed and predicted values for each instance. We varied the number of training epochs (100, 250, 500, or 1000) and averaged the error over the regimes. The test was conducted separately for each lead time, with the result for 180 min appearing in Fig. 4. The results indicate that the lowest error on the sensitivity test cases, and thus the best number of training epochs for the ANN, is 500. The same result (not shown) was obtained for the other lead times.
Fig. 4.
Fig. 4.

Sensitivity-study results for the optimal number of training epochs of the ANN for the RD-ANN at SMUD sites for the 180-min lead time.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

After the sensitivity study determined the number of training epochs, the next step in configuring the RD-ANN-GKtCC model was to determine the best number of neurons and the best number of regimes for each forecast lead time and forecast location. We performed a sensitivity study with 5, 10, 15, and 20 neurons in the hidden layer and k ranging from 2 to 9 for each forecast lead time. The best combinations (in terms of the lowest MAE on the sensitivity-test datasets) are shown in Table 4. For the SMUD sites, the best k is 2 for the two shorter lead times and 3 for the two longer lead times. For the BNL location, the best k is 2 for all forecast lead times. The best number of neurons varies among the different locations and lead times; the results showed relatively minor differences between different numbers of neurons, however, which indicates that the increase in forecast power nearly balances the increase in overfitting for a range of model complexities around the best configuration.

Table 4.

Best number of regimes K and number of neurons in the hidden layer for all forecast lead times at both SMUD and BNL as determined by the lowest error on the sensitivity-test set.

Table 4.

c. RD-ANN-GCT

The third method of regime-dependent prediction uses the cloud-type variable in the GOES-East data to determine regimes; therefore, this technique is named RD-ANN-GCT. An ANN is trained for each cloud type separately. These cloud types and their respective frequency in the datasets are fog (12.4%), liquid water clouds (13.9%), supercooled water clouds (20.4%), opaque ice clouds (11.0%), cirrus clouds (32.8%), and overlapping clouds (9.5%), in addition to the cases identified as clear because of the absence of derived satellite variables. Because the GOES cloud-type variable inherently separates into different regimes, there is no sensitivity study necessary to determine the optimal number of regimes, but a sensitivity study confirmed that the same number of training epochs and neurons as in the configuration for the RD-ANN-GKtCC should be used.

6. Results

a. SMUD

Once the best configurations are determined, the true test of skill is the comparison of the forecast techniques on the independent test datasets. The data are initially split on the basis of whether there are derived data in the GOES-East observations. Derived data are only available when the measured temperature and reflectance data indicate that clouds are present. If an instance is identified as clear on the basis of the GOES-East data, then an ANN trained on only those cases is used to predict the clearness index. Otherwise, the RD-ANN models and an ANN without regime identification are used to predict the clearness index. Clearness-index persistence is used as our baseline technique in both cases. The results for the GOES-East-defined clear cases are shown in Table 5 for all forecast lead times for the SMUD location. They indicate that the ANN improves upon the clearness-index-persistence method at the 60-, 120-, and 180-min forecast lead times. At the 15-min forecast lead time, however, the error is nearly double that of the clearness-index-persistence forecast, and this result is likely a case of overfitting the training data. At this forecast lead time, the magnitude of the irradiance is relatively consistent unless a cloud advects or develops over the observation site. Because these instances are rare when GOES-East data determine the sky to be clear, the ANN likely overfits those uncommon cases and thus hurts the overall performance of the model. We kept the configuration of the ANN consistent throughout the forecast lead times and across the clear and cloudy data subsets; future work will examine how to adjust the parameters of the ANN so that the model performs well on the test dataset for the clear-data subset.

Table 5.

Comparison of MAE for the clearness-index persistence and the ANN-Clear model for all forecast lead times for the SMUD site.

Table 5.

Next, all of the RD-ANN methods were compared with both the ANN without regime identification (ANN-ALL) and the clearness-index persistence for all cases labeled other than clear by the GOES-East data. These MAE results are plotted in Fig. 5 for all forecast lead times. As expected, the forecast error increases as the forecast lead time increases. The only method that generally performs worse than clearness-index persistence is the RD-ANN-GCT method that uses the GOES-East-derived cloud types as the regime-classification method. At the 15-min lead time, the RD-ANN-KtCC, RD-ANN-GKtCC, ANN-ALL, and clearness-index persistence all show similar errors. At the 60-min and longer lead times, the RD-ANN-KtCC, RD-ANN-GKtCC, and ANN-ALL all show improvement over the clearness-index persistence, as shown by the larger MAE of the clearness-index forecasts. The method that generally performs best is the RD-ANN-GKtCC method, which exploits the GOES-East data in both the k-means clustering and ANN.

Fig. 5.
Fig. 5.

MAE as a function of lead time for all methods of the satellite-determined cloudy instances for the SMUD site. The method that performs best in the majority of the forecast lead times is the RD-ANN-GKtCC method.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

To quantify the improvement in forecast skill with the regime-dependent methods, we compute the percent improvement over our baseline clearness-index-persistence technique. The percent improvement over clearness-index persistence for the forecasts at the SMUD sites is shown in Fig. 6. At the 15-min lead time, all of the methods except RD-ANN-GCT closely mimic clearness-index persistence. At this lead time only the RD-ANN-GKtCC method improves slightly over the clearness-index persistence, by 1%. In contrast, at the 60-, 120-, and 180-min lead times, most of the RD-ANN methods show between 10% and 28% improvement over the clearness-index-persistence method. The RD-ANN-GOES model shows the worst performance except at the 180-min lead time, at which point it begins to improve over the clearness-index persistence. This poor performance is likely due to several factors. One possible reason is that the cloud-type classification separates into six different regimes, which is higher than the number of regimes that our sensitivity tests found best in the RD-ANN-GKtCC method. Another reason is that there likely are cases of misclassification by the GOES-East system. In addition, there are cloud regimes with small data-subset sizes, such as the fog, overlapping, and opaque ice cloud regimes that each have only 9.5%–12.5% of the total data, and therefore the ANN is potentially overfitting on those regimes. The ANN did have substantially lower errors on the training data (not shown), which further indicates that the ANN was overfitting the smaller regime subsets. At the 60-, 120-, and 180-min lead times, the RD-ANN-GKtCC method shows 21.0%, 26.4%, and 27.4% improvement over the clearness-index persistence, respectively. The RD-ANN-GKtCC method is best at all lead times except 120 min, for which time the RD-ANN-KtCC produces a slightly better 26.6% improvement over clearness-index persistence. These results demonstrate that the RD-ANN methods are able to improve substantially over clearness-index persistence at 60-, 120-, and 180-min lead times; the cloud-regime classification makes a considerable impact on the overall performance of the models, however.

Fig. 6.
Fig. 6.

Percent improvement over the clearness-index-persistence forecasts for all methods on the satellite-determined cloudy instances.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

b. BNL

Although the SMUD dataset provides a substantial amount of data for training, sensitivity testing, and independent verification, it is important to analyze how our complex regime-dependent model performs when trained with a smaller dataset. Doing so quantifies the value of obtaining larger, and thus more expensive, training datasets. In addition to redeveloping the same RD-ANN methods using the BNL dataset, we also trained the RD-ANN-GKtCC model on the SMUD dataset and applied it to the BNL dataset (RD-ANN-SMUD) to determine how a general model trained at one site performs at a different site. The MAE for each method on the BNL test data is shown in Fig. 7 for all forecast lead times. These results indicate that the clearness-index-persistence method has lower error than all ANN methods for BNL. The results also indicate that, similar to the results for the SMUD sites, the RD-ANN-GCT model is the worst-performing model. At the 15- and 60-min lead times, the best regime-dependent model is the method trained at SMUD. This highlights the importance of numerous and applicable training data, especially considering that the geostationary satellite data are distorted in different ways for locations in California versus New York, negatively affecting the forecast performance of a model trained at one location and applied to the other. The amount of data available from BNL to train the models at that site is likely too little given the number of predictors and the model complexity. With 40 predictors provided to the ANN, it may be too complex to avoid overfitting given a training dataset of a maximum (if no regime classification is done) of 309 instances. Future work will examine how to properly down-select to the appropriate number of predictors and model complexity so as to capture the true predictive relationships among the predictors in a limited dataset.

Fig. 7.
Fig. 7.

Results for all methods on the satellite-determined cloudy instances for the BNL forecast site. The method that performs best in the majority of the forecast lead times is the clearness-index-persistence method.

Citation: Journal of Applied Meteorology and Climatology 55, 7; 10.1175/JAMC-D-15-0354.1

c. Variability prediction

Although the deterministic forecast skill such as that shown above is of primary interest to utility companies and systems operators, it is also valuable to predict irradiance variability. Variability is important because the utility companies and systems operators need to allocate adequate resources to deal with variations that cannot be deterministically predicted. Here, we compute the irradiance variability as the standard deviation of the clearness index over the following 3 h (i.e., the standard deviation of twelve 15-min-average clearness-index values). We test the variability prediction for SMUD because the deterministic prediction results showed that the dataset has ample data for training and testing. As our baseline forecast, we compute the standard deviation of the 15-min-average clearness-index values over the prior hour. In essence, this clearness-index-persistence forecast predicts that variability will remain the same for the following 3 h. We test this baseline technique versus an ANN trained without regime identification and a new version of the RD-ANN-GKtCC method that uses the same inputs and predictors as the deterministic irradiance forecast method but is now trained to predict the 3-h clearness-index variability. The results for the variability prediction, shown in Table 6, reveal that the lowest MAE comes from the RD-ANN-GKtCC prediction method. The RD-ANN-GKtCC method shows 18.6% improvement over the clearness-index-persistence forecast of the expected irradiance variability. The clearness-index persistence, ANN-ALL, and RD-ANN-GKtCC methods all show substantially lower errors than the average value of the clearness-index variability, which was computed to be 0.092 for the test dataset.

Table 6.

List of the MAEs for predicting the clearness-index variability with the clearness-index persistence, ANN-ALL, and RD-ANN-GKtCC methods trained to predict the variability for the SMUD sites.

Table 6.

7. Discussion and conclusions

In this study, we utilize surface weather observations, solar irradiance observations, and GOES-East satellite data as inputs and predictors into regime-dependent techniques that first identify cloud regimes before fitting an ANN to predict clearness index. This approach allows each ANN to focus on the forecast mission for a specific cloud type. We find that a k-means cluster-based ANN method (RD-ANN-GKtCC) improves upon the forecasting performance of not only the baseline clearness-index persistence but also a global ANN for lead times of 60, 120, and 180 min. At the 15-min forecast lead time, all RD-ANN methods mimicked the clearness-index persistence, with the RD-ANN-GKtCC method managing to show a 1% gain in forecasting performance over clearness-index persistence.

The RD-ANN methods not only showed improved performance for deterministic clearness-index predictions but also for predicting clearness-index variability. A new version of the RD-ANN-GKtCC model trained to predict the variability of the clearness index over the next 3 h showed substantial forecast-error reduction relative to using either a variability-persistence method or a global ANN. Thus, the RD-ANN-GKtCC model is able to improve the prediction of the deterministic irradiance and its variability for short-range lead times, given sufficient training data.

Although the RD-ANN methods show substantial performance gain for the Sacramento (SMUD) sites that had a large training dataset, when the RD-ANN methods were trained to predict for a site on Long Island (BNL), with its small training dataset, the complex models did not perform well on the independent test dataset. To improve the forecasting methods at a site with a small amount of training data, the RD-ANN methods likely will need to be tuned with a smaller predictor set and a simpler configuration to allow the method to model the true predictive relationships among the predictors. The true predictive relationships in a small dataset are likely limited; therefore, future work can examine automatic ways of configuring RD-ANN systems depending on the amount of training data and number of available predictors. A simpler configuration with fewer predictors could potentially avoid the problem of overfitting datasets that are too small (e.g., BNL) for using nonlinear models.

Of the three RD-ANN methods tested, that which used a regime classification that is based on the cloud-type-derived variable in the GOES-East data performed the worst. This outcome was likely due to a combination of multiple problems and so yields several ideas for future work. First, the GOES-East algorithm derives cloud types on the basis of only the satellite-measured values. Our ANN models are also provided predictors from surface weather observations and surface irradiance observations. Therefore, the RD-ANN methods that use a combination of the available data are more likely to capture clusters that represent real predictive relationships that the ANN is able to model. The forecast-error dependence on available predictors could be examined in future work by testing the forecasting skill of the RD-ANNs if the regime-classification versions are the same but the ANNs are only provided the GOES-East-measured variables. Last, some of the cloud types are uncommon in the data, resulting in small training-data subsets and thus giving the ANN model a higher likelihood of overfitting the available training data.

Although the complex RD-ANN models have shown impressive forecast improvements for the SMUD sites, the clearness-index-persistence method still performs best when the dataset is too small to effectively train an ANN. Future work will look to quantify the amount of data required for the RD-ANN-GKtCC method to outperform a persistence-based approach. Future work will also examine whether using the GOES-West data could potentially provide additional predictors that would improve the forecasts from the RD-ANN models.

Acknowledgments

This material is based upon work supported by the U.S. Department of Energy under Sunshot Award DE-EE0006016 and by the National Center for Atmospheric Research, which is sponsored by the National Science Foundation. Funding was also provided to LMH by NREL Subcontract AGG-2-22256-01. We gratefully acknowledge all of the collaborators on the SunCast project for insightful discussions and ideas, including Seth Linden, Sheldon Drobot, Jared Lee, Julia Pearson, David John Gagne, and Tara Jensen. This project would not have been possible without the data from the Sacramento Municipal Utility District and Brookhaven National Laboratory and the help from Thomas Brummet at NCAR for the data quality control and processing. Thanks are given to Matt Rogers and Steve Miller for GOES-East data acquisition, discussion, and quality control and for intellectual conversations that led to innovative applications of satellite data in this study.

REFERENCES

  • Almonacid, F., , P. J. Pérez-Higueras, , E. F. Fernández, , and L. Hontoria, 2014: A methodology based on dynamic artificial neural network for short-term forecasting of the power output of a PV generator. Energy Convers. Manage., 85, 389398, doi:10.1016/j.enconman.2014.05.090.

    • Search Google Scholar
    • Export Citation
  • Beyer, H. G., , C. Costanzo, , and D. Heinemann, 1996: Modifications of the Heliosat procedure for irradiance estimates from satellite data. Sol. Energy, 56, 207212, doi:10.1016/0038-092X(95)00092-6.

    • Search Google Scholar
    • Export Citation
  • Bhardwaj, S., , V. Sharma, , S. Srivastava, , O. S. Sastry, , B. Bandyopadhyay, , S. S. Chandel, , and J. R. P. Gupta, 2013: Estimation of solar radiation using a combination of hidden Markov model and generalized fuzzy model. Sol. Energy, 93, 4354, doi:10.1016/j.solener.2013.03.020.

    • Search Google Scholar
    • Export Citation
  • Bilionis, I., , E. M. Constantinescu, , and M. Anitescu, 2014: Data-driven model for solar irradiation based on satellite observations. Sol. Energy, 110, 2238, doi:10.1016/j.solener.2014.09.009.

    • Search Google Scholar
    • Export Citation
  • Bouzerdoum, M., , A. Mellit, , and A. Massi Pavan, 2013: A hybrid model (SARIMA–SVM) for short-term power forecasting of a small-scale grid-connected photovoltaic plant. Sol. Energy, 98, 226235, doi:10.1016/j.solener.2013.10.002.

    • Search Google Scholar
    • Export Citation
  • Chow, C. W., , N. Urquhart, , M. Lave, , A. Dominquez, , J. Kleissl, , J. Shields, , and B. Washom, 2011: Intra-hour forecasting with a total sky imager at the UC San Diego solar energy testbed. Sol. Energy, 85, 28812893, doi:10.1016/j.solener.2011.08.025.

    • Search Google Scholar
    • Export Citation
  • Chu, Y., , H. T. C. Pedro, , and C. F. M. Coimbra, 2013: Hybrid intra-hour DNI forecasts with sky image processing enhanced by stochastic learning. Sol. Energy, 98, 592603, doi:10.1016/j.solener.2013.10.020.

    • Search Google Scholar
    • Export Citation
  • Chu, Y., , H. T. C. Pedro, , M. Li, , and C. F. M. Coimbra, 2015: Real-time forecasting of solar irradiance ramps with smart image processing. Sol. Energy, 114, 91104, doi:10.1016/j.solener.2015.01.024.

    • Search Google Scholar
    • Export Citation
  • Cros, S., , O. Liandrat, , N. Sébastien, , and N. Schmutz, 2014: Extracting cloud motion vectors from satellite images for solar power forecasting. Proc. IEEE Int. Geoscience and Remote Sensing Symp., Quebec City, QC, Canada, IEEE, 4123–4126, doi:10.1109/IGARSS.2014.6947394.

  • Diagne, M., , M. David, , P. Lauret, , J. Boland, , and N. Schmutz, 2013: Review of solar irradiance forecasting methods and a proposition for small-scale insular grids. Renewable Sustainable Energy Rev., 27, 6576, doi:10.1016/j.rser.2013.06.042.

    • Search Google Scholar
    • Export Citation
  • Fernandez, E., , F. Almonacid, , N. Sarmah, , P. Rodrigo, , T. K. Mallick, , and P. Perez-Higueras, 2014: A model based on artificial neuronal network for the prediction of the maximum power of a low concentration photovoltaic module for building integration. Sol. Energy, 100, 148158, doi:10.1016/j.solener.2013.11.036.

    • Search Google Scholar
    • Export Citation
  • Fu, C.-L., , and H.-Y. Cheng, 2013: Predicting solar irradiance with all-sky image features via regression. Sol. Energy, 97, 537550, doi:10.1016/j.solener.2013.09.016.

    • Search Google Scholar
    • Export Citation
  • Hammer, A., , D. Heinemann, , E. Lorenz, , and B. Lückehe, 1999: Short-term forecasting of solar radiation: A statistical approach using satellite data. Sol. Energy, 67, 139150, doi:10.1016/S0038-092X(00)00038-4.

    • Search Google Scholar
    • Export Citation
  • Haupt, S. E., and Coauthors, 2016: The SunCast Solar Power Forecasting System: The results of the public-private-academic partnership to advance solar power forecasting. National Center for Atmospheric Research Tech. Note NCAR/TN-526+STR, 307 pp., doi:10.5065/D6N58JR2.

  • Heidinger, A. K., , M. J. Foster, , A. Walther, , and X. Zhao, 2014: The Pathfinder Atmospheres–Extended AVHRR climate dataset. Bull. Amer. Meteor. Soc., 95, 909922 doi:10.1175/BAMS-D-12-00246.1.

    • Search Google Scholar
    • Export Citation
  • Hinkelman, L. M., , and M. Sengupta, 2012: Relating solar resource variability to cloud type. Abstracts, Fall Meeting, San Francisco, CA, Amer. Geophys. Union, abstract A31F-0086. [Available online at http://adsabs.harvard.edu/abs/2012AGUFM.A31F0086H.]

  • Huang, H., , J. Xu, , Z. Peng, , S. Yoo, , D. Yu, , D. Huang, , and H. Qin, 2013: Cloud motion estimation for short term solar irradiance prediction. Proc. IEEE Int. Conf. on Smart Grid Communications, Vancouver, BC, Canada, IEEE, 696–701, doi:10.1109/SmartGridComm.2013.6688040.

  • Inman, R. H., , H. T. C. Pedro, , and C. F. M. Coimbra, 2013: Solar forecasting methods for renewable energy integration. Prog. Energy Combust. Sci., 39, 535576, doi:10.1016/j.pecs.2013.06.002.

    • Search Google Scholar
    • Export Citation
  • IRENA and CEM, 2014: The socio-economic benefits of solar and wind energy. International Renewable Energy Agency Clean Energy Ministerial Rep., 107 pp. [Available online at http://www.irena.org/DocumentDownloads/Publications/Socioeconomic_benefits_solar_wind.pdf.]

  • Kleissl, J., 2013: Solar Energy Forecast and Resource Assessment. Academic Press, 462 pp.

  • Lippmann, R. P., 1987: An introduction to computing with neural nets. IEEE Acoust. Speech Signal Process. Mag., 4, 422, doi:10.1109/MASSP.1987.1165576.

    • Search Google Scholar
    • Export Citation
  • Lopez, G., , F. J. Batlles, , and J. Tovar-Pescador, 2005: Selection of input parameters to model direct solar irradiance by using artificial neural networks. Energy, 30, 16751684, doi:10.1016/j.energy.2004.04.035.

    • Search Google Scholar
    • Export Citation
  • Lorenz, E., , A. Hammer, , and D. Heinemann, 2004: Short term forecasting of solar radiation based on satellite data. Proc. ISES Europe Solar Congress, EuroSun 2004, Freiburg, Germany, International Solar Energy Society, 841–848. [Available online at https://www.uni-oldenburg.de/fileadmin/user_upload/physik/ag/ehf/enmet/publications/solar/conference/2004/eurosun/short_term_forecasting_of_solar_radiation_based_on_satellite_data.pdf.]

  • Lorenz, E., , J. Kuhnert, , and D. Heinemann, 2012: Overview on irradiance and photovoltaic power prediction. Weather Matters for Energy, A. Troccoli, L. Dubus, and S. E. Haupt, Eds., Springer, 429–454.

  • Marquez, R., , and C. F. M. Coimbra, 2013: Intra-hour DNI forecasting based on cloud tracking image analysis. Sol. Energy, 91, 327336, doi:10.1016/j.solener.2012.09.018.

    • Search Google Scholar
    • Export Citation
  • Marquez, R., , H. T. C. Pedro, , and C. F. M. Coimbra, 2013: Hybrid solar forecasting method uses satellite imaging and ground telemetry as inputs to ANNs. Sol. Energy, 92, 176188, doi:10.1016/j.solener.2013.02.023.

    • Search Google Scholar
    • Export Citation
  • Martín, L., , L. F. Zarzalejo, , J. Polo, , A. Navarro, , R. Marchante, , and M. Cony, 2010: Prediction of global solar irradiance based on time series analysis: Application to solar thermal power plants energy production planning. Sol. Energy, 84, 17721781, doi:10.1016/j.solener.2010.07.002.

    • Search Google Scholar
    • Export Citation
  • McCandless, T. C., , S. E. Haupt, , and G. S. Young, 2014: Short term solar radiation forecasts using weather regime-dependent artificial intelligence techniques. 12th Conf. on Artificial Intelligence and its Applications to the Environmental Sciences, Atlanta, GA, Amer. Meteor. Soc., J3.5. [Available online at https://ams.confex.com/ams/94Annual/webprogram/Manuscript/Paper240879/STATCast_AMS_TM_Final.pdf.]

  • McCandless, T. C., , S. E. Haupt, , and G. S. Young, 2016: A regime-dependent artificial neural network technique for short-range solar irradiance forecasting. Renewable Energy, 89, 351359, doi:10.1016/j.renene.2015.12.030.

    • Search Google Scholar
    • Export Citation
  • Mellit, A., 2008: Artificial intelligence technique for modeling and forecasting of solar radiation data: A review. Int. J. Artif. Intell. Soft Comput., 1, 5276, doi:10.1504/IJAISC.2008.021264.

    • Search Google Scholar
    • Export Citation
  • Mellit, A., , A. Massi Pavan, , and V. Lughi, 2014: Short-term forecasting of power production in a large-scale photovoltaic plant. Sol. Energy, 105, 401413, doi:10.1016/j.solener.2014.03.018.

    • Search Google Scholar
    • Export Citation
  • Miller, S. D., and Coauthors, 2014: Estimating three-dimensional cloud structure via statistically blended satellite observations. J. Appl. Meteor. Climatol., 53, 437455, doi:10.1175/JAMC-D-13-070.1.

    • Search Google Scholar
    • Export Citation
  • Morf, H., 2014: Sunshine and cloud cover prediction based on Markov processes. Sol. Energy, 110, 615626, doi:10.1016/j.solener.2014.09.044.

    • Search Google Scholar
    • Export Citation
  • Notton, G., , C. Paoli, , S. Vasileva, , M.-L. Nivet, , J.-L. Canaletti, , and C. Cristofari, 2012: Estimation of hourly global solar irradiation on tilted planes from horizontal one using artificial neural networks. Energy, 39, 166179, doi:10.1016/j.energy.2012.01.038.

    • Search Google Scholar
    • Export Citation
  • Pedro, H. T. C., , and C. F. M. Coimbra, 2012: Assessment of forecasting techniques for solar power prediction with no exogenous inputs. Sol. Energy, 86, 20172028, doi:10.1016/j.solener.2012.04.004.

    • Search Google Scholar
    • Export Citation
  • Quesada-Ruiz, S., , Y. Chu, , J. Tovar-Pescador, , H. T. C. Pedro, , and C. F. M. Coimbra, 2014: Cloud-tracking methodology for intra-hour DNI forecasting. Sol. Energy, 102, 267275, doi:10.1016/j.solener.2014.01.030.

    • Search Google Scholar
    • Export Citation
  • Reed, D. R., , and R. J. Marks, 1998: Neural Smithing: Supervised Learning in Feedforward Artificial Neural Networks. MIT Press, 346 pp.

  • Rosello, E. G., , J. B. G. Perez-Schofield, , J. G. Dacosta, , and M. Perez-Cota, 2003: Neuro-Lab: A highly reusable software-based environment to teach artificial neural networks. Comput. Appl. Eng. Educ., 11, 93102, doi:10.1002/cae.10042.

    • Search Google Scholar
    • Export Citation
  • Rosenblatt, F., 1958: The perceptron: A probabilistic model for information storage and organization in the brain. Psychol. Rev., 65, 386408, doi:10.1037/h0042519.

    • Search Google Scholar
    • Export Citation
  • Tapakis, R., , and A. G. Charalambides, 2013: Equipment and methodologies for cloud detection and classification: A review. Sol. Energy, 95, 392430, doi:10.1016/j.solener.2012.11.015.

    • Search Google Scholar
    • Export Citation
  • Voyant, C., , M. Muselli, , C. Paoli, , and M.-L. Nivet, 2012: Numerical weather prediction (NWP) and hybrid ARMA/ANN to predict global radiation. Energy, 39, 341355, doi:10.1016/j.energy.2012.01.006.

    • Search Google Scholar
    • Export Citation
  • Voyant, C., , M. Muselli, , C. Paoli, , and M.-L. Nivet, 2013: Hybrid methodology for hourly global radiation forecasting in Mediterranean area. Renewable Energy, 53, 111, doi:10.1016/j.renene.2012.10.049.

    • Search Google Scholar
    • Export Citation
  • Zagouras, A., , A. Kazantzidis, , E. Nikitidou, , and A. A. Argiriou, 2013: Determination of measuring sites for solar irradiance, based on cluster analysis of satellite-derived cloud estimations. Sol. Energy, 97, 111, doi:10.1016/j.solener.2013.08.005.

    • Search Google Scholar
    • Export Citation
Save