There is a large amount of documented weather information all over the world, including Asia (e.g., old diaries, log books, etc.). The ultimate goal of this study is to reconstruct historical weather by deriving total cloud cover (TCC) from historically documented weather records and to assimilate them using a general circulation model and a data assimilation scheme. Two experiments are performed using the Global Spectral Model and an ensemble Kalman filter: 1) a reanalysis data experiment and 2) a ground observation data experiment, for 18 synthesized observation stations in Japan according to the Historical Weather Data Base. By assuming that weather records can be converted into three TCC categories, the synthetic observation data of daily TCC are created from reanalysis data, with a large observation error of 30%, and by classifying ground observation data into the three categories. Compared with the simulation without assimilation of any observation, the results of the reanalysis data experiment show improvements, not only in TCC but also in other meteorological variables (e.g., humidity, precipitation, precipitable water, wind, and pressure). For specific humidity at 2 m above the surface, the monthly averaged root-mean-square error is reduced by 18%–22% downstream of the assimilated region. The results of the ground observation data experiment are not as successful as a result of additional error sources, indicating the bias needs to be handled correctly. By showing improvements with the loosely classified cloud information, the feasibility of the developed model to be applied for historical weather reconstruction is confirmed.
Climate change remains one of the greatest concerns in the earth sciences. Large uncertainties limit the effectiveness of long-term climate forecasts. To improve our understanding of climate mechanisms, observation data over long periods are required. There are several data rescue organizations such as Atmospheric Circulation Reconstructions over the Earth (ACRE), the International Surface Pressure Databank (ISPD), the International Comprehensive Ocean–Atmosphere Dataset (ICOADS), the International Environmental Data Rescue Organization (IEDRO), and Mediterranean Climate Data Rescue (MEDARE) working on recovering and storing historical climate data. Reliable rescued datasets have been used to create historical reanalysis datasets [e.g., National Centers for Environmental Prediction–Department of Energy (NCEP–DOE) Reanalysis-2, the European Centre of Medium-Range Weather Forecasts’s (ECMWF) twentieth century reanalysis (ERA-20C) and 10-member ensemble of coupled climate reanalyses of the twentieth century (CERA-20C), the Climate Forecast System Reanalysis (CFSR), and the Twentieth Century Reanalysis (20CR)] by assimilating the data into climate models (Kanamitsu et al. 2002; Saha et al. 2010; Compo et al. 2011; Poli et al. 2016). However, with the exception of Europe and parts of North America, there are only a few instrumental meteorological datasets available before the nineteenth century. For example, in Japan, instrumental meteorological data have only been available since the 1870s, when the official modern meteorological network in Japan began. In Europe, some instruments were introduced in the seventeenth century, with temperature observations dating back to 1659 in central England (Manley 1974).
Many old diaries record historical daily weather. These records have the potential to reconstruct weather when and where instrumental data are not available. According to the Historical Weather Data Base (HWDB; Yoshimura 2007) in Japan, diaries have been recovered that include historical weather information since the 1660s, with more than 18 daily observation sites existing in Japan during the period covering the 1750s to the 1870s. Even though there are some historic instrumental records in Europe and North America, information from old diaries could still be a significant reference of historical weather in these locations as well (Adamson 2015), because instrumental weather records before the nineteenth century are limited (Brázdil et al. 2005).
Numerous studies have attempted to reconstruct climate from non-instrumental data. One approach is to use natural proxy data such as tree rings, ice cores, coral shells, marine sediments, and speleothems (Guilderson et al. 1994; Siddall et al. 2003; Moberg et al. 2005), which mostly aim to reconstruct climate over long periods. Historical documents have also been used for climate reconstruction. Compared to natural proxies, historical documents have higher temporal resolution and accuracy (up to subdaily) (Pfister et al. 1999). However, the quantitative interpretation of those documents is challenging. Thus, the reconstructions based on historical documents are usually done by index methods, which convert qualitative data into intensity indices for temperature or precipitation. The indices are created by counting specific meteorological phenomena (Maejima 1966; Mikami 1996; Zhang et al. 2013) and categorizing descriptive data (Baron 1982; Gimmi et al. 2007; Dobrovolný et al. 2010), and are used to reconstruct temperature or precipitation at an annual or seasonal resolution (Brázdil et al. 2005). Moreover, there are studies that incorporate multiple proxy datasets and historical documents (Bradley and Jonest 1993; Mann et al. 1998, 2008; PAGES 2k Consortium 2013). However, since these methods are based on linear, statistical relationships, the methods are insufficient in terms of the limited number of variables that can be reconstructed (e.g., only temperature), and their poor spatial coverage outside of regions where proxy data or documents are available.
Lately, data assimilation has been used to reconstruct historical climate. This approach combines dynamical information from climate models and information from proxy data. To assimilate natural proxy data, which is integrated information over long periods of time, data assimilation of time-averaged observations is used such as ensemble Kalman filter (EnKF) approaches (Huntley and Hakim 2010; Bhend et al. 2012; Steiger et al. 2014; Dee et al. 2016), and particle filter approaches (Annan and Hargreaves 2012; Mathiot et al. 2013; Renssen et al. 2015). These methods assimilate annual or seasonal temperatures reconstructed from natural proxy data for hundreds or thousands of years. Therefore, such approaches are also capable of assimilating temperature reconstructed from historical documents, but they cannot directly assimilate the information in historical documents on a daily scale. Hence, both of the methods, statistical or assimilation of time-averaged observations, lose some of the high temporal resolution information contained in the historical documents.
Here, we propose an approach to directly assimilate information from non-instrumental historical documents on a daily scale by converting descriptive weather records into total cloud cover (TCC) or cloudiness. The aim of this study is to investigate the feasibility of reconstructing historical weather using TCC obtained from weather records in old diaries, a general circulation model (GCM), and a data assimilation scheme. To achieve this, the Global Spectral Model (GSM; Kanamitsu et al. 2002), which is a physical climate model, is used to assimilate TCC with a local ensemble transform Kalman filter (LETKF) technique (Hunt et al. 2007). This physics-based approach enables the reconstruction of a variety of atmospheric variables incorporated within the GSM, even outside of regions where data are available. It should be noted that the target temporal resolution of the reconstructed fields in this study is similar to modern reanalysis and outperforms conventional non-instrumental databased reconstruction studies that typically have temporal resolutions of a year or a season. Thus, we use the term historical weather reconstruction in this study to make a distinction from “historical climate reconstruction,” which contributes to the creation of a new perspective of historical climate research. For instance, this approach enables us to improve reanalysis products by assimilating information from historical documents worldwide for, in particular, the early nineteenth century, when limited observations are available.
From the perspective of data assimilation, this study evaluates how atmospheric dynamics could be improved by assimilating TCC information, which follows from the work of Yoshimura et al. (2014), who assimilated water vapor isotope information into the GSM. Yoshimura et al. (2014) created synthetic observation stations for water vapor isotopes and evaluated whether isotopic constraints would have positive impacts on other atmospheric variables. Because cloud cover is related to a number of atmospheric variables in physical processes (e.g., cloud condensation caused by the updraft of a warm humid air mass due to wind convergence in a low pressure area, droplet formation in clouds resulting in rainfall, and solar radiation reflection from clouds to create cooling as well as greenhouse effects), we expect to generate better atmospheric fields with the TCC assimilation.
a. Local ensemble transform Kalman filter
The uncertainties in old diaries should be properly treated when using the data in numerical models. Data assimilation is a widely used mathematical approach incorporating information from both observation data and dynamical models to obtain optimal physical fields. The technique corrects fields forecasted by dynamical models by taking observational and model errors into account. The uncertainty of observation depends on the method used, such as the accuracy of weather descriptions in old diaries in the present study. A Kalman filter is one of the classical methods among the variety of existing data assimilation schemes. However, it is impossible to apply this method to nonlinear forecasting models because it is a linear method, the error covariance of the model should be known beforehand, and the computational cost is significantly high because of the calculation of the time evolution of the error covariance (Tippett et al. 2003).
Evensen (1994) introduced a stochastic approach, which is known as the ensemble Kalman filter, to address the above issues using ensembles to represent forecast errors and estimate error covariance. Houtekamer and Mitchell (1998) successfully performed EnKF experiments within a perfect-model context using a low-resolution atmospheric model. However, such stochastic approaches require perturbed observations to update ensemble members, which can bias the estimates of the analysis error covariance (Whitaker and Hamill 2002). Deterministic methods, such as the ensemble square root filter (EnSRF), do not need the perturbed observations when creating the ensembles (Tippett et al. 2003). The ensemble transform Kalman filter (ETKF; Bishop et al. 2001) is one of the deterministic methods, which finds the transformation matrix such that it can compute the error covariance efficiently, but has the limitation that it cannot apply localization techniques (Hamill 2006). Ott et al. (2004) introduced the local ensemble Kalman filter (LEKF), which assimilates the observations locally, grid point by grid point, in low-dimensional subspaces. This allows for the simultaneous computation of different grid points using parallel processing and, in doing so, reduces the computational time. In this study, we used the LETKF (Hunt et al. 2007), which applies the ETKF locally as in the LEKF.
The LETKF has options to use “localization” and “covariance inflation” techniques. Localization is used to reduce the errors caused by spurious correlations among distant locations as a result of the limited ensemble size (Hamill et al. 2001). In addition to the inherent localization of the LETKF, which is done by separating into local regions, Hunt et al. (2007) suggested to use the “smoothed localization” method, which gradually decreases the influence of the observations as the distance increases from the observation location. In this study, we use a Gaussian function for smooth localization, as suggested by Miyoshi and Yamane (2007). Additionally, covariance inflation is applied to avoid underestimation of the error variance, which has been known to be a common problem in ensemble filters (Anderson 2009). We employ the adaptive covariance inflation (Miyoshi 2011) to estimate the multiplicative inflation parameters adaptively.
Here, we provide a brief review of data assimilation and specifics of our data assimilation approach for diary-derived TCC. Following the notation in Whitaker and Hamill (2002), the EnKF update equations are
where xb is the m-dimensional background or prior model state vector, xa is the analysis or posterior model state vector, y0 is the p-dimensional vector of observed values (TCC in this study), is the operator that converts the models state to the observation space, is the m × m dimensional background error covariance matrix, is the analysis error covariance, is the m × m dimensional identity matrix, and the overbar denotes an ensemble mean. In addition, denotes the m × p dimensional Kalman gain matrix:
where is the p × p dimensional observational error covariance matrix. While the background and analysis error covariance matrices, and , are defined using the true state xt in the Kalman filter, they are approximated as the ensemble covariance matrices, and , in the EnKF:
where the superscript κ denotes either b (background) or a (analysis). The model state vectors xb, xa consist of TCC, wind, air temperature, specific humidity, surface air pressure, and precipitation in this study. The prior model state xb is calculated by applying the GSM to the analysis model state xa at the previous time step. Although can be in general a complex nonlinear operator, only extracts TCC from xb at observation locations. The most important procedures in this study are how to generate the TCC observation y0 based on the diaries and the determination of observational error , which are explained in section 2c.
b. Global Spectral Model (GSM)
Atmospheric numerical models are popular tools for studying the global dynamics of weather. The GSM represents variables on a sphere by a truncated series of spherical harmonics. The spectral representation of variables has several advantages such as the exact calculation of space derivatives, no pole problems, and no instability arising from aliasing in an ideal situation (Orszag 1970). In this study, we use the GSM originally developed as the operational seasonal forecast system by NCEP (Kanamitsu et al. 2002). This model has been known as a seasonal forecast model (SFM) and has been used as the operational forecast model at NCEP until 2004, and as the basis of several model development projects (e.g., Saha et al. 2006; Yoshimura and Kanamitsu 2013).
The model’s physics packages include the longwave radiation scheme of Chou and Suarez (1994), the shortwave radiation scheme of Chou (1992), relaxed Arakawa–Schubert convective parameterization (Moorthi and Suarez 1992), nonlocal vertical diffusion (Hong and Pan 1998), mountain drag (Alpert et al. 1988), shallow convection (Tiedtke 1983), and the Noah land surface scheme (Ek et al. 2003). The typical well-tested spatial resolutions of horizontally T62 (about 200 km) and vertically 28 sigma layers are adopted in this study. The model integration time varies by 20–30 min depending on seasons.
c. Assimilation system
The assimilation system developed in this study incorporates the GSM and the LETKF. The GSM forecasts atmospheric fields with a certain number of ensemble members for each model time step, as shown in Fig. 1. Then, the LETKF corrects the fields based on TCC data from either the reanalysis dataset or ground observation data. The TCC values, its accuracy (i.e., observational error), and initial conditions are required. In this study, the ensemble averages of analyzed fields at each output time interval are used for all of the visualizations and analyses.
The question here is how to derive the TCC information from daily weather records. There are many descriptions of daily weather in Japanese diaries, such as clear, fine, slightly cloudy, cloudy, rain, heavy rain, thunderstorm, and snow. Although each diary differs slightly in its descriptions, the terms clear, partially cloudy, and cloudy are typically used in Japanese diaries. Please note that the direct translation of the second term is generally “fair weather,” but we use “partially cloudy” here to be more appropriate for cloud cover descriptions. The Japan Meteorological Agency (JMA) defines these three categories in terms of the fraction of cloud cover observed in the sky: clear is less than 10% cloud cover, partially cloudy is in between 20% and 80%, and cloudy is more than 90%. Thus, our assumption allows only three classifications of TCC, and the three classifications can be converted into TCC as 10% for clear, 50% for partially cloudy, and 90% for cloudy. The standard deviation of the Gaussian-distributed observational error of TCC is determined as 30% for each of the three categories. It should be noted that if TCC is a uniform random variable between 0% and 100%, the observational error is also uniformly distributed in each category. However, we assume that the observational error of the categorized data is Gaussian distributed in order to apply the LETKF. This limitation may be addressed in future studies with non-Gaussian data assimilation techniques. We apply this assumption to our reanalysis data and ground observation data experiments as explained in the next section. The defining characteristic of data assimilation is that both uncertainties in data and a model are quantitatively considered to generate a “best fit” analysis. One aim of our study is to investigate the potential of such data, with rather high uncertainty, to constrain the atmospheric dynamics in the model.
d. Experimental design and datasets
Two types of experiments, using reanalysis data and ground observation data, are conducted in this study as preliminary steps for reconstructing historical weather based on the HWDB (Yoshimura 2007). The database includes entries from a large number of old diaries for all areas of Japan, starting from the 1660s, and is available online (https://tk2-202-10627.vs.sakura.ne.jp/). In this study, 18 observation stations are assumed, as shown in Fig. 2, which corresponds to the diaries available during the period from the 1750s to the 1870s. Both experiments are performed for one month from 1 January 2006, with a 6-h output time interval and 20 ensemble members. The number of ensemble members is determined based on the work of Miyoshi and Yamane (2007), who investigated a sufficient ensemble size for the LETKF in global climate studies. To establish the initial conditions, the GSM is simulated beforehand from 1 January 2005 to 1 January 2007, without any atmospheric nudging [only sea surface temperature (SST) and sea ice data from NCEP–DOE Reanalysis-2 were given]. Then, each of the 20 ensemble members is initialized from a different day between 1 and 20 January 2006, basically following the method of Yoshimura et al. (2014).
The purpose of the reanalysis data experiment is to examine the effectiveness of the TCC assimilation with limited error sources. The data used here are produced by applying a spectral nudging technique to NCEP–DOE Reanalysis-2 data. In this experiment, the reanalysis data are considered as the “truth”; synthetic TCC observations are generated once a day by adding random noise to the “true” values. The random noise is sampled from the normal distribution within a given observational error (i.e., standard deviation of 30%). Thus, in this experiment, the simulation results with TCC assimilation (hereafter Assim) and the results of simulations without the assimilation (CTL) are compared to the reanalysis or truth observations to evaluate the improvement in reproducing weather with poor cloud data; observer biases are neglected.
The ground observation data experiment is more similar to the practical reconstruction from old diaries. TCC data used in this experiment are obtained from 18 JMA stations (Fig. 2) and have been monitored visually by distinct observers. We use daily averaged TCC results that have been observed either 3 hourly or 6 hourly at each station to add artificial irregularity in observation time. Because TCC from JMA is recorded on a 0 to 10 scale, synthetic observation data are made by classifying the data into three categories: 0–1.9 were 10% TCC, 2.0–8.0 were 50%, and 8.1–10 were 90%. Then, the categorized TCC data are assimilated at 1400 Japan standard time (JST), assuming that observations are made around noon. As with the reanalysis data experiment, the results of Assim and CTL are compared to the truth (original JMA data before categorization in this case). The assimilated observation data in this experiment contain not only the observer’s bias but also errors caused by an irregular observation time at each station.
It should be noted that even though the experimental settings are similar to the observing system simulation experiment (OSSE) method used in Yoshimura et al. (2014), the biggest difference is that we use the truth taken from reanalysis data or ground observations. Therefore, there likely will be a systematic difference (bias) between the truth and the GSM. It is also an aim of this study to check the applicability of the data assimilation, with such an inevitable bias.
a. Reanalysis data experiment
First, the assimilation system is evaluated at a single point. Figure 3 shows comparisons of the truth, CTL, and Assim for the temporal variations of six atmospheric variables in a grid including Tokyo. TCC is completely different from the truth without the assimilation, but is greatly improved by the assimilation (Fig. 3a). The results of the CTL reveal little variability in surface air pressure, specific humidity, and wind. In contrast, the variabilities increase in Assim and become closer to the truth or display similar trends, even though the absolute values fluctuate significantly and are sometimes even more distant from the truth than those of CTL (Figs. 3b,d,e,f). While CTL underestimates and Assim overestimates precipitation, Assim is able to capture the events on 14 and 21 January (Fig. 3c).
The estimation of TCC is improved for all of Japan, as shown in Fig. 4. Figures 4a and 4c show the distribution of the 1-month-averaged 6-h root-mean-square error (RMSE) and the correlation coefficient R between the ensemble mean of Assim and the truth, respectively. The reduction in RMSE (Fig. 4b) and the increase in R (Fig. 4d) with respect to CTL are displayed. The estimation is most improved in the central part of Japan, but is also improved in the surrounding ocean. However, there are small variations in the accuracy of the TCC estimation at the northernmost (Nemuro) and southernmost (Kagoshima) stations, indicating that the spatial coverage of observation data is important. The TCC assimilation also affects regions where there are no stations, which is discussed later at the global scale.
A similar analysis for 2-m specific humidity is shown in Fig. 5. The specific humidity estimation is improved over the middle part of Japan and the southeast ocean (Figs. 5b,d). Although Fig. 5b shows that the estimations in other regions (i.e., the northern part of Japan and the westernmost region of the largest island of Japan) are slightly worse than CTL, R is positive for all of Japan (Fig. 5c); the RMSE is smaller in the northern part of Japan (Fig. 5a). The regions with improvements in specific humidity do not correspond to the regions with improvements in TCC. Specifically, the regions with improvements in specific humidity are located downstream of the stations, because wind blows in a northwesterly direction in Japan during winter as a result of the high pressure systems located north of the Eurasian continent. Thus, clouds created by the assimilation will flow toward the southeast and modify humidity in the downstream areas.
Figure 6 shows the results of a similar evaluation for precipitation. The estimation of precipitation is improved in the south and worsened in the north (Figs. 6b,d). The region where precipitation is improved corresponded to the area in which specific humidity is improved. Thus, the improvement in the south may be explained in the same manner as for specific humidity. However, it is difficult to evaluate the assimilation effects on precipitation. Although cloud distribution and rainfall distribution are correlated, cloud type is not specified. Thus, the TCC assimilation has a large effect on precipitation; however, improvements are unstable. Figure 7 shows the results for 2-m air temperature. The spatial patterns of the changes in RMSE (Fig. 7b) and in R (Fig. 7d) are also similar to the results of 2-m specific humidity. This fact indicates that atmospheric dynamics have a larger effect on changes in temperature than the updates from assimilating TCC.
The impacts of the TCC assimilation are evaluated in the Northern Hemisphere. Figure 8 shows the distribution of the reduction in RMSE for four variables by assimilating cloud cover with circles denoting statistical significance according to a two-tailed t test. Here, RMSE calculated from each ensemble is used to conduct the t test. Figure 8 shows grid points that exhibit a local significance level of α = 0.01 (shown in gray circles) and a significance level of αFDR = 0.1 after applying the false discovery rate (FDR) method (Wilks 2016) (shown in black circles). The area where TCC is improved (the RMSE decreases by 2% or more; red colors in Fig. 8a) is 3.4 times larger than the area where it is worsened (the RMSE increases by 2% or more; blue colors in Fig. 8a). In particular, the TCC estimation is improved in southwest Japan. The air pressure distribution is statistically more influenced than any of the other three variables (Fig. 8b). This might be because the pattern of synoptic-scale pressure systems is changed by the assimilation. Though there seem apparent differences in RSME of precipitation in some parts of the Northern Hemisphere, our test (Wilks 2016) did not judge the results to be statistically significant. Specific humidity is statistically improved over Japan and almost no grid points are statistically worsened (Fig. 8d). Improvements in precipitation and specific humidity show similar trends such as in the south of Japan and around Florida and Cuba (Figs. 8c,d). These results are interesting from the perspective that the TCC assimilation in Japan affects the weather all over the world via teleconnection patterns associated with phenomena such as El Niño–Southern Oscillation and the Pacific–North American teleconnection pattern.
b. Ground observation data experiment
The results of the experiment using JMA cloud data are described here. As a result of limited data availability, only grids including JMA stations are evaluated. Figure 9 shows the improvements of the TCC estimation (in terms of RMSE and R) over Japan. Following the previous experiment, the TCC estimation in this experiment is improved significantly with assimilation. This confirms the hypothesis that information of only three categories of weather states can improve the cloud estimation.
The results for air temperature are shown in Fig. 10. Although the accuracy of the temperature estimation in terms of the R value does not show any significant change, as shown in Fig. 10d, it is improved in terms of RMSE over all of Japan, as shown in Fig. 10b. This is because the R value for temperature is strongly governed by the diurnal cycle, and assimilating cloud amounts once a day has a small effect on the diurnal cycle. Thus, it is shown that the cloud constraints once a day have positive impacts on the RMSE of daily temperature.
Figure 11 shows the means of 1-month-averaged RMSE and R for all stations for daily TCC, as well as the 6-h surface air temperature, 6-h surface air pressure, and 6-h precipitation. As observed from the distribution maps over Japan (Figs. 9 and 10), improvements in the TCC can be seen in terms of both RMSE and R (Fig. 11a), while the RMSE of temperature is improved (Fig. 11b). For air pressure, both RMSE and R are increased (Fig. 11c). Since air pressure values estimated by the GSM and observed in JMA stations have significant differences both for Assim and CTL experiments (figure not shown), the impacts of the TCC constraints on air pressure are difficult to assess in this experiment. Precipitation is negatively affected by the TCC assimilation (Fig. 11d). This result also demonstrates the difficulty in characterizing precipitation solely with cloud amount information.
The negative impacts on surface air pressure and precipitation also suggest that the biases in the model and the observations should not be neglected. The ground observation has different biases than the model. The correlation coefficients of daily TCC between the JMA ground observation and the reanalysis vary from almost 0 (totally independent) to almost 1 (totally correspondent) (Fig. 12). Assimilating TCC at weakly correlated sites may not always improve the other relevant variables, such as air temperature, air pressure, or precipitation; it is possible that instead the assimilation causes a serious dynamical distortion, such that the overall performance of the assimilation might worsen. To fix this problem, both model biases and observational biases should be considered further in future studies. The usage of alternative covariance inflation and localization methods and parameters has the potential to reduce model biases. Some solutions for reducing observational biases from diary data are discussed in the next section.
There are several developments that could be made to further advance this study. First, observation datasets can be increased by adding data from diaries written outside of Japan, such as historical documents found in China (Ge et al. 2005) and in Europe (Bell and Ogilvie 1978; Jones et al. 2001; Jones and Mann 2004; Brázdil et al. 2005) by properly interpreting weather descriptions in terms of TCC in each country. As shown in this study, dense observation networks contribute to better estimates, but the effects of the TCC constraints in remote locations are still not clear. Thus, the impacts of using diary data from locations far from target regions should be assessed before adding the data. In addition, the time taken to assimilate clouds can be improved using information in diaries (e.g., sunny in the morning, cloudy at noon), which was not considered in the ground observation experiment where we used daily average TCC observations. It is also possible to assimilate clouds more than once a day, because weather was recorded subdaily in some diaries.
One advantage of this system is that we can add other variables to the assimilation, such as air temperature and precipitation. The challenge of including air temperature from non-instrumental records is the difficulty in knowing the absolute value of air temperature, because the only information available is subjective (i.e., related to an observer’s sensation). For instance, the descriptions such as cold, freezing, chilly, moderate, mild, warm, or hot were used to describe temperature in eighteenth-century diaries in New England (Baron 1982). To overcome this problem, investigators could assimilate only anomalies by assuming how large the deviations from the average have to be for people to start to feel hot or cold. It is also challenging to assimilate precipitation, although there are a variety of ways to describe rainfall events in Japanese. Precipitation amounts may be categorized by the descriptions in diaries as proposed for TCC, but observation errors and observer biases are likely to be more significant than those of TCC. This is partially the case because there is no specific and consistent upper limit for precipitation, and it is highly nonhomogeneous both in time and space. Please note that the index method is a possible solution when assimilating temperature and precipitation, but it is also not straightforward to apply, since an index does not map directly to an absolute value, and the method was originally designed to examine long-term trends. Another option is to assimilate dry days with the assumption of very small observation error. This would produce more accurate lengths of dry spells, which may improve estimates of other atmospheric variables.
Information obtained not only from diaries but also from natural proxy datasets can be assimilated into this system. Renssen et al. (2015) used a particle filter method to reconstruct the Younger Dryas climate by assimilating surface air temperature and SST obtained from proxy data. Moreover, isotope proxies can be directly assimilated into isotope-incorporated GCMs (Yoshimura 2015). However, it should be noted that paleoclimate reconstruction using data assimilation greatly depends on structural errors in the GCMs (Dee et al. 2016), as was noted in the experiment with JMA observation data.
We have found that many atmospheric variables improved with the TCC assimilation in the experiment with reanalysis data, but more progress is required in the ground observation data experiment. Other than mitigating the biases of the GSM, a potential solution is to remove the subjective biases of different observers following the index methods. Statistics for the time series of TCC converted from the three weather states (clear, partially cloudy, and cloudy) can be compared between diaries and between different observers in the same diary, but for different periods.
Given that diaries have existed for long periods, long-term simulations should also be conducted. Although the atmospheric variables did not show great improvements in the 6-h time steps in the experiment with ground observation data, we did not examine the long-term effects. Because cloud cover affects radiative forcing, it would be interesting to investigate whether daily cloud constraints from diaries can reconstruct temperature and other variables over long periods. The system developed in this study is capable of investigating this problem.
5. Summary and conclusions
Weather descriptions in written records can be a powerful tool for reconstructing historical climate during the period when instrument records of meteorological variables were uncommon. A large number of such records have been found in Japan, and previous studies have reconstructed past climates from historical documents by counting specific meteorological phenomena and categorizing descriptive data. However, these studies are only capable of obtaining limited atmospheric information, and are poor at reconstructing in regions where no historical documents exist.
We have proposed an alternative approach that utilizes historical documents by combining a climate model and a data assimilation technique. We have developed a system to assimilate TCC into the GSM by the LETKF, with the assumption that the three unique weather descriptions in typical Japanese diaries can be converted into TCC measurements. Two experiments using non-diary data were performed in this study to evaluate the impacts on the atmospheric fields by the TCC constraints. The first experiment used NCEP–DOE reanalysis data, which are assumed to have few errors, while the second experiment used JMA ground observation data, which includes human-sourced errors. In both cases, 18 observation stations in Japan were assumed based on the number of diaries available during the period from the 1750s to the 1870s.
The reanalysis data experiment produced positive results overall. Assimilating TCC with built-in additional error improved estimations of not only TCC but also other surface atmospheric variables such as air pressure, specific humidity, precipitation, and zonal wind speed. We showed that a dense observation network has a significant impact on TCC, with humidity and precipitation positively affected, especially downstream of the observation sites. However, the ground observation data experiment yielded insufficient results for the system to be practically applied. The results were compared to JMA data, but only TCC and air temperature showed improvement. Problems with reconstructing air pressure and precipitation remain.
Several improvements are necessary for this system to be applied to actual historical weather reconstruction efforts. Some potential solutions are to attempt to correct the biases in the model and observations, add information from diaries written outside of Japan, assimilate information at different times for each diary, and add diary information for other variables, such as air temperature and precipitation. It is also possible to assimilate natural proxy data that were not obtained from historical documents, which has been attempted in several studies. By applying these treatments, this system could be a promising tool for reconstructing historical weather from old diaries. Furthermore, this system shows great potential for climate reconstruction over long periods, as long as historical documents are available.
This article includes studies conducted under the Research Institute for Humanity and Nature (RIHN) project and the Program for Risk Information on Climate Change (SOUSEI) and Arctic Challenge for Sustainability (ArCS) of the Ministry of Education, Culture, Sports, Science and Technology in Japan (MEXT); the Core Research for Evolutional Science and Technology (CREST) program of the Japan Science and Technology Agency (JST); and the Japan Society for the Promotion of Science (JSPS) Grants 26289160 and 15K13566. PN was supported by a MEXT scholarship during the study.
Current affiliation: Department of Civil and Environmental Engineering, University of California, Davis, Davis, California.