Quantifying Daytime Heating Biases in Marine Air Temperature Observations from Ships

: Marine air temperatures recorded on ships during the daytime are known to be biased warm on average due to energy storage by the superstructure of the vessels. This makes unadjusted daytime observations unsuitable for many applications including for the monitoring of long-term temperature change over the oceans. In this paper a physics-based approach is used to estimate this heating bias in ship observations from ICOADS. Under this approach, empirically determined coef ﬁ cients represent the energy transfer terms of a heat budget model that quanti ﬁ es the heating bias and is applied as a function of cloud cover and the relative wind speed over individual ships. The coef ﬁ cients for each ship are derived from the anomalous diurnal heating relative to nighttime air temperature. Model coef ﬁ cients, cloud cover, and relative wind speed are then used to estimate the heating bias ship by ship and generate nighttime-equivalent time series. A variety of methodological approaches were tested. Application of this method enables the inclusion of some daytime observations in climate records based on marine air temperatures, allowing an earlier start date and giving an increase in spatial coverage compared to existing records that exclude daytime observations. SIGNIFICANCE STATEMENT: Currently, the longest available record of air temperature over the oceans starts in 1880. We present an approach that enables observations of air temperatures over the oceans to be used in the creation of long-term climate records that are presently excluded. We do this by estimating the biases inherent in daytime temperature reports from ships, and adjust for these biases by implementing a numerical heat-budget model. The adjustment can be applied to the variety of ship types present in observational archives. The resulting adjusted temperatures can be used to create a more spatially complete record over the oceans, that extends further back in time, potentially into the late eighteenth century.


Background and motivation
Marine air temperature (MAT) observations from ships form a long-term climate record used to construct gridded data products as either the principal data source (Berry andKent 2009, 2011;Kent et al. 2013;Cornes et al. 2020;Junod and Christy 2020) or for bias adjustment of sea surface temperature (SST) products (Huang et al. 2017;Kennedy et al. 2019).These gridded products only use MAT observed during nighttime (NMAT) to exclude data affected by solar heating of the instrument and local ship environment during daytime (DMAT).Using only NMAT approximately halves the number of available observations and limits the temporal extent of any MAT-based dataset as early observations were often only recorded during the daytime (Fig. 1a).For example, two recently published NMAT datasets begin in 1880 (CLASSnmat; Cornes et al. 2020) and 1900 (UAHNMAT; Junod and Christy 2020).Extending the MAT record further back in time requires bias adjustment of DMAT, and if this adjustment can be determined accurately the sampling and coverage of MAT will be improved throughout the record.
Global mean surface temperature (GMST) anomaly datasets, combining observations over land, ice, and ocean, have used SST in lieu of MAT for their ocean component (Lenssen et al. 2019;Morice et al. 2021;Huang et al. 2020), including in the sixth Intergovernmental Panel on Climate Change Assessment Report (Gulev et al. 2021).GMST is used instead of global surface air temperature (GSAT) for three main reasons: there are more (all-hours) SST observations than NMAT; quantification of SST measurement bias and uncertainties is more mature than for MAT (Kennedy et al. 2019); and the belief that SST anomalies are more reliable than MAT at large spatial scales (Kent and Kennedy 2021).It was also asserted that large-scale anomalies of SST and MAT display similar variability and trends (Huang et al. 2017), although this is increasingly being questioned (e.g., Cowtan et al. 2015;Richardson et al. 2016;Rubino et al. 2020).Here we demonstrate a method to estimate the daytime heating biases in MAT observations on a ship-by-ship basis that can be applied throughout the observed record.The ultimate goal is to use these adjusted data to create a GSAT record based on air temperature over land, ice, and ocean.This will facilitate comparison of the observed surface temperature record with the output of climate models (Jones 2020), which most straightforwardly provide estimates of GSAT rather than GMST.

Methods
a.The Berry et al. (2004) model Berry et al. (2004, hereafter BKT) developed a model to quantify heating-related biases in MAT, accounting for the energy accumulation and release by the superstructure of ships.The BKT model was developed and tested using temperature values recorded on board the Ocean Weather Ship Cumulus during 1988 and later used to examine exposure-related bias on 17 ships extracted from the VOSClim database (Berry and Kent 2005).In the construction of the NOC Surface Flux and Marine Meteorological Dataset (Berry andKent 2009, 2011) the BKT model was used to adjust the MAT observations obtained from the International Comprehensive Ocean-Atmosphere Dataset (ICOADS; Freeman et al. 2017) for the period 1973-2014.However, in order to simplify the calculations in that analysis, a fixed annual set of coefficients was applied across all ships.Here we develop coefficients ship by ship to give an adjustment for heating bias that reflects the characteristics of a particular ship.
We define several measures of temperature in Eqs. ( 1)-( 3), which are illustrated in Fig. 2: T air is the true air temperature; T ship is the measured air temperature; T nt is the background FIG. 1. Sampling characteristics of MAT observations from ship reports in ICOADS (Freeman et al. 2017).(a) The 1784-2020 percentage of MAT observations recorded annually during daytime (DMAT, black line, left-hand axis), the red dashed line indicates 50% daytime observations.The solid blue line (right-hand axis) is the annual total number of MAT observations, and the dotted blue line is the number of MAT observations associated with a ship track of 12 or more reports, with diurnal sampling, and including only observations with associated cloud and relative wind speed (V) observations.Free text comments indicate the annual average number of MAT observations for select periods (or the total amount for 1784-1853).(b) Stacked plot of the percent of MAT observations with a corresponding cloud and/or V value.The red area indicates reports with both cloud and V; the blue area indicates reports with neither.Reports with either cloud or V, but not both, are indicated in green and yellow, respectively.The dotted line indicates MAT with V but without cloud when the green color overrides.When yellow is visible, lack of cloud information is the major constraint on applying the heating bias model introduced in section 2a; when green is visible, lack of V is the constraint.(c) Stacked plot of the percentage of MAT observations with an associated present weather code (WW; green) and with WW code indicating precipitation (red).The dashed line shows the percentage of extant WW indicating precipitation.nighttime air temperature (see section 2b); DT err is the change in measured temperature due to ship heating; DT diur is an estimate of DT err ; DT BKT is the estimate of the temperature difference from the BKT model; and T adj is the measured air temperature adjusted using the BKT model: T adj 5 T ship 2 DT BKT : (3) The BKT model relates T air and T ship (both measured in Kelvin) to the heat exchange, Eq. ( 4): In this equation m is the mass (kg) and c the specific heat capacity (J kg 21 K 21 ) of the sensor environment (that part of the ship that affects the measurement), t is time (s), Q SW is the shortwave irradiance [Eq.( 5)], and Q Conv and Q Cond are the rates of heat transfer between the ship and the atmosphere through convection and conduction [all in W m 22 , Eq. ( 6)]: Here, a s is the solar absorptivity of the sensor environment, A s is the surface area normal to the direction of the incoming direct solar radiation (m 2 ), R top is the solar radiation at the top of the atmosphere (we use 1368 W m 22 ), u is the solar elevation, a i and b i are cloud-cover-dependent coefficients (index i indicates categories of total cloud cover quantities by oktas; from Dobson and Smith 1988), h m and h o are the convective and conductive heat transfer coefficients (W m 22 K 21 ), and A c is the surface area of the sensor environment (m 2 ).Following Berry and Kent (2005) we exclude the small thermal energy transfer component (Q LW ), shown by BKT to account for a maximum ;3% of the estimated heating bias.Assuming d(T air )/dt ' 0, Eq. ( 4) becomes Substituting the coefficients given in Table 1 into Eq.( 7) gives where V is relative wind speed (m s 21 ) and the empirical coefficients x 1,3,4,5 represent terms of the energy budget model (Table 2).We have redefined the coefficients x 3 and x 5 to incorporate and exclude x 2 (used in the original BKT definition), so cooling depends on (x 3 V x 4 1 x 5 ) and heating on x 1 .Expansion of the sinu terms in Eq. ( 8) and further substitutions (Table 2) gives where f is hour angle in radians.The solution to Eq. ( 9), gives the value of the heating error at any time during daylight hours (for a full description of the solution, see BKT): where a is 2p/12.The integrating factor k int can be determined assuming the sensor environment is in equilibrium at sunrise (dDT diur /dt ; 0) such that Parameter Substitution where f sr is the hour angle at the time of sunrise.At night the heating error is DT err;night 5 (T err;ss ) exp(2h 1 t ss ), where t ss is the time elapsed since sunset.Using Eq. (10) (daytime) or Eq. ( 12) (nighttime) DT BKT at any location and time can be calculated using the coefficients x 1,3,4,5 along with cloud cover and V.
b. Estimation of the temperature error due to ship heating Both the true diurnal variation of T air and the heating error are poorly known.BKT estimated the heating error (DT err ) in two ways: as the MAT anomaly from the local midnight to sunrise mean and as the T ship 2 SST difference.The former is likely to overestimate DT err as it incorporates the true diurnal cycle of T air , while the latter is likely to be an underestimate.
Given the difficulty of making an adjustment that accounts for the real diurnal cycle of T air , a pragmatic approach was taken to estimate DT diur [Eq.( 2)], and hence the adjustment [DT BKT , from Eqs. ( 10) and ( 12)] relative to an estimated background nighttime temperature (T nt ).First, an estimate of the expected diurnal SST anomaly associated with every value of T ship was calculated as a function of cloud cover and wind speed following Morak-Bozzo et al. (2016) and subtracted from T ship .Nighttime values were then calculated using the normal definition of 1 h after sunset to 1 h after sunrise (Bottomley et al. 1990).These nighttime averages were assigned to the time of each sunrise and were linearly interpolated over the 24-h period for each ship (T nt ).This approach allows the construction of climate records from a combination of adjusted all-hours MAT with unadjusted NMAT.

c. Solar parameterization
The BKT model uses cloud-cover-dependent coefficients to estimate solar radiation based on location, date, and time.BKT used coefficients from the Dobson and Smith (1988) okta model, which were derived from a limited geographical region.Using the same okta model as Dobson and Smith (1988), we generate a set of updated coefficients.To do this, we used data from the Surface Solar Radiation dataset-Heliosat version 2.1 (Pfeifroth et al. 2019), which covers most of the Atlantic (658S-658N, 658W-658E).We collocate the 30-min sampling interval of satellite instantaneous incoming solar radiation values with ICOADS cloud observations for the period 1983-2017 and use this information to generate updated okta model coefficients (https://git.noc.ac.uk/glosat_tc/okta_model). The resulting coefficients produce a less peaked solar cycle than the original Dobson and Smith coefficients and reduce the overall RMSE of estimated to satellite incoming solar radiation by ;10% for data not included in the fit.The a i and b i terms [Eq.( 5)] become a i,lat and b i,lat as specific coefficients are available for 108 latitudinal bands.Presently the BKT model implementation requires the solar parameterization to be in the same form as the okta model, precluding the use of, for example, the parameterization of Aleksandrova et al. (2007).
Other than adjusting nine oktas to eight when the ICOADS present weather code indicates precipitation (Aleksandrova et al. 2018), we do not make any adjustments to the ICOADS cloud record.Considering long-term temporal trends, biases likely remain due to heterogeneous recording practices and conversions across the diversity of ICOADS source data.For example, cloud observations pre-1949 (when cloud recording changed from tenths to oktas) may be biased low due to being double adjusted if the original observation was in oktas (Gulev and Aleksandrova 2020).

d. Optimization
The optimization selects values for the x coefficients that minimize the difference between DT diur and DT BKT using Eqs.( 10) and ( 12), and using several different cost functions.Coefficients are derived for selected individual ships, but could also be applied across a group of ships thought to have similar DT diur characteristics.
The solution uses the L-BFGS-B (Byrd et al. 1995) solver in R (an option in the optim function; R Core Team 2019) with lower and upper coefficient limits from Table 1.We minimize six different cost functions to evaluate the BKT model solutions.Each cost function tests different aspects of the goodness of fit and the spread across the cost functions is wider giving more realistic estimates of fit uncertainty: 1) The residual root-mean-square error (RMSE), and RMSE V 5 are designed to greater weight the importance of minimizing the (DT diur 2 DT BKT ) residual through the day and across values of V. 5) RMSE DW 5 (1 2 l)RMSE 1 l(|DW 2 2|), where DW is the Durbin-Watson statistic and l is a scaling factor that we set to 0.3.RMSE DW , is used to down-weight solutions where the residual displays autocorrelation.6) RMSE KS 5 (1 2 l)RMSE 1 l(KS), where KS is the Kolmogorov-Smirnov statistic.This cost function gives greater weight to solutions where the cumulative sums of daytime values of DT diur and DT BKT are small.
An ensemble of these cost functions is used to test different aspects of the structure of the residual (DT diur 2 DT BKT ) to ensure a reasonable fit throughout the day and across all cloud-cover and relative wind speed combinations.Avoiding unphysical starting coefficient combinations improves efficiency and helps to avoid local minima so we use a pool of ;350 precalculated sets of starting coefficients to initialize the fit.For each ship, we randomly select 10 sets of starting coefficients and 5 subsets of 70% of available days.This gives an ensemble of 300 sets of coefficients (10 starting values, 5 data subsets, and 6 cost functions), and any convergence failures are rerun until there are 50 sets of coefficients per cost function.Unless otherwise stated, hereafter the DT BKT value is the ensemble mean taken from 60 realizations of the DT BKT using the 10 best-fit time series from each of the 6 cost functions.

a. Fitting to individual ships
To illustrate the application of the BKT model, we show results from 16 ships covering different time periods, sampling frequencies, and original input sources (Table 3).The data for these ships were obtained from the ICOADS (Freeman et al. 2017) archive: release 3.0.0 up to 2014 and release 3.0.1 thereafter.Quality checking has been applied to the data prior to model fitting (appendix).The reports from these 16 ships contain all of the variables required to fit the adjustment model, and all have reported data over at least 150 days.Collectively, these ships provide a global sample of data between 608S and 608N, with 64% of observations in the tropics (308N-308S), 26% in the Northern Hemisphere, and 10% in the Southern Hemisphere.Longitudinally, there are 31% of observations in the Atlantic Ocean, 29% in the Pacific Ocean, 18% in the Indian Ocean, 17% in the South China Sea and adjoining gulfs/seas, with 3% in the Mediterranean Sea and remainder (2%) of observations from minor ocean basins.
Figure 2 shows the diurnal adjustment for the ship Raphael during October 1884.Figure 2a shows T ship , T nt , and T adj .Figure 2b shows the estimates of DT diur and DT BKT .
Figure 3 shows the mean DT diur , DT BKT , and residuals (DT diur 2 DT BKT ) using the best-fit set of coefficients for each cost function (i.e., six lines) for the ship Mary (Figs. 3a-d), split across local hour of the day, cloud cover, 2 m s 21 intervals of V, and 108 latitude bins.Following BKT we use a target accuracy of 60.28C.Figure 3 shows that across the input parameters of the BKT model (time/position, cloud cover, and V), the heating bias is removed, with bin-mean residuals that are generally within 60.28C, and the bin-mean local-hour average residuals are always within the 60.28C target.However, for this ship the BKT model appears to underadjust for clear skies (0 okta) and a V of 22-24 m s 21 , although these bins are poorly sampled (14 observations for 0 oktas and 45 and 16 observations for the 22 and 24 m s 21 bins, respectively).
Figures 4a-c display the DT diur , DT BKT , and residuals (DT diur 2 DT BKT ) across all 16 ships as a function of the number of hours since sunrise.DT diur can be ,08C, as MAT values close to sunrise will be cooler than the nighttime mean MAT.DT BKT is always above 08C, and this is reflected in the negative residuals for hours 0-1 and $18.Aside from the 28-34, 46, and 50 m s 21 wind speed bins and 608N latitude bin, the bin-mean residuals (Figs.4c-f) are all within 60.28C.The pattern of a relative DT diur 2 DT BKT underadjustment for hours 3-5 and 9-14 (Fig. 4c) appears consistent regardless of whether a single cost function is used or a different sample of ships is selected (not shown).Possible causes are inaccurate estimates of solar radiation [Eq.( 5)] or systematic errors in our estimate of DT diur .
The mean overall DT diur 2 DT BKT residual for each individual ship is always within 60.28C, with 11 out of 16 ships within 60.058C.The largest residual (0.148C) is found for the U.S. Navy 12388 ship.The WWII period is one of the more difficult periods to apply the BKT model correction, due to the limited number of observations from which determine T nt , as well as the occurrence of and a warm bias in nighttime observations over 1942-46 (Cornes et al. 2020).To determine the relative improvement of MAT data after applying the BKT model adjustment, DT diur and DT diur 2 DT BKT should be compared.First, it is clear that the spread of DT diur values (Fig. 4a) is greater than the spread for both DT BKT (Fig. 4b) and DT diur 2 DT BKT (Fig. 4c), as expected.The RMSE reduction (DT diur cf.DT diur 2 DT BKT ) ranges from 15% (U.S. Navy 12388, 1.538-1.358C)to 53% (Kajtum, 2.458-1.138C),with a mean of 28% across all 16 ships.The RMSE reduction significantly correlates (r 5 0.92) with the magnitude of DT diur .3 are included and the DT BKT is taken as the ensemble mean across 60 realizations of the DT BKT (the 10 best-fit realizations from each of the 6 cost functions described in section 2d).The horizontal solid red and dark-red dashed lines indicate zero and 60.28C limits, respectively.The box widths correspond to the square root of the sample size in each bin.
It is not expected that DT BKT will exactly match DT diur .Residuals will include the effects of any model misfit, errors in V, cloud cover, or the parameterization of solar radiation and other nonsystematic differences such as weather effects.The magnitude and variability of the residuals, and the percentage changes, will depend on the relative sizes of the adjustment required and these other factors.
Figure 5 illustrates values of DT BKT under fixed environmental conditions and for selected latitudes for each ship, using the 60 ensemble member BKT model coefficients for each ship.Under these conditions, DT BKT in terms of amplitude and timing is similar for some ships (e.g., the pairing of the USS Merrimac and Kanagawa Maru), and different for others.To adjust the Kajtum using the coefficients generated for the Chosen Maru would leave the Kajtum still retaining a large MAT diurnal cycle, whereas the inverse operation would generate a physically unrealistic diurnal cycle for the Chosen Maru.Uncertainties across the ships are largest around the peak heating hours, and the uncertainty range across different ships will relate to the magnitude of the DT diur and the environmental conditions, which will depend on the region in which the ship was operating.Figure 5 illustrates the importance of obtaining a BKT model solution for individual ships, but also suggests that coefficients can be estimated for groups of similar ships (see sections 3c and 3e).
If a ship contains observations where it was not possible to determine T nt , but there are sufficient T nt observations for that ship to fit the BKT model, then every observation with a corresponding cloud and V can be adjusted since sets of BKT model coefficients can be determined.

b. Estimating missing cloud and V values
Depending on the observation source, MAT will not always be accompanied by cloud and wind observations.Figure 1 shows the proportion of potentially adjustable MAT observations using the BKT model has been decreasing since a sustained peak in the 1980s, likely due to increasing contributions from automatic weather stations in ICOADS in the modern era (Freeman et al. 2017).
As a means to examine the impact of infilling data on the BKT model adjustment (explored in section 3d), we generate the empirical histogram of clouds on a 18 spatial grid at monthly resolution (using ICOADS data from 1961 to 1990).We can then sample cloud cover values from this climatological histogram to generate ensembles of cloud cover estimates for MAT with missing cloud cover, which will vary across a 18 grid and month.Similarly for V, we sample wind speed (ws) values from the Rayleigh distribution, with the scale parameter set as ws/ p √ , and direction sampled from the climatological distribution of the 16-point compass direction.To generate V, we further add a random directional component from the uniform distribution (622.498) to the coarse-resolution wind direction and then use the observed ship speed and value to recalculate a sampled V. c. Bulk application of the BKT model using "stock" coefficient combinations The optimization of model coefficients is computationally intensive and impractical for application to every ship in ICOADS.To avoid this the optimization was applied to over 10 000 ships in ICOADS during the period 1854-2020, generating a collection of "stock" coefficients (without using infilled cloud and V).Many of these coefficient combinations were similar, so we reduced the number of stock coefficients.We first reduced the number of the coefficients by removing duplicate values across the four x coefficients when rounding to two significant figures.We then calculated the hourly BKT adjustment value for a selection of spatial locations, environmental conditions, and days, and removed coefficient combinations resulting in the same hourly rounded (0.18C) values of the DT BKT throughout the day.This results in a stock coefficient dataset of 2500 coefficients, suitable for adjustment of data from widely differing ships.The 2500 different possible DT BKT values can be calculated and the coefficients selected using the same set of cost functions used for optimization.DT BKT values can be determined following the same approach in section 2d allowing efficient adjustment of large datasets.

d. BKT adjustment and uncertainty using "stock" coefficients and climatological infilling
The impact of infilling missing cloud and V values (section 3b) and fitting the model using a pool of stock coefficients in lieu of running the optimization (section 3c) is shown in Fig. 6. Figure 6a shows that the mean uncertainty value (defined as one standard deviation of the 60-member DT diur 2 DT BKT ensemble spread) is at a minimum when using raw observation data and fitting via optimization, with largest uncertainty values during the peak heating hours of 6-12 h after sunrise.The uncertainty increases slightly when using raw observation data and the stock coefficients (black line with crosses), and further increases when infilling V and cloud cover (green and magenta lines).The greatest increase in uncertainty comes from replacing observation data with climatological infilling of both V and cloud.Using either optimized (section 2b) or stock coefficients (section 3c) when infilling both variables makes little difference (both blue lines).The greater increase in uncertainty when infilling cloud only (magenta line) as opposed to V only (green line) is logical in the context of the BKT model [Eq.( 10)] as the okta value scales the incoming solar radiation, and that sets the initial magnitude of DT BKT .This pattern typically holds true when assessing the uncertainty change against bins of cloud cover, V, and latitude, though some bin values  e)-(h) the bin mean of the 60-member standard deviation of the DT diur 2 DT BKT residual.Six different approaches to defining the DT BKT were used: the "normal" approach using raw observational data (black line with circles), using infilled cloud (magenta line), V (green line), both cloud and V (blue line with circles) alongside fitting the DT BKT using "stock" coefficient combinations for raw observational data (black line with crosses) and infilled cloud and V (blue line with crosses).
differ.Climatological infilling of both parameters typically doubles the uncertainty compared to using raw data and optimizing (Fig. 6).The relatively minor increase in uncertainty when using stock coefficients and raw data gives us confidence in the en masse application of the BKT model using this approach.

e. Application to pre-1854 ships
If it is not possible to generate any T nt values for a ship, DT diur cannot be estimated [Eq. (2)] and the BKT model cannot be fit using the methodology we outline in section 2d.A DT BKT value can be determined using stock coefficients, but the chosen sets of coefficients have to be determined via analog, based an expected DT diur profile for the particular ship.
Before ca.1854 there are increasingly fewer NMAT observations (Fig. 1a), and ships that do sample the diurnal cycle are unlikely to have cloud and V observations available (Fig. 1b).
Stock coefficients enable an estimation of the DT BKT to be made without a DT diur target.As the accuracy of the adjustment cannot be directly assessed this way, the quality of the adjustment will be based on the efficacy of the grouping of the ships.For example, we can expect that most ships pre-1854 are wooden-hulled sailing ships, with nonstandard observing practices (i.e., differences between countries and individual ships).It would therefore be desirable to obtain a set of coefficients that have been successfully applied to analog ships during the following years.Good analogs are difficult to derive during the early 1850s as the global shipping fleet transitioned from sail to steam.However, the 1853 Brussels Marine Conference (Maury 1853) led to an increased standardization of measuring practices and hence the metadata in ICOADS/ digitized records could enable selection of ships in the decades following 1854 that are the most appropriate counterparts to the assumptions listed above.
We trial two attempts to adjust the pre-1854 data.First, data from all ships between 1854 and 1870 are used as the pre-1854 analog period.Each stock coefficient combination is given an identification number, and the number of times each set of stock coefficient occurs within the 60-member ensemble for a ship in the 1854-70 period occurs is counted.From this, a break in the most frequently occurring coefficients was identified at n 5 83, which generated an ensemble of 83 different realizations from the stock coefficients that are then applied to the pre-1854 ships.The mean of the DT BKT is determined from these 83 sets of coefficients and the standard deviation of the DT BKT values becomes the uncertainty range.Second, for pre-1854 ships with over 50 DT diur observations, we generate cloud and V values as in section 3b, which enables the BKT model to be fit as in section 3c, resulting in a 59 member ensemble size.
Figure 7a presents a density plot of the ratio between the heating and cooling terms of the BKT model.This allows a broad approximation of the exposure and heating bias of a ship.The 1854-70 ensemble is characterized by most ships' ratio being below ;0.002, which after Fig. 4 in Berry and Kent (2005), is an appropriate range for good ships with a low heating bias.The pre-1854 ensemble distribution is more uniformly spread, indicative of more ships with larger heating biases.This is reflected in Fig. 7b, using the same fixed environmental conditions as Fig. 5; the DT BKT is shown to be larger (and less certain) for the pre-1854 ensemble.The impact on a MAT time series using either ensemble is shown in Fig. 7c, alongside the DT diur .For the HMS Favorite during December 1831, it is clear that the pre-1854 ensemble (Fig. 7c) captures the evolution of the DT diur more appropriately than the 1854-70 ensemble (Fig. 7d) as the DT diur often falls out of the uncertainty range for the latter.
En masse application of the pre-1854 ensemble of coefficients to the pre-1854 data would result in larger values of DT BKT as opposed to using the 1854-70 ensemble.The purpose of the comparison here is not to identify the better overall choice, but to highlight that it is possible to achieve sensible heating bias adjustments to the early data.Rather than a broadscale adjustment, specific BKT model coefficient groupings could be made for different ICOADS decks or source IDs, and as for newly digitized data as they become available.Utilizing this analog approach to the heating bias adjustment is not limited to pre-1854, and could be used throughout the full ICOADS period.

a. General application of the BKT model
In this paper we have extended the method developed by BKT for the correction of diurnal heating biases in ship-based air temperature measurements.From our estimate of this heating bias, we are able to generate MAT time series for individual ships, T adj (Fig. 2), that substantially reduces the DT err , leaving a mean residual within 60.28C (Fig. 4).Results focus on a sample of 16 ships, but the approach is applicable to all ship-based observations in ICOADS and ultimately will be used in the construction of improved estimates of global surface air temperature trends.
Our DT diur estimate [Eq.( 2)], based on the difference between MAT and the underlying NMAT trend, minus the climatological SST cycle from buoys as defined in Morak-Bozzo et al. (2016), is likely an overestimate of the true heating bias.The heating bias is difficult to disentangle from the true diurnal cycle as both depend on the incoming solar radiation.
Application of the BKT model requires observations to be part of a ship-track time series, either through an extant identifier or after application of a tracking methodology (Carella et al. 2017), to enable DT diur to be calculated.This can be ameliorated by improved tracking methods or ensuring ship identify information is preserved in metadata records as they are stored/digitized.
For ships that lack accompanying cloud and V it is possible to estimate the MAT daytime bias (Fig. 6) by infilling these variables.The uncertainty in DT BKT inflates to account for the infilling.
It is possible to achieve a removal of the daytime heating bias for ships without sampling across the diurnal cycle; this is required for temporal extension of the MAT record further back in time than ca.1854.For example, the English East Indian Company ships (ICOADS Deck 248) mostly report a single daily observation at local noon, which makes determining a nighttime value and DT diur estimate impossible.However, in this paper we have demonstrated that if a sufficient number of analog ships can be identified, which are able to be adjusted, the most commonly occurring BKT model coefficients used in the adjustment of these ships can be used to generate an ensemble of DT BKT for these older ships (Fig. 7), enabling a backward temporal extension of the MAT record.
Here, an outline for choosing analog ships was made, but this can be refined in the future as data from newly recovered sources are digitized, and/or metadata tied to existing observations are utilized.

b. Data issues and quality control
Relatively strict quality control procedures have been applied (appendix) to ensure the analysis uses data that accurately portray the measured diurnal cycle.
The diurnal-cycle-based quality control routine (appendix) identified data from a number of ships in the 1880s that passed the climatology-based QC checks but that had a ;12-h offset.Without removal or adjustment these data would adversely affect NMAT datasets.Furthermore we were able to identify ships suspected of making measurements in cabins, by analysis of the peak hour of DT diur (appendix).Overall, this shows that there is still much to be learned about MAT observations and diurnal-cycle-based assessments are likely to remain a useful tool in improving the long-term records (Cornes et al. 2020;Chan and Huybers 2021).A further unresolved issue is whether some reported ICOADS wind and direction values are the true wind and direction, or relative values uncorrected for ship trajectory (Gulev 1999).

c. Precipitation and weather codes
The presence of precipitation invalidates the energy transfer assumptions of the BKT model.When the recording of the present weather (WW) code is systematically high (.95% during the 1960-70s, green color in Fig. 1c), the percent of WW observations indicating precipitation is ;10% (red color in Fig. 1c).As the WW code is not always recorded with every MAT observation, it may not be possible to identify all observations that may have been affected by precipitation.Further work is required to better identify affected observations and to understand the impact of precipitation on the heating bias.

d. Systematic structure in diurnal residuals
The approach outlined here, across the 16 analyzed ships, reduces the mean hourly local time error in all-hours observations (DT err ' DT diur ) to within 60.28C (DT diur 2 DT BKT ).However, a systematic diurnal structure remains in the residuals of the BKT model adjustment (Fig. 4c).Further reduction in these residuals is likely to require an improved analysis method.Examples of possible improvements might be better estimates of incoming solar radiation, potentially including a diffuse term; reinstatement of the original x 6 thermal transfer term (which would add dewpoint temperature as a data requirement in applying the BKT model); or explicitly estimating, or fitting the true diurnal cycle of MAT.
e.The need for more complete data and metadata The value in the recovery and digitization of MAT data, in terms of the marine contribution to extending the global temperature record, cannot be overstated.While much work has been done in extracting historical observations from available archives, e.g., García-Herrera et al. (2005), extra value can be prescribed to MAT observations that cover the full diurnal cycle and have concomitant cloud and wind speed observations, particularly for pre-1854.Cloud cover is an essential climate variable, but the advent of automated measurements has resulted in major drop in ship-based cloud observations in ICOADS since the peak in the 1980s (Kent et al. 2019).
FIG.3.The mean DT diur (solid blue line), DT BKT (dashed lines), and residual (dotted lines) for the ship Mary (Table3) grouped by (a) local hour (every 2 h), (b) cloud cover (one okta intervals), (c) 2 m s 21 intervals of V, and (d) 108 latitude bins.Individual dashed and dotted lines represent the best-fitting DT BKT from each of the six cost functions described in section 2d.The horizontal red lines indicate 08 and 60.28C limits.Each bin contains at least 100 observations, except for 0 and 7 oktas, 18 and .20 m s 21 , and 408 latitude.

FIG. 4 .
FIG. 4. Boxplots displaying the bin mean (solid line), bin mean 61 standard deviation (box limits), and 5th and 95th percentiles (whiskers) for (a) DT diur and (b) DT BKT as grouped by the number of hours after sunrise.(c)-(f) The DT diur 2 DT BKT residual when grouped by (c) the number of hours after sunrise, (d) cloud cover, (e) 2 m s 21 intervals of V, and (f) 108 latitude bins.All 16 ships from Table3are included and the DT BKT is taken as the ensemble mean across 60 realizations of the DT BKT (the 10 best-fit realizations from each of the 6 cost functions described in section 2d).The horizontal solid red and dark-red dashed lines indicate zero and 60.28C limits, respectively.The box widths correspond to the square root of the sample size in each bin.
FIG.5.The mean (solid line), standard deviation range (darker shading), and 5th-95th-percentile range (lighter shading) for DT BKT for the 16 different ships (Table3) under fixed environmental conditions of 15 m s 21 V, four oktas cloud cover, 2208 longitude, Julian day 150, and variable latitudes 258N (red), 508N (green), and 658N (blue).The vertical line is at 1300 local time and the number in the upper left of each panel indicates the peak heating hour at 258N.
FIG 6. (a)-(d)The bin-mean DT diur 2 DT BKT residual and (e)-(h) the bin mean of the 60-member standard deviation of the DT diur 2 DT BKT residual.Six different approaches to defining the DT BKT were used: the "normal" approach using raw observational data (black line with circles), using infilled cloud (magenta line), V (green line), both cloud and V (blue line with circles) alongside fitting the DT BKT using "stock" coefficient combinations for raw observational data (black line with crosses) and infilled cloud and V (blue line with crosses).
FIG. 7. (a)  Density plot of the ratio between the BKT model heating and cooling terms (i.e., x 1 /x 3 V x 4 1 x 5 ) for each cost function as selected by either the 83-member ensemble using the 1854-70 grouping (blue) or the 59-member ensemble using pre-1854 ships (green).(b) The magnitude and uncertainty of DT BKT under the same fixed environmental conditions as in Fig.5at 258N.(c) The DT diur time series (black dashed line) for the ship HMS Favorite during December 1831, with T BKT for the pre-1854 ensemble (solid green line) where shading intensity corresponds to 61-2 standard deviations.The T BKT for the 1854-70 grouping is shown without shading for comparison (solid blue line).

TABLE 1 .
Empirical coefficients; V is relative wind speed (m s 21 ).

TABLE 3 .
Sixteen ships selected from ICOADS to illustrate the results of fitting the BKT model.Deck refers to the original source data collection in ICOADS.Metadata contain information that could be readily obtained via an Internet search of the original call sign or name of the ship.The ship U.S. Navy 12388 samples at 0800, 1200, and 2000 local hour, a common feature of currently available WWII-era ships.