• Abraham, J. P., and Coauthors, 2013: A review of global ocean temperature observations: Implications for ocean heat content estimates and climate change. Rev. Geophys., 51, 450483, https://doi.org/10.1002/rog.20022.

    • Search Google Scholar
    • Export Citation
  • Boyer, T., and Coauthors, 2016: Sensitivity of global upper-ocean heat content estimates to mapping methods, XBT bias corrections, and baseline climatologies. J. Climate, 29, 48174842, https://doi.org/10.1175/JCLI-D-15-0801.1.

    • Search Google Scholar
    • Export Citation
  • Cheng, L., and Coauthors, 2016: XBT science: Assessment of instrumental biases and errors. Bull. Amer. Meteor. Soc., 97, 924933, https://doi.org/10.1175/BAMS-D-15-00031.1.

    • Search Google Scholar
    • Export Citation
  • Cowley, R., S. Wijffels, L. Cheng, T. Boyer, and S. Kizu, 2013: Biases in expendable bathythermograph data: A new view based on historical side-by-side comparisons. J. Atmos. Oceanic Technol., 30, 11951225, https://doi.org/10.1175/JTECH-D-12-00127.1.

    • Search Google Scholar
    • Export Citation
  • Domingues, C. M., J. A. Church, N. J. White, P. J. Gleckler, S. E. Wijffels, P. M. Barker, and J. R. Dunn, 2008: Improved estimates of upper-ocean warming and multi-decadal sea-level rise. Nature, 453, 10901093, https://doi.org/10.1038/nature07080.

    • Search Google Scholar
    • Export Citation
  • Gouretski, V., and K. P. Koltermann, 2007: How much is the ocean really warming? Geophys. Res. Lett., 34, L01610, https://doi.org/10.1029/2006GL027834.

    • Search Google Scholar
    • Export Citation
  • Karpathy, A., 2017: Module 1: Neural networks. CS231n: Convolutional neural networks for visual recognition, Stanford University, http://cs231n.github.io/.

  • Kingma, D., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv.org, 15 pp., https://arxiv.org/abs/1412.6980.

  • Kizu, S., S.-I. Ito, and T. Watanabe, 2005: Inter-manufacturer difference and temperature dependency of the fall-rate of T-5 expendable bathythermograph. J. Oceanogr., 61, 905912, https://doi.org/10.1007/s10872-006-0008-z.

    • Search Google Scholar
    • Export Citation
  • Kizu, S., C. Sukigara, and K. Hanawa, 2011: Comparison of the fall rate and structure of recent T-7 XBT manufactured by Sippican and TSK. Ocean Sci., 7, 231244, https://doi.org/10.5194/os-7-231-2011.

    • Search Google Scholar
    • Export Citation
  • Lyman, J. M., and G. C. Johnson, 2008: Estimating annual global upper-ocean heat content anomalies despite irregular in situ ocean sampling. J. Climate, 21, 56295641, https://doi.org/10.1175/2008JCLI2259.1.

    • Search Google Scholar
    • Export Citation
  • Lyman, J. M., S. A. Good, V. V. Gouretski, M. Ishii, G. C. Johnson, M. D. Palmer, D. M. Smith, and J. K. Willis, 2010: Robust warming of the global upper ocean. Nature, 465, 334337, https://doi.org/10.1038/nature09043.

    • Search Google Scholar
    • Export Citation
  • Palmer, M. D., 2017: Reconciling estimates of ocean heating and Earth’s radiation budget. Curr. Climate Change Rep., 3, 7886, https://doi.org/10.1007/s40641-016-0053-7.

    • Search Google Scholar
    • Export Citation
  • Palmer, M. D., and Coauthors, 2010: Future observations for monitoring global ocean heat content. Proc. OceanObs’09: Sustained Ocean Observations and Information for Society, Venice, Italy, IOC/UNESCO, 13 pp., https://doi.org/10.5270/OceanObs09.cwp.68.

  • Palmer, M. D., T. Boyer, R. Cowley, S. Kizu, F. Reseghetti, T. Suzuki, and A. Thresher, 2018: An algorithm for classifying unknown expendable bathythermograph (XBT) instruments based on existing metadata. J. Atmos. Oceanic Technol., 35, 429440, https://doi.org/10.1175/JTECH-D-17-0129.1.

    • Search Google Scholar
    • Export Citation
  • Panchal, G., A. Ganatra, Y. Kosta, and D. Panchal, 2011: Behaviour analysis of multilayer perceptrons with multiple hidden neurons and hidden layers. Int. J. Comput. Theory Eng., 3, 332337, https://doi.org/10.7763/IJCTE.2011.V3.328.

    • Search Google Scholar
    • Export Citation
  • Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 28252830.

  • Potdar, K., T. S. Pardawala, and C. D. Pai, 2017: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl., 175, 79, https://doi.org/10.5120/ijca2017915495.

    • Search Google Scholar
    • Export Citation
  • von Eye, A., and C. C. Clogg, 1996: Categorical Variables in Developmental Research: Methods of Analysis. Elsevier, 286 pp.

  • von Schuckmann, K., and Coauthors, 2016: An imperative to monitor Earth’s energy imbalance. Nat. Climate Change, 6, 138144, https://doi.org/10.1038/nclimate2876.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    (left) The recall/hit rate of the NN and iMeta algorithms from 1966 to 2015. (right) The percentage improvement in recall achieved by NN over iMeta over time (green line; left axis) and the number of probes in the validation set (purple line; right axis).

  • View in gallery

    (left) The recall/hit rate of the NN for three probe types over 1966–2015. (right) The total number of each probe in the dataset.

  • View in gallery

    The precision of T-7 (Sippican) and Deep Blue (Sippican) probes.

  • View in gallery

    The percentage of the unlabeled data assigned to the major probe types by the iMeta algorithm (green) and the NN algorithm (blue). It is compared with the percentage of labeled data for the same probe types (red); (S) and (T) refer to Sippican and TSK, respectively.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1 1 1
PDF Downloads 0 0 0

Using Neural Networks to Correct Historical Climate Observations

View More View Less
  • 1 Department of Mathematics, Imperial College London, London, United Kingdom
  • 2 Met Office Hadley Centre, Exeter, United Kingdom
  • 3 Met Office Informatics Lab, Exeter, United Kingdom
© Get Permissions
Full access

Abstract

Biases in expendable bathythermograph (XBT) instruments have emerged as a leading uncertainty in reconstructions of historical ocean heat content change and therefore climate change. Corrections for these biases depend on the type of XBT used; however, this is unspecified for 52% of the historical XBT profiles in the World Ocean Database. Here, we use profiles of known XBT type to train a neural network that can classify probe type based on three covariates: profile date, maximum recorded depth, and country of origin. Whereas previous studies have shown an average classification skill of 77%, falling below 50% for some periods, our new algorithm maintains an average skill of 90%, with a minimum of 70%. Our study illustrates the potential for successfully applying machine learning approaches in a wide variety of instrument classification problems in order to promote more homogeneous climate data records.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Niall H. Robinson, niall.robinson@informaticslab.co.uk

Abstract

Biases in expendable bathythermograph (XBT) instruments have emerged as a leading uncertainty in reconstructions of historical ocean heat content change and therefore climate change. Corrections for these biases depend on the type of XBT used; however, this is unspecified for 52% of the historical XBT profiles in the World Ocean Database. Here, we use profiles of known XBT type to train a neural network that can classify probe type based on three covariates: profile date, maximum recorded depth, and country of origin. Whereas previous studies have shown an average classification skill of 77%, falling below 50% for some periods, our new algorithm maintains an average skill of 90%, with a minimum of 70%. Our study illustrates the potential for successfully applying machine learning approaches in a wide variety of instrument classification problems in order to promote more homogeneous climate data records.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Niall H. Robinson, niall.robinson@informaticslab.co.uk

1. Introduction

Historical climate records are essential in order to understand how our Earth system works and therefore make useful projections of future climate change. The ocean plays a critical role in climate and is the major recipient of the excess solar energy accumulating in the Earth system as a result of anthropogenic climate change. Estimates of current and past ocean heating rates, based on in situ ocean temperature profile observations, are critical to assessing the rate and trajectory of climate change (von Schuckmann et al. 2016). However, estimates of historical ocean heating rates can vary dramatically across different analyses (e.g., Palmer et al. 2010; Abraham et al. 2013). Two dominant sources of uncertainty arise from (i) the mapping methods used to translate profile observations into spatially complete fields of ocean temperature change and (ii) differences in bias corrections applied to expendable bathythermograph (XBT) instruments (Lyman et al. 2010; Boyer et al. 2016; Palmer 2017). The motivation for this study lies in promoting the refinement of future XBT bias corrections through improved classification of unknown XBT probe types.

XBTs were developed during the 1960s as a cheap and effective means of collecting temperature observations in the upper few hundred meters of the ocean (Abraham et al. 2013). Since 1966, XBTs have been deployed by both merchant and research vessels and immediately brought about a dramatic increase in the sampling of ocean temperature (Lyman and Johnson 2008). As the XBTs passively sink, they use a thermistor to take measurements of the temperature as a function of time. These can then be converted to temperature as a function of ocean depth via a fall-rate equation (Abraham et al. 2013). Over time, a number of different XBTs have been developed with different operating characteristics, such as sinking rate. Although there have been several manufacturers, the vast majority of probes have been produced by Sippican (now called Lockheed Martin Sippican Inc., since 2005), based in the United States, and the Japanese Tsurumi-Seiki Co. (TSK), who remains the only Sippican-licensed manufacturer (Palmer et al. 2018).

XBT temperature measurements are subject to time-varying warm biases (e.g., Gouretski and Koltermann 2007), which can confound estimates of historical ocean heat content and steric sea level change (e.g., Domingues et al. 2008). As a result, a variety of different XBT bias correction schemes have been developed over the last decade or so (Cheng et al. 2016; Boyer et al. 2016). One of the challenges for the application of profile bias corrections is that 52% of the profiles are from an XBT of unrecorded model or manufacturer. Since temperature biases are dependent on both probe model and manufacturer (collectively referred to herein as the probe “type”; Kizu et al. 2005, 2011; Cowley et al. 2013), correctly classifying unknown probe types is crucial to accurately correct the climatological record.

This study builds on the work of Palmer et al. (2018), who presented a deterministic algorithm for assigning a probe type for unlabeled XBT profiles based on country of origin, maximum depth recorded, and profile date. This algorithm was informed through frequentist statistical analysis of known probe types over the period 1966–2015 and expert knowledge of the relationships between the country of origin and XBT manufacturer. The authors demonstrated that their algorithm had an average success rate of 77% in identifying the correct probe type. Here, we explore the use of a machine learning approach (a neural network) as a means of classifying unknown XBT profiles. Our study uses exactly the same dataset and covariates as Palmer et al. (2018). However, we perform the training of a separate neural network for each year of data, meaning the algorithm is effectively able to adapt over time.

2. Data

The uncorrected XBT profiles were retrieved from the World Ocean Database (WOD) via the National Oceanic and Atmospheric Administration (NOAA)/National Centers for Environmental Information in August 2017. We consider the 50-yr period from 1966 to 2015 for our analysis. Over this period, there are approximately 2.3 million temperature profiles, of which approximately 1.1 million have a recorded probe type; that is, metadata are present for both the probe manufacturer and model. The labeled data of the 50-yr period account for 48% of the total data. Post-2000, the proportion of labeled data is close to 100%, compared to in the early 1990s, where the proportion of labeled data accounts for less than 30%. There were some cases where the probe model is known but the manufacturer is unknown. These only make up of the dataset and were discarded from the training dataset, as these data complicate the problem while being a corner case, which does not add significant skill to any solution. Over this period, there are 27 unique probe types recorded in the dataset.

There are three manufacturers of XBT probes in the WOD. The U.S. manufacturer Sippican accounts for the majority of known XBT probes (95.1% of the data), with TSK (Japan) and Sparton (Canada) accounting for 4.3% and 0.6%, respectively. The frequency of probe types is also dominated by T-4 (Sippican), Deep Blue (Sippican), and T-7 (Sippican), which account for 85%. As noted by Palmer et al. (2018), it is likely that there are some mislabeled data among the known profiles; however, we similarly assume that this represents only a small percentage of the dataset.

XBT profiles also record several covariates that are available for the full dataset. Among those, we find the depth (calculated with a uniform assumed fall-rate equation), the date of observation, and the country that deployed the probe.

3. Methodology

The state-of-the-art algorithm [known as the iMeta; detailed in Palmer et al. (2018)] consists of a decision tree where the decision thresholds are based on the country of origin, date of deployment, and maximum reporting depth of the XBT. However, these covariates are not unique; for instance, a country may purchase multiple probe types, and new models of a probe may coexist with older models, as they may not have been deployed immediately. This leads to error, particularly in the years where there are several probe types. A machine learning approach, on the other hand, does not suffer from this restriction as long as there is sufficient input information to support the distinction between the probe types; it is capable of learning how to classify the probes even when there are too many probe types for the decision tree.

We initialize a multilayer neural network (NN) with two hidden layers of 10 nodes each, taking the same covariates as in the Palmer et al. (2018) algorithm. A neural network approach was chosen, as it is better than other approaches, such as multiple linear regression, at dealing with nonlinear relationships, such as are expected between covariates and probe type. For most problems, two hidden layers are sufficient, and 10 nodes per hidden layer is appropriate, given the number of input/output nodes (Panchal et al. 2011). It is important to note that the hidden layers, together with their nonlinear activation function, are essential for the versatility of the NN. That is what makes the method better than standard linear regression when the relationship between the inputs and outputs is highly nonlinear (more details in the appendix). While this hyperparameter selection is a subjective step, the method performs well on the test set; that is, it consistently performs better than iMeta. This gives an indication of classification skill, assuming that the unlabeled data follow similar distribution as the validation data. We tried various other network configurations with more nodes; however, this did not increase the performance of the NN, and the extra nodes became redundant.

We train and test the NN independently for each year using the profiles for which the XBT type is known, and finally, we apply the NN to profiles for which the probe type is unknown. While the same configuration of iMeta was applied to the entire dataset, we train, test, and apply an NN to the data from each year separately. This approach makes the assumption that the probe characteristics that most likely imply a certain probe within the labeled data are the same as in the unlabeled data for any given year. Note that iMeta makes the same assumption when it uses profile year to make decisions in one of its layers of the decision tree and then applies the other decision layers in the same manner for both labeled and unlabeled data.

To assess the performance of NN, we use the recall, also known as the hit rate, as used in Palmer et al. (2018). The hit rate is defined as the fraction of true positives over the sum of the number of true positives plus the false negatives. It can also be referred to as the sensitivity and represents the chance of classifying a probe correctly. We present plots of the hit rate for each year, the accuracy for selected XBT types, and finally, the bar chart of the XBTs we label. We apply Adam optimization, as recommended by Karpathy (2017), using the scikit-learn module (Pedregosa et al. 2011). For the hyperparameters of this method, we use the hyperparameter values suggested by Kingma and Ba (2014):

  • Initial learning rate = 0.001.
  • Exponential decay rate for estimates of first moment vector = 0.9.
  • Exponential decay rate for estimates of second moment vector = 0.999.
  • Value for numerical stability = 10−8.
  • L2 penalty (regularization term) parameter = 0.0001.

We then used the rectified linear units (ReLU) as activation function in all the layers and softmax as the loss function. We use a tolerance of 10−4, meaning convergence is considered to be reached if the loss function does not decrease by more than this for more than two iterations.

We used 75% of the labeled data for training and the remaining 25% for validation. The data are input as a matrix so that operations are applied at once to an entire set of values. Input data are normalized by centering on the mean value and scaling to the standard deviation (z-score normalization). Categorical data, such as the country of origin, are converted to numerical values. Converting categorical inputs to numerical values is common practice in machine learning. There are several methods with which to vectorize the categorical input (Potdar et al. 2017). We choose an ordinal encoding detailed by von Eye and Clogg (1996) for its ease of interpretation and understanding.

This split was varied (with a validation dataset between 20% and 35%) with little impact on the results. The profiles that were assigned to each partition were selected randomly, and repeated random selections did not affect the results. We use “Adam” as a method for gradient descent (Karpathy 2017); this method automatically tunes the most important hyperparameter, learning rate, in the stochastic descent. This allows us to maximize the amount of data we use for training, which is important in this data-limited case, while still keeping the choice of hyperparameters as objective as possible. While Adam still uses hyperparameters that need to be chosen by the user, such as initial learning rate and learning rate decay schedule, Kingma and Ba (2014) suggest widely applicable values, which are applied here. The model was fit several times with different, randomly chosen training and validation sets, obtaining consistently good results.

We conducted some experiments by converting the categorical input into binary vectors, as this could bring about an improvement to the performance of a neural network. We also tried varying the tolerance parameter that governs the stopping criteria for convergence of the optimizer, since this parameter is not tuned automatically by the Adam algorithm. To tune the tolerance, we used a standard technique where a fixed grid of possible values is used as potential tolerance parameters, and the optimal parameter is selected based on cross-validation in the training set. We found that the overall performance achieved by tuning this parameter and changing the encoding of the country input did not show a substantial improvement. However, this vastly increases the dimensionality of the input data and reduces the amount of data available for training, which is likely to increase the chance of overfitting. For this reason, the analysis presented here uses the original parameter values presented above.

For further details of the neural network algorithm and its application, see the appendix.

4. Results

In Fig. 1, we compare the NN algorithm to the iMeta algorithm by Palmer et al. (2018) over the period 1966–2015, inclusive.

Fig. 1.
Fig. 1.

(left) The recall/hit rate of the NN and iMeta algorithms from 1966 to 2015. (right) The percentage improvement in recall achieved by NN over iMeta over time (green line; left axis) and the number of probes in the validation set (purple line; right axis).

Citation: Journal of Atmospheric and Oceanic Technology 35, 10; 10.1175/JTECH-D-18-0012.1

The NN algorithm shows a consistent improvement in the recall over the Palmer et al. (2018) iMeta algorithm for the entire time period. When there are very few probe types, such as the period between 1966 and 1975, the NN algorithm performs only slightly better than iMeta. By contrast, when there are many probe types, such as during the early 1990s (>17 probe types), the NN substantially outperforms iMeta. This reflects the particular unsuitability of decision tree approaches such as iMeta when there are several possible classes. The comparative performance across the period is shown in Fig. 1, which shows an average relative improvement over the entire time period of 14%.

Figure 2 shows the recall of three different XBT types and their recorded frequencies over time. Breaks in the recall series reflect years when these probes were not recorded as being used. As the number of T-4 (Sippican) probes decreases in the early 1990s, we see a gradual decline in the recall. Also, post-2010, the number of recorded T-4 (Sippican) probes is almost zero, causing recall to decline further. On the other hand, over the entire 50-yr period, we have relatively few T-5 (Sippican) probe types and a highly varying recall, probably reflecting their scarcity in the training cohort. Hence, the recall for T-5 (Sippican) and other uncommon probes is dependent on the random initialization of the neural network. Finally, we can see that there are enough T-7 (Sippican) probes to reliably identify them over the entire period, which is seen from the constant high recalls and low variability in it.

Fig. 2.
Fig. 2.

(left) The recall/hit rate of the NN for three probe types over 1966–2015. (right) The total number of each probe in the dataset.

Citation: Journal of Atmospheric and Oceanic Technology 35, 10; 10.1175/JTECH-D-18-0012.1

As we mentioned, the NN performance decreases for the probe types with a very little relative percentage of the total dataset. However, we decided not to investigate any special procedure to address that problem due to the number of these probes relative to the size of the dataset. On average, the NN still manages to classify correctly a reasonable number of the less common probe types, and ultimately, the model still gets results clearly outperforming the iMeta algorithm.

We verified that the NN did not spuriously weight classification toward probes that were more common in the training data. We first define a new measure of the performance of the NN: precision. This is the number of true positives over the sum of the number of true positives and false positives (i.e., the fraction of classifications for a given type that are correct). A high precision means a relatively low number of false positives. In general, we observe that a high probe frequency is associated with high levels of precision. For example, in the case of Deep Blue (Sippican) and T-7 (Sippican) after 1997, even though the former probe type is more frequent, it also has higher precision (Fig. 3). This is counter to what would be expected if there was systematic classification bias toward the more common probe types.

Fig. 3.
Fig. 3.

The precision of T-7 (Sippican) and Deep Blue (Sippican) probes.

Citation: Journal of Atmospheric and Oceanic Technology 35, 10; 10.1175/JTECH-D-18-0012.1

Finally, we apply the NN algorithm to the set of unlabeled data. In Fig. 4, we can see that T-4 (Sippican) is the most predicted probe type (similarly to the iMeta algorithm result). However, the NN algorithm yields a higher percentage of T-7 (Sippican) probes than the proportion labeled and a lower percentage of Deep Blue (Sippican) probes in the unlabeled data. This may spark suspicion since it contradicts the labeled probes distribution and the results obtained with the iMeta algorithm. However, in the labeled data, the recall of Deep Blue (Sippican) probes is 90%, which means that the NN will correctly assign a Deep Blue (Sippican) probe with a 90% chance. With a deeper look into the NN fit to the labeled data, we see that the number of Deep Blue (Sippican) probes that the neural network mislabels as T-7 (Sippican) is small. Furthermore, the recall of T-7 (Sippican) is 60%, meaning there is a 40% chance T-7 probes will be misclassified, and the proportion of T-7 (Sippican) misclassified as Deep Blue (Sippican) is more significant than the other way around. For this reason, we are confident that we are not repeatedly classifying Deep Blue (Sippican) probes as T-7 (Sippican), and hence our percentages are correct.

Fig. 4.
Fig. 4.

The percentage of the unlabeled data assigned to the major probe types by the iMeta algorithm (green) and the NN algorithm (blue). It is compared with the percentage of labeled data for the same probe types (red); (S) and (T) refer to Sippican and TSK, respectively.

Citation: Journal of Atmospheric and Oceanic Technology 35, 10; 10.1175/JTECH-D-18-0012.1

In the observed labeled data, there is a transition of the number of labeled T-7 probes to the number of Deep Blue probes (Fig. 4c; Palmer et al. 2018). To show that the NN does indeed capture this transition, we look again at the precision for these probe types. The NN has consistent high precision of T-7 (Sippican) probes and Deep Blue probes (Fig. 3). This suggests that unlike the iMeta approach, where T-7 (Sippican) probes will always be classified as Deep Blue (Sippican) after 1997, the NN still assigns probes after 1997 as T-7 (Sippican) correctly.

It is important to note that T-7 (Sippican) and Deep Blue (Sippican) are designed to be essentially the same probe, with the exception that there is a longer length of wire in the canister for the Deep Blue probes, which allows them to be launched from faster-moving vessels. Therefore, if there are some misclassified T-7 (Sippican) probes as Deep Blue (Sippican) or vice versa, the final outcome on fall-rate corrections will be minimal (Kizu et al. 2011).

5. Conclusions and discussion

This study has demonstrated that applying machine learning to the classification of XBT probe types allows for an improvement in the accuracy over the current state-of-the-art method (Palmer et al. 2018). This approach also has the advantage that subjective a priori expertise is not needed. We believe that similar machine learning approaches could be exploited for other environmental datasets. A natural extension for this project is to study independently the data where the model is known but the manufacturer is unknown. In this case, there will be only two possible manufacturers, so an independent neural network could be applied to assign the manufacturer in these cases, or a clustering type algorithm could be considered.

Another interesting option for future study is to use a single NN for the whole WOD dataset. In that case, an interesting extra covariate to include is the fraction of each labeled probe type among the number of observations in that year. This would help compensate for the single NN by adding the information when each probe was commonly available. Also, we expect that we would need a higher number of nodes in the hidden layers.

In this study, we restricted ourselves to the use of the same covariates as the iMeta study to allow a direct comparison. However, future work could explore improvements to covariates. For example, it is not rare for probe types to report measurements significantly deeper than their maximum depth, as we can see from the data and comparing it to the maximum probe operating depths detailed by Palmer et al. (2018). Improvements may be gained by using the maximum reported depth at final valid measurement instead of maximum depth reported. Also, it could be taken into account that this criterion may not be the most optimal in shallow waters. Further study into the NN performance in shallow waters would be of interest.

Finally, the scores of the neural network can be interpreted as probabilities for each type, allowing for a Monte Carlo ensemble to capture uncertainty. This would be very useful for development of future XBT bias corrections and for more fully exploring the uncertainty by employing a Monte Carlo approach (e.g., Cheng et al. 2016).

Acknowledgments

TPL would like to acknowledge the funding and support from Climate-KIC in carrying out this research. Climate-KIC is supported by the European Institute of Innovation and Technology (EIT), a body of the European Union. TPL and FPL are supported by EPSRC and the CDT in the Mathematics of Planet Earth under Grant EP/L016613/1. MDP was supported by the Joint U.K. BEIS/Defra Met Office Hadley Centre Climate Programme (GA01101). The work presented here builds on community efforts of the International Quality Controlled Ocean Database initiative (IQuOD; www.iquod.org), which is sponsored by the Scientific Committee on Oceanic Research and the International Oceanographic Data and Information Exchange program of the Intergovernmental Oceanographic Commission (IOC). Additionally, the authors would sincerely like to thank the three anonymous reviewers of this manuscript for the insightful comments they provided.

APPENDIX

Neural Network Methodology

We will now give a brief introduction to the neural network methodology we apply. For a complete tutorial on neural networks and their implementation, see Karpathy (2017). A classification problem in machine learning consists of an algorithm that inputs the covariates of a probe and returns a score vector in which every possible probe type gets a score. Then, a loss function is applied in which higher values are given for worse scores. There are several possible loss functions; we apply the softmax classifier, which uses the cross-entropy loss
eq1
where , , are the different scores for each class, and is the score for the correct class. A regularization term is also introduced in the loss function to avoid overfitting and ambiguities. In our case, we use an regularization function:
eq2
The final formula for the loss comes from taking the mean of for all probes and subtracting the regularization term
eq3
where is a constant tuned by the user.
We now describe the algorithm to get the scores from the covariates. This part is responsible for the performance of the classification algorithm. It consists of several fully connected layers of nodes, the input layer, the hidden layers, and the output layer. Here, we use a fully connected network, which means each node of a certain layer is connected to every other node of the next layer via a function that depends on the weights . Those weights are of dimension , where n is the number of nodes in the first layer, and m is the number of nodes in the next layer. The composition of these functions connecting each layer to the next gives us the map between the covariates and the scores. For example, for a two-hidden-layer neural network, we have
eq4
where are the covariates and the scores. The activation function f is a nonlinear function essential for the performance of the neural network. We will be using the ReLU activation function, defined as
eq5
After these two steps, training the neural network simply consists of minimizing the loss function L. To do this, we employ a gradient descent method called Adam (Kingma and Ba 2014). This method automatically tunes the parameters within the algorithm and uses a feature called momentum to improve upon other automatic but simple methods to vary the learning rate automatically during minimization. Momentum consists of taking into account not just the gradient at the actual point, but also the gradient in the previous points. If the algorithm has been sending the function in a certain direction for some time, it will place a bigger learning rate in this direction; however, if it keeps bouncing forward and backward but stays trapped in another direction, it will decrease the learning rate. Adam requires the gradient to be calculated with respect to the parameters. In general, it is not possible to calculate the derivative analytically, so a numerical approximation is needed. Finally, the chain rule is applied iteratively over all the layers since the function from the covariates to the loss is a composition of every function going from layer to layer and finally composed with the loss function; this chain rule in machine learning is called back propagation. The implementation used Pedregosa et al.’s (2011) study extensively. The code can be found at https://github.com/ThomasLeahy/NN_XBT, including the alternative implementation.

REFERENCES

  • Abraham, J. P., and Coauthors, 2013: A review of global ocean temperature observations: Implications for ocean heat content estimates and climate change. Rev. Geophys., 51, 450483, https://doi.org/10.1002/rog.20022.

    • Search Google Scholar
    • Export Citation
  • Boyer, T., and Coauthors, 2016: Sensitivity of global upper-ocean heat content estimates to mapping methods, XBT bias corrections, and baseline climatologies. J. Climate, 29, 48174842, https://doi.org/10.1175/JCLI-D-15-0801.1.

    • Search Google Scholar
    • Export Citation
  • Cheng, L., and Coauthors, 2016: XBT science: Assessment of instrumental biases and errors. Bull. Amer. Meteor. Soc., 97, 924933, https://doi.org/10.1175/BAMS-D-15-00031.1.

    • Search Google Scholar
    • Export Citation
  • Cowley, R., S. Wijffels, L. Cheng, T. Boyer, and S. Kizu, 2013: Biases in expendable bathythermograph data: A new view based on historical side-by-side comparisons. J. Atmos. Oceanic Technol., 30, 11951225, https://doi.org/10.1175/JTECH-D-12-00127.1.

    • Search Google Scholar
    • Export Citation
  • Domingues, C. M., J. A. Church, N. J. White, P. J. Gleckler, S. E. Wijffels, P. M. Barker, and J. R. Dunn, 2008: Improved estimates of upper-ocean warming and multi-decadal sea-level rise. Nature, 453, 10901093, https://doi.org/10.1038/nature07080.

    • Search Google Scholar
    • Export Citation
  • Gouretski, V., and K. P. Koltermann, 2007: How much is the ocean really warming? Geophys. Res. Lett., 34, L01610, https://doi.org/10.1029/2006GL027834.

    • Search Google Scholar
    • Export Citation
  • Karpathy, A., 2017: Module 1: Neural networks. CS231n: Convolutional neural networks for visual recognition, Stanford University, http://cs231n.github.io/.

  • Kingma, D., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv.org, 15 pp., https://arxiv.org/abs/1412.6980.

  • Kizu, S., S.-I. Ito, and T. Watanabe, 2005: Inter-manufacturer difference and temperature dependency of the fall-rate of T-5 expendable bathythermograph. J. Oceanogr., 61, 905912, https://doi.org/10.1007/s10872-006-0008-z.

    • Search Google Scholar
    • Export Citation
  • Kizu, S., C. Sukigara, and K. Hanawa, 2011: Comparison of the fall rate and structure of recent T-7 XBT manufactured by Sippican and TSK. Ocean Sci., 7, 231244, https://doi.org/10.5194/os-7-231-2011.

    • Search Google Scholar
    • Export Citation
  • Lyman, J. M., and G. C. Johnson, 2008: Estimating annual global upper-ocean heat content anomalies despite irregular in situ ocean sampling. J. Climate, 21, 56295641, https://doi.org/10.1175/2008JCLI2259.1.

    • Search Google Scholar
    • Export Citation
  • Lyman, J. M., S. A. Good, V. V. Gouretski, M. Ishii, G. C. Johnson, M. D. Palmer, D. M. Smith, and J. K. Willis, 2010: Robust warming of the global upper ocean. Nature, 465, 334337, https://doi.org/10.1038/nature09043.

    • Search Google Scholar
    • Export Citation
  • Palmer, M. D., 2017: Reconciling estimates of ocean heating and Earth’s radiation budget. Curr. Climate Change Rep., 3, 7886, https://doi.org/10.1007/s40641-016-0053-7.

    • Search Google Scholar
    • Export Citation
  • Palmer, M. D., and Coauthors, 2010: Future observations for monitoring global ocean heat content. Proc. OceanObs’09: Sustained Ocean Observations and Information for Society, Venice, Italy, IOC/UNESCO, 13 pp., https://doi.org/10.5270/OceanObs09.cwp.68.

  • Palmer, M. D., T. Boyer, R. Cowley, S. Kizu, F. Reseghetti, T. Suzuki, and A. Thresher, 2018: An algorithm for classifying unknown expendable bathythermograph (XBT) instruments based on existing metadata. J. Atmos. Oceanic Technol., 35, 429440, https://doi.org/10.1175/JTECH-D-17-0129.1.

    • Search Google Scholar
    • Export Citation
  • Panchal, G., A. Ganatra, Y. Kosta, and D. Panchal, 2011: Behaviour analysis of multilayer perceptrons with multiple hidden neurons and hidden layers. Int. J. Comput. Theory Eng., 3, 332337, https://doi.org/10.7763/IJCTE.2011.V3.328.

    • Search Google Scholar
    • Export Citation
  • Pedregosa, F., and Coauthors, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 28252830.

  • Potdar, K., T. S. Pardawala, and C. D. Pai, 2017: A comparative study of categorical variable encoding techniques for neural network classifiers. Int. J. Comput. Appl., 175, 79, https://doi.org/10.5120/ijca2017915495.

    • Search Google Scholar
    • Export Citation
  • von Eye, A., and C. C. Clogg, 1996: Categorical Variables in Developmental Research: Methods of Analysis. Elsevier, 286 pp.

  • von Schuckmann, K., and Coauthors, 2016: An imperative to monitor Earth’s energy imbalance. Nat. Climate Change, 6, 138144, https://doi.org/10.1038/nclimate2876.

    • Search Google Scholar
    • Export Citation
Save