Abstract

Accurate and real-time sea surface salinity (SSS) prediction is an elemental part of marine environmental monitoring. It is believed that the intrinsic correlation and patterns of historical SSS data can improve prediction accuracy, but they have been not fully considered in statistical methods. In recent years, deep-learning methods have been successfully applied for time series prediction and achieved excellent results by mining intrinsic correlation of time series data. In this work, we propose a dual path gated recurrent unit (GRU) network (DPG) to address the SSS prediction accuracy challenge. Specifically, DPG uses a convolutional neural network (CNN) to extract the overall long-term pattern of time series, and then a recurrent neural network (RNN) is used to track the local short-term pattern of time series. The CNN module is composed of a 1D CNN without pooling, and the RNN part is composed of two parallel but different GRU layers. Experiments conducted on the South China Sea SSS dataset from the Reanalysis Dataset of the South China Sea (REDOS) show the feasibility and effectiveness of DPG in predicting SSS values. It achieved accuracies of 99.29%, 98.44%, and 96.85% in predicting the coming 1, 5, and 14 days, respectively. As well, DPG achieves better performance on prediction accuracy and stability than autoregressive integrated moving averages, support vector regression, and artificial neural networks. To the best of our knowledge, this is the first time that data intrinsic correlation has been applied to predict SSS values.

1. Introduction

Sea surface salinity (SSS) is one of the most critical factors in the study of climate forecasting (Cronin and McPhaden 1999), global water cycle (Batteen et al. 1995), sea ice observation, marine disaster monitoring (Reul et al. 2012; Hasson et al. 2013), marine ecosystems (Gabarró et al. 2004) and military field. Accurate and real-time SSS prediction is an elemental part of marine environmental monitoring. Computational methods for predicting SSS values can be divided into two categories: statistical methods and machine-learning methods.

Statistical methods refer to physical oceanic models with fixed functions and some assumptions, in which the values of involved parameters can be computed with empirical data. For example, multivariate adaptive regression spline model (Urquhart et al. 2012) and multiple linear regression model (Qing et al. 2013) have been used for SSS prediction. These statistical methods have several merits:

  1. The models are usually simple models and explicit to understand.

  2. The solution to them is usually easier than machine-learning methods and takes less time.

Because of the nonlinear and stochastic nature of SSS data, statistical methods cannot describe the unique nature well and result in bigger prediction errors than machine-learning methods.

Machine-learning methods can “learn” internal patterns or correlations from series data. Machine-learning methods—see, for example, genetic algorithm (Chen et al. 2017) or artificial neural network (ANN) (Urquhart et al. 2012)—have been used for SSS prediction. Machine-learning methods have superior expressive abilities and enabled to fit nearly all the functions to arbitrary precision. While ANN cannot learn the correlation of sequence data, so it needs to choose the time feature manually, which may lead to unsatisfactory prediction results. Similarly, the genetic algorithm often falls into a nonconvex optimization problem and may be easy to encounter a local minimum. The optimization of machine-learning methods may be hard, and overfitting is really a tough problem to be solved (Jiang et al. 2018; Siahkoohi et al. 2019).

In recent decades, deep-learning methods, especially recurrent neural network (RNN), have been widely used for time series data processing and value prediction. RNN introduces the recurrent unit structure and allows the internal connection between the hidden units, so it is suitable for analyzing and processing time series data (Krizhevsky et al. 2012). It has the gradient vanishing problem in RNN, that is, as time interval increases, RNN will lose the ability to learn historical information from the past. Hochreiter presented the long short-term memory (LSTM) network (Hochreiter and Schmidhuber 1997), which can solve the gradient vanishing problem by introducing “gate control unit,” and has been widely used in the field of time series data prediction. It usually takes a long time to train an LSTM due to its complex internal structure. To speed up training, Cho et al. (2014) proposed a gated recurrent unit (GRU) network model based on the LSTM network model in 2014, which can maintain the prediction effect with fewer training parameters than LSTM. In 2017, LSTM was applied for predicting values of sea surface temperature (SSH) and overcome the performance of classical support vector regression (SVR) method (Zhang et al. 2017; Ratto et al. 2019). While no result has been reported by using LSTM to predict SSS values yet.

In this work, LSTM models are proposed to predict SSS values. Considering the advantages of small calculation amount and fast convergence of GRU, we apply GRU to SSS prediction and propose the dual path GRU network (DPG), which consists of CNN and RNN. The CNN part is composed of a 1D CNN without pooling, and the RNN part is composed of two parallel but different GRU layers. It is aimed to use this structure to extract both the overall long-term pattern of time series and the local short-term pattern of time series. Experiments conducted on the SSS from the Reanalysis Dataset of the South China Sea (REDOS) show the feasibility and effectiveness of DPG in predicting SSS values. It achieved an accuracy of 99.29%, 98.44%, and 96.85% in predicting the coming 1, 5, and 14 days, respectively. As well, DPG achieves better performance on prediction accuracy and stability than autoregressive integrated moving average (ARIMA), SVR, and ANN. To the best of our knowledge, this is the first time that data intrinsic correlation is applied to predict SSS values.

The rest of the paper is organized as follows: In section 2 SSS prediction problem is reviewed and DPG model is introduced. Details of the experiment procedures including discussion are provided in section 3. Last, we draw conclusions in section 4.

2. Method

In this section, it is described the SSS prediction as a time series forecasting problem, and then the DPG architecture is shown. At last, we introduce the objective function and the optimization strategy.

a. Problem description

Regard SSS prediction as a curve-fitting problem and aim to use the historical SSS values to predict future SSS values. More formally, given a series of historical time series values Y = {y1, y2, …, yn} where n is the historical step, it is aimed at predicting a series of future values in a rolling forecasting fashion as shown in Fig. 1. It starts by using values of elements in Y1 = {y1, y2, …, yn} to predict the value of yn+1, and then, we use Y2 = {y2, y3, …, yn+1} to predict the yn+1, and so on. In this way, we can predict the desired future SSS values from the very beginning sequential value Y1. Note that each of the predicted values is based on previous prediction information. We are interested here in the task of prediction within 2 weeks, so we set n = 14; that is, the SSS values of the past 2 weeks are used to predict the SSS values of the next 2 weeks.

Fig. 1.

Forecasting fashion of SSS.

Fig. 1.

Forecasting fashion of SSS.

In Fig. 2, it exhibits the topological structure of the proposed DPG network, which is composed of a CNN module and an RNN module, whose function will be elaborated in details in the following two subsections.

Fig. 2.

The general topological structure of the DPG.

Fig. 2.

The general topological structure of the DPG.

b. The CNN module

Convolutional neural network (CNN) is a downsampling network that slides convolution on input data by filter. In our DPG network (shown in Fig. 2), we use a 1D CNN without pooling layer as the first layer, which can extract local short-term pattern from long-term time series and reduce the amount of parameter. The jth filter sweeps through the input vector V and produces

 
hj=RELU(Wj*V+bj),

where the asterisk denotes the convolution operation and the RELU function is RELU(x) = max(0, x). Output hj is a vector, and the output matrix of the convolutional layer is of size (n − sizef + 1) × numbf, where n denotes the step, sizef is the size of filters, and numbf is the number of filters. After the convolution layer, a dropout layer is added, which can effectively reduce the occurrence of overfitting and achieve regularization effect to some extent (Srivastava et al. 2014; Lu et al. 2018).

c. The RNN module

1) Structure of GRU

GRU network model and LSTM network model have similar data flow in cells. GRU does not have a separate storage unit, which makes it more efficient in data training. Figure 3 shows the typical structure of a GRU cell. There are two kinds of gates in GRU: the reset gate rt and the update gate zt. Both of them are activated by logistic sigmoid function. The reset gate rt can determine how dependent the candidate state h˜t is on the history state ht−1. If the reset gate is with a smaller value, it means more historical information is ignored in the candidate state. The update gate zt determines how much information from the historical state ht−1 is retained in the current state ht, and how much information is received from the candidate state h˜t. When the update gate has a larger value, it means more candidate state information is received.

Fig. 3.

Structure of the GRU.

Fig. 3.

Structure of the GRU.

2) Workflow of dual path GRU

The output of the dropout layer goes into two parallel but different GRU layers at the same time. One of them is the general-GRU layer and the other is the improved-GRU layer. That is why we call this structure dual path. The hidden units of the general-GRU layer are connected sequentially, and the hidden state of it at time t is computed as

 
rt=σ(Wrxt+Urht1+br),
 
zt=σ(Wzxt+Uzht1+bz),
 
h˜t=tanh[Wcxt+Uc(rtht1)+bc],and
 
ht=zth˜t+(1zt)ht1,

where W and U are weight matrices, b is the bias term, ⊙ is the element-wise product, σ is the sigmoid function, tanh is the hyperbolic tangent function and xt is the input of this layer at time t. The output of this layer is the hidden state at each time stamp.

In practice, GRU usually fail to capture very long-term correlation because of the problem of gradient vanishing. To make our model learn the overall long-term pattern of time series, we refer to the skip-connection structure from Residual Network (ResNet; He et al. 2016). Skip links are used in our GRU to construct the improved-GRU (IGRU) layer. Skip-connection structure can reflect the periodic pattern in real world. For example, if we need to predict the temperature at t o’clock, a classical trick is to check the records at t o’clock on historical days, especially yesterday, that is, 24 h ago. That is exactly what we want our skip-connection structure to learn from the data. There is a hyperparameter s in the improved-GRU layer, which means that the distance in the skip-connection structure is s hidden units. The hidden state of IGRU layer at time t can be calculated by

 
rt=σ(Wrxt+Urhts+br),
 
zt=σ(Wzxt+Uzhts+bz),
 
h˜t=tanh[Wcxt+Uc(rthts)+bc],and
 
ht=zth˜t+(1zt)hts,

where the input of this layer is the output of the dropout layer, and s is the number of hidden units skipped through in IGRU layer. In data experiments, it is found that a well-tuned s can significantly increase the accuracy of the model.

At last, we use a fully connected layer to combine the outputs of the general-GRU (GGRU) layer and IGRU layer. The inputs to the fully connected layer include the hidden state of the GGRU layer at time stamp t, denoted by htG, and s hidden states of the improved-GRU layer from time stamp ts + 1 to t denoted by hts+1I, hts+2I, …, htI. The output of the fully connected layer is computed as

 
y˜t=WGhtG+j=0s1WjIhtjI+b,

where W is the weight matrices, b is the bias term, and y˜t is the prediction result of the DPG at time stamp t.

d. Objective function and optimization strategy

In the optimization process, adaptive moment estimation (Adam) is used to optimize the model parameters. Adam is a first-order optimization algorithm that can replace the traditional stochastic gradient descent algorithm, and it can adapt the learning rate to the parameters, performing larger updates for infrequent parameters and smaller updates for frequent parameters (Kingma and Ba 2015). During the training process, this algorithm iteratively updates the weights and biases of each neuron in the network model so as to reduce the output value of the loss function to the optimal value. For the loss function of the model, we use the mean-square error function given by the following formula:

 
loss=1nt=1n(yty˜t)2,

where n is the number of samples, yt is the actual value, and y˜t is the output value of the model at time stamp t.

3. Data experiments

a. Dataset

We create an SSS dataset covering the South China Sea from the REDOS. This SSS dataset contains daily values from January 1992 to January 2012 (7305 days in total) and covers the South China Sea from 5° to 23°N and from 105° to 123°E, which is a 0.10° latitude multiplied by 0.10° longitude grid (180 × 180).

b. Evaluation index

Prediction accuracy (ACC) and the root-mean-square error (RMSE) are used to evaluate the effectiveness of different prediction methods, and the calculation formulas are shown as follows:

 
ACC=1i=1n(|XaiXpi|Xai)nand
 
RMSE=i=1n(XaiXpi)2n,

where Xai and Xpi are the actual and predicted values of point i, respectively. The RMSE can reflect the accuracy of the prediction well, and it is sensitive to the very large or very small errors in a set of results; that is, it can show the ability of the model to control absolute error (Phaisangittisagul 2016). For RMSE a lower value is better, whereas for ACC a higher value is better. In the following experiments, we use the area-average RMSE and area-average ACC, and the best results are highlighted in boldface in the tables.

c. Results and analysis

Data experiments are run under the environment of the Ubuntu 16.04 64-bit operating system and use keras as the framework and tensorflow as the backend. It is a fairly common way to partition the dataset by either 6:2:2 or 8:1:1. It is believed that when the amount of data is small, having the proportion of training set be too large and the verification set be too small may lead to serious overfitting, while the small test set cannot effectively evaluate the model. So, with the amount of data increasing, it is not needed to set a large validation sets, after all, it is used to continuously tune the model to the optimal. In most cases, a small percentage of the verification set is sufficient. For instance, in the million-level dataset ImageNet, the training set is divided into 99.8% and the validation and test sets are both divided into 0.1%. Therefore, the South China Sea SSS dataset has been split into training set (80%), validation set (10%), and test set (10%) in chronological order.

We also compare the results of using 8:1:1 and 6:2:2 for splitting the dataset by training set, validation set, and test set, respectively. There are six hyperparameters in our DPG model, and they are the iteration times, the batch size, the number of filters numbf, the size of filters sizef, the number of hidden units skipped through in IGRU layer s, and the number of nodes in GRU layer numbg. Here, the number of nodes is set in both IGRU and GGRU layers. In data experiments, we gradually tune these hyperparameters.

Initially it needs to determine the iteration times and the batch size, because they can determine the optimal training scale for the entire network. The iteration time is chosen from {25, 50, 75}, and the batch size is chosen from {100, 150, 200, 250}. Table 1 shows the results on the SSS dataset with different iteration times and batch size. It can be seen from the results that the best performance occurs when iteration = 50 and batch = 200.

Table 1.

Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different iteration times and batch sizes. Here and in subsequent tables, boldface type indicates the best result.

Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different iteration times and batch sizes. Here and in subsequent tables, boldface type indicates the best result.
Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different iteration times and batch sizes. Here and in subsequent tables, boldface type indicates the best result.

If the iteration times are too small, the training is far from making the parameters converge to the optimal value, whereas, when it is too large, the phenomenon of overfitting occurs obviously, which will reduce the accuracy of the model. Similarly, different batch sizes consider different sample information in the training process. When the batch size is too small, the model has difficult converging. If it is too large, the accuracy of the model changes little when compared with the optimal value, but the calculation amount and training time increase significantly.

According to the structural order of the model, it needs to determine the values of hyperparameters in the convolution layer. The number of filters numbf is chosen from {25, 50, 100}, and the size of filters sizef is chosen from {2, 4, 6, 8}. Table 2 shows the results on the SSS dataset with different numbers of filters and sizes of filters. It can be seen from the results that the best performance occurs when numbf = 100 and sizef = 6. The reason may be due to a large enough number of filters to better process input information and a suitable size of filters that can extract the most appropriate short-term local pattern from long-term time series.

Table 2.

Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different numbf and sizef.

Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different numbf and sizef.
Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different numbf and sizef.

For the RNN module, the number of hidden unit skips through IGRU layer s is chosen from {3, 5, 7, 9}, and the number of nodes in GRU layer numbg is chosen from {25, 50, 75}. Table 3 shows the results on the SSS dataset with different s and numbg. It can be seen from the results that the best performance occurs when numbg = 50 and s = 7. The reason may be the same: enough nodes to better process input information and a suitable s, which is equal to 7. It means every seven of the hidden units in IGRU layer are a skipped connection, which corresponds to a week in real life; that is, our model suggests that the SSS value of the current day is more strongly related to the SSS value of the week before.

Table 3.

Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different numbg and s.

Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different numbg and s.
Prediction results (area-average RMSE and ACC) on the South China Sea SSS dataset with different numbg and s.

To test the prediction ability of our DPG model, we compare it with traditional time series prediction model ARIMA, and other basic machine-learning models, such as SVR, ANN, simple-RNN, and a new time series prediction model named time convolution network (TCN), which is also made up of CNN and RNN, just like our DPG model. All of the five methods are tested with our South China Sea SSS dataset. The performances of these methods are shown in Table 4.

Table 4.

Prediction results (area-average RMSE and ACC) of different models on the South China Sea SSS dataset (5°–23°N, 105°–123°E). Here the boldface type indicates the best results across methods.

Prediction results (area-average RMSE and ACC) of different models on the South China Sea SSS dataset (5°–23°N, 105°–123°E). Here the boldface type indicates the best results across methods.
Prediction results (area-average RMSE and ACC) of different models on the South China Sea SSS dataset (5°–23°N, 105°–123°E). Here the boldface type indicates the best results across methods.

For ARIMA, we set p = 1, q = 1, and d = 1. In SVR, we use the radial basis function (RBF) kernel for prediction and perform experiments using the “scikit-learn” software. Moreover, the kernel width for RBF is set as σ = 1.2, which is chosen by cross validation. For both ANN and simple-RNN, a three-tier architecture is selected, that is, input layer, hidden layer, and output layer. MSE is taken as the loss function, and Adam is used for optimization. For TCN, we set the number of filters numbf = 32 and the size of kernel sizek = 3.

The reason we make predictions for 1 day, 5 days, and 2 weeks (14 days) in the future is to verify the DPG model in the cases of short-term prediction, midterm prediction, and long-term prediction, respectively. As well, we can see from Table 4 that the DPG model achieve the best prediction performance.

To show the prediction results of the DPG model more intuitively, we visualize the 5-day prediction and the ground truth in Fig. 4. It can be seen that there is a high similarity between the predicted value shown in Fig. 4a and the actual value shown in Fig. 4b. Moreover, we use the grayscale image to show the difference between the predicted value and actual value, which is shown in Fig. 4c. We find that the biggest differences are concentrated in the southwest of Taiwan province, this is because the SSS in these areas changes quite rapidly and is affected by many factors. Therefore, it is difficult to make accurate predictions based on the historical data of these regions.

Fig. 4.

Comparison between the predicted output and the ground truth for 5 days on the South China Sea SSS dataset: (a) predicted SSS data for 5 days, (b) ground-truth SSS data for 5 days, and (c) the difference between the ground truth and the predicted output.

Fig. 4.

Comparison between the predicted output and the ground truth for 5 days on the South China Sea SSS dataset: (a) predicted SSS data for 5 days, (b) ground-truth SSS data for 5 days, and (c) the difference between the ground truth and the predicted output.

The results are obtained by extracting both the overall long-term pattern of time series and the local short-term pattern of time series. The long-term pattern of time series of SSS can be learned with one of the GRU networks (DPG), but the local short-term pattern of time series was learned by DPG with CNN and RNN. Since the values of SSS in the whole year do not change a lot, the changing can be learned by GRU and DPG networks in both paths. GRU can learn long time patterns while forgetting nonimportant features, and DPG can catch the features happening in short time patterns.

From this point of view, it is not surprising that the accuracy achieves 99.29%, 98.44%, and 96.85% in predicting the coming 1, 5, and 14 days, respectively. It is, however, still a challenge to predict ahead at longer days, such as 30 days. As well, when season changes, the value of SSS will become unstable in a very short time series and is not easy to follow certain regular pattern. We need to design some methods by learning very short time changes with small-scale data.

We have tried the data splitting ratios 8:1:1 and 6:2:2. The performance of using the 8:1:1 ratio is better in both RMSE and accuracy, shown in Figs. 5 and 6, respectively. The ratio of using 8:1:1 is a little better than ratio 6:2:2.

Fig. 5.

Comparing results of RMSE with ratio 8:1:1 (blue line) and 6:2:2 (red line).

Fig. 5.

Comparing results of RMSE with ratio 8:1:1 (blue line) and 6:2:2 (red line).

Fig. 6.

Comparing results of accuracy with ratio 8:1:1 (blue line) and 6:2:2 (red line).

Fig. 6.

Comparing results of accuracy with ratio 8:1:1 (blue line) and 6:2:2 (red line).

4. Conclusions

In this paper, we proposed a novel deep-learning model for the task of SSS forecasting and explored the optimal parameters of this architecture by experiments. By combining the strengths of CNN and RNN, it can learn not only the overall long-term pattern of time series, but also the local short-term pattern of time series. When compared with ARIMA, SVR, TCN, and other models, DPG significantly showed state-of-the-art results in time series forecasting on the South China Sea SSS dataset.

For future research, there are several promising directions in extending the work. First of all, we can improve our model to predict salinity at different ocean depths by modifying hyperparameters such as the number of model layers and nodes or by introducing densely connected structures (Huang et al. 2017) into the GRU layer. In other words, the task changes from predicting sea surface salinity to predicting upper-ocean salinity, which will be a prediction of the 3D region that combines the temporal and spatial information. After that, we can use the predicted results to infer other variables related to sea salinity, such as freshwater flux and direction of the ocean current. Meanwhile, we can also make plans for future fishery and aquaculture on the basis of the predicted results.

Acknowledgments

This work was supported by National Key Research and Development Program (2018YFC1406204; 2018YFC1406201), National Natural Science Foundation of China (Grants 61873280, 61672033, 61672248, and 61972416), Taishan Scholarship (tsqn201812029), Major projects of the National Natural Science Foundation of China (Grant 41890851), Natural Science Foundation of Shandong Province (ZR2019MF012), Fundamental Research Funds for the Central Universities (18CX02152A and 19CX05003A-6), the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant XDA19060503), and the Chinese Academy of Sciences (Grant ISEE2018PY05).

REFERENCES

REFERENCES
Batteen
,
M. L.
,
C. A.
Collins
,
C. R.
Gunderson
, and
C. S.
Nelson
,
1995
:
The effect of salinity on density in the California Current System
.
J. Geophys. Res.
,
100
,
8733
8749
, https://doi.org/10.1029/95JC00424.
Chen
,
L.
,
B.
Alabbadi
,
C. H.
Tan
,
T. S.
Wang
, and
K. C.
Li
,
2017
:
Predicting sea surface salinity using an improved genetic algorithm combining operation tree method
.
J. Indian Soc. Remote Sens.
,
45
,
699
707
, https://doi.org/10.1007/s12524-016-0637-7.
Cho
,
K.
,
M. B.
Van
,
C.
Gulcehre
,
D.
Bahdanau
,
F.
Bougares
,
H.
Schwenk
, and
Y.
Bengio
,
2014
:
Learning phrase representations using RNN encoder–decoder for statistical machine translation. Conf. on Empirical Methods in Natural Language Processing, Doha, Qatar, Association for Computational Linguistics, 1724–1734
, https://doi.org/10.3115/v1/D14-1179.
Cronin
,
M. F.
, and
M. J.
McPhaden
,
1999
:
Diurnal cycle of rainfall and surface salinity in the western Pacific warm pool
.
Geophys. Res. Lett.
,
26
,
3465
3468
, https://doi.org/10.1029/1999GL010504.
Gabarró
,
C.
,
J.
Font
,
A.
Camps
,
M.
Vall-llossera
, and
A.
Julià
,
2004
:
A new empirical model of sea surface microwave emissivity for salinity remote sensing
.
Geophys. Res. Lett.
,
31
, L01309, https://doi.org/10.1029/2003GL018964.
Hasson
,
A. E.
,
T.
Delcroix
, and
R.
Dussin
,
2013
:
An assessment of the mixed layer salinity budget in the tropical Pacific Ocean. Observations and modelling (1990–2009)
.
Ocean Dyn.
,
63
,
179
194
, https://doi.org/10.1007/s10236-013-0596-2.
He
,
K.
,
X.
Zhang
,
S.
Ren
, and
J.
Sun
,
2016
:
Deep residual learning for image recognition. Proc. Int. Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Institute of Electrical and Electronics Engineers
,
770
778
, https://doi.org/10.1109/CVPR.2016.90.
Hochreiter
,
S.
, and
J.
Schmidhuber
,
1997
:
Long short-term memory
.
Neural Comput.
,
9
,
1735
1780
, https://doi.org/10.1162/neco.1997.9.8.1735.
Huang
,
G.
,
Z.
Liu
, and
K. Q.
Weinberger
,
2017
:
Densely connected convolutional networks. Proc. Int. Conf. on Computer Vision and Pattern Recognition, Hawaii, HI, Institute of Electrical and Electronics Engineers
,
4700
4708
, https://doi.org/10.1109/CVPR.2017.243.
Jiang
,
G.
, and Coauthors
,
2018
:
A deep learning algorithm of neural network for the parameterization of typhoon-ocean feedback in typhoon forecast models
.
Geophys. Res. Lett.
,
45
,
3706
3716
, https://doi.org/10.1002/2018GL077004.
Kingma
,
D. P.
, and
J.
Ba
,
2015
:
Adam: A method for stochastic optimization. Third Int. Conf. on Learning Representations, San Diego, CA, International Conference on Learning Representations, 11
.
Krizhevsky
,
A.
,
I.
Sutskever
, and
G. E.
Hinton
,
2012
:
ImageNet classification with deep convolutional neural networks
.
Commun. ACM
,
60
,
1097
1105
, https://doi.org/10.1145/3065386.
Lu
,
H.
, and Coauthors
,
2018
:
FDCNet: Filtering deep convolutional network for marine organism classification
.
Multimedia Tools Appl.
,
77
,
21 847
21 860
, https://doi.org/10.1007/s11042-017-4585-1.
Phaisangittisagul
,
E.
,
2016
:
An analysis of the regularization between L2 and dropout in single hidden layer neural network. Proc. Int. Conf. on Intelligent Systems, Modelling and Simulation, Bangkok, Thailand, Institute of Electrical and Electronics Engineers
,
174
179
, https://doi.org/10.1109/ISMS.2016.14.
Qing
,
S.
,
J.
Zhang
,
T.
Cui
, and
Y.
Bao
,
2013
:
Retrieval of sea surface salinity with MERIS and MODIS data in the Bohai Sea
.
Remote Sens. Environ.
,
136
,
117
125
, https://doi.org/10.1016/j.rse.2013.04.016.
Ratto
,
C. R.
, and Coauthors
,
2019
:
OceanGAN: A deep learning alternative to physics-based ocean rendering. Int. Conf. on Computer Graphics and Interactive Techniques, Los Angeles, CA, Association for Computing Machinery, 89
, https://dl.acm.org/doi/abs/10.1145/3306214.3338559.
Reul
,
N.
,
J.
Tenerelli
,
B.
Chapron
,
D.
Vandemark
,
Y.
Quilfen
, and
Y.
Kerr
,
2012
:
SMOS satellite L-band radiometer: A new capability for ocean surface remote sensing in hurricanes
.
J. Geophys. Res.
,
117
,
C02006
, https://doi.org/10.1029/2011JC007474.
Siahkoohi
,
A.
, and Coauthors
,
2019
:
Deep-learning based ocean bottom seismic wavefield recovery
.
SEG Tech. Prog. Exp. Abstr.
,
2232
2237
, https://doi.org/10.1190/segam2019-3216632.1.
Srivastava
,
N.
,
G.
Hinton
,
A.
Krizhevsky
,
I.
Sutskever
, and
R.
Salakhutdinov
,
2014
:
Dropout: A simple way to prevent neural networks from overfitting
.
J. Mach. Learn. Res.
,
15
,
1929
1958
.
Urquhart
,
E. A.
,
B. F.
Zaitchik
,
M. J.
Hoffman
,
S. D.
Guikema
, and
E. F.
Geiger
,
2012
:
Remotely sensed estimates of surface salinity in the Chesapeake Bay: A statistical approach
.
Remote Sens. Environ.
,
123
,
522
531
, https://doi.org/10.1016/j.rse.2012.04.008.
Zhang
,
Q.
,
H.
Wang
,
J.
Dong
,
G.
Zhong
, and
X.
Sun
,
2017
:
Prediction of sea surface temperature using long short-term memory
.
IEEE Geosci. Remote Sens.
,
14
,
1745
1749
, https://doi.org/10.1109/LGRS.2017.2733548.

Footnotes

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).