## 1. Introduction

The discovery of the artificial neural network (ANN) technology from a confluence of different scientific and nonscientific disciplines has given birth to a powerful research tool with applications cutting across disciplines. In the geoscience world, the ANN technique has enjoyed enhanced patronage in generating precise information on many geoscientific problems that border on control, prediction, inversion, classification, pattern recognition, and data compression, which were hitherto solved with mathematical and statistical techniques with limited accuracy. Information generated by the ANN procedure is free from the known limitations of the preexisting, empirically based modeling tools because the ANN technique does not need mathematical equations to guide its operation. Instead, the ANN extracts information directly from a series of input–output data pairs without any prior assumptions about their nature and interrelations (Mandal et al. 2009; Moghaddam et al. 2010; Spichak et al. 2011). To make reliable predictions, the requisite condition that the ANN method needs while solving complicated problems is to have a large volume of input–output data pairs for it to use in training the network instead of mathematical equations and other empirical relations (Ali Akcayol and Cinar 2005).

Currently, the ANN technique is the most popular artificial learning tool in the geosciences, with applications including automatic seismic wave arrival time picking (Dai and Macbeth 1994; Gentili and Michelini 2006), subsurface temperature estimation and extrapolation (Spichak and Zakharova 2009; Spichak et al. 2007; Spichak et al. 2011), contaminant concentration and distribution (Gemitzi et al. 2009; Al-Mahallawi et al. 2012), resistivity modeling and inversion (Calderón-Macías et al. 2000; Manoj and Nagarajan 2003; Singh et al. 2005; Maiti et al. 2011), reservoir property estimation and characterization (Calderón-Macías et al. 2000; Aminian and Ameri 2005), lithologic boundary discrimination (Maiti and Tiwari 2009, 2010), disaster forecasting (Marzban and Stumpf 1996; Marzban 2000; Böse et al. 2008), weather prediction (Hayati and Mohebi 2007; Hayati and Shirvany 2007; Jin et al. 2008), and many others. The awareness created by the capabilities of the ANNs in solving problems with diverse complexity has attracted immense interest from researchers in both applications and model designs.

Neural networks have the capacity to solve a wide range of problems that was hitherto not satisfactorily solved using the traditional mathematical or statistical tools irrespective of the inherent level of complexity in the problem. The feed-forward, multilayer neural networks (FFMNNs) that utilize function approximation technique in solving problems are very outstanding in solving complex problems because of its structural flexibility, good capabilities, and presence of a large pool of algorithms (Negnevitsky 2005; Stathakis 2009; Sharma and Chandra 2010). Neural network architecture and topology are the major factors that influence the accuracy of results, generalization capacity, and training speed (Stathakis 2009; Sharma and Chandra 2010). Stathakis (Stathakis 2009) has documented the procedures for selecting optical network topology needed for tackling any problem. They include the traditional trial and error, heuristic search, exhaustive search, and constructive method, as well as using a synergy between neural networks and genetic algorithms, which automatically searches for optimally performing topology based on a novel fitness function.

The conventional trial and error method is the most popular training method commonly used by many ANN users. The procedure involves manual selection of a network topology that is usually implemented using a trial and error approach. Such networks sometimes lack the needed capacity to yield optimal results especially if designed by inexperienced users (Stathakis 2009) and compulsorily requires a large volume of data for use in training the network. Performance optimization is a very important consideration that must be made while planning to design a multilayer perceptron (MLP) network that can optimally solve problems. Networks with minimal complexity and optimal performance capability are the most desirable for the ANN user community. Automated design network architecture has recently been introduced and is becoming very popular considering the level of desirability (Maiti et al. 2011). A major problem that has continued to plague the manual design approach is that all factors that can influence network performance (network topology; the nature of the applicable activation function for each node; type of optimization method; and the values of the training parameters such as epoch, learning rate, momentum, initial weight adjustment procedure, and so on) must be properly set before such networks can perform optimally. Otherwise, common problems such as overfitting or underfitting of data, which can lead to poor generalization, should be expected (Marzban 2000). This is contrary to the automated design procedures in which these factors change dynamically during the training process as the network automatically searches for a more satisfying solution to the problem to be solved (Stathakis 2009; Sharma and Chandra 2010).

Motivated by the recent successful application of the ANN procedure to estimate subsurface temperature from MT data and borehole thermograms reported by Spichak et al. (Spichak et al. 2011) and some of the references therein, we decided to test the method in the Tattapani geothermal field in central India. Our main objective is not to develop a new method but to adapt an existing constructive back-propagation (CBP) neural network algorithm to estimate subsurface temperatures from a small volume of MT data and borehole thermograms. A single, hidden multilayer perception feed-forward neural network (SLMLPNN), which works automatically using the principle of random weight initialization process, was implemented in the neutral network toolbox available in MATLAB (Demuth and Beale 2002). The constructive neural networks (CNNs) belong to a group of algorithms with adaptive structure. Such algorithms normally solve problems from the primitive perceptron model and dynamically increase the node population in steps of one until the maximum number computed from the data structure is attained (Ash 1989; Parekh et al. 2000; Sharma and Chandra 2010). This architectural adaptation can only stop if any of the preset stopping conditions (the upper bound number of nodes calculated from the size of the training dataset, the initialization conditions, and the minimum threshold performance) have been attained. In any case, the best optimized solution will be returned.

## 2. A review of some methods of estimating subsurface temperature

_{b}) with the conductivity of the saturating fluid (σ

_{w}) and the solid rock matrix according to Equation (1) aswhere

*a*,

*m*, and

*n*are formation-dependent constants;

*S*is the saturation index; and σ

_{s}is the conductivity of the rock matrix, which was assumed to be nonexistent in the original Archie’s law. The conductivity of the solid phase of a rock matrix (σ

_{s}) has been found to be dependent on its absolute temperature especially under laboratory conditions according to the semiconductor equation aswhere σ

_{o}is a constant that represents the conductivity of a theoretically infinite temperature,

*E*

_{i}is the activation energy, and

*k*is Boltzmann’s constant (Meju 2002). Thus, by combining these two equations, temperature variations inside the Earth’s interior can be estimated if the nature of subsurface materials is known. Although this technique is still widely used especially in shallow subsurface investigations, absence of requisite information on the level of heterogeneity that prevails inside deeper layers of Earth and contradictions in results observed under laboratory and field conditions seriously degraded researchers’ confidence on the use of this relationship in estimating subsurface temperatures at deeper depths.

*ρ*

_{T}and

*ρ*

_{o}are the resistivities at temperature

*T*and some reference temperature

*T*

_{o}. The empirical constants α and

*β*are also defined at temperature

*T*

_{o}.

*T*

_{Z}is the temperature (°C) at depth

*Z*(km),

*T*

_{S}is the surface temperature (°C),

*Q*

_{S}is the surface heat flux,

*k*is the thermal conductivity of the Earth materials, and

*A*

_{S}is the rate of radioactive heat production. Insufficient knowledge about the nature of the materials inside Earth’s interior and the associated prevailing extent of heterogeneity have not permitted reliable estimates to be made. In some cases, absolute lack of correlation between results made under laboratory and field conditions degraded researcher’s confidence on the workability of these assumption-laden relationships.

Spichak et al. (Spichak et al. 2007; Spichak et al. 2011) and Spichak and Zakharova (Spichak and Zakharova 2009) reported an entirely new approach of estimating subsurface temperatures from magnetotelluric (MT) data and adjacent borehole (BH) thermograms. Their procedure was based on using an ANN that has been trained on the basis of correspondence between the MT data and BH thermograms. An ANN is a functional imitation of the fascinating and highly successful processing powers of the natural biological neuron. Just like the biological neurons, ANNs have the capability to accept multiple inputs from several sources, integrate, and simultaneously process all the inputs using its parallel processing capabilities (Bas and Boyaci 2007; Maiti and Tiwari 2010). The ANNs are nonlinear and entail a completely data-driven, modern data modeling procedure that does not require any initial model to stand on. It is now an efficient tool for modeling complex and nonlinear geophysical problems that cannot be successfully handled by conventional empirical equations (Maiti et al. 2007; Maiti et al. 2011). Estimates made by the ANNs are reliable because they are free from problems associated with using empirical relations since the ANNs do not require any mathematical equation to work with but rather extract information directly from sets of the input–output pairs of examples (in the case of supervised learning) (Maiti et al. 2007; Mandal et al. 2009; Maiti and Tiwari 2010; Maiti et al. 2011).

## 3. Physiography of the study area

The Tattapani geothermal field is located in the Surguja district, Chhattisgarh, central India. It is located at the junction where the Satpura and the Mahanadi mobile belts meet the east-northeast–west-southwest trending megatectonic Son–Narmada–Tapti (SONATA) lineament zone. Thermal manifestations in the area are in the form of hot-water springs discharging hot water at temperatures of 50°–98.5°C, with siliceous sinter deposited at the base of the springs (Shanker et al. 1987; Jain et al. 1995). All these manifestations occur in marshy grounds and hydrothermally altered clay zones, covering an area of about 0.1 km^{2} (Shanker et al. 1987).

The study area falls within the southern margin of Tattapani–Ramkola coalfield, at the contact of Archaean rocks with the Lower Gondwana rocks. The thermal activity is controlled by the east-northeast–west-southwest Tattapani fault and other northeast–southwest trending cross faults. The Tattapani fault separates the Archaean rocks exposed in the southern side from the lower Gondwana group exposed toward the northern and northwestern parts of the fault (see Figure 1). The Proterozoic basement rocks comprise gneisses, diorites, biotite schists, actinolite–tremolite schists, kyanite–sillimanite schists, granulites, amphibolite bands of phyllites, quartzites, and graphites in some places while the lower Gondwana group consists of sandstones and shales. Effects of shearing are clearly manifested in the rocks near the main Tattapani fault. Most of the numerous hot springs present in the area are located along the Tattapani fault (Sarolkar and Das 2006).

## 4. MT and borehole temperature data acquisition and analysis

^{3}to 10

^{−3}Hz between 1998 and 2000 by the Magnetotelluric Division of the National Geophysical Research Institute (NGRI) of India (Harinarayana et al. 2000). The raw field data were filtered and processed to remove noise and stored in electronic data interchange (EDI) formats using the magnetotelluric processing (MAPROS) code. Robust processing techniques were used to generate estimates of apparent resistivity and phase versus frequency from the raw time series data. Static shift and other dimensionality indicator analyses were performed by MT Division of NGRI. Details can be found in Harinarayana et al. (Harinarayana et al. 2000) and Veeraswamy and Harinarayana (Veeraswamy and Harinarayana 2006). Despite these desirable data treatments, all the datasets were visually checked for problems associated with static shift by directly checking for consistency and correlating adjacent datasets. The lack of static shift problems in the curves that were finally used in this study (see Figure 2) is in accordance with the findings of Harinarayana et al. (Harinarayana et al. 2000). In spite of the seemingly lack of static shift problems as deduced from qualitative assessment and correlation of the curves, the data were still subjected to quantitative dimensionality assessment. From the results of the quantitative dimensional analysis, any data that were observed to be under the influence of 3D structures were summarily dropped. Dimensionality assessment of the data was performed by using the rotationally invariant amplitude skewness (Skew

_{S}) parameter of Swift (Swift 1967). According to Swift (Swift 1967), the Skew

_{S}parameter computed from Equation (5) should ideally be 0 for 1D and 2D structures but, in environments where the subsurface is dominated by 3D structures, the Skew

_{S}parameter will ideally be greater than zero (Skew

_{S}> 0) (Rybin et al. 2008),Results observed are shown in Figure 3 where it can be seen that for frequencies above 1 Hz, the Skew

_{S}values are well below 0.1 (<0.1) (the dashed line in Figure 3), which is suggestive of a subsurface that is dominated by 1D/2D structures. Skew

_{S}values observed for frequencies below 1 Hz were generally greater than 0.1 (Skew

_{S}> 0.1), suggesting that 3D structures dominate the subsurface at these depths. Based on the results of earlier MT studies of the area (Harinarayana et al. 2000), regional tectonics, and the observed Skew

_{S}values, the shallow subsurface was assumed to be predominantly 1D. The vertical magnetic field (

*H*

_{z}) records that were observed to be noisy were also dropped.

Some MT data were randomly selected and some of the data processing procedures already conducted by the MTS data processing unit of NGRI were repeated. The repetition was done in order to compare the two sets of results and consequently develop confidence in the other datasets. The data processing steps that we repeated included robust single-station processing to downweight or discard outliers of impedance estimates at each frequency band of measurement (Sutarno and Vozoff 1989; Jupp and Vozoff 1997); transforming the remaining data from time domain to frequency domain using the fast Fourier transform algorithm; and computing cross correlations between pairs of mutually orthogonal magnetic (*H*) and electric (*E*) field intensities measured along the *x* and *y* directions (*H*_{x}–*E*_{y} and *H*_{y}–*E*_{x}), respectively. In some cases, spectral smoothening was performed and stacked for impedance estimation. Data quality was generally good in many stations, except within the frequency range of 0.1–5 Hz, where the intensity of the natural electromagnetic signals is usually weak compared to the other frequencies in the spectrum (Simpson and Bahr 2005).

One-dimensional modeling of the data was performed using the Geotools software package (Geotools 1997) where the Occam linearized inversion scheme (Constable et al. 1987) and the Marquardt inversion scheme (Marquardt 1963) were implemented. The procedure is simple and fast, and it generated the vertical distribution of subsurface resistivity of an assumed horizontally layered Earth as a function of depth (Harinarayana et al. 2004). The layered model obtained from the Occam inversion along with qualitative study results and geological information of the area were used to assume an initial model for Marquardt inversion. Figure 4 shows the 1D model of subsurface variation of electrical resistivity with depth obtained at all the MT sites.

Ten borehole thermograms recorded within the shallow depth range of 100–500 m in the study area were acquired from published work of Shanker et al. (Shanker et al. 1987). Most of the temperature recording stations selected were located in close proximity to the MT stations (Figure 1b). The resistivity and temperature variation with depth curves (Figure 5) were digitized in a 10-m interval. Maximum temperature observed from the temperature records was 112°C at 500-m depth, although higher temperature values have been reported in the same area at shallower depths (Sarolkar and Das 2006). Heat flow measurements have been conducted in the area by researchers such as Shanker et al. (Shanker et al. 1987) and Jain et al. (Jain et al. 1995), and results show that peak values of heat fluxes are as high as 300 m W m^{−2}. Average heat flow flux is about 190 ± 50 m W m^{−2}, which is much higher than the global average value of 60 m W m^{−2} (Jain et al. 1995).

## 5. An overview of the CNN algorithms

The constructive neural network algorithms, unlike the conventional fixed structure networks, have the inbuilt capacity to modify the network structure during training in order to cope with the challenges posed by complex problems. Excellent reviews on the types and advantages of using CNN algorithms in solving different problems irrespective of complexity can be found in the works of Kwok and Yeung (Kwok and Yeung 1997), Parekh et al. (Parekh et al. 2000), Islam and Murase (Islam and Murase 2001), and Sharma and Chandra (Sharma and Chandra 2010). The CNNs usually start solving problems with minimum network topology (minimum number of layers, nodes in the hidden layer, and connections) and dynamically increase the node, layer, and connection population by unity as may be required before a problem is optimally solved. The cascade-correlation algorithm (CCA), dynamic node creation (DNC), and hybrid algorithms are famous members of the CNN family (Islam and Murase 2001; Sharma and Chandra 2010).

The DNC algorithm (Ash 1989) is an efficient member of the CNN family and was introduced to overcome the problems associated with learning using the back-propagation algorithm including the problem of slow learning rate and rigid topology, the step size problem, the local minima problem, and the moving target problem. The DNC algorithm originally constructs a single, hidden-layer artificial neural network with a zero node in the hidden layer (the naive model); tests the naive network on the problem; and, in the event of failure, automatically adds one node (or batch of nodes) to the existing network and restarts the tests all over. Once an optimum number of nodes in the hidden layer have been found, it is connected to the output side weights. Advantages of using the DNC algorithm include rapid learning rate, establishment of a self-modified network size and topology, and the ability to retain its inbuilt structures even when there is a change in the training dataset. Furthermore, it does not require back propagation of failure signals. The original DNC algorithm has undergone several modifications because it was observed to have difficulty in learning complex problems (Islam and Murase 2001).

The CBP algorithm, for instance, is a modified version of DNC algorithm introduced by Lehtokangas (Lehtokangas 1999). CBP is fast becoming a very popular tool because, apart from enjoying all the benefits attributed to the DNC, it is known to have a simple implementation procedure. The CBP also has the advantage of being suitable for use in fixed size networks as well as its documented ability to utilize stochastic optimization routines (Sharma and Chandra 2010). The CBP is usually designed with only one hidden layer and the traditional method of back propagating the error signal is retained, thereby creating a single-layer, feed-forward neural network (SLFFNN). Such advantages increase the generalization potential of SLFFNN with minimal network architecture and thus influenced our choice in our present study.

## 6. Data pairing, scaling, and model parameterization

Testing of the network was performed using two different datasets: 1) temperature variations with depth derived from borehole thermograms and 2) resistivity variation with depth obtained from MT records. The procedure requires that the data pairs (subsurface resistivity variation and their corresponding borehole temperature) to be from the same depth (Spichak and Zakharova 2009; Spichak et al. 2011) and close to each other. Consequently, nearby MT and BH stations were paired as shown in Figure 1b. Distance between the MT–BH pairs was variable but generally less than 2 km. The option of using closely paired MT–BH data necessitated by the desire to minimize any extraneous effect that wide MT–BH separation could have introduced into the performance of the network. According to Spichak (Spichak 2006), when the MT–BH pairs are widely spaced, the network can be compelled to work in interpolation mode. Moderately wide MT–BH separation (~3 km) was deliberately introduced into the pairing procedure so that geographic factors can be incorporated into the input data and, consequently, forced the network to function in extrapolation mode. The moderately widely spaced set of MT–BH data pairs was included in order to test the function approximating powers of the good network when an optimum architecture has been found (Whiteson and Stone 2006).

*X*

_{i}were converted to standard scores

*Z*

_{s}using the mean

*μ*and standard deviation σ of the entire dataset (Larsen and Marx 2000; Carroll and Carroll 2002) [see Equation (6)] before they was partitioned into the three mutually exclusive subsets of training, validation, and test sets,The standardization process was performed in order to ensure that all inputs were scaled down to values that vary between −1 and +1. This procedure is a standard precautionary measure usually adopted in ensuring that the wide variations in the original values of the input dataset do not force the network to go into saturation while processing the data (Maiti and Tiwari 2010). The “prestd.m” code that is available in the neural network toolbox of MATLAB was used to achieve this. The training subset was further preprocessed using the “prepca.m” code in order to filter out redundant data in the training dataset. Additionally, the validation and test datasets were preprocessed using the “transmit.m” code.

Partitioning the data into three mutually exclusive blocks was necessary in order to guard against the problem of data overfitting associated with back-propagation algorithms. The phenomenon of overfitting can occur if the network “overlearns” and consequently “memorizes” the training dataset such that it cannot make good generalization when confronted with new datasets. Demuth and Beale (Demuth and Beale 2002), Maiti et al. (Maiti et al. 2007; Maiti et al. 2011), Maiti and Tiwari (Maiti and Tiwari 2010), and many other researchers have described how the early stopping technique can improve the generalization capability of a network by eliminating overfitting problems. By utilizing the early stopping method, a threshold learning condition can be set, under which the network can stop the learning process. The early stopping procedure ensures that the network stops learning whenever the best-fitting model beyond which the network cannot make any reasonable increase in the error function is attained. In practice, this is done by randomizing all input data and partitioning it into three mutually exclusive subsets, as mentioned above. Usually a minimum of 50% of all data in the database must be reserved for training, during which the network builds the network architecture needed for 1) solving the problem and 2) training the network on how to solve the problem at hand (Demuth and Beale 2002; Maiti et al. 2007). The output of the training set is used by the network in optimizing the number of neurons in the hidden layers, developing and appropriately adjusting the connection weights of the neurons, and consequently boosting its predictive potentials. Consequently, we set aside 52% (174) [and later 61% (203)] of all data in the database exclusively for training. The remaining data in the database were shared at 65% for validation and 35% for testing. The validation set was used by the network to fine tune its topology with a view to sharpening and clearing any gray areas in its predictive powers (Negnevitsky 2005; Larochelle et al. 2009). It is not directly used for training but is deployed for continuous monitoring of its error during the process of training such that, if the error attains a preset minimum condition, it stops further training of the network. This cross-validation scheme prevents the network from memorizing the training dataset and consequently left in a state that it can make acceptable generalization (Maiti et al. 2007). The testing dataset that consists of data from MT7-BH10 and MT11-BH30 MT–BH data pairs were either fed into the network in full correspondence with a training dataset of 174 (or 52%) or fed into the network one by one. In the latter option, the remaining parts of the testing dataset were usually added to the training dataset, thereby increasing the percentage of the training dataset to 61%.

Threshold upper performance benchmark was fixed at 10% and the initialization conditions that consist of 35 trials for any of the randomly generated numbers (RGNs) were set as variables. The network was set into running mode (training) by using a conditional statement to search for the best fitting initialization conditions [number of trials (Ntrials) and the preset random number] for a particular number of nodes in the hidden layer. These initialization conditions were used by the network to test the adequacy of a range of nodes (*N*) in the hidden layer (in our data, *N* varies from 0 to 9) for optimal performance by the network. For each initialization condition and number of nodes in the hidden layer, the network performance was rated. In the event of the performance index satisfying the preset threshold performance target, the training process terminates and results were displayed in both tabular and graphical forms. Besides the computed performance rating, visual inspection of the graphically presented results was also useful in qualitatively assessing the performance of the network.

## 7. Network design and adaptation

The neural network toolbox package in MATLAB (Demuth and Beale 2002) was used in designing a structurally flexible SLFFNN. The network was optimally designed to take in four inputs and automatically search for the best performing network topology to solve the problem for each given set of initialization conditions (Ntrials and random number). A generalized form of the SLFFNN code that was modified to fit into our problem can be found online (at http://www.mathworks.com/matlabcentral/newsreader/view_thread/308972). The various steps that the SLFFNN code used in executing the commands have been described in the work of Negnevitsky (Negnevitsky 2005) and Sharma and Chandra (Sharma and Chandra 2010). The original SLFFNN algorithm was modified to automatically search for the best initialization condition in a simple loop pattern using preset maximum Ntrials of 35 for each random number so that the best performing random number and the corresponding Ntrials cycle can be selected at the end of the cycle. Thus, depending on the intended accuracy, the search for a better performing random number candidate can be continued so long as the preset threshold performance target has not been attained. Standard performance rating tools such as relative error (ɛ), adjusted coefficient of determination (R^{2}a), absolute average deviation (AAD), root-mean-square error (RMSE), and postregression analysis were used to assess the performance of each random number so that, at the end of each round of trial, the best (or near best) performing random number can be selected.

## 8. Methodology of subsurface temperature estimation

The problem of designing a structurally flexible network that can be used to estimate subsurface temperatures from a small volume of MT–BH data pairs of data, which the conventional manually designed MLP networks cannot solve satisfactorily, is what this study is out to address. Consequently, we attempted to solve the problem using both the conventional manually designed MLP and the SLFFNN. A static structured MLP network was designed and tested with the data. The input layer of the manually designed network had five inputs followed by two hidden layers while the output layer had one fixed neuron. The number of neurons in the first hidden layer was fixed at 20 while the number of neurons in the second hidden layer was 15. The learning rate (α) and momentum (*β*) were kept at 0.01 and 0.9, respectively. The MLP network was taught repeatedly with the goal of attaining a threshold performance benchmark of 5% using ɛ as performance indicator.

A structurally flexible, single-layer, feed-forward neural network was designed. The actual architecture of the intended feed-forward, back-propagation MLP was not predefined before training began. However, the structure of the input and output layers were uniquely known since they depend on the structure of the input and output data pairs. Thus, the input layer was structurally designed to have five neurons. The neurons were designed to take in three position coordinates of nearby boreholes and MT locations, resistivity, and temperature values at different depths. The position coordinates consist of average values of longitudes and latitudes of the MT and borehole positions and depth and one neuron in the output layer. The single neuron in the output layer was responsible for the transmission of all results (temperature) from the network to the outside world.

*j*th neuron in a hidden layer (

*l*),

*i*th node in hidden layer (

*l*) as received from another node in the (

*l*− 1)th layer,

*i*th node in layer (

*l*− 1) and

*j*th node in the layer (

*l*), and

*j*in hidden layer

*l*. Output signals from the

*j*th node in the hidden layer (

*l*) was transferred to the (

*l*+ 1)th layer by a nonlinear log–sigmoid transfer function. The choice of this particular function is influenced by its well-established continuous differentiable and monotonically increasing smooth step function properties, which fit them into a group of functions with desired properties (Hagan et al. 2002). On the other hand, the linear transfer function was designated to transmit signals from the network to the outside world. The nonlinear log–sigmoid transfer function ensures that all variable inputs (or outputs from) into the nodes are fully transformed from their original plus and minus infinity variations to their binary equivalents. For instance, if the net inputs into node (

*j*) in the hidden layer (

*l*) is

*j*is transformed according to Equation (8) aswhere

*f*

_{j}is the log–sigmoid transfer function and is mathematically defined aswhere

*x*is an input parameter and

*e*denotes the natural logarithm (Benaouda et al. 1999; Van der Baan and Jutten 2000; Negnevitsky 2005; Spichak et al. 2011). The linear transfer function does not transform the outputs from the output neurons in any way but rather ensures that they are transmitted directly with their original variations preserved.

*j*(

*O*

_{j}) are compared with the known outputs (

*D*

_{j}) and the overall network error

*E*summed over all the output nodes, and the samples were internally computed usingThe local error e from only the

*j*th node in hidden layer (

*l*) can be calculated according to Benaouda et al. (Benaouda et al. 1999) and Maiti et al. (Maiti et al. 2007) asIn the network training process, the network minimizes

*E*by systematically changing the strength of the connection weights

## 9. Network performance evaluation

Reputable statistical performance testing indices were adopted to assess and evaluate the performance of the network. These indices include (i) adjusted coefficient of determination (R^{2}a), (ii) relative error (ɛ), (iii) RMSE (d) AAD, and (iv) posttraining regression analysis.

^{2}a is a statistical parameter that measures the extent of fit between two variables. If the magnitude of R

^{2}a is equal to 1, then the two variables are exactly the same (Demuth and Beale 2002; Sin et al. 2006; Maiti et al. 2007; Moghaddam et al. 2010). The R

^{2}a values are computed using the expressionwhere

*T*

_{known}is the known temperature at a given depth observation point

*i*,

*T*

_{predicted}is the estimated temperature at the same observation point

*i*,

*T*

_{known}, and

*N*is the total number of observation data points in the dataset (Gemitzi et al. 2009; Moghaddam et al. 2010). For a given random number R

^{2}a, other performance tests were calculated for all Ntrials made by the network to find a suitable structure for the network architecture. The standard deviation, mean, median, and variance of all R

^{2}a values were also computed and plotted against number of nodes in the hidden layer (see Figure 6, top and middle). This was done to check the robustness and sensitivity of the predictions made by the network to the different datasets.

^{2}a should be as close to unity as possible.

Posttraining regression analysis was also performed by testing the dataset to assess how the known (actual) output and the predicted network outputs are quantitatively related, formally establishing their reliability limit. Demuth and Beale (Demuth and Beale 2002) pointed out that perfectly estimated network outputs will have a one-to-one correspondence with their target outputs. On graphical analysis, the graph will have a linear slope and R^{2}a values of unity and zero intercept with each other. The three parameters that we generated—namely, the slope, R^{2}a, and intercept—are as shown in Table 1. The results indicate that the slope (*a*) and the adjusted coefficient of determination (R^{2}a) are close to unity in all the cases (the lowest R^{2}a value is 0.979), while the *y* intercept (*b*) is close to 0 (the worst-case value is −2.425). In some cases, the limited data volume affects the generalization capability of the network such that they are a bit higher or lower.

Summary of fluctuations in network performance for different volumes of input data.

## 10. Discussion of results

The performance of an SLFFNN MLP has been optimized and adapted for solving a nonlinear problem of subsurface temperature estimation from a limited volume of MT–BH data pairs using the CBP algorithm. The CBP algorithm bequeaths to the network architecture an adaptable structure, which enables it to automatically adjust its topology to meet with the challenges of the problem to be solved. The static-structured manually designed network performs well during training but fails to make good generalization when confronted with the test data (see Figure 7). This observation suggests that either the volume of the training dataset might not have been large enough for proper learning or the number of hidden neurons that was manually fixed in the network for the purpose of acquiring the needed intelligence for use in solving the problem might have caused the network to overlearn and consequently fail to make good generalization when confronted with the testing dataset. The possibility of the latter option being responsible for the poor performance of the manually designed network was ruled out because the performance of the network was satisfactory during the validation phase. Although it has been pointed out by many researchers including Lawrence (Lawrence 1994), Helle et al. (Helle et al. 2001), Stathakis (Stathakis 2009), and others that there is no actual rule to guide users in selecting an optimal number of hidden layers and neurons, it thus appears that allowing the network to adjust its topology to cope with the challenges of the data to be solved is a better practice in network design.

Solving problems from any dataset containing limited volume of data has continued to pose serious challenges to the neural network users’ community since neural networks require a large volume of data pairs from which to learn (see Moghaddam et al. 2010). Therefore, a structurally flexible network can be deployed to solve a wide range of problems including those involving limited volume of data, although more tests need to be performed to ascertain the level of reliability. The automated search for the optimal number of neurons in the hidden layer from zero (the naive model) with corresponding increase in R^{2}a values toward unity suggest that the network is structurally flexible.

Figure 6 (bottom) shows graphical plots of estimated and known subsurface temperatures from the SLFFNN. The SLFFNN network seems to have performed well even with the limited volume of data used in training the network. The gradual build up in the number of neurons in the hidden layer and their corresponding performance can be seen in the computed values of R^{2}a plotted in Figure 6 (top). The R^{2}a values approach unity when the maximum number of neurons in the hidden layer computed from the size of the training dataset has been loaded in each case. The graph shows variations in the mean, median, maximum, minimum and standard deviation of R^{2}a values for different number of nodes in the hidden layer. From the statistical assessment of network performance, the lowest value of R^{2}a was determined to be 0.979 while corresponding values of AAD, RMSE, and ɛ were observed to be 3.746, 1.467, and 4.09, respectively. This shows that the topology of the network is suitable for it to make acceptable predictions as can be seen in the form of good matching patterns between the predicted and known temperatures from the borehole thermograms (see Figure 6, bottom). The good matching pattern observed between the observed and estimated temperature readings, even with the low volume of data used for training the network, is suggestive of a high approximating capacity network that is typical of reinforcement learning in neural networks (Sutton and Barto 1998; Ferrari and Stengel 2005; Zainuddin and Pauline 2008). Therefore, a better performance level that is associated with the random weight initialization and degrees of freedom procedure that this procedure used in tackling the problem can be achieved when the volume of data is large enough to properly train the network.

Changes in subsurface structural geology (e.g., faults, fractures, etc.) and hydrogeological conditions are known to adversely affect the normal subsurface conductivity distribution pattern (see Spichak et al. 2011). Consequently, estimates of subsurface conditions made in such heterogeneous environments are likely to be fraught with errors and therefore unreliable (Spichak et al. 2011). Function approximating procedures that are usually implemented by the SLFFNN could serve as a reliable tool in predicting subsurface conditions in such heterogeneous environments. Neural networks with adaptive structures have been reported to be characterized by high function approximating capability and consequently high-performance ratings (Sharma and Chandra 2010; Qing-Lai et al. 2010; Nicoletti et al. 2009). The ability of the SLFFNNs to optimally estimate subsurface temperatures in spite of the constraints imposed by performance limiting factors such as the small volume of the training dataset, the geological complexity prevalent at the site and the sometimes wide separation of the MT–BH data pairs is indicative of a high-performance network. Such level of nonlinear problem solving capability, even with a small size of data, is characteristic of networks with good topological representation that the adaptive structure of the SLFFNN was able to achieve (Whiteson and Stone 2006; Lagoudakis and Parr 2003). Thus, the predicted network outputs that were made from nonlinear parameters without any established theoretical relationship (see Figure 5) is indicative of a network with satisfactory number of neurons in the hidden layers as well as proper choice of activation function (Van der Baan and Jutten 2000; Chen and Yang 2005).

The influence of the quantity of data volume on training, which has been reported by Spichak et al. (Spichak et al. 2011), and the predictive powers of the network were investigated by reducing the volume of the testing dataset by 50% and increasing the quantity of the training dataset to 62%. The results of the repeated trainings for each half and full dataset are shown in Table 1. In all the cases, better performance is obtained when the MT–BH data pairs used for training the network was more than 60%. In the case of our limited volume of dataset, the tabulated results (see Table 1) try to capture the little improvement in network performance with increase in the volume of the training dataset. This indicates that better predictions can always be obtained from the network if a greater number of MT–BH pairs is used in the network training phase. Consequently, the anticipated influence of changes in subsurface structural, hydrogeological, and geological conditions can be drastically reduced since the network is expected to generate enough neurons that can tackle the problem head on. Thus, the training methodology has a significant influence on the accuracy of the predictions made by the network.

In the process of searching for optimal initialization conditions, the network tests all available nodes computed from the size of the training data structure using the random weight initialization and degree of freedom scheme. The network usually starts the testing process with a zero node (the naive model) in the hidden layer and runs the test until all the nodes are exhausted. This process may last for some time, although it is considerably less than the time usually spent in training manually designed networks by the trial and error method (see Table 1 in Moghaddam et al. 2010). For instance, a continuously running set of 35 trials for each random number and node population in the hidden layer usually returns satisfactory predictions within 45 min. Once such conditions are found, the process of training the network is usually fast with repeatable results. Thus, the approach, unlike what has been reported in the literature (see Moghaddam et al. 2010), is very economical with time, although the overall time tends to be dependent on the desired level of accuracy. The search for optimal initialization conditions to solve the problem normally starts with the network attempting to solve the problem without any node in the hidden layer corresponding to Rosenblatt’s primitive perceptron (constant or naive) model (Rosenblatt 1962; Negnevitsky 2005). If the residual prediction error is greater than the preset tolerance limit after 35 attempts (Ntrials) for a particular number of nodes in the hidden layer and random number, the network automatically increases the node population by one and restarts the search all over again. The network will keep on trying until the upper bound number of nodes in the hidden layer computed from the size of the training data structure has been exhausted. The upper bound number of nodes was 6 and 9 when the quantity of training data was 52% and 61%, respectively. For each Ntrials made with a specified number of nodes in the hidden layer, the network returns a local minimum that corresponds to a convergence point with the best value of adjusted coefficient of determination (R^{2}a). Thus, by prefixing an upper acceptable threshold value for R^{2}a (and other testing parameters) for a fixed number of Ntrials, the network was allowed to search for optimal solutions using different values of RGNs in a loop pattern. For each random number, a local minimum value for R^{2}a and other test parameters were computed. In the event that none of these local minima satisfies the preset acceptable condition, the best of all the minima can still serve as the best optimization condition. Once an acceptable initialization condition has been found, the prevailing number of nodes Ntrials and the random number become the best initialization condition for the problem.

## 11. Conclusions

We have demonstrated how a SLFFNN can be used to estimate subsurface temperatures using a small volume of resistivity data and borehole thermograms from the Tattapani geothermal field in central India. The network uses the random weight initialization and adjusted degrees of freedom approach in searching for optimal performing initialization condition for the network. The network has the capability of dynamically changing the network topology during training in order to cope with the challenges of the problem. The network performed satisfactorily well in spite of the performance limiting constraints imposed by the limited volume of MT–BH data pairs used in training the network, the nonlinearity of the problems to be solved, the wide separation of the MT–BH pairs, and the prevailing geological and structural complexities at the Tattapani geothermal field in central India.

The structural adaptability of the SLFFNN is evident in the network’s continuous and desperate search for optimum network architecture to solve the problem. The network attempted solving the problem with no node in the hidden layer corresponding to Rosenblatt’s primitive perceptron (the naive or constant) model; if the predicted result does not satisfy some preset performance conditions, it automatically increases the node population by one and repeats the trials all over. The trial can only stop if either an acceptable solution is found or the upper bound number of nodes generated from the dimension of the training dataset is exhausted, in which case it will return the near-best solution.

## Acknowledgments

The first author is grateful to the Council for Scientific and Industrial Research (CSIR) and Third World Academy of Sciences (TWAS) for cosponsoring and funding his research at the National Geophysical Research Institute (NGRI), Hyderabad, India, under the CSIR-TWAS postdoctoral fellowship. Thanks are also due to the director of NGRI for his encouragements, understanding, and interest and for kindly granting us permission to publish this work. We benefited immensely from Prof. Greg Heath of MATLAB Central, who generously shared his single-hidden-layer multilayer perceptron neural network code with us. We are also grateful to our colleagues in the Magnetotelluric Division of NGRI for their support, concern, and encouragement. Contributions, suggestions, and critical reviews made by the reviewers that shaped the original manuscript to the present form are also appreciated.

## References

Ali Akcayol, M., , and C. Cinar, 2005: Artificial neural network based modeling of heated catalytic converter performance.

,*Appl. Therm. Eng.***25**, 2341–2350, doi:10.1016/j.applthermaleng.2004.12.014.Al-Mahallawi, K., , J. Mania, , A. Hani, , and I. Shahrour, 2012: Using of neural networks for the prediction of nitrate groundwater contamination in rural and agricultural areas.

,*Environ. Earth Sci.***65,**917–928, doi:10.1007/s12665-011-1134-5.Aminian, K., , and S. Ameri, 2005: Application of artificial neural networks for reservoir characterization with limited data.

,*J. Petrol. Sci. Eng.***49**(3–4), 212–222, doi:10.1016/j.petrol.2005.05.007.Archie, G. E., 1942: The electrical resistivity log as an aid in determining some reserviour characteristics. Petroleum technology, American Institute of Mineral and Metal Engineering Tech. Publ. 1422, 8–13.

Ash, T., 1989: Dynamic node creation in back-propagation.

,*Connect. Sci.***1**, 365–375, doi:10.1080/09540098908915647.Bas, D., , and I. H. Boyaci, 2007: Modeling and optimization II: Comparison of estimation capabilities of response surface methodology with artificial neural networks in a biochemical reaction.

,*J. Food Eng.***78**, 846–854, doi:10.1016/j.jfoodeng.2005.11.025.Benaouda, D., , G. Wadge, , R. B. Whitmarsh, , R. G. Rothwell, , and C. MacLeod, 1999: Inferring the lithology of borehole rocks by applying neural network classifiers to downwhole logs: An example from the Ocean Drilling Program.

,*Geophys. J. Int.***136**, 477–491, doi:10.1046/j.1365-246X.1999.00746.x.Böse, M., , F. Wenzel, , and M. Erdik, 2008: PreSEIS: A neural network-based approach to earthquake early warning for finite faults.

,*Bull. Seismol. Soc. Amer.***98**, 366–382, doi:10.1785/0120070002.Calderón-Macías, C., , M. K. Sen, , and P. L. Stoffa, 2000: Artificial neural networks for parameter estimation in geophysics.

,*Geophys. Prospect.***48**, 21–47, doi:10.1046/j.1365-2478.2000.00171.x.Carroll, S. R., , and D. J. Carroll, 2002:

*Statistics Made Simple for School Leaders: Data-Driven Decision Making.*Rowman & Littlefield, 146 pp.Chen, D., , and J. Yang, 2005: Robust adaptive neural control applied to a class of nonlinear systems.

*Proc. 17th IMACS World Congress,*Paris, France, IMACS, T5-I-01-0911.Constable, S. C., , R. L. Parker, , and C. G. Constable, 1987: Occam’s inversion: A practical algorithm for generating smooth models from electromagnetic sounding data.

,*Geophysics***52**, 289–300, doi:10.1190/1.1442303.Dai, H., , and C. MacBeth, 1994: Split shear-wave analysis using an artificial neural network.

,*First Break***12**, 605–613.Demuth, H., , and M. Beale, 2002: Neural network toolbox for use with MATLAB handbook. The MathWorks user’s guide, 154 pp.

Ferrari, S., , and R. F. Stengel, 2005: Smooth function approximation using neural networks.

,*IEEE Trans. Neural Netw.***16**, 24–38, doi:10.1109/TNN.2004.836233.Flóvenz, Ó. G., , L. S. Georgsson, , and K. Árnason, 1985: Resistivity structure of the upper crust in Iceland.

*J. Geophys. Res.,***90,**10 136–10 150, doi:10.1029/JB090iB12p10136.Gemitzi, A., , C. Petalas, , V. Pisinaras, , and V. A. Tsihrintzis, 2009: Spatial prediction of nitrate pollution in groundwaters using neural networks and GIS: An application to South Rhodope aquifer (Thrace, Greece).

,*Hydrol. Processes***23**, 372–383, doi:10.1002/hyp.7143.Gentili, S., , and A. Michelini, 2006: Automatic picking of P and S phases using a neural tree.

,*J. Seismol.***10**, 39–63, doi:10.1007/s10950-006-2296-6.Geotools, 1997: Users guide. Geotools Rep., 448 pp.

Hagan, T. H., , H. B. Demuth, , and M. Beale, 2002:

*Neural Network Design.*PWS Publishing, 734 pp.Harinarayana, T., and et al. , 2000: Magnetotelluric investigations in Tatapani geothermal region, Surguja district, Madhya Pradesh, India. MNES Project Rep., 127 pp.

Harinarayana, T., , K. K. Abdul Azeez, , K. Naganjaneyulu, , C. Manoj, , K. Veeraswamy, , D. N. Murthy, , and S. P. E. Rao, 2004: Magnetotelluric studies in Puga valley geothermal field, NW Himalaya, Jammu and Kashmir, India.

,*J. Volcanol. Geotherm. Res.***138**, 405–424, doi:10.1016/j.jvolgeores.2004.07.011.Harinarayana, T., , K. K. Abdul Azeez, , D. N. Murthy, , K. Veeraswamy, , S. P. E. Rao, , C. Manoj, , and K. Naganjaneyulu, 2006: Exploration of geothermal structure in Puga geothermal field, Ladakh Himalayas, India by magnetotelluric studies.

,*J. Appl. Geophys.***58**, 280–295, doi:10.1016/j.jappgeo.2005.05.005.Hayati, M., , and Z. Mohebi, 2007: Temperature forecasting based on neural network approach.

,*World Appl. Sci. J.***2**, 613–620.Hayati, M., , and Y. Shirvany, 2007: Artificial neural network approach for short term local forecasting for Illam region.

*Int. J. Electr. Electron. Sci. Eng.,***1,**121–125.Helle, H. B., , A. Bhatt, , and B. Ursin, 2001: Porosity and permeability prediction from wireline logs using artificial neural networks: A North Sea case study.

,*Geophys. Prospect.***49**, 431–444, doi:10.1046/j.1365-2478.2001.00271.x.Islam, M. M., , and K. Murase, 2001: A new algorithm to design compact two-hidden-layer artificial neural networks.

,*Neural Netw.***14**, 1265–1278, doi:10.1016/S0893-6080(01)00075-2.Jain, S. C., , K. K. K. Nair, , and D. B. Yedekar, 1995: Geology of the Son-Narmada-Tapti lineament zone in central India.

*Geoscientific Studies of the Son-Narmada-Tapti Lineament Zone,*Geological Survey of India, 1–154.Jin, L., , C. Yao, , and X.-Y. Huang, 2008: A nonlinear artificial intelligence ensemble prediction model for typhoon intensity.

,*Mon. Wea. Rev.***136**, 4541–4554, doi:10.1175/2008MWR2269.1.Jupp, D., , and K. Vozoff, 1997: Reply by the authors to F. J. Esparza and E. Gómez‐Treviño.

,*Geophysics***62**, 692, doi:10.1190/1.1487031.Kwok, T.-Y., , and D.-Y. Yeung, 1997: Objective functions for training new hidden units in constructive neural networks.

,*IEEE Trans. Neural Netw.***8**, 1131–1148, doi:10.1109/72.623214.Lagoudakis, M. G., , and R. Parr, 2003: Least-squares policy iteration.

,*J. Mach. Learn. Res.***4**, 1107–1149.Larochelle, H., , Y. Bengio, , J. Louradour, , and P. Lamblin, 2009: Exploring strategies for training deep neural networks.

,*J. Mach. Learn. Res.***10**, 1–40.Larsen, R. J., , and M. L. Marx, 2001:

*An Introduction to Mathematical Statistics and Its Applications.*3rd ed. Prentice Hall, 790 pp.Lawrence, J., 1994:

*Introduction to Neural Networks: Design, Theory and Applications.*California Scientific Software Press, 348 pp.Lehtokangas, M., 1999: Modeling with constructive backpropagation.

,*Neural Netw.***12**, 707–716, doi:10.1016/S0893-6080(99)00018-0.Maiti, S., , and R. K. Tiwari, 2009: A hybrid Monte Carlo method based artificial neural networks approach for rock boundaries identification: A case study from the KTB bore hole.

,*Pure Appl. Geophys.***166**, 2059–2090, doi:10.1007/s00024-009-0533-y.Maiti, S., , and R. K. Tiwari, 2010: Automatic discriminations among geophysical signals via Bayesian neural network approach.

,*Geophysics***75**, E67–E78, doi:10.1190/1.3298501.Maiti, S., , R. K. Tiwari, , and H. J. Kűmpel, 2007: Neural network modelling and classification of lithofacies using well log data: A case study from KTB borehole site.

,*Geophys. J. Int.***169**, 733–746, doi:10.1111/j.1365-246X.2007.03342.x.Maiti, S., , G. Gupta, , V. C. Erram, , and R. K. Tiwari, 2011: Inversion of Schlumberger resistivity sounding data from the critically dynamic Koyna region using the hybrid Monte Carlo-based neural network approach.

,*Nonlinear Processes Geophys.***18**, 179–192, doi:10.5194/npg-18-179-2011.Mandal, S., , P. V. Sivaprasad, , S. Venugopal, , and K. P. N. Murthy, 2009: Artificial neural network modeling to evaluate and predict the deformation behaviour of stainless steel type AISI 304L during hot torsion.

,*Appl. Soft Comput.***9**, 237–244, doi:10.1016/j.asoc.2008.03.016.Manoj, C., , and N. Nagarajan, 2003: The application of artificial neural networks to magnetotelluric time-series analysis.

,*Geophys. J. Int.***153**, 409–423, doi:10.1046/j.1365-246X.2003.01902.x.Marquardt, D. W., 1963: An algorithm for least-squares estimation of nonlinear parameters.

,*J. Soc. Ind. Appl. Math.***11**, 431–441, doi:10.1137/0111030.Marzban, C., 2000: A neural network for tornado diagnosis: Managing local minima.

,*Neural Comput. Appl.***9**, 133–141, doi:10.1007/s005210070024.Marzban, C., , and G. Stumpf, 1996: A neural network for tornado prediction based on Doppler radar-derived attributes.

,*J. Appl. Meteor.***35**, 617–626, doi:10.1175/1520-0450(1996)035<0617:ANNFTP>2.0.CO;2.Meju, M. A., 2002: Geoelectromagnetic exploration for natural resources: Models, case studies and challenges.

,*Surv. Geophys.***23**, 133–205, doi:10.1023/A:1015052419222.Moghaddam, M. G., , F. B. H. Ahmad, , M. Basri, , and B. M. A. Rahman, 2010: Artificial neural network modelling studies to predict the yield of enzymatic synthesis of betulinic acid ester.

,*Electron. J. Biotechnol.***13**, doi:10.2225/vol13-issue3-fulltext-9.Negnevitsky, M., 2005:

*Artificial Intelligence: A Guide to Intelligent Systems.*2nd ed. Pearson, 415 pp.Nicoletti, M. C., , J. R. Bertini Jr., , D. Elizondo, , L. Franco, , and J. M. Jerez, 2009: Constructive neural network algorithms for feedforward architectures suitable for classification tasks.

*Constructive Neural Networks,*D. Elizondo et al., Eds., Studies in Computational Intelligence, Vol. 258, Springer, 1–23.Parekh, R., , J. Yang, , and V. Honavar, 2000: Constructive neural-network learning algorithms for pattern classification.

,*IEEE Trans. Neural Netw.***11**, 436–451, doi:10.1109/72.839013.Qing-Lai, W., , Z. Hua-Guang, , L. De-Rong, , and Z. Zhao Yan, 2010: An optimal control scheme for a class of discrete-time nonlinear systems with time delays using adaptive dynamic programming.

,*Acta Autom.***46**, 121–129.Rosenblatt, F., 1962:

*Principles of Neurodynamics: Perceptions and the Theory of Brain Mechanisms.*Spartan, 616 pp.Rybin, A. K., , V. V. Spichak, , V. Y. Batalev, , E. A. Bataleva, , and V. E. Matyukov, 2008: Array magnetotelluric soundings in the active seismic area of northern Tien Shan.

,*Russ. Geol. Geophys.***49**, 337–349, doi:10.1016/j.rgg.2007.09.014.Sarolkar, P. B., , and A. K. Das, 2006: Reservoir studies at Tatapani geothermal field, Surguja district, India.

*Proc. 31st Workshop on Geothermal Reservoir Engineering,*Stanford, CA, Stanford University, SGP-TR-179.Shanker, R., , J. L. Thussu, , and J. M. Prasad, 1987: Geothermal studies at Tattapani hot spring area, Sarguja district, central India.

,*Geothermics***16**, 61–76, doi:10.1016/0375-6505(87)90079-4.Sharma, S. K., , and P. Chandra, 2010: Constructive neural networks: A review.

,*Int. J. Eng. Sci. Technol.***2**, 7847–7855.Simpson, F., , and K. Bahr, 2005:

*Practical Magnetotellurics.*Cambridge University Press, 254 pp.Sin, H. N., , S. Yusof, , N. A. H. Shilkh, , and R. R. Abdu, 2006: Optimization of enzymatic clarification of sapodilla juice using response surface methodology.

,*J. Food Eng.***73**, 313–319, doi:10.1016/j.jfoodeng.2005.01.031.Singh, U. K., , R. K. Tiwari, , and S. B. Singh, 2005: One-dimensional inversion of geo-electrical resistivity sounding data using artificial neural networks—A case study.

,*Comput. Geosci.***31**, 99–108, doi:10.1016/j.cageo.2004.09.014.Spichak, V. V., 2006: Estimating temperature distributions in geothermal areas using a neuronet approach.

,*Geothermics***35**, 181–197, doi:10.1016/j.geothermics.2006.01.002.Spichak, V. V., , and O. K. Zakharova, 2009: The application of an indirect electromagnetic geothermometer to temperature extrapolation in depth.

,*Geophys. Prospect.***57**, 653–664, doi:10.1111/j.1365-2478.2008.00778.x.Spichak, V. V., , O. K. Zakharova, , and A. K. Rybin, 2007: Possibility of realization of contact-free electromagnetic geothermometer.

,*Dokl. Earth Sci.***417**,1370–1374, doi:10.1134/S1028334X07090176.Spichak, V. V., , O. K. Zakharova, , and A. K. Rybin, 2011: Methodology of the indirect temperature estimation basing on magnetotelluric data: Northern Tien Shan case study.

,*J. Appl. Geophys.***73**, 164–173, doi:10.1016/j.jappgeo.2010.12.007.Stathakis, D., 2009: How many hidden layers and nodes?

,*Int. J. Remote Sens.***30**, 2133–2147, doi:10.1080/01431160802549278.Sutarno, D., , and K. Vozoff, 1989: Robust M-estimation of magnetotelluric impedance tensors.

,*Explor. Geophys.***20**, 383–398, doi:10.1071/EG989383.Sutton, R. S., , and A. G. Barto, 1998:

*Reinforcement Learning: An Introduction.*MIT Press, 322 pp.Swift, C. M., Jr., 1967: A magnetotelluric investigation of electrical conductivity anomaly in the southwestern United States. Ph.D thesis, Massachusetts Institute of Technology, 226 pp.

Van der Baan, M., , and C. Jutten, 2000: Neural networks in geophysical applications.

,*Geophysics***65**, 1032–1047, doi:10.1190/1.1444797.Veeraswamy, K., , and T. Harinarayana, 2006: Electrical signatures due to thermal anomalies along mobile belts reactivated by the trail and outburst of mantle plume: Evidences from the Indian subcontinent.

,*J. Appl. Geophys.***58**, 313–320, doi:10.1016/j.jappgeo.2005.05.007.Whiteson, W., , and P. Stone, 2006: Evolutionary function approximation for reinforcement learning.

,*J. Mach. Learn. Res.***7**, 877–917.Zainuddin, Z., , and O. N. G. Pauline, 2008: Function approximation using artificial neural networks.

,*WSEAS Trans. Math.***6**, 333–338.