The air–sea flux of greenhouse gases [e.g., carbon dioxide (CO2)] is a critical part of the climate system and a major factor in the biogeochemical development of the oceans. More accurate and higher-resolution calculations of these gas fluxes are required if researchers are to fully understand and predict future climate. Satellite Earth observation is able to provide large spatial-scale datasets that can be used to study gas fluxes. However, the large storage requirements needed to host such data can restrict its use by the scientific community. Fortunately, the development of cloud computing can provide a solution. This paper describes an open-source air–sea CO2 flux processing toolbox called the “FluxEngine,” designed for use on a cloud-computing infrastructure. The toolbox allows users to easily generate global and regional air–sea CO2 flux data from model, in situ, and Earth observation data, and its air–sea gas flux calculation is user configurable. Its current installation on the Nephalae Cloud allows users to easily exploit more than 8 TB of climate-quality Earth observation data for the derivation of gas fluxes. The resultant netCDF data output files contain >20 data layers containing the various stages of the flux calculation along with process indicator layers to aid interpretation of the data. This paper describes the toolbox design, which verifies the air–sea CO2 flux calculations; demonstrates the use of the tools for studying global and shelf sea air–sea fluxes; and describes future developments.
The climate of Earth is sensitive to the radiative impact of a number of gases and different types of particles in the atmosphere. The atmospheric concentration of many important gases and particles is sensitive to the air–sea transfer of volatile compounds. These gases can also play a substantial role in the biogeochemistry of the oceans. It is therefore important to quantify contemporary air–sea fluxes of gases and also to provide the understanding necessary to project possible future changes in these fluxes. The air–sea fluxes of gases can in some cases be inferred indirectly, but most flux estimates depend on a calculation using a standard bulk air–sea gas transfer (e.g., as defined within Takahashi et al. 2009, hereafter T09). For each gas, this calculation depends upon measurements of the gas concentration in the surface ocean and the lower atmosphere and upon “transfer coefficients” that describe the “rate constants” for transfer across the sea surface. The simplest calculation requires only a single transfer coefficient, the gas transfer velocity.
Greenhouse gases are those that can absorb and emit infrared radiation. Of these gases, CO2 is one of the most studied and systematically observed. Increasing levels of atmospheric CO2, caused by the burning of fossil fuels and biomass, are of growing concern due to their impact on the global climate system. Understanding the pathways, sources, sinks, and impact of CO2 on the earth’s climate system is essential for monitoring climate and predicting future scenarios. The global ocean is thought to annually absorb ~25% of anthropogenic CO2 emissions (Le Quéré et al. 2015, 2014), and it constitutes the only true net sink for anthropogenic CO2 over the last 200 years (Sabine et al. 2004). The North Atlantic sink in particular has been shown to be highly variable (Watson et al. 2009) and the mechanisms driving this variability are not well understood. Therefore, isolating and reducing the uncertainties in the estimates of the oceanic sink of CO2 is a crucial goal of climate science (Le Quéré et al. 2009).
In the last decade there has been an explosion in the availability of large (>1 TB) high-quality, well-characterized, and often multisensor cross-calibrated Earth observation (EO) datasets. For example, the European Space Agency (ESA) GlobWave project (http://globwave.ifremer.fr/) produced a 20-plus-year time series of global coverage multisensor cross-calibrated wave and wind data. Similar efforts in the United States resulted in the National Aeronautics and Space Administration (NASA) project Making Earth System Data Records for Use in Research Environments (MEaSURES), which produced a 13-plus-year time series of global coverage multisensor surface biology datasets (Maritorena et al. 2010). These successes and the classification by the Group on Earth Observations (GEO) of a number of parameters discernable from space as essential climate variables prompted ESA to start their Climate Change Initiative (CCI) projects. There are currently 14 different CCI projects. For those interested in air–sea gas fluxes, arguably the most interesting of these projects is the Sea Surface Temperature CCI project, which has provided a 20-plus-year time series of global coverage multisensor sea surface skin and subskin temperature data (Merchant et al. 2012). Unfortunately, the resources required to download and exploit these large global and decadal datasets can limit their exploitation by the scientific community. The development of cloud technologies and storage provides a solution. Cloud computing can be defined as interconnected computing resources that can be easily scaled up (grown) or down (shrunk) while maintaining its capability or function, rather like a “cloud” in the atmosphere. These systems have a number of key features, such as a high level of redundancy (e.g., servers can be removed or upgraded without users noticing) and their scalable nature (e.g., they use standard hardware and software, allowing low-cost expansion of a cloud). Through cloud computing it becomes possible for users to easily remotely access and process large volumes of data and then simply download the results to their local desktop computers or laptops.
a. The OceanFlux Greenhouse Gases project
The OceanFlux Greenhouse Gases project was funded by the European Space Agency in 2011 to encourage the use of satellite Earth observation data for studying air–sea gas fluxes. To achieve this, the objectives of the project included the development and validation of novel gas flux Earth observation algorithms (Goddijn-Murphy et al. 2013, 2012) and scientific analyses (Land et al. 2013). Another objective was to provide datasets and processing tools that can be used by the scientific community. Accordingly, the gas flux data processing tools, collectively named the “FluxEngine,” are described in this paper. The FluxEngine allows users to configure the flux parameterization and to select their chosen input data, and then it generates the resulting monthly global flux datasets. A plethora of climate study quality EO, in situ, and model data are available as input to the toolbox and to aid the interpretation of the resultant flux data (see Table 1). The outputs from the system are standard netCDF datasets that can be easily read into a number of third-party scientific software packages. The primary gas of interest for the OceanFlux Greenhouse Gases project was CO2. Therefore, the FluxEngine has been developed to aid the study of the air–sea flux of CO2, although as described later in the paper, the toolbox can also be used to support the study of other gases, such as N2O and dimethyl sulfide (DMS).
b. Air–sea flux calculations
The flux of CO2 between the atmosphere and the ocean (air–sea) is controlled by wind speed; sea state; sea surface temperature; surface processes, including any biological activity; and the difference in CO2 fugacity between the ocean and the atmosphere. The air-to-sea flux of CO2 (F, g m−2 s−1) is calculated using the gas transfer velocity k (m s−1), and the difference in CO2 concentration (g m−3) between the base [CO2AQW] and the top [CO2AQ0] of a thin (~10–250 μm) mass boundary layer at the sea surface:
The concentration of CO2 in seawater is the product of its solubility α (g m−3 μatm−1) and its fugacity fCO2 (μatm). As gas solubility is a function of salinity and temperature, it varies across the aqueous boundary layer. Hence, Eq. (1) now becomes
where the subscripts denote values in water (W), at the sea–air interface (S), and in air (A).
The CO2 concentration (and thus fugacity) is normally measured a few meters below the sea surface rather than at the surface. Variations in temperature at the sea surface (such as diurnal warming) will affect the fugacity via the carbonate reaction. For simplicity we can substitute partial pressure for fugacity because their values differ by <0.5% over the temperature range of interest (McGillis and Wanninkhof 2006). Equation (2) can therefore be alternatively represented as
It is also popular for Eqs. (2) and (3) to be collapsed into formulations that ignore the differences between the two solubilities, and just use the waterside solubility αW for both halves of the equation, resulting in
and this formulation is often referred to as the “bulk” parameterization. The airside partial pressure of CO2, pCO2A (μatm) can be calculated using the concentration of CO2 in dry air and air pressure using
Term X[CO2] is the molar fraction of CO2 in the dry atmosphere (expressed as the zonal mean in T09 and the FluxEngine), P is the air pressure (mb, expressed as a daily mean in the FluxEngine), and pH2O is the saturation vapor pressure (mb), which is defined in terms of sea surface temperature and salinity (Weiss and Price 1980) by
where SSTk is the sea surface temperature dataset of interest (K) and S is the salinity. For pCO2W the FluxEngine relies upon in situ pCO2W measurements (e.g., data from a buoy or ship) or an in situ–derived climatology of pCO2W data (e.g., T09).
2. The development of the FluxEngine
The following sections describe the design of the FluxEngine, the methods used for calculating the air–sea fluxes, the input datasets that are available, and the methods used to verify the implementation.
a. The FluxEngine design, input data, and implementation
Figure 1 shows a diagram of the main parts of the FluxEngine, conceptually showing the input data and the contents of the output files. The input data are on the left of the diagram, with the calculation in the middle and an overview of the contents of each output file shown on the right. The air–sea fluxes are calculated using monthly composite data and the generation of these input data is described below. The output file is a netCDF4 Climate Format (CF) 1.6 compliant file that contains >20 data layers. The data layers within each output file include the different components of the gas flux calculation, statistics of the input datasets (e.g., variance of the wind speed), and process indicator layers to aid interpretation of the fluxes. The process indicator layers include fixed masks (e.g., land, ocean basins, open ocean, and coastal classification), climatological data (e.g., persistent SST fronts), and other modeling or Earth observation datasets useful for interpreting the fluxes (e.g., chlorophyll-a concentrations and model-generated estimates of wave whitecapping). All output from the toolbox consists of monthly global coverage 1° × 1° spatial resolution data (360 × 180 arrays). Therefore, generation of gas fluxes for a complete year (12 months) requires 12 sets of data inputs (one set per month) and the FluxEngine then produces 12 output files. The air–sea flux calculation utility contains internal integrity checks for all of the output data layers. These integrity checks highlight if any data layers contain data outside of a predefined expected range. If a data value for a data layer falls outside its specified valid data range, then a count is added to the corresponding data element position in one of the process indicator data layers.
The inputs to the flux calculation (box 2 in Fig. 1) are monthly composite data. Table 1 shows all of the monthly datasets that are currently available for the FluxEngine to use as input. Where required, these monthly netCDF composite data were generated using the original daily data. For those datasets that were generated, each monthly composite file contains the mean (first-order moment), median, standard deviation, and the second-, third-, and fourth-order moments as calculated using one calendar month of data.
The FluxEngine was developed using license-free software tools and libraries. The flux calculation utility and supporting utilities use Python [version 2.7 (v2.7)], NumPy (v1.7.1), and the SciPy toolbox (v0.13.0). This source code comprises more than 6000 lines of code.
b. The computing platform
The FluxEngine is currently installed on the Centre ERS d'Archivage et de Traitement (CERSAT) Nephelae Cloud. This Linux-based cloud-computing platform provides a petascale storage capacity and distributed processing over more than 600 computing nodes. It was specifically designed by CERSAT for massive archive processing, for applications including data mining and multidecadal Earth observation data reprocessing. As with all cloud computing, it offers facilities for simple backup and restoration, high-speed data processing (i.e., the processing nodes are close to data, so there is no potential for input/output bottlenecks), the system can be tailored to a specific job (i.e., there is no reliance on physical hardware as it uses virtual servers), no specific skills are required to use it, and it maximizes the use of resources through dynamic reallocation. Data processing runs submitted by the user are scheduled and run on the cloud-computing nodes using a system developed by CERSAT called “GoGo list.” This is simply a wrapper enabling processing jobs to be executed and monitored on the cloud, and the use of GoGo list is completely invisible to the user. It must be noted though that the main reason for using the Nephelae Cloud here is to provide potential users with access to a large and continually growing satellite Earth observation dataset. The FluxEngine can also be installed and used on a desktop or laptop computer with no loss of performance.
c. Configurable options
The main flux calculation within the FluxEngine is user configurable through the use of a plain text ASCII configuration file. Within the configuration a user can choose the input datasets, the flux calculation model [choose between Eqs. (4) and (9)], and the gas transfer velocity parameterization. A range of different gas transfer parameterizations are available (e.g., McGillis et al. 2001; Nightingale et al. 2000; Wanninkhof 1992), including those based on sea state and surface roughness (Fangohr and Woolf 2007; Goddijn-Murphy et al. 2012). Through a generic formulation (see the appendix), it is also possible for users to use their own wind speed–based parameterization. Through the configuration file, the user can choose to inject random noise (normally distributed with specified mean and standard deviation) to any of the main input datasets (SST, U10, Hs, pCO2, or fCO2). In the same way, the user can choose to add a bias offset value to any of the main input datasets. This functionality allows the impact of known input data uncertainties (i.e., root-mean-square error and bias) to be propagated through to the final flux datasets, allowing the impact of these uncertainties to be quantified in terms of the gas fluxes. Example configuration files are included with the open source software.
d. Additional software tools
Three additional tools exist within the FluxEngine toolbox. These enable the calculation of integrated net fluxes, regridding of the output data, and cruise or buoy in situ data to be used as inputs. The integrated net flux tool can provide gross and net fluxes, mean values for each dataset within the output data, and estimates of the open ocean fluxes and the flux contribution from any missing data (such as that from coastal, shelf, and enclosed seas). The regridding tool converts the 1° × 1° data to the 5° × 4° grid used by the T09 climatological dataset. This allows users to easily compare the FluxEngine output with that of the T09 climatology. The in situ to netCDF conversion tool converts sparse in situ data into a spatially and temporally binned or gridded (1° × 1° grid) format. For example, the tool allows in situ pCO2 data to be used as input to the FluxEngine. Further details on all of the toolbox software components can be found in the appendix of this paper.
3. Data quality and verification of the calculations
The widely used T09 climatology dataset provides an ideal benchmark for verifying the operation and output of the FluxEngine, as it contains both the air–sea flux estimates and the input values used to calculate these fluxes. Therefore, the T09 air–sea flux data were first used to verify the FluxEngine integrated net flux tool, and then the main T09 input fields were used to verify the combination of the FluxEngine flux calculation and the integrated net flux tool.
To verify the integrated net flux tool, the year 2000 T09 climatology data were linearly interpolated to a 1° × 1° grid and then provided as the input to the tool. The integrated net flux tool used the T09 ice normalization (see the appendix). The resultant net air–sea CO2 flux for the open ocean region was −1.39 GtC yr−1 with an additional −0.17 GtC yr−1 attributed to missing data, large lakes, the Mediterranean Sea, and coastal and shelf seas, giving a total of −1.56 GtC yr−1. The difference between this global open ocean result (as calculated using the linearly interpolated data and the net flux tool) and that stated in the original publication is <1%.
The T09 climatology data of SST, XCO2, U10, pCO2w, air pressure, and percentage ice cover (linearly interpolated to a 1° × 1° grid) were then used as input to the FluxEngine. The main flux utility was configured to use Eq. (5) to calculate the air–sea fluxes. Figure 2a shows the resultant daily mean air–sea flux map. The projection and scale bar have been chosen to allow easy comparison with Fig. 13 in T09. The resultant flux outputs were then compared with the corresponding linearly interpolated T09 fluxes. The following is true for all monthly outputs: the pCO2W data were identical to six decimal places; 2% of the pCO2A data elements were >± 0.01% different; 2% of the α data elements were >± 3% different; and 2% of the k data elements were >± 2% different. These small differences within the calculations collectively results in 5%–12% (dependent upon the month; see Fig. 3) of the air–sea flux F data elements being >± 5% different. The differences in these data fields between the output and that of the original linearly interpolated T09 dataset are likely to be a combination of (i) minor rounding differences in the flux calculations (e.g., due to differences in precision between the FluxEngine calculations and those of the original T09 publication) and (ii) interpolation issues at boundaries. Figure 3 shows that the majority of the differences between the two air–sea flux datasets correspond to the sea ice boundary at high latitudes. Since we are comparing a linearly interpolated air–sea flux (originally on a 4° × 5° grid) with the output of 1° × 1° calculation (where all inputs are also 1° × 1°), differences at these boundaries are expected. Using the integrated net flux utility on the output, the annual net integrated flux for 2000 was −1.33 GtC yr−1 with an additional value of −0.16 GtC yr−1 contribution from missing data, large lakes, the Mediterranean Sea, and coastal and shelf seas, giving a total of −1.49 GtC yr−1. This final result is within 5% of that derived from the original 1° × 1° linearly interpolated T09 data. Figure 2b shows the annual mean air–sea CO2 flux and the monthly net integrated fluxes for the four main oceanic basins.
4. Scientific application
This section illustrates how the FluxEngine and resultant data can be used to study global and regional air–sea gas fluxes.
a. Global analyses and the Southern Ocean
A time series global analysis was performed to demonstrate the use of the FluxEngine. The linearly interpolated (to 1° × 1°) T09 climatology data of X[CO2], pCO2w (and associated SST), and Earth observation SST (SST skin) and U10 data, NCEP Climate Forecast System Reanalysis (CFSR) air pressure, and SSM/I percentage ice cover were used as the input to the FluxEngine (please see Table 1 for dataset specifics).
When the FluxEngine is configured to use the T09 ,pCO2w data and the chosen SST dataset is not from T09 (either due to studying a different year than the original T09 year 2000, or due to the selection of a different SST dataset), the pCO2w data need to be reanalyzed to be consistent with this new SST dataset. Therefore, following previous studies (Fangohr et al. 2008; Kettle and Merchant 2005; Land et al. 2013), the FluxEngine reanalyses the T09 ,pCO2W data to correct them to the chosen SST datasets using the relationship provided by T09:
where TC is the original temperature dataset (°C) and the subscript SSTC denotes the sea surface temperature (°C).
The FluxEngine was run for years 1995–2009 using Eq. (5) for the flux calculations and the gas transfer velocity of Nightingale et al. (2000). To generate an estimate of the uncertainties in the flux estimate due to the known uncertainties in the U10, pCO2W, and SST input data, the FluxEngine time series processing was repeated with injected random noise enabled. The system assumes a 1.5 μatm yr−1 change in the atmospheric and oceanic partial pressures of CO2 relative to the year 2000 of the original T09 data. Therefore, the partial pressure difference is imposed to have no interannual variation or trend, and therefore differences between years are due to other factors. Integrated net fluxes were then calculated using the T09 ice normalization. As an example output, Fig. 4 shows the February and August monthly air–sea CO2 flux maps for year 2000 and Fig. 5 shows the resulting year 2000 daily mean air–sea CO2 flux. The integrated net flux for year 2000 was −1.46 PgC yr−1. The annual global integrated net flux across all years falls in the range of −1.63 to −1.00 PgC yr−1 with a mean and standard deviation (interannual variability) of −1.22 ± 0.21 PgCyr−1. These results are within the annual climatological global estimate estimated by T09 and fall within the range of recent estimates (Le Quéré et al. 2014). The time series of air–sea fluxes for the global oceans and the four main oceanic regions (Atlantic, Pacific, Southern, and Indian) can be seen in Fig. 6, and the International Hydrographic Office (IHO) oceanic region definitions are shown in Fig. 5c. The gray-shaded areas in Fig. 6 are the uncertainties in the global air–sea flux estimates based on the known uncertainties of the U10, pCO2W, and SST input data. The largest variability in the oceanic sink in the time series is during periods of large ENSO variation, 1997–2000 (strong positive ENSO followed by a strong negative ENSO; i.e., El Niño followed by La Niña) and 2007–08 (positive ENSO followed by negative ENSO), and Fig. 6 suggests that a large part of this variability is driven by the air–sea CO2 flux in the Pacific Ocean. The mean and standard deviation (interannual variability) of the air–sea fluxes for the Atlantic, Pacific, Indian, and Southern Oceans were −0.42 ± 0.05, −0.28 ± 0.15, −0.30 ± 0.05, −0.04 ± 0.02 PgC yr−1, respectively. These values are all comparable to their equivalent values in T09, although it must be noted that the ocean definitions in T09 vary slightly from those of the IHO. Clearly, the Southern Ocean contributes a relatively small amount to the net global flux, and the standard deviation shows that the Southern Ocean flux has a low temporal variability in comparison to the other oceanic basins. It is interesting to note that the T09 definition of the Southern Ocean extends farther north by 10° than the IHO definition and that this additional area encompasses regions of persistent ocean currents (Fig. 5b). Recalculating the net air–sea CO2 flux using the T09 Southern Ocean definition reduces the estimate of the temporal variability further, producing a net air–sea flux of −0.04 ± 0.01 Pg yr−1.
b. European shelf seas
Despite their relatively small area, accounting for just ~5% of the World Ocean’s surface, shelf seas play an important part in the global carbon cycle and in buffering human impacts on marine systems. These regions have a disproportionately large role in primary and new production, remineralization, and the sedimentation of organic matter [Chen et al. (2013) and the references therein]. The high biological activity in these regions can result in considerable drawdown of atmospheric CO2, with the potential for the carbon to be exported to the deep ocean. A recent study (Chen and Borges 2009) has estimated that 29% of the global air-to-sea CO2 flux occurs in shelf seas. The North Sea is considered to be a sink for atmospheric CO2 (e.g., Frigstad et al. 2011; Thomas et al. 2004), but the ability of the entire northwest European shelf to act as a sink of CO2, and the variability of this sink, are areas of active research. Assuming negligible net burial rates of carbon in shelf sediments (de Haas et al. 2002), the net off-shelf carbon export will equal the region’s net air–sea CO2 exchange. Therefore, estimating the net air–sea CO2 exchange in the European shelf seas can help quantify the carbon export from this shelf sea.
The global time series data described in section 4a were used to study the air–sea CO2 fluxes in the European shelf seas. Four different bathymetry-based definitions of the northwestern European shelf seas [<1000, <500, <200, and <200 m plus the Norwegian Trench (NT)] were generated using Python and the General Bathymetric Chart of the Oceans, GEBCO_08 grid, and were used for calculating the net flux. Figure 7 shows the estimated net sink for each year and each region definition. The northwest European shelf sea-integrated net flux across all years and definitions falls in the range of −10.1 to −23.7 TgC yr−1 (Table 2). The limits of this range are set by using the 200 m (−10.1 TgC yr−1) and 1000 m (−23.7 TgC yr−1) bathymetry masks. The air–sea fluxes generated using the 200- and 1000-m masks differ by 13%–14%, assuming the 200-m flux as the reference. Despite the relatively course spatial resolution of these data, these estimates are comparable to previous in situ–based studies. A recent review and assimilation of the published literature on shelf sea and coastal air–sea fluxes from in situ data (Chen et al. 2013) estimates the northwest European shelf net air–sea CO2 flux to be −13.88 TgC yr−1 (for an unknown year), which they estimate to be ~4% of the global net flux due to estuaries and shelves. A recent modeling study (Wakelin et al. 2012) estimates the European shelf average long-term net air–sea CO2 flux to be −39.6 TgC yr−1 based on a 16-yr average from a hydrodynamic-ecosystem model. There are differences in the definition of the northwestern European shelf between all of these studies, and the results in Table 2 illustrate that a precise definition is required. One possible reason for the differences between the model estimates and those presented here is likely to be the relatively coarse near-surface vertical resolution of the model. Each model surface grid cell will represent a volume of water that is typically between 0.1 and 2 m deep, dependent upon the underlying bathymetry (Shutler et al. 2011). This means that the model is unlikely to be able to resolve any near-surface temperature gradients. In contrast the FluxEngine-derived fluxes presented here use satellite SST skin measurements that are observations of the temperature within a thin layer (500 μm) at the waterside of the air–sea interface (Donlon et al. 2002). The concentration of CO2 is highly temperature dependent, so the lack of near-surface vertical resolution within the model could have a large impact on the calculated CO2 air–sea gas fluxes.
c. Underway in situ data
To demonstrate the flexibility of the system, the air–sea fluxes for a research cruise in the central equatorial Atlantic were calculated using the FluxEngine. In situ fCO2 and associated SST data were downloaded from the community Surface Ocean CO2 Atlas (SOCAT) website (v1.5, tropical Atlantic group) and then preprocessed into the format required by the FluxEngine (using the in situ tool; see the appendix). The FluxEngine was then used to calculate the air–sea CO2 fluxes using the in situ fCO2 data and the temporally and spatially corresponding EO data (the EO data sources are the same as used in sections 4a and 4b). Figure 8a shows the resultant gridded fCO2 data for all SOCAT, v1.5, tropical Atlantic in situ fCO2 data, and Fig. 8b shows the gridded in situ fCO2 data from a single cruise (Bakker 2014) in the tropical Atlantic. Figure 8c shows Earth observation–derived mean gas transfer velocity for October 2000, and Fig. 8d shows the resultant air–sea CO2 fluxes from using Fig. 8b as the input fCO2 data. The fluxes can be seen to vary along the cruise track between a source (positive) and a sink (negative) of CO2. The missing gas transfer velocity data (causing a hiatus in the estimates of air–sea fluxes) are due to missing data in the Earth observation sea surface temperature dataset.
The FluxEngine was run on a single cloud node (2.4-GHz Intel Xeon, 3 GB of memory), and the time taken to process a single-year dataset was determined. All input datasets were quality filtered and converted to a suitable netCDF format in advance of this analysis. A 1-yr climatology of fluxes at 1° × 1° spatial resolution took 40 min (total process time) to complete. Disabling the process indicator layer output reduced this time to 25 min. Running a 5-yr time series (with the indicator layers off) on the single cloud node took 2 h [i.e., ~25 × 5 yr (60 min)−1 total process time], whereas repeating the same 5-yr time series processing across five nodes using GoGo list took 25 min (as each year was executed on an independent node and so executed simultaneously). Calculating the integrated net fluxes from the 1-yr climatology netCDF files took 10 min (total process time). So, the total end-to-end time for calculating a 1-yr climatology and then calculating the integrated net fluxes using one node (with all indicator layers turned on) was 50 min.
5. Future developments
The FluxEngine has been developed to allow the study of CO2 and the current version uses the T09 climatology pCO2W data for the waterside component of the calculation. To increase the versatility of the system, work is ongoing to extend the toolbox to use the community SOCAT datasets (e.g., Pfeil et al. 2013; Bakker et al. 2014). There are many other climatically important gases, including nitrous oxide (N2O) and methane (CH4). Partial support for these gases exists within the toolbox, as N2O, CH4, and CO2 are all poorly soluble gases, so their gas transfer velocity parameterizations can be considered as interchangeable with that for CO2. Therefore, the toolbox can be used to generate maps of gas transfer velocity to enable air–sea fluxes of N2O and CH4 to be studied. Similarly, k for the gas DMS can be considered to be the direct component (i.e., nonbubble component) of a k CO2 parameterization, and the toolbox already includes two methods for deriving this direct component. The generic nature of the gas calculation and parameterization lends itself to being extended to determine air–sea fluxes for other gases. For each additional gas, a gas transfer velocity parameterization and an in situ dataset or climatology of in-water concentrations, partial pressures or fugacity, and suitable solubility equations are required. After CO2 the next largest gas climatology (in terms of the number of in situ data points) is the Lana et al. (2010) DMS climatology, and so future extensions of the FluxEngine are likely to include the addition of a DMS capability through exploiting published work (e.g., Goddijn-Murphy et al. 2012; Johnson 2010; Lana et al. 2011). This will involve exploiting (or linking in) the open source code of Johnson (2010), which can be used to calculate gas transfer velocities. The full capability to calculate air–sea fluxes for N2O, CH4, and other gases is not currently possible as the in-water data collections for these gases are still in their infancy, but efforts have begun to collate such datasets, for example, the Marine Methane and Nitrous Oxide (MEMENTO) database (Bange 2006).
The toolbox currently uses climatological salinity data from the T09 climatology. Recent advancements in satellite Earth observation have seen the launch of two sensors that can measure surface salinity from space. These are NASA’s Aquarius and ESA’s Soil Moisture Ocean Salinity (SMOS) missions, and future work will enable SMOS salinity data to be used within the toolbox. A web interface is also being developed that will enable users to create configuration files, execute processing, and download the resulting output.
A flexible air–sea CO2 data processing toolbox called the FluxEngine has been developed and presented. The flux calculation itself is user configurable, and the outputs have been extensively evaluated and compared with reference datasets. No specialist knowledge is required to use the toolbox, and it is based on standard software tools and packages that require no licenses. It is currently installed and running on the Nephelae Cloud at the Insitut Francais de Recherche pour l´Exploitation de la Mer (Ifremer), where >8 TB of climate-quality data can be used as input to the flux calculations. The use of cloud-computing approaches means that the data processing is scalable, and this feature is completely transparent to the user. Here we have used the toolbox to estimate the 15-yr-average net air–sea flux of CO2 for the global oceans (including shelf seas and coastal zones), the four main oceanic basins, and the European shelf seas. We have shown how subtle differences in the definitions of the European shelf seas can cause differences of >10% in the calculated annual net fluxes. Similarly differences in the Southern Ocean definition can have an impact on the calculated air–sea flux. We therefore urge the scientific community to use a common set of oceanic region definitions, to allow the outputs from differing studies to be easily compared and contrasted. The FluxEngine provides a mechanism for this and its open source nature allows the scientific community to freely exploit the toolbox. It is hoped that the FluxEngine will help to improve the transparency and traceability of results from air–sea gas flux studies. The FluxEngine was originally developed for the ESA OceanFlux Greenhouse Gases project, and it is currently being used to produce air–sea gas flux climatologies and to study method and data uncertainties. Users can access the version of the toolbox installed on the Nephalae Cloud on the OceanFlux Greenhouse Gases project website (http://www.oceanflux-ghg.org); alternatively, the open-source software is available in GitHub (https://github.com/oceanflux-ghg/FluxEngine).
This work was funded by the European Space Agency (ESA) Support to Science Element (STSE) through the OceanFlux Greenhouse Gases project (Contract 4000104762/11/I-AM) and the U.K. NERC Carbon and Nutrient Dynamics and Fluxes over Shelf Systems (CANDYFLOSS) project (Contract NE/K002058/1). The Surface Ocean CO2 Atlas (SOCAT) is an international effort, supported by the International Ocean Carbon Coordination Project (IOCCP), the Surface Ocean Lower Atmosphere Study (SOLAS), and the Integrated Marine Biogeochemistry and Ecosystem Research program (IMBER), to deliver a uniformly quality-controlled surface ocean CO2 database. The many researchers and funding agencies responsible for the collection of data and quality control are thanked for their contributions to SOCAT.
The FluxEngine Tools, Data, and Generic Gas Transfer Parameterization
Table A1 lists the software utilities that are available in the FluxEngine toolbox. Specific details on some of the tools are given in the following sections.
The utility calculates integrated net fluxes (FIN) over a given region from the monthly mean flux and ice cover data as follows. The input netCDF files are assumed to contain monthly mean net flux F, ice cover, gas transfer velocity, in-water CO2 concentration, and interfacial CO2 concentration. A high-spatial-resolution land mask is required and a region definition file if nonglobal regions are to be analyzed. The four main oceanic regions (Atlantic, Pacific, Indian, and Southern Oceans), as defined by the IHO, are provided within the monthly netCDF output (within the process indicator data layers), so these can be used as the region definitions for this utility if desired. A high-spatial-resolution land mask is also provided within the toolbox and its use is described below.
The method for the integrated net flux tool is now described. At each pixel or data element, the integrated net flux is initially calculated from F and the pixel’s total area, which is calculated assuming the earth to be an oblate spheroid. Next, we need to account for the effect of ice, and two methods of ice normalization are available. The first is from T09, which specifies that if ice cover within a data element is <10%, then it has a negligible effect on the integrated net flux and so the ice cover value is assumed to be 0%. Data elements with ice cover >90% are set to 90% to account for leads, polynyas, etc. The net flux for each data element where the ice cover is >0% is then reduced linearly by the percentage of ice cover value. The second ice normalization method (Loose et al. 2009) posits that the integrated net flux from partially ice-covered regions is greater than would be expected from proportionality with ice-free fraction. Here, the integrated net flux is proportional to (ice-free fraction)0.4. This implies that the integrated net flux from a pixel with 90% ice cover (the maximum assumed by T09) is actually responsible for 40% of the ice-free integrated net flux. This results in a quadrupling of the estimated net flux from regions of fast ice over that assumed by the T09 method. Users are able to choose which method they prefer to use.
The data are at a relatively coarse spatial resolution of 1° × 1° (which at the equator is ~111 km × ~111 km). Therefore, to calculate the contribution of each data element to the regional or global integrated net flux, we need to know the proportion of ocean (whether ice covered or not) actually contained within each data element. For most elements this will be 1 (open ocean or sea ice) or 0 (land), but it may be intermediate in data elements that cover region boundaries (e.g., either oceanic boundaries or coastal regions). The oceanic (nonland) proportion of the pixel is multiplied by the ice-corrected integrated net flux to give the contribution to the regional or global integrated net flux.
Where F data are missing (termed the missing integrated net flux), a first-order correction is made in order not to underestimate the regional net flux. As F is calculated from remotely sensed SST and wind speed, areas with significant ice cover, persistent cloud cover, or some coastal regions are likely to have missing F. We sum the ice-corrected ocean area of such data elements contained within the region of interest and use the areas with measured F to calculate a regional average F, which is then multiplied by the missing area to give an estimate of the integrated net flux from the missing regions. This is added to the integrated net flux to give an estimate of the total regional integrated net flux. Note that the flux from individual regions may not exactly sum to the global region flux using this method, since in one case the regional mean flux is used to estimate the missing flux, while in the other case the global mean flux is used.
The net flux tool also provides gross fluxes and the average values for all spatially varying variables within the netCDF files. The gas transfer velocity k, in-water CO2 concentration ([CO2AQW]), and interfacial CO2 concentration ([CO2AQ0]) are used to calculate the upward and downward integrated gross fluxes. The upward gross flux, FUG, is defined as k[CO2AQW] and the downward gross flux, FDG, is defined as k[CO2AQ0]. Missing data are treated in the same way for these calculations as for integrated net flux.
When using the net flux utility, each selected region has its own output data file. Data are output for each month of each year for which input data are supplied, and the annual totals are provided for each year. Net flux outputs include FIN (integrated net flux based on calendar days), missing integrated net flux, and integrated net flux assuming a 30.5-day month, along with similar values for FUG and FDG.
This tool calculates the mean for each 5° × 4° grid cell using the corresponding 1° × 1° grid cells in the input data. No correction due to variations in area (between the 5° × 4° and the 1° × 1° grids) due to interactions with land is considered and any missing (masked) 1° × 1° data are not used in the calculation. If all data in the 1° × 1° grid cells are masked (missing values), then the output 5° × 4° grid cell is also masked. The tool can output the regridded fields as netCDF or as a single ASCII comma-separated variable (CSV) file. The output file will simply reflect the contents of the input netCDF with each data array (2D dataset) replaced by its 5° × 4° equivalent, while the optional CSV file contains columns corresponding to latitude, longitude, and all 2D datasets found in the input file.
This utility assumes that the input data are in CSV format, with headings that include “latitude,” “longitude,” and optionally “date” or “date/time” on the first line, and corresponding values on the subsequent lines. Latitude and longitude must be in decimal degrees, while date or date/time must follow the format DD/MM/(YY)YY [hh:mm(:ss)], where parentheses indicate options and DD is the two-digit day number within the month, MM is the two-digit month number (starting at 01) and (YY)YY is the two- or four-digit year. The resulting output data are in netCDF format, with latitude and longitude limits copied from a reference global netCDF file that is passed as one of the inputs. The output netCDF includes a dataset “count” containing the number of observations found in each grid cell. The user can optionally specify a startTime and endTime (in the same range of formats as date) and then the tool will only count observations on or after startTime and/or before endTime. If the end time is only a date with no time specified, it is assumed to be inclusive; that is, the end time is the end of the specified day. Hence, using the options of startTime “01/01/08” and endTime “31/01/08” would only count observations in January 2008. Any data are binned into each grid cell and no interpolation of the data is performed. The “time” value in the netCDF is set as follows: If any data contain date/time information, then the time is set to the midpoint of the range of valid times (i.e., times corresponding to data that have been counted). Otherwise, if startTime and endTime are both set, time is set to the midpoint of these. In all other cases, time is taken from the reference netCDF.
b. Input data currently available
Table 1 gives an overview of the available input datasets.
c. Generic gas transfer parameterization
Several published gas transfer velocity parameterizations (e.g., as used in T09) are of the form
The toolbox provides a more generic polynomial expression that enables a large range of different wind speed–based gas transfer velocity parameterizations to be used (Nightingale et al. 2000; Wanninkhof et al. 2009), for example,
This parameterization allows users to exploit their own wind speed–based gas transfer relationship.
This article is licensed under a Creative Commons Attribution 4.0 license.