NCEI Water-Column Sonar Data Archive on AWS
Last updated
Last updated
Water column sonar data, the acoustic back-scatter from the near-surface to the seafloor, are used to assess physical and biological characteristics of the ocean including the spatial distribution of plankton, fish, methane seeps, and underwater oil plumes.
In collaboration with NOAA's National Marine Fisheries Service () and the University of Colorado Boulder, NOAA’s National Centers for Environmental Information () established a . This project entails ensuring the long-term stewardship of well-documented water column sonar data, and enabling discovery and access to researchers and the public around the world.
Data providers include NOAA National Marine Fisheries Service (), NOAA Office of Ocean Exploration and Research (), NOAA National Ocean Service (), Rolling Deck to Repository (), U.S. academic and private institutions, and international groups.
This data set comprises the water-column sonar data archived at NCEI in a more readily accessible media. Data provided to NCEI are in their raw format. Processing routines are being applied to a subset of the archive, specifically focusing on Simrad EK60 single and multiple frequency datasets. Ping alignment, noise removal algorithms (De Robertis & Higgenbottom, 2007; Ryan et al., 2015), and bottom detection algorithms are applied to the raw data binned into one hour intervals using Echoview (Myriax, v.10). The processed data are exported as a CSV for each interval and each frequency.
Raw archived data were collected using a variety of vessel-mounted sonars with Kongsberg's EM 122 (12 kHz) and EM 302 (30 kHz), Simrad's (18-710 kHz, split beam), (70-120 kHz, can be split beam), and (18-710 kHz, split beam and broadband) being the most common. The configuration of each cruise's sonar system (e.g., beam type and angle) can be found in the file metadata.
File names contain the start time for that file, and often include a preceding tag for that cruise. The timestamp in UTC follows the convention: ‘D’YYYYMMDD’-T’hhmmss. For example, “SaKe_2013-D20130522-T134850”, indicates a files from a 2013 SaKe cruise and the start of the file is May 22, 2013 at 13:48:50 (UTC).
Data are categorized as raw or processed.
Data for EK60 data are the output of a Matlab-Echoview (v.10)-Matlab workflow*, collated by frequency, e.g., 18, 38, 70, 120, and 200 kHz. Within individual folders each cruise contains CSV files formatted with headers to describe the structure of the underlying data.
*Any use of trade names does not imply endorsement by NOAA
The raw EK60 data are processed with the routine below. This routine will be available in pyEcholab in 2020. Processed data are not available for all raw data. However, more will be added over time as it is created.
Removal of top 10 m of data due to bubble interference
If EK60 data contain multiple frequencies, preprocess with a 3x3 median convolution filter and apply multi-frequency single-beam imaging index outlined in Wall et al. (2016) using a threshold of -66 dB
Data are archived in an Amazon S3 bucket with access to the general public. The folder structure is outlined as follows:
For processed data: cruise → transducer frequency/bottom/multi-frequency single-beam imaging index → file
For raw data: ship → cruise → instrument → file
To download a 18 kHz frequency file from the SH1305 cruise such as "SaKe2013-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv" you can read directly from the URL as follows:
Raw and processed data are stored in the cloud on an Amazon Web Services S3 bucket and accessible for download using a variety of tools.
To download and cache a file while checking for exceptions:
There are several tutorials that will help you download the data and begin analysis. They utilize both raw and processed data.
Updated tutorials utilizing cloud-native Zarr data from NOAA's Open Data Dissemination can be found here:
De Robertis, A., & Higginbottom, I. (2007). A post-processing technique to estimate the signal-to-noise ratio and remove echosounder background noise. ICES Journal of Marine Science, 64(6): 1282-1291.
Ryan, T.E., Downie, R.A., Kloser, R.J., and Keith, G. (2015). Reducing bias due to noise and attenuation in open-ocean echo integration data. ICES Journal of Marine Science, 72(8): 2482-2493.
Binary files are generated during individual cruises. Users would typically use a tool such as to open the files and process the data into a more conventional format.
including impulse, attenuation, transient, and background noise
If the archived data are used in a future publication, please cite all used data sets to document and provide credit back to the data creators. Cruises have unique citations. See for details. Citation information can be found at the .
The library provides an object-oriented and well documented interface to the data set. We can configure the boto3 resource to access our bucket, "noaa-wcsd-pds" as an anonymous user using low-level functions from .
Plotting Raw EK60 Data
Frequency Differencing with Raw Data
Reading and Plotting Processed CSV Data
Reading and Plotting Raw Bottom Data
Echofish — Echopype EK60 Cloud Processing
Echofish — Frequency Differencing with L2 EK60 Data []
Echofish — Geospatial Indexing []
Simmonds, E.J. and MacLennan, D.N. 2005. . Blackwell Science, Oxford. 456pp.
Wall, C.C. (2016), Building an accessible archive for water column sonar data, Eos, 97, . Published on 15 August 2016.
Wall, C.C., Jech, J.M. and. McLean, S.J. (2016) Increasing the accessibility of acoustic data through global access and imagery, ICES Journal of Marine Science, 73(8): 2093–2103, DOI: .