NCEI Water-Column Sonar Data Archive on AWS

Background

Water column sonar data, the acoustic back-scatter from the near-surface to the seafloor, are used to assess physical and biological characteristics of the ocean including the spatial distribution of plankton, fish, methane seeps, and underwater oil plumes.

In collaboration with NOAA's National Marine Fisheries Service (NMFS) and the University of Colorado Boulder, NOAA’s National Centers for Environmental Information (NCEI) established a national archive for water column sonar data. This project entails ensuring the long-term stewardship of well-documented water column sonar data, and enabling discovery and access to researchers and the public around the world.

Data providers include NOAA National Marine Fisheries Service (NMFS), NOAA Office of Ocean Exploration and Research (OER), NOAA National Ocean Service (NOS), Rolling Deck to Repository (R2R), U.S. academic and private institutions, and international groups.

This data set comprises the water-column sonar data archived at NCEI in a more readily accessible media. Data provided to NCEI are in their raw format. Processing routines are being applied to a subset of the archive, specifically focusing on Simrad EK60 single and multiple frequency datasets. Ping alignment, noise removal algorithms (De Robertis & Higgenbottom, 2007; Ryan et al., 2015), and bottom detection algorithms are applied to the raw data binned into one hour intervals using Echoview (Myriax, v.10). The processed data are exported as a CSV for each interval and each frequency.

Additional Resources

Data

Raw archived data were collected using a variety of vessel-mounted sonars with Kongsberg's EM 122 (12 kHz) and EM 302 (30 kHz), Simrad's EK60 (18-710 kHz, split beam), ME70 (70-120 kHz, can be split beam), and EK80 (18-710 kHz, split beam and broadband) being the most common. The configuration of each cruise's sonar system (e.g., beam type and angle) can be found in the file metadata.

File names contain the start time for that file, and often include a preceding tag for that cruise. The timestamp in UTC follows the convention: ‘D’YYYYMMDD’-T’hhmmss. For example, “SaKe_2013-D20130522-T134850”, indicates a files from a 2013 SaKe cruise and the start of the file is May 22, 2013 at 13:48:50 (UTC).

Type

Data are categorized as raw or processed.

Raw

Binary files are generated during individual cruises. Users would typically use a tool such as pyEcholab to open the files and process the data into a more conventional format.

Processed

Data for EK60 data are the output of a Matlab-Echoview (v.10)-Matlab workflow*, collated by frequency, e.g., 18, 38, 70, 120, and 200 kHz. Within individual folders each cruise contains CSV files formatted with headers to describe the structure of the underlying data.

*Any use of trade names does not imply endorsement by NOAA

Data Details

The raw EK60 data are processed with the routine below. This routine will be available in pyEcholab in 2020. Processed data are not available for all raw data. However, more will be added over time as it is created.

  • ​Match ping times​

  • ​Seafloor detection​

  • ​Noise removal including impulse, attenuation, transient, and background noise

  • ​Re-sample by pings​

  • Removal of top 10 m of data due to bubble interference

  • If EK60 data contain multiple frequencies, preprocess with a 3x3 median convolution filter and apply multi-frequency single-beam imaging index outlined in Wall et al. (2016) using a threshold of -66 dB

Structure

Data are archived in an Amazon S3 bucket with access to the general public. The folder structure is outlined as follows:

  • For processed data: cruise → transducer frequency/bottom/multi-frequency single-beam imaging index → file

  • For raw data: ship → cruise → instrument → file

S3 Bucket: "ncei-wcsd-archive"
├── data
│ ├── processed
│ │ ├── SH1305
│ │ │ ├── 18kHz
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── 38kHz
│ │ │ │ ├── ...
│ │ │ ├── 70kHz
│ │ │ │ ├── ...
│ │ │ ├── 120kHz
│ │ │ │ ├── ...
│ │ │ ├── 200kHz
│ │ │ │ ├── ...
│ │ │ ├── bottom
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── multifrequency
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── ...
│ │ ├── GU1002
│ │ │ ├── ...
│ │ ├── AL0502
│ │ │ ├── ...
│ │ ├── ...
│ ├── raw
│ │ ├── Bell_M_Shimada
│ │ │ ├── SH1305
│ │ │ │ ├── EK60
│ │ │ │ │ ├── SaKe_2013-D20130623-T063450.raw
│ │ │ │ │ ├── SaKe_2013-D20130623-T064452.raw
│ │ │ │ │ ├── SaKe_2013-D20130623-T064452.bot
│ │ │ │ │ ├── ...
│ │ ├── Gordon_Gunter
│ │ │ ├── GU1002
│ │ │ │ ├── EK60
│ │ │ │ │ ├── ...
│ │ ├── Albatross_IV
│ │ │ ├── AL0502
│ │ │ │ ├── EK60
│ │ │ │ │ ├── ...
│ │ │ │ ├── ...
│ │ ├── ...

To download a 18 kHz frequency file from the SH1305 cruise such as "SaKe2013-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv" you can read directly from the URL as follows:

​https://ncei-wcsd-archive.s3-us-west-2.amazonaws.com/data/processed/SH1305/18kHz/SaKe2013-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv​

Citation

If the archived data are used in a future publication, please cite all used data sets to document and provide credit back to the data creators. Cruises have unique citations. See individual cruises for details. Citation information can be found at the NCEI water-column sonar data archive.

Access

Raw and processed data are stored in the cloud on an Amazon Web Services S3 bucket and accessible for download using a variety of tools.

The library boto3 provides an object-oriented and well documented interface to the data set. We can configure the boto3 resource to access our bucket, "ncei-wcsd-archive" as an anonymous user using low-level functions from botocore.

import boto3, botocore
from botocore import UNSIGNED
from botocore.client import Config
​
s3 = boto3.resource(
's3',
aws_access_key_id='',
aws_secret_access_key='',
config=Config(signature_version=UNSIGNED),
)
​
BUCKET = 'ncei-wcsd-archive'

To download and cache a file while checking for exceptions:

import os
​
# The object key is specified as:
# key = 'data/raw/Oscar_Dyson/DY1706/EK60/DY1706_EK60-D20170609-T005736.bot'
​
# The filename is:
file = 'DY1706_EK60-D20170609-T005736.bot'
​
try:
if file not in os.listdir('.'):
s3.Bucket(BUCKET).download_file('data/raw/Oscar_Dyson/DY1706/EK60/' + file, file)
print('downloaded:', file)
else:
print('already found:', file)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise

To read and print all objects currently in the bucket (use CTRL+C to stop output):

for key in s3.Bucket(BUCKET).objects.all():
print(key.key)

Tutorials

There are several tutorials that will help you download the data and begin analysis. They utilize both raw and processed data.

Plotting Raw EK60 Data [EK60 Jupyter Notebook]​

Frequency Differencing with Raw Data [Frequency Jupyter Notebook]​

Reading and Plotting Processed CSV Data [CSV Jupyter Notebook]​

Reading and Plotting Raw Bottom Data [Bottom Jupyter Notebook]​

References

  • De Robertis, A., & Higginbottom, I. (2007). A post-processing technique to estimate the signal-to-noise ratio and remove echosounder background noise. ICES Journal of Marine Science, 64(6): 1282-1291.

  • Ryan, T.E., Downie, R.A., Kloser, R.J., and Keith, G. (2015). Reducing bias due to noise and attenuation in open-ocean echo integration data. ICES Journal of Marine Science, 72(8): 2482-2493.

  • Simmonds, E.J. and MacLennan, D.N. 2005. Fisheries Acoustics: Theory and practice. Blackwell Science, Oxford. 456pp.

  • Wall, C.C. (2016), Building an accessible archive for water column sonar data, Eos, 97, https://doi.org/10.1029/2016EO057595. Published on 15 August 2016.

  • Wall, C.C., Jech, J.M. and. McLean, S.J. (2016) Increasing the accessibility of acoustic data through global access and imagery, ICES Journal of Marine Science, 73(8): 2093–2103, DOI: https://doi.org/10.1093/icesjms/fsw014.

Contact [email protected] for support with the data set