NCEI Water-Column Sonar Data Archive on AWS
Water column sonar data, the acoustic back-scatter from the near-surface to the seafloor, are used to assess physical and biological characteristics of the ocean including the spatial distribution of plankton, fish, methane seeps, and underwater oil plumes.
In collaboration with NOAA's National Marine Fisheries Service (NMFS) and the University of Colorado Boulder, NOAA’s National Centers for Environmental Information (NCEI) established a national archive for water column sonar data. This project entails ensuring the long-term stewardship of well-documented water column sonar data, and enabling discovery and access to researchers and the public around the world.
This data set comprises the water-column sonar data archived at NCEI in a more readily accessible media. Data provided to NCEI are in their raw format. Processing routines are being applied to a subset of the archive, specifically focusing on Simrad EK60 single and multiple frequency datasets. Ping alignment, noise removal algorithms (De Robertis & Higgenbottom, 2007; Ryan et al., 2015), and bottom detection algorithms are applied to the raw data binned into one hour intervals using Echoview (Myriax, v.10). The processed data are exported as a CSV for each interval and each frequency.
Raw archived data were collected using a variety of vessel-mounted sonars with Kongsberg's EM 122 (12 kHz) and EM 302 (30 kHz), Simrad's EK60 (18-710 kHz, split beam), ME70 (70-120 kHz, can be split beam), and EK80 (18-710 kHz, split beam and broadband) being the most common. The configuration of each cruise's sonar system (e.g., beam type and angle) can be found in the file metadata.
File names contain the start time for that file, and often include a preceding tag for that cruise. The timestamp in UTC follows the convention: ‘D’YYYYMMDD’-T’hhmmss. For example, “SaKe_2013-D20130522-T134850”, indicates a files from a 2013 SaKe cruise and the start of the file is May 22, 2013 at 13:48:50 (UTC).
Data are categorized as raw or processed.
Binary files are generated during individual cruises. Users would typically use a tool such as pyEcholab to open the files and process the data into a more conventional format.
Data for EK60 data are the output of a Matlab-Echoview (v.10)-Matlab workflow*, collated by frequency, e.g., 18, 38, 70, 120, and 200 kHz. Within individual folders each cruise contains CSV files formatted with headers to describe the structure of the underlying data.
*Any use of trade names does not imply endorsement by NOAA
The raw EK60 data are processed with the routine below. This routine will be available in pyEcholab in 2020. Processed data are not available for all raw data. However, more will be added over time as it is created.
- Removal of top 10 m of data due to bubble interference
- If EK60 data contain multiple frequencies, preprocess with a 3x3 median convolution filter and apply multi-frequency single-beam imaging index outlined in Wall et al. (2016) using a threshold of -66 dB
Data are archived in an Amazon S3 bucket with access to the general public. The folder structure is outlined as follows:
- For processed data: cruise → transducer frequency/bottom/multi-frequency single-beam imaging index → file
- For raw data: ship → cruise → instrument → file
S3 Bucket: "noaa-wcsd-pds"
├── data
│ ├── processed
│ │ ├── SH1305
│ │ │ ├── 18kHz
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── 38kHz
│ │ │ │ ├── ...
│ │ │ ├── 70kHz
│ │ │ │ ├── ...
│ │ │ ├── 120kHz
│ │ │ │ ├── ...
│ │ │ ├── 200kHz
│ │ │ │ ├── ...
│ │ │ ├── bottom
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── multifrequency
│ │ │ │ ├── SaKe_2013-D20130522-T134850.csv
│ │ │ │ ├── SaKe_2013-D20130522-T140446_to_SaKe2013-D20130522-T145239.csv
│ │ │ │ ├── ...
│ │ │ ├── ...
│ │ ├── GU1002
│ │ │ ├── ...
│ │ ├── AL0502
│ │ │ ├── ...
│ │ ├── ...
│ ├── raw
│ │ ├── Bell_M_Shimada
│ │ │ ├── SH1305
│ │ │ │ ├── EK60
│ │ │ │ │ ├── SaKe_2013-D20130623-T063450.raw
│ │ │ │ │ ├── SaKe_2013-D20130623-T064452.raw
│ │ │ │ │ ├── SaKe_2013-D20130623-T064452.bot
│ │ │ │ │ ├── ...
│ │ ├── Gordon_Gunter
│ │ │ ├── GU1002
│ │ │ │ ├── EK60
│ │ │ │ │ ├── ...
│ │ ├── Albatross_IV
│ │ │ ├── AL0502
│ │ │ │ ├── EK60
│ │ │ │ │ ├── ...
│ │ │ │ ├── ...
│ │ ├── ...
To download a 18 kHz frequency file from the SH1305 cruise such as "SaKe2013-D20130523-T080854_to_SaKe2013-D20130523-T085643.csv" you can read directly from the URL as follows:
If the archived data are used in a future publication, please cite all used data sets to document and provide credit back to the data creators. Cruises have unique citations. See individual cruises for details. Citation information can be found at the NCEI water-column sonar data archive.
Raw and processed data are stored in the cloud on an Amazon Web Services S3 bucket and accessible for download using a variety of tools.
import boto3, botocore
from botocore import UNSIGNED
from botocore.client import Config
s3 = boto3.resource(
's3',
aws_access_key_id='',
aws_secret_access_key='',
config=Config(signature_version=UNSIGNED),
)
BUCKET = 'noaa-wcsd-pds'
To download and cache a file while checking for exceptions:
import os
# The object key is specified as:
# key = 'data/raw/Oscar_Dyson/DY1706/EK60/DY1706_EK60-D20170609-T005736.bot'
# The filename is:
file = 'DY1706_EK60-D20170609-T005736.bot'
try:
if file not in os.listdir('.'):
s3.Bucket(BUCKET).download_file('data/raw/Oscar_Dyson/DY1706/EK60/' + file, file)
print('downloaded:', file)
else:
print('already found:', file)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
There are several tutorials that will help you download the data and begin analysis. They utilize both raw and processed data.
- De Robertis, A., & Higginbottom, I. (2007). A post-processing technique to estimate the signal-to-noise ratio and remove echosounder background noise. ICES Journal of Marine Science, 64(6): 1282-1291.
- Ryan, T.E., Downie, R.A., Kloser, R.J., and Keith, G. (2015). Reducing bias due to noise and attenuation in open-ocean echo integration data. ICES Journal of Marine Science, 72(8): 2482-2493.
- Simmonds, E.J. and MacLennan, D.N. 2005. Fisheries Acoustics: Theory and practice. Blackwell Science, Oxford. 456pp.
- Wall, C.C. (2016), Building an accessible archive for water column sonar data, Eos, 97, https://doi.org/10.1029/2016EO057595. Published on 15 August 2016.
- Wall, C.C., Jech, J.M. and. McLean, S.J. (2016) Increasing the accessibility of acoustic data through global access and imagery, ICES Journal of Marine Science, 73(8): 2093–2103, DOI: https://doi.org/10.1093/icesjms/fsw014.
Contact [email protected] for support with the data set
Last modified 5d ago