Preprocessing

Python Boto3

The first step is to get raw data files. These can be accessed publicly using anonymous credentials:

import boto3, botocore
from botocore import UNSIGNED
from botocore.client import Config

s3 = boto3.resource(
    's3',
    aws_access_key_id='',
    aws_secret_access_key='',
    config=Config(signature_version=UNSIGNED),
)

BUCKET = 'ncei-wcsd-archive'

To read all the files in the bucket we source details from this tutorial:

to_download = []
for key in s3.Bucket(BUCKET).objects.all():
    if key.key.find('data/processed/SH1305/18kHz/SaKe') > 0:
        to_download.append(key.key)

for KEY in to_download:
    try:
        temp_filename = KEY.split('/')[-1]
        s3.Bucket(BUCKET).download_file(KEY, temp_filename)
    except botocore.exceptions.ClientError as e:
        if e.response['Error']['Code'] == "404":
            print("The object does not exist.")
        else:
            raise

AWS Command Line Interface

Alternatively you can also just copy an entire directory of files using the AWS command line interface. The following downloads all of the 18 kHz data from the SH1305 cruise. Note that the size of the download is about 60 GB.

aws s3 sync s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/ .

Last updated

Was this helpful?