Preprocessing
Python Boto3
The first step is to get raw data files. These can be accessed publicly using anonymous credentials:
import boto3, botocore
from botocore import UNSIGNED
from botocore.client import Config
s3 = boto3.resource(
's3',
aws_access_key_id='',
aws_secret_access_key='',
config=Config(signature_version=UNSIGNED),
)
BUCKET = 'ncei-wcsd-archive'
To read all the files in the bucket we source details from this tutorial:
to_download = []
for key in s3.Bucket(BUCKET).objects.all():
if key.key.find('data/processed/SH1305/18kHz/SaKe') > 0:
to_download.append(key.key)
for KEY in to_download:
try:
temp_filename = KEY.split('/')[-1]
s3.Bucket(BUCKET).download_file(KEY, temp_filename)
except botocore.exceptions.ClientError as e:
if e.response['Error']['Code'] == "404":
print("The object does not exist.")
else:
raise
AWS Command Line Interface
Alternatively you can also just copy an entire directory of files using the AWS command line interface. The following downloads all of the 18 kHz data from the SH1305 cruise. Note that the size of the download is about 60 GB.
aws s3 sync s3://ncei-wcsd-archive/data/processed/SH1305/18kHz/ .
Last updated
Was this helpful?