Goalist Developers Blog

Finding File Paths in Your AWS S3 Database

Hello this is Kim from Goalist.

Today I would like to explain an automation I made using AWS(Amazon Web Service) S3 and python. It is easy to access your files when you don't have too much data in your S3 database.

However, if you have a big data or if you need to find a specific data you can't spend your time looking through all your folders to find files you need.

So this automation helps you look up all the paths for the files you need.

Getting started

In order to gain access to your AWS S3 database, first you need to set your credentials so that you can access your AWS S3 database with python.

In your terminal you can simple install the AWS CLI(Command Line Interface) by simply typing:

aws configure

with this, it will ask you to input AWS Access Key Id, AWS Secret Access Key, Default region name, Default output format like the following

AWS Access Key ID [None]: 
AWS Secret Access Key [None]: 
Default region name [None]: 
Default output format [None]: 

However, you can change this in the credentials file in your .aws folder.

Once you have setup your AWS CLI, you can now proceed gaining access to your AWS Bucket with boto3 like the following.

AWS_BUCKET_NAME = 'your bucket name'
session = boto3.session.Session(profile_name='your profile name')
s3 = session.client('s3')

Searching for files

After you gain access to your AWS with boto3, it is time to list all the objects in your bucket.

def list_objects(key_prefix):
    res = s3.list_objects_v2(Bucket='oz-data', Prefix=key_prefix)
    if 'Contents' in res:
        return list(map(lambda x: x['Key'], res['Contents']))

like the above I used the list_objects_v2 method to get my list of all the objects in my bucket.

Next, to find the file path you need, you can write a code that performs as a filter. In my case, I needed my files to include a certain string or have certain dates in between.

Finally, after getting all the files I need, I add them up to a csv file so I can check the path and download the files I need.

Wrap up

That's all from my automation and for this post. I'll leave a link for boto3 documentation and AWS CLI so you can check all the cool commands that you can use.

I'm Kim from Goalist and I will see you soon in another post. Happy Coding :)

Links Boto3 Documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html

AWS CLI Documentation: https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-quickstart.html