In this project I utilized Python, SQL, and AWS to design and develop an ETL pipeline that extracts and visualizes data from the official Census ACS5 Survey API. This project serves as an exercise in developing data architectures that allow for easy access of real data that provides valuable insights to drive positive change in education outcomes in the US.
Overview
Extract, Transform, Load:
In this phase, I will write an AWS Lambda
function that extracts multiple datasets from the API, process and store them as .csv files on Amazon S3
.
Dimensional Modeling:
I will then write SQL queries in Amazon Athena
to organize the data and create materialized views for each measure and dimension.
Data Visualization:
Lastly, I will connect Amazon Quicksight
to visualize the data.