
Formula 1 Real-time Data Engineering Pipeline
• Python for data collection, PostgreSQL for storage
• Kafka and Debezium for streaming
• DuckDB with dbt for analytics.
• Implemented CDC and SCD patterns for advanced data tracking.
Showcasing my work in data engineering, analytics, and machine learning
• Python for data collection, PostgreSQL for storage
• Kafka and Debezium for streaming
• DuckDB with dbt for analytics.
• Implemented CDC and SCD patterns for advanced data tracking.
Designed and implemented a comprehensive data engineering pipeline using modern data stack technologies. Generated synthetic data and integrated it with Snowflake, dbt, Airflow, and AWS services to showcase an automated ELT approach with DevOps practices.
Developed 7 different deep learning models to detect whether drivers are paying attention on the road. Built and deployed a Streamlit web application that allows users to upload images and get immediate feedback on driver safety.
Implemented image classification using two approaches: custom CNN and MobileNet pre-trained model. Achieved 97% accuracy with the pre-trained model and deployed a Streamlit web application that classifies uploaded images as either cats or dogs.
Conducted comprehensive exploratory data analysis on H&M dataset to understand customer purchasing patterns and preferences. Developed insights that could be used for personalized fashion recommendations to enhance customer experience.
This project involves building a data pipeline that processes Formula 1 racing data using modern data engineering tools. While the source data is batch, I implemented real-time architecture for learning purposes. The system integrates Python for data collection, PostgreSQL for storage, Kafka and Debezium for streaming, and DuckDB with dbt for analytics.
This project focuses on using computer vision and deep learning to determine whether a driver is driving safely with proper attention. I implemented and compared 7 different deep learning models to find the most accurate approach for this critical safety application.
This project demonstrates different approaches to image classification by comparing custom Convolutional Neural Networks (CNN) with pre-trained models. The goal was to accurately classify images as either cats or dogs and make this capability accessible through a user-friendly web application.
This project involves a detailed exploratory data analysis (EDA) on H&M dataset to understand customer preferences and shopping patterns. The insights derived from this analysis can be used to create personalized fashion recommendations, enhancing customer experience and driving sales.
This project showcases my expertise in data engineering by implementing a complete data pipeline using modern technologies. The primary objective was to generate synthetic data and integrate it with various tools to demonstrate an automated ELT approach with DevOps practices.