Gaurang is a Data Engineer with 5+ years of experience in designing and implementing scalable data pipelines and workflows. He has extensive expertise in cloud services, data processing, and machine learning integrations.
Designed and implemented robust end-to-end data pipelines across multiple domains.
Achieved 89% prediction accuracy in a Real Estate Price Prediction Model through optimization.
Automated data workflows using Apache Airflow to ensure scalability and reliability.
Implemented custom data observability systems and alerting mechanisms for early detection of data quality issues.
Overview: This project designed and implemented a robust end-to-end data pipeline to extract data from Google Analytics and transfer it to a cloud-based analytics platform for business insights. Responsibilities: Designed and implemented the end-to-end data pipeline using AWS Glue, Databricks, and Apache Airflow for orchestration. Leveraged Databricks for extracting large datasets and integrated with AWS S3 for cloud storage. Utilized Apache Airflow to automate scheduling and execution, optimizing resource usage. Applied PySpark for processing and transforming large-scale datasets, enabling real-time and batch capabilities. Implemented a multi-stage transformation process in Snowflake to deliver actionable insights.
Key outcomes:
Designed and implemented a robust end-to-end data pipeline for Google Analytics data.
Ensured data accuracy and reliability through end-to-end data quality checks.
Overview: This project involved developing and deploying a data pipeline monitoring system for Product 360. Responsibilities: Developed and deployed the data pipeline monitoring system using Databricks for processing and Snowflake for warehousing. Ingested and processed large-scale datasets from Azure Data Lake Storage (ADLS) and Azure Blob Storage using Databricks. Automated the pipeline with Apache Airflow to ensure scalable and reliable data flow to Snowflake. Implemented a custom data observability system to track data health, detecting anomalies and changes in patterns.
Key outcomes:
Deployed a data pipeline and monitoring system for Product 360.
Implemented a custom data observability system with anomaly detection.
Overview: This project focused on designing and implementing a customer data integration platform to consolidate data from multiple sources (CRM, e-commerce platforms) into a single source of truth. Responsibilities: Designed and implemented the platform to consolidate customer data from CRM and e-commerce platforms. Leveraged AWS Glue to automate the ETL process into AWS S3, ensuring data consistency. Employed Databricks and PySpark for large-scale processing and complex transformations.
Key outcomes:
Designed and implemented a customer data integration platform for a single source of truth.
Automated ETL processes for customer data using AWS Glue into AWS S3.
Overview: This project engineered a fully automated data extraction and transformation pipeline for health care systems. Responsibilities: Engineered a fully automated data pipeline leveraging AWS WorkMail, AWS S3, and AWS Lambda. Integrated AWS Lambda with AWS WorkMail to automate Excel file ingestion into an S3 bucket. Triggered Lambda functions for data cleansing and transformation.
Key outcomes:
Engineered a fully automated data extraction and transformation pipeline for healthcare systems.
Designed a fault-tolerant and scalable system capable of handling large data volumes.
Key outcomes:
Achieved 89% prediction accuracy for the real estate price prediction model.
Developed an optimization pipeline for model selection and hyperparameter tuning.
Enabled stakeholders to easily interpret predicted prices through effective visualizations.
Gaugran
Data Engineer