Nitesh Kumar has developed and optimized ETL pipelines for diverse data sources and destinations, including cloud-based data warehouses. He has proven experience with unstructured data extraction and has built and deployed scheduled and event-triggered data pipelines using AWS Glue and Airflow. His expertise includes data quality assurance and providing end-user support, making him a valuable asset in data engineering roles.
Expertise in AWS services for data warehousing and ETL pipeline development.
Proficient in Python and SQL, utilized across multiple data engineering projects.
Developed a full-stack web application for database querying and management.
Implemented AWS Textract for modernizing data solutions by extracting unstructured data.
Delivered ECAN Customer 360 for Holcim loading plant data from 6+ source systems into Redshift
Implemented AWS Textract for modernising unstructured-PDF data extraction in pharma DAP
Built incremental data-loading mechanisms supporting hourly, daily and monthly frequencies
Developed full-stack DATALAB web app for cross-region PostgreSQL CRUD with React.js
Overview: This project involves loading plant-related data into a Redshift Data Warehouse for a prominent building material company. Responsibilities: Performed Data Ingestion, ETL Pipeline Development, and View Creation as per requirements. Modernized data solutions by implementing AWS Textract for extracting unstructured data from PDFs.
Key outcomes:
Developed efficient incremental data pipelines for hourly, daily, and monthly data loads.
Built and deployed scheduled and event-triggered ETL pipelines using AWS Glue.
Overview: This project for a pharmaceutical company involves onboarding medical datasets through a Data Analytics Platform. Responsibilities: Built Airflow pipelines for ETL processes and platform tasks scheduled as cron jobs. Developed Airflow tasks using shell scripts, Python scripts, and Kubernetes pod operations.
Key outcomes:
Managed end-to-end data ingestion and ETL pipeline development for critical medical datasets.
Ensured data quality through source vs. target DQ checks.
Overview: Developed a web application enabling users to perform basic queries on a PostgreSQL database across different EC2 regions. Responsibilities: Implemented user authentication against a PostgreSQL database, enabling users to perform CRUD operations.
Key outcomes:
Developed a full-stack web application for secure database querying and schema management.
Created a serverless framework application for efficient CRUD operations.
Key outcomes:
Automated AWS S3 bucket and Glacier vault creation with a custom Python CLI tool.
Integrated the CLI command into Airflow pipelines, enabling automated resource provisioning.
Implemented input validation and resource tagging/policy application for robust automation.
Nitesh Kumar
Data Engineer