Rahul Dastidar · Senior PySpark Data Engineer · 7+ yrs

Senior

Kolkata7+ years experienceremote

Available within 48 hrs

About Rahul

Rahul Dastidar is a seasoned Data Engineering Consultant with over 7 years of experience in cloud and distributed computing. He specializes in PySpark development, data pipelines, and Master Data Management (MDM) with Reltio. Rahul has built scalable pipelines on AWS EMR and Databricks, integrating enterprise data with Reltio's MDM platform. He is adept at API-driven integration and performance optimization of Spark workloads, ensuring enterprise-grade data governance and reliability.

Core expertise

PySpark

language

10/10

Reltio MDM

other

10/10

AWS

cloud

9/10

Apache Spark

language

9/10

GitHub Actions

devops

8/10

Azure

cloud

8/10

REST APIs

other

8/10

Additional skills(13)

PySparkDatabricksApache SparkAzure DatabricksHiveElasticsearchReltio MDMREST APIsAWS LambdaAWS Step Functions

Why hire Rahul?

Production deploy authorityMentored 5+ juniors

Designed and implemented scalable PySpark pipelines on Databricks and AWS EMR.

Achieved up to 35% reduction in execution time and 20% cost savings through optimization.

Automated end-to-end CI/CD for PySpark pipelines using GitHub Actions & Jenkins.

Established data governance frameworks ensuring 99% data integrity.

Built and optimized PySpark pipelines handling 1B+ records with 40% faster processing.

Successfully integrated enterprise datasets into Reltio MDM, enabling unified customer 360 view.

Improved data survivorship and match/merge accuracy by 25% through advanced rule configuration.

Project highlights(4)

Project 1 – DevOps & Data Engineering Consultant

Focuses on PySpark development and data pipeline implementation on Databricks.
Integrates enterprise data with Reltio MDM and builds data quality rules.

Designed and implemented PySpark pipelines on Databricks to process and transform multi-terabyte datasets.
Integrated enterprise data sources with Reltio MDM using REST APIs for entity synchronization.
Built data quality rules (deduplication, enrichment, survivorship logic) before ingestion into Reltio.
Optimized Spark jobs with partitioning and caching, reducing execution time by 35%.
Automated workflows with AWS Lambda + Step Functions, orchestrating ingestion pipelines.

PySparkDatabricksReltio MDMREST APIsApache SparkAWS LambdaAWS Step FunctionsSQL

Key outcomes:

Designed and implemented PySpark pipelines for multi-terabyte datasets.
Optimized Spark jobs, reducing execution time by 35%.
Automated ingestion pipelines using AWS Lambda and Step Functions.

Project 2 – Senior Manager - Cloud & Data Engineering

Led migration of client's customer master data into Reltio MDM.
Developed PySpark ETL workflows for master record cleansing and enrichment.

Led migration of client's customer master data into Reltio MDM, configuring match & merge rules.
Built PySpark ETL workflows in Azure Databricks for cleansing, enrichment, and deduplication of master records.
Implemented Reltio entity modeling and integrated downstream applications through REST APIs.
Established data governance frameworks ensuring 99% data integrity.

Reltio MDMPySparkAzure DatabricksREST APIsApache Spark

Key outcomes:

Led migration of customer master data into Reltio MDM.
Built PySpark ETL workflows for master data cleansing.
Ensured 99% data integrity through governance frameworks.

Project 3 – System Architect – AWS & Big Data

Architected big data pipelines with PySpark on AWS EMR for real-time data ingestion.
Developed ETL jobs for integrating customer data with MDM systems (Reltio & Informatica).

Architected big data pipelines with PySpark on AWS EMR, enabling real-time ingestion of enterprise data.
Developed ETL jobs integrating customer data with MDM systems (Reltio & Informatica).
Automated synchronization between Reltio MDM and analytics platforms, ensuring high-quality master data.
Monitored Spark cluster performance and tuned jobs for cost efficiency (20% savings).

PySparkAWS EMRReltio MDMInformaticaApache Spark

Key outcomes:

Architected big data pipelines on AWS EMR for real-time data ingestion.
Developed ETL jobs integrating customer data with MDM systems.
Tuned Spark jobs for 20% cost efficiency.

Project 4 – Technical Architect – Cloud & Automation

Designed data workflows using PySpark + Hive to improve reporting accuracy.
Configured ELK-based monitoring for Spark job failures and data pipeline health.

Designed data workflows with PySpark + Hive, improving reporting accuracy.
Configured ELK-based monitoring for Spark job failures and data pipeline health.
Collaborated with data architects on data modeling & cleansing strategies.

PySparkHiveElasticsearchApache Spark

Key outcomes:

Designed data workflows with PySpark and Hive to improve reporting accuracy.
Configured ELK-based monitoring for Spark job failures.
Collaborated on data modeling and cleansing strategies.

7+ years of industry experience

HealthTech2 projects

•Project— System Architect – AWS & Big DataPySpark · AWS EMR · Reltio MDM · Informatica +1
•Project— Technical Architect – Cloud & AutomationPySpark · Hive · Elasticsearch · Apache Spark

Ready to work with Rahul?

Onboard within 48 hours. No long hiring cycles, no recruiter middleman.

At a Glance

LocationKolkata

Experience7+ years

Work moderemote

Direct hirePossible

Start within48 hours

From$1,868/ month

Single contract. Billed in USD.

Typically responds within 4 business hours.

5-day replacement guarantee

48-hour onboarding, single invoice

Direct chat — no recruiter middleman

Top Skills

PySpark

10/10

Reltio MDM

10/10

AWS

9/10

Apache Spark

9/10

GitHub Actions

8/10

Seniority signals

Owns production deploysGreenfield architectSystem ownerCode reviewerMentor / leads juniors

Vetted by Witarist

Technical skills assessed & verified

Background & identity checked

English communication verified

Ready to onboard in 48 hours

Not sure if this is the right fit?

Tell us your requirements and we'll match you with the best candidates.

Rahul Dastidar

PySpark