Technical Lead & Archiect - Azure & Databricks, Spark,Kafka,Snowflake,Scala,Pyspark,AWS Cloud,NoSQL
Objectives – Having 18.5 years of experience As a Technical Lead and Architect, I am passionate about leveraging my expertise in Databricks, Data Build Tool, Spark, Confluent Kafka, Data Lake, Lake House and Cloud Solutions to drive innovation and efficiency. With a solid background in data architecture and IT Infrastructure, I am to contribute to the robust and vision of your company by providing ro bust technical solutions that align with strategic business goals. My Goal is to enhance data-driven decision-making processes, optimize big data pipelines and implement secure and scalable cloud architectures that people the organization forward in the ever-evolving technical landscape. Certification & Achievements – 1. Confluent Certified Administrator for Apache Kafka: expiry July 2026. Confluent Certified Administrator for Apache Kafka • Amit Raj • Confluent 2. Databricks Certified Data Engineer Professional: expiry Aug 2026 Databricks Certified Data Engineer Professional • Amit Raj • Databricks Badges 3. Databricks Accredited Lakehouse Fundamentals: expiry June 2025 Academy Accreditation - Databricks Lakehouse Fundamentals • Amit Raj • Databricks Badges 4. Microsoft Certified: Azure Administrator Associate (AZ104): Microsoft certification ID: 1100039942 Expires on: August 3, 2025 Credentials - AmitRaj-8869 | Microsoft Learn 5. Microsoft Certified: Azure Security Engineer Associate (AZ500): Microsoft certification ID: 1100039942 Expires on: August 6, 2025 Credentials - AmitRaj-8869 | Microsoft Learn 6. Recognition certificate from Fidelity for designing global solutions for Data exchange. 7. Got Achievement medal from DIB(Client) with appreciation for design event-based enterprise architecture & contribution – EventHub SUMMARY • Overall total of 18.5+ years of Experience years of Experience in Application Design, Development & Deployment of Hadoop Eco System/Java/J2EE systems with good exposure to Enterprise Architecture. • Relevant Experience 9.2 years in Big Data technologies working with multiple clients and domain knowledge. • Experienced in Cassandra data modelling, cluster setup and data management. • Experienced in working with Spark-SQL, Spark SQL and Spark Structure Streaming, MLib to process and analyse data queries. • Experienced in designing solutions using Spark Streaming and Kafka Streaming for Payment Gateway/point of sales events. • Individual Contribution (Kafka Architect): Delivered UAT and PROD Cluster within the timeline for Kafka cluster using Cloudera 6.x, CSP 2.0. • Implemented a unified data platform to gather data from different sources using Kafka Producers and consumers in Scala and java. • Solid background in Object-Oriented analysis & design, UML and various design patterns. • Worked using Azure cloud(Blob, EventHub), Kubernetes, docker with Spark, scala, Schema Registry, Avro Schema with home security application for Honeywell • Implemented KSQL, KTable and KStream using Confluent Kafka along with Kafka Connect. • Hands-on Data bricks - Databricks Clusters, Data Lakehouse, Delta lake, DBFS, EXPLORE, Analyze, Clean, Transform and Load Data using Databricks. • Experience with Azure: Azure Synapse Analytics, ADLS, ADF, CosmoDB, Azure Function, Stream Analytics, Power BI. • Experience with SQL and NoSQL databases including Mysql, Oracle, Cassandra, and PostgreSQL. BigTable • Experience building and optimizing the ‘big data’ data pipeline. • Experince with Azure Devops, CI/CD pipeline, Kubernetes and docker • Motivated Technical Architect with 5 years of progressive experience. • Having Experience AWS (Ec2,S3) • Having experience with Snowflake to design data lake and load data from multiple sources to the Snowflake database. • Effectively manages assignments and team members. • Dedicated to self-development to provide expectation-exceeding service. Customer-focused, successfully contributing to company profits by improving team efficiency and productivity. • Utilizes excellent organizational skills to enhance efficiency and lead teams to achieve outstanding delivery. SKILLS == =================== Database architecture Database architecture development Data Architecture Big Data ETL Technical solution development Azure data solutions Data insight provision Technical guidance IT Architecture Technical solutions Big data frameworks Technical Skills: Hortonworks2.5, Cloudera5/6, Apache Hadoop2/3 ,Spark2/3,Apache Kafka, Confluent Kafka, Hive 2/3, Impala, Sqoop, OOZie, Zookeeper, Snowflake, Data Build tool (DBT), HBase, Apache Cassandra /DataStax Cassandra, Data Bricks, Azure Cloud, AWS cloud, Talend, Airflow etc. Programming Language Python , Scala & Java Other Tools Kibana, Logstash, ElasticSearch, ELK. ============================= PROJECT UNDERTAKEN: Project: Implementation of Data Warehouse and reporting platform Roles: Databricks Architect & Engineer Teams: 12 members Technical Skills: Azure Cloud, Azure Data Factory (ADF), ADLS, Databricks, Spark3.x, Python, Scala2.15, DB2, Oracle 12g, Azure SQL My Contribution Data Bricks Infrastructure Solution: - Configured Unified Data Access Control using Unity Catalog – E1 & BY System provide a specific set of permissions, like Read Only, or, Write Only to a specific Group of Users on one, or, some of the Delta Tables, or, even at the Row Level, or, Column Level, which can contain Personally Identifiable Information, i.e., PII, of that Delta Tables - Provide Data Governance with centralized place: administer (TAI) the access to the data, and, also audit the access to the data. - Applied Data lineage for E1 & BY tables with look-up tables using Unity Catalog. - Implemented Data sharing protocol to apply secure data sharing downstream using Unity Catalog and - Design Architecture of Unity Catalog which can be linked to multiple Databricks Workspaces- DEV, UAT, PROD environment. - Created Metastore for the Unity Catalog - Apply User Management of the Unity Catalog for the TAI Lakehouse project: Users, Groups, or, the Service Principle, and, the permissions those have - Configure Data Bricks Cluste with spark 3.x for DEV, UAT & PROD for TAI -E1 & BY System. - Design & apply medallion architecture, Setup a Data Lake house with Bronze, Silver and Gold layers of a storage system using Azure Data Lake Gen2. Azure Cloud Infra and Security: - Install self-hosted integration runtime for the DB2 ON DEV, UAT & PROD and Oracle on-prem cluster on the source system. - Install Azure Virtual network managed IR On DEV, UAT & PROD. - Installed Db2 connector on DEV, UAT & PROD. - Created linked service lnk_BY_Azure_SQL, lnk_E1_Azure_SQL, lnk_Db2_E1 - Install and Configure Azure Key-Vault, added all the credential for Azure SQL, ADLS, Databricks, Users, global users, linked service to Azure Key Vault. – DEV, UAT & PROD. - Created 3 nodes for DEV and 5 nodes for PROD cluster to migrate data. - Setup and configure Azure Active Directory to provide team access policy for Databricks cluster, Azure Data Factory, Azure SQL, Azure Data Lake house. - Coordinate with TAI Client and Microsoft support team to resolve throughput issues. As Azure & Databricks Data Engineer: - developed most critical data ingestion pipelines using Azure Data Factory (ADF) for E1 to migrate 12.8TB of 120 tables from Db2 to ADLS RAW as a parquet file. There are many large tables with 2-4 TB of volume data containing 400 to 800 million records. - Initial & Incremental migration pipelines for both the E1 and BY sources with a watermark based on Julian's date & time - Design Audit table (Process log) and Control table (System) to achieve dynamic pipeline and audit information for master and child pipeline. - Design architecture solution to achieve delete for PKSNAPSHOT – E1 & BY. - Build dynamic delete pipeline using ADF (load PKTBL), Databricks PySpark for Daily, Weekly, OnDemand, and Yearly frequency to delete records from target (Analytics layer – gold layer) based on source system delete column and delete table - Build transformation using Datarbicks Spark with Scala for E1 to - Apply a transformation with a lookup table and transform to the Silver layer. - Build transformation to transform on Analytics layer (Gold) using Databricks Spark & scala. - Implemented UPSERT using Spark Structure Streaming with 5 minutes on the Analytics layer - Design pipeline architecture for master pipeline, child pipeline with different activity ID, Pipeline ID, Master pipeline ID with different pipeline Run ID to make sure for smooth transition audit. - Build logic, developed using Pyspark on Datarbicks – applied on DEV, UAT & PROD to check counter – master pipeline IN PROGRESS - or NOT so that pipeline execution should not overlap. - Pass pipeline parameter to insert or update Audit/Control table using Databricks -Pyspark. - Monitor Performance in DEV & PROD, worked with the team to reduce time. - Milestone – to achieve 10-minute SLA for Incremental load on E1 & BY (end-to-end completion time) - Milestone – achieved 1.53.45hrs to load (400millions record with2.3TB) at RAW as parquet file using ADF pipeline - Interact with Azure Devos’s engineer to build a CICD pipeline for DEV, UAT & PROD with - Develop pipeline as POC using Databricks Workflow, compare the cost with Azure Pipeline, and present to the client. My Contribution to Past Project: Project: Data Exchange (Security Framework) Roles: Technical Lead & Architect – Confluent KStream & KSQL Client: Fidelity & Westpac Team: 9 members Technical Skills: AzureDevops, Jdk 19.0, Confluent Kafka, Kstream, KSQL, Azure Databricks, DBFS, Delta lake, Azure Data Factory, ADLSGen2, Confluent Schema Registry, AES Algorithm, Hash Algorithm, Kubernetes Cluster(AKS).