Description:
Data Engineer Location: Gauteng Contract duration: 01 January 2026 - 31 December 2028 Our client is seeking a hands-on Data Engineer with strong experience in building scalable data pipelines and analytics solutions on Databricks. They will design, implement, and maintain end-to-end data flows, optimize performance, and collaborate with data scientists, analytics, and business stakeholders to turn raw data into trusted insights.ESSENTIAL SKILLS:
- Expertise with Apache Spark (PySpark), Databricks notebooks, Delta Lake, and SQL
- Strong programming skills in Python for data processing
- Experience with cloud data platforms (Azure) and their Databricks offerings; familiarity with object storage (ADLS)
- Proficient in building and maintaining ETL/ELT pipelines, data modeling, and performance optimization
- Knowledge of data governance, data quality, and data lineage concepts
- Experience with CI/CD for data pipelines, and orchestration tools (GitHub Actions, Asset Bundles or Databricks’ jobs)
- Strong problem-solving skills, attention to detail, and ability to work in a collaborative, cross-functional team
ADVANTAGEOUS SKILLS:
- Experience with streaming data (Structured Streaming, Kafka, Delta Live Tables).
- Familiarity with materialized views, streaming tables, data catalogs and metadata management.
- Knowledge of data visualization and BI tools (Splunk, Power BI, Grafana).
- Experience with data security frameworks and compliance standards relevant to the industry.
- Certifications in Databricks or cloud provider platforms.
QUALIFICATIONS/EXPERIENCE:
Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Systems, or a related field.
3+ years of hands-on data engineering experience.
Key Responsibilities:
- Design, develop, test, and maintain robust data pipelines and ETL/ELT processes on Databricks (Delta Lake, Spark, SQL, Python/Scala/SQL notebooks)
- Architect scalable data models and data vault/ dimensional schemas to support reporting, BI, and advanced analytics
- Implement data quality, lineage, and governance practices; monitor data quality metrics and resolve data issues proactively
- Collaborate with Data Platform Engineers to optimize cluster configuration, performance tuning, and cost management in cloud environments (Azure Databricks)
- Build and maintain data ingestion from multiple sources (RDBMS, SaaS apps, files, streaming queues) using modern data engineering patterns (CDC, event-driven pipelines, change streams, Lakeflow Declarative Pipelines)
- Ensure data security and compliance (encryption, access controls) in all data pipelines
- Develop and maintain CI/CD pipelines for data workflows; implement versioning, testing, and automated deployments
Requirements:
- Expertise with Apache Spark (PySpark), Databricks notebooks, Delta Lake, and SQL
- Strong programming skills in Python for data processing
- Experience with cloud data platforms (Azure) and their Databricks offerings; familiarity with object storage (ADLS)
- Proficient in building and maintaining ETL/ELT pipelines, data modeling, and performance optimization
- Knowledge of data governance, data quality, and data lineage concepts
- Experience with CI/CD for data pipelines, and orchestration tools (GitHub Actions, Asset Bundles or Databricks’ jobs)
- Strong problem-solving skills, attention to detail, and ability to work in a collaborative, cross-functional team
- Experience with streaming data (Structured Streaming, Kafka, Delta Live Tables).
- Familiarity with materialized views, streaming tables, data catalogs and metadata management.
- Knowledge of data visualization and BI tools (Splunk, Power BI, Grafana).
- Experience with data security frameworks and compliance standards relevant to the industry.
- Certifications in Databricks or cloud provider platforms.
- Design, develop, test, and maintain robust data pipelines and ETL/ELT processes on Databricks (Delta Lake, Spark, SQL, Python/Scala/SQL notebooks)
- Architect scalable data models and data vault/ dimensional schemas to support reporting, BI, and advanced analytics
- Implement data quality, lineage, and governance practices; monitor data quality metrics and resolve data issues proactively
- Collaborate with Data Platform Engineers to optimize cluster configuration, performance tuning, and cost management in cloud environments (Azure Databricks)
- Build and maintain data ingestion from multiple sources (RDBMS, SaaS apps, files, streaming queues) using modern data engineering patterns (CDC, event-driven pipelines, change streams, Lakeflow Declarative Pipelines)
- Ensure data security and compliance (encryption, access controls) in all data pipelines
- Develop and maintain CI/CD pipelines for data workflows; implement versioning, testing, and automated deployments
26 Nov 2025;
from:
careers24.com