Description:
A company that provides a Privacy-Preserving Data Collaboration Platform is seeing a Cloud Data Platform Support Engineer to ensure the stability, reliability, and efficiency of their data platform, primarily hosted on Microsoft Azure (with a future vision to extend to other cloud environments).Responsibilities:
-
Incident & Support: Provide technical support for cloud-based data platforms (data warehouses, pipelines, distributed computing) and swiftly resolve production incidents including performance degradation, stability issues, and data reliability concerns.
-
Root Cause Analysis (RCA): Perform and document root cause analysis for long-term prevention.
-
Monitoring & Observability: Proactively monitor system health and data pipeline performance using cloud-native tools, developing dashboards, alerts, and reporting frameworks for real-time insight.
-
Automation: Build and maintain automation scripts using Python, PowerShell, and Bash to reduce repetitive tasks and enhance operational efficiency.
-
Platform Improvement: Suggest and implement improvements to increase platform resilience, reliability, and performance .
-
Collaboration: Work closely with Data Engineers, Full Stack Support Engineers, Data Scientists, and client-facing teams for troubleshooting and resolution.
-
Knowledge Base: Write, maintain, and share runbooks and troubleshooting guides .
-
On-Call: Be available for extended working hours during critical outage events.
Minimum Requirements:
-
Education: Bachelor’s degree in Computer Science, Information Systems, Engineering, or a closely related field.
- Certifications: Professional certifications in Microsoft Azure (Data Engineering, Administration, or Solution Architecture).
-
Experience: 3+ years of hands-on experience in support engineering, cloud operations, or data engineering within a cloud environment ( Microsoft Azure preferred ).
-
Data Platform: Strong practical experience with cloud-hosted data platforms, including data warehouses, pipeline orchestration services, and distributed compute engines .
-
Analytics Platforms: Experience working with modern scalable analytics platforms such as Databricks, Spark, Azure Synapse, or Microsoft Fabric .
-
Containerization: Familiarity with container orchestration and virtualization technologies like Kubernetes and Docker .
-
Monitoring: Familiarity with cloud-native monitoring and observability tools .
-
Automation: Ability to build and maintain automation scripts using languages like Python, PowerShell, and Bash (implied by the job responsibilities).
-
Troubleshooting: Proven ability to investigate and resolve issues using SQL/T-SQL, Python, and Spark workloads.
-
Operations: Knowledge of incident management practices (escalation, resolution, and prevention) and experience upholding high standards for reliability in business-critical production systems.
Benefits:
- Competitive salary based on experience (salary can potentially be more based on experience/skills)
IF you meet the above requirements and want to make a career-changing move, apply today by emailing your CV to itcareers@hireresolve.za.com
Requirements:
-
Incident & Support: Provide technical support for cloud-based data platforms (data warehouses, pipelines, distributed computing) and swiftly resolve production incidents including performance degradation, stability issues, and data reliability concerns.
-
Root Cause Analysis (RCA): Perform and document root cause analysis for long-term prevention.
-
Monitoring & Observability: Proactively monitor system health and data pipeline performance using cloud-native tools, developing dashboards, alerts, and reporting frameworks for real-time insight.
-
Automation: Build and maintain automation scripts using Python, PowerShell, and Bash to reduce repetitive tasks and enhance operational efficiency.
-
Platform Improvement: Suggest and implement improvements to increase platform resilience, reliability, and performance .
-
Collaboration: Work closely with Data Engineers, Full Stack Support Engineers, Data Scientists, and client-facing teams for troubleshooting and resolution.
-
Knowledge Base: Write, maintain, and share runbooks and troubleshooting guides .
-
On-Call: Be available for extended working hours during critical outage events.
-
Education: Bachelor’s degree in Computer Science, Information Systems, Engineering, or a closely related field.
- Certifications: Professional certifications in Microsoft Azure (Data Engineering, Administration, or Solution Architecture).
-
Experience: 3+ years of hands-on experience in support engineering, cloud operations, or data engineering within a cloud environment ( Microsoft Azure preferred ).
-
Data Platform: Strong practical experience with cloud-hosted data platforms, including data warehouses, pipeline orchestration services, and distributed compute engines .
-
Analytics Platforms: Experience working with modern scalable analytics platforms such as Databricks, Spark, Azure Synapse, or Microsoft Fabric .
-
Containerization: Familiarity with container orchestration and virtualization technologies like Kubernetes and Docker .
-
Monitoring: Familiarity with cloud-native monitoring and observability tools .
-
Automation: Ability to build and maintain automation scripts using languages like Python, PowerShell, and Bash (implied by the job responsibilities).
-
Troubleshooting: Proven ability to investigate and resolve issues using SQL/T-SQL, Python, and Spark workloads.
-
Operations: Knowledge of incident management practices (escalation, resolution, and prevention) and experience upholding high standards for reliability in business-critical production systems.
- Competitive salary based on experience (salary can potentially be more based on experience/skills)