Data Engineering Lead
- Lahore, Multan, Karachi, Islamabad
- WFH Flexible
- Information Technology
We are seeking a Data Engineering Lead with 8+ years of hands-on experience and a strong background in real-time and batch data processing, containerization, and cloud-based data orchestration. This role is ideal for someone passionate about building robust, scalable, and efficient data pipelines, and who thrives in agile, collaborative environments.
Key Responsibilities
- Design, build, and maintain real-time data pipelines using streaming frameworks such as Kafka, Apache Flink, and Spark Structured Streaming.
- · Develop batch processing workflows with Apache Spark (PySpark)
- Orchestrate and schedule data workflows using orchestration frameworks such as Apache Airflow and Azure Data Factory
- Containerize applications using Docker, manage deployments with Helm, and run them on Kubernetes
- Implement modern storage solutions using open formats such as Parquet, Delta Lake, and Apache Iceberg
- Build high-performance analytics engines using tools like Trino or Presto
- Collaborate with DevOps to manage infrastructure with Terraform and integrate with CI/CD pipelines via Azure DevOps
- Ensure data quality and consistency using tools like Great Expectations
- Write modular, well-tested, and maintainable Python and SQL code
- Develop an observability layer to monitor and optimize performance across data pipelines
- Participate in agile ceremonies and contribute to sprint planning and reviews
Required Skills & Experience
- Advanced Python programming with a strong focus on modular and testable code
- Strong knowledge of SQL and experience working with large-scale datasets
- Hands-on experience with at least one major cloud platform (Azure preferred)
- Solid experience with real-time data processing (Kafka, Flink, or Spark Streaming)
- Expertise in Apache Spark (PySpark) for batch processing
- Experience implementing lakehouse architectures and working with columnar storage (e.g., ClickHouse)
- Proficient in using Azure Data Factory or Apache Airflow for data orchestration
- Experience in building APIs to expose large datasets
- Solid experience with Docker, Kubernetes, and Helm
- Familiarity with data lake open formats such as Parquet, Delta Lake, and Iceberg
- Basic experience with Terraform for infrastructure provisioning
- Practical experience with data quality frameworks (e.g., Great Expectations)
- Comfortable working in agile development teams
- Proven ability in debugging and performance tuning of streaming and batch data jobs
- Experience with AI-driven tools (e.g., text-to-SQL) is a plus
We have an amazing team of 700+ individuals working on highly innovative enterprise projects & products. Our customer base includes Fortune 100 retail and CPG companies, leading store chains, fast-growth fintech, and multiple Silicon Valley startups.
What makes Confiz stand out is our focus on processes and culture. Confiz is ISO 9001:2015 (QMS), ISO 27001:2022 (ISMS), ISO 20000-1:2018 (ITSM) and ISO 14001:2015 (EMS) Certified. We have a vibrant culture of learning via collaboration and making workplace fun.
People who work with us work with cutting-edge technologies while contributing success to the company as well as to themselves.
To know more about Confiz Limited, visit: https://www.linkedin.com/company/confiz-pakistan/