Engineer IV - DevOps
- Hyderabad, Telangana, India
- Information Technology
Must be able to support in the middle of the night (PST) - 10-12 hours offset of PST.
On Call - 9PM-9AM PST.
We are seeking a skilled and experienced Senior DevOps Engineer to join our team. In this role, you will drive the development, deployment, and optimization of our infrastructure and applications, ensuring scalability, reliability, and security. You will work across cloud platforms, CI/CD pipelines, and automation tools to support platform upgrades, migrations, and ongoing operational excellence. The role includes participation in an on-call rotation to ensure the reliability and availability of our systems and pipelines.
Key Responsibilities:
- Infrastructure as Code: Design, implement, and maintain infrastructure using Terraform.
- Cloud Platform Expertise: Build and manage scalable, secure, and cost-efficient solutions on AWS and GCP.
- CI/CD Pipelines: Develop, optimize, and maintain robust CI/CD pipelines to streamline software delivery and deployment processes.
- Monitoring and Observability: Implement and maintain monitoring, logging, and alerting solutions using tools like New Relic and Splunk to ensure high system availability and performance.
- Containerization and Orchestration: Manage and deploy applications using Kubernetes, ensuring scalability and reliability of containerized workloads.
- Event Streaming and Messaging: Work with Kafka to enable real-time data streaming and event-driven architectures.
- Data Platforms: Collaborate with teams to support and optimize data platforms, including BigQuery or big data platforms like Hadoop/EMR/DataProc.
- Cloud Networking and Security: Familiarity with secure networking solutions and enforce cloud security best practices, ensuring data integrity and compliance.
- Platform Upgrades & Migrations: Lead and execute application upgrades, platform migrations, and infrastructure updates with minimal downtime and impact to business operations.
- Collaboration: Work closely with development, data engineering, and operations teams to deliver scalable and reliable solutions that meet evolving business needs.
- On-Call Support: Participate in the on-call rotation to address incidents, troubleshoot issues, and maintain system reliability.
On-Call Responsibilities
This role includes participation in an on-call rotation to ensure the reliability and performance of production systems:
- Rotation Schedule: Weekly rotation beginning Tuesday at 9:00 PM PST through Monday at 9:00 AM PST.
- Responsibilities During On-Call:
- Monitor system health and respond to alerts promptly.
- Troubleshoot and resolve incidents to minimize downtime.
- Escalate issues as needed and document resolutions for future reference.
Skills:
- AWS, GCP, CI/CD, Terraform, New Relic, Splunk
- Kubernetes, Kafka, Data Platfomrs (Big Query/SnowFlake or Big data platforms (Hadoop/EMR/DataProc) )
- Understanding for cloud networking and security principles
- Experience upgrading applications for platform upgrades and migrations.
Required Qualifications:
- 7+ years of experience in DevOps, Site Reliability Engineering (SRE), or a related role.
- Hands-on experience with AWS and GCP cloud platforms.
- Expertise in Terraform for infrastructure automation and management.
- Strong knowledge of CI/CD pipelines and associated tools (e.g., GitHub Actions, GitLab CI/CD).
- Proficiency in monitoring and logging tools such as New Relic and Splunk.
- Experience managing containerized applications and orchestration platforms, particularly Kubernetes.
- Familiarity with Kafka for event-driven architectures and real-time messaging.
- Experience working with data platforms such as BigQuery, Snowflake, or big data solutions like Hadoop/EMR/DataProc.
- Solid understanding of cloud networking and security principles, including VPCs, firewalls, IAM, and encryption.
- Proven ability to lead and execute platform upgrades and migrations with minimal disruption.
- Excellent troubleshooting and problem-solving skills with a focus on root cause analysis.
- Strong communication and collaboration skills to work effectively across teams.
We have a global team of amazing individuals working on highly innovative enterprise projects & products. Our customer base includes Fortune 100 retail and CPG companies, leading store chains, fast growth fintech, and multiple Silicon Valley startups.
What makes Confiz stand out is our focus on processes and culture. Confiz is ISO 9001:2015 (QMS), ISO 27001:2022 (ISMS), ISO 20000-1:2018 (ITSM) and ISO 14001:2015 (EMS) Certified. We have a vibrant culture of learning via collaboration and making workplace fun.
People who work with us work with cutting-edge technologies while contributing success to the company as well as to themselves.
To know more about Confiz Limited, visit https://www.linkedin.com/company/confiz/