| Role | Cloud Operations Lead |
| Location | Onsite or Hybrid |
| Experience | 5-8 years |
| Education | Bachelor’s degree in Computer Science, IT, or related field. |
| Number of Positions | 1 |
Skills Required |
| Technical Skills |
- Expertise with at least one major cloud platform (AWS, Azure, GCP).
- Hands-on experience with monitoring and observability tools (CloudWatch, Datadog, New Relic, Prometheus).
- Strong knowledge of Infrastructure as Code (IaC) (Terraform, CloudFormation, ARM templates).
- Experience with incident management frameworks (ITIL, SRE principles, PagerDuty/On-Call rotations).
- Understanding of container orchestration (Kubernetes, ECS, AKS, GKE) and CI/CD pipelines.
- Familiarity with cloud security best practices and compliance frameworks.
|
| Soft Skills |
- Proven ability to lead and motivate teams in a fast-paced environment.
- Excellent problem-solving, decision-making, and communication skills.
- Ability to collaborate with both technical and business stakeholders.
|
Roles & Responsibilities |
| Role Overview |
| The Cloud Operations Lead is responsible for ensuring the stability, availability, security, and performance of all cloud-based systems and services. This role focuses on operational excellence, including monitoring, incident response, change management, and capacity planning. The Cloud Operations Lead coordinates cross-functional teams, optimizes cloud resources for cost efficiency, and drives automation to reduce manual work. |
Key Responsibilities |
Cloud Operations & Reliability- Own day-to-day operations of all production, staging, and development cloud environments.
- Ensure high availability of services by maintaining robust monitoring, alerting, and incident response processes.
- Lead root cause analysis (RCA) and post-mortem reviews to drive continuous improvement.
- Implement observability practices (logging, tracing, metrics) to proactively detect and resolve issues.
- Oversee patch management and maintenance activities to keep systems secure and up-to-date.
Automation & Optimization - Develop and maintain automation scripts for cloud resource provisioning, scaling, and monitoring.
- Optimize cloud cost usage through rightsizing, reserved instances, and cost governance policies (FinOps).
- Standardize runbooks and operational playbooks for common tasks and incidents.
Security & Compliance - Enforce security baselines (IAM, encryption, network segmentation) across all cloud services.
- Collaborate with security teams to implement cloud-native security tools and respond to threats.
- Ensure compliance with regulations and audits (SOC 2, ISO 27001, GDPR, HIPAA where applicable).
Team Leadership & Collaboration - Lead, mentor, and develop a team of cloud operations engineers.
- Foster a culture of SRE/DevOps best practices, promoting automation and reliability.
- Work closely with application, DevOps, and networking teams to support business-critical services.
- Act as escalation point for critical incidents and operational challenges.
Vendor & Stakeholder Management - Manage relationships with cloud providers (AWS, Azure, GCP) and monitoring tool vendors.
- Provide regular status updates and operational metrics to senior management.
- Collaborate with finance to align on cloud cost forecasts and budget planning.
|
Other Information |
| Educational Qualifications |
- Bachelor’s degree in Computer Science, IT, or related field.
- 5–8 years of experience in cloud operations, site reliability engineering (SRE), or IT infrastructure.
- 2+ years in a leadership role managing operational teams.
|