Senior Cloud Infrastructure Engineer
- Full-time
Company Description
BETSOL is a cloud-first digital transformation and data management company offering products and IT services to enterprises in over 40 countries. BETSOL team holds several engineering patents, is recognized with industry awards, and BETSOL maintains a net promoter score that is 2x the industry average.
BETSOL’s open source backup and recovery product line, Zmanda (Zmanda.com), delivers up to 50% savings in total cost of ownership (TCO) and best-in-class performance.
BETSOL Global IT Services (BETSOL.com) builds and supports end-to-end enterprise solutions, reducing time-to-market for its customers.
BETSOL offices are set against the vibrant backdrops of Broomfield, Colorado and Bangalore, India.
We take pride in being an employee-centric organization, offering comprehensive health insurance, competitive salaries, 401K, volunteer programs, and scholarship opportunities. Office amenities include a fitness center, cafe, and recreational facilities.
Learn more at betsol.com
Job Description
Cloud & Infrastructure
- Design, operate, and optimize AWS infrastructure in a hybrid cloud environment.
- Improve performance, reliability, and cost efficiency through proactive optimization and capacity planning.
- Perform lifecycle management, scalability improvements, and infrastructure modernization initiatives.
- Act as a senior escalation point for complex infrastructure issues.
Systems Reliability & Operations
- Participate in on-call rotation and lead incident response efforts.
- Own monitoring and alerting using tools such as CloudWatch and related observability platforms.
- Drive root cause analysis for recurring issues and implement long-term reliability fixes.
- Reduce operational effort through automation and proactive improvements.
Ticket & Service Management
- Monitor, assign, prioritize, and resolve tickets using ITSM tools such as ServiceNow, Jira, or similar platforms.
- Adhere to SLA, ticket quality standards, documentation requirements, and escalation procedures.
- Perform root cause analysis for recurring issues and collaborate with teams for permanent fixes.
- Ensure accurate time tracking, ticket updates, and resolution notes as per ITIL best practices.
Identity, Access & Corporate Systems
- Administer Active Directory, Entra ID, and Okta, including identity integrations.
- Implement and maintain IAM, RBAC, and access controls aligned with Zero Trust principles.
- Support core enterprise services including Group Policy, DNS, and DHCP.
- Configure user profiles, email accounts, and system policies as required.
Automation & Infrastructure as Code
- Build and maintain infrastructure using Terraform.
- Develop automation using PowerShell, Bash, and Python.
- Integrate infrastructure workflows into CI/CD pipelines (e.g., GitHub Actions).
- Identify and eliminate manual processes through automation.
Network & Connectivity Support
- Troubleshoot LAN, Wi-Fi, VPN, and basic network connectivity issues.
- Configure and support network and local printers.
- Coordinate with network and security teams for escalations related to infrastructure issues.
Security & Compliance
- Support endpoint security, vulnerability management, and CSPM initiatives.
- Integrate logs and security signals into SIEM platforms (e.g., Rapid7).
- Partner with Security on remediation and risk reduction efforts.
- Help reduce audit findings and improve overall security posture.
- Follow IT policies, security standards, and compliance requirements.
Asset, Inventory & Lifecycle Management
- Manage IT assets throughout their lifecycle, including procurement, allocation, tracking, recovery, and disposal.
- Maintain accurate asset records using CMDB and asset management tools such as ServiceNow, Insight, HPAM, etc.
- Handle IT inventory management, ensuring adequate stock levels for laptops, desktops, accessories, and spares.
Collaboration, Documentation & Leadership
- Collaborate with Security, App & Dev teams, Helpdesk, clients, and vendors.
- Serve as a senior technical escalation point within a small IT team.
- Mentor teammates and contribute to operational best practices.
- Drive reduction of technical debt and infrastructure backlog.
- Create and update technical documentation, SOPs, and knowledge base articles.
- Proactively identify opportunities to improve processes, automation, and service delivery.
Qualifications
Qualifications
- 5+ years of experience in Systems Engineering, Infrastructure, Cloud Operations, or equivalent System Administration roles.
- Bachelor’s degree in Computer Science, Information Technology, or equivalent experience.
Must Have Skills
- AWS production operations (core) Production experience operating AWS core services and controls: EC2 and/or ECS (operational support) RDS basics (backups/reliability, common troubleshooting) S3 operational/security basics CloudWatch logs/metrics/alarms Secrets/config patterns using Secrets Manager and/or SSM Parameter Store Comfortable writing/reviewing IAM policies with least privilege mindset Not required: deep event-driven/serverless architecture ownership (dev team owns this).
- Terraform (multiple repos) Hands-on Terraform in multiple repos: Can explain remote state + locking approach (at least conceptually correct) PR-based change workflow (plan/review/apply controls) Uses modules or can improve/standardize patterns Basic drift awareness (how they detect/respond)
- GitHub Actions for infrastructure (general, practical) Candidate can read/modify Actions workflows used for infrastructure: Understands secrets/environments/runners at a working level Has debugged workflow failures (not just triggered runs) Comfortable supporting Terraform pipeline steps (plan/apply gating) This is Actions for Infrastructure, not necessarily building CI/CD for app teams.
- Ownership + incident leadership + on-call Willing to be on-call 1 week every 3 weeks (low incident volume) Has led incidents end-to-end (not only participated) Can explain RCA + prevention follow-through Self-starter traits: improves standards, reduces tech debt, doesnʼt need handholding
- Extreme Ownership / Self-starter Ambiguity ownership: picks up unclear problems, defines scope, proposes a plan, and executes without waiting to be told End-to-end closure: drives work through implementation, documentation, handoff, and verification (not just “built it”) Raises standards: creates or improves standards (Terraform patterns, pipeline gates, governance controls, runbooks) and gets adoption Continuous improvement: eliminates recurring issues via automation/RCA, not repeat firefighting Strong judgment + communication: escalates early, communicates clearly, and makes tradeoffs explicit
- Automation Python/Bash automation experience (familiar is acceptable)
Nice to have skills
- Okta administration (workforce) Operational Okta admin experience (day-1 capability): User lifecycle basics (join/move/leave ops), deactivation basics Group management and app assignments Troubleshoot common access issues (assignment/group mismatch, MFA loops, login failures) Knows when to escalate deeper SSO/SCIM/policy architecture work Azure (minimal) Operational familiarity with: Entra ID Conditional Access (support-level) Comfortable with Entra <-> Okta integration context (support/escalate appropriately) Okta depth Basic sign-on policy edits when needed (simple changes; deeper work can escalate) Built SSO integrations (SAML/OIDC) end-to-end Implemented SCIM provisioning (attribute mapping, deprovisioning, drift handling) AWS governance / AFT (even though itʼs ~20% of the role) Experience extending AWS AFT (since itʼs already implemented) Governance controls experience: Organizations/SCP, CloudTrail org patterns, AWS Config aggregators/rules Cost optimization experience (tagging standards, budgets/alerts, rightsizing/commitments) IAM Identity Center deeper experience GitHub Enterprise Cloud governance Repo/org permission model ownership Security policies / branch protections / CODEOWNERS Experience responding to secret scanning findings (containment/rotation/hardening) Automation PowerShell Python/Bash automation experience beyond basic
- Networking fundamentals (routing, firewalls, VPNs).
- Excellent oral and written communication skills with the ability to interact with global users.
Preferred Qualifications
- Cloud certifications (AWS).
- Security or SRE background.
- MCSE, MCITP, MCTS, or equivalent certifications.
Additional Information
Working Hours would be 9PM to 5 AM IST