Senior Engineer- SRE

  • Full-time
  • Sub Division: Group Technology
  • Division: GCOO

Company Description

Now it’s your time to join the #1 bank in the Middle East and one of the most prestigious financial companies in the region. Shaking up the world of banking requires a lot of smarts and skill. We’re looking for the brightest and best to help us reach our goals and we’ll also help you reach yours. Your success is our success as you grow stronger in your career. Join us and leave a legacy of your own, as a pioneer in both the company and the industry.

Job Description

Bridge the gap between operations and developer teams, aiming to expedite developments while improving reliability & quality.

Qualifications

  • Build a Site Reliability Engineering culture across the organization by sharing best practices, approaches, documentation, and code with other engineering teams
  • Create software that improves the reliability of systems in production, fixing issues and responding to incidents.
  • Apply automation and software to any tasks or parts of the system that would benefit from it or are performed manually
  • Able to troubleshoot complicated, cross platform issues handling OS, Networking, Database in a cloud-based SaaS environment and handle live production incidents, debug/troubleshoot application and infrastructure issues, follow and implement SRE best practices
  • Monitor application performance take steps to improve overall application performance and stability and follow through with implementation
  • Ensure SLA/SLO error margins are being adhered to by the teams before releasing new features. Take corrective actions if error margins are out of bounds.
  • Conduct system analysis, configuration management and develops improvements for system software performance, availability and reliability
  • Design, write, ship, and motivate the creation of software and systems to increase observability, product reliability and efficiency
  • Work closely with software engineers and testers to ensure the system is responding properly to no-functional requirements such as performance, security, and availability
  • Document your system knowledge as you acquire it over time, create runbooks, and ensure critical system information is readily available to those who need it
  • Maintain and monitoring deployment, orchestration, of the servers, docker containers, databases, and general backend infrastructure
  • Keep up-to date with security and proactively identify, diagnose, and solve complex security issue

Additional Information

  • Bachelor’s degree in computer science or other highly technical, scientific discipline
  • Overall experience of 7+ years including 4+ years’ experience as SRE/DevOps Engineer
  • Working closely with engineering teams to understand their product requirements and how they build/test/deploy their software applications
  • Demonstrable experience in Containerization-Docker and orchestration (Kubernetes)
  • Demonstrable experience in CI/CD tools such as bitbucket, bamboo, nexus and helm
  • Experience with Infrastructure As Code (Terraform, Cloud Formation, Ansible)
  • Knowledge and proven hands-on experience in large-scale databases and distributed technologies, such as Kafka and Confluent Platform Kafka
  • Basic programming and scripting skills (preferably Golang, bash, shell, etc.,)
  • Ability to provide advice, best practices and recommendations for the operation and deployment of Microsoft Azure
  • Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Nagios, New Relic, Perfmon, PerfView, ProcDump, DebugDiag
  • Familiarity with Linux and UNIX systems (e.g. CentOS, RedHat) and command line system administration such as Bash, VIM, SSH.
  • Hands on experience in configuration management of server farms (using tools such as Puppet, Chef, Ansible, etc.,).
  • Network routing, Load balancing and Networking protocols, a base knowledge of TCP/IP, with an understanding of HTTP and DNS
  • Knowledge of SRE & Agile methodologies

Preferred Skills (Good to have)

  • Demonstrated understanding of ITIL methodologies, ITIL v3 or v4 certification
  • Kubernetes CKA or CKAD certification
Privacy PolicyImprint