Systems Engineer III

  • Full-time

Company Description

Tesco Bengaluru

We are a multi-disciplinary team creating a sustainable competitive advantage for Tesco by standardising processes, delivering cost savings, enabling agility, providing cutting-edge technological solutions and empowering our colleagues to do ever more for our customers. With cross-functional expertise in Global Business Services and Retail Technology & Engineering, a wide network of teams and strong governance we reduce complexity thereby offering high quality services for our customers. Tesco Bengaluru, established in 2004 to enable standardisation and build centralised capabilities and competencies, makes the experience better for our millions of customers worldwide and simpler for over 4,40,000 colleagues.

At Tesco Business Solutions, we have a mission to simplify, scale & partner to serve our customers, colleagues and suppliers through a best-in-class intelligent Business Services model. We do this by building a world class business services model by executing services model framework right at the heart of everything we do for our worldwide customers. The key objective is to implement and execute service model across all our functions and markets consistently. The ethos of business services is to free-up our colleagues from a regular manual operational work. We use cognitive technology to augment our key decision making. We also built a Continuous Improvement (CI) culture across functions to drive bottom-up business efficiencies by optimising processes. Business services colleagues need to act as a business partner with our group stakeholders to build a collaborative partnership driving continuous improvement across markets and functions to lead the best customer experience by serving our shoppers a little better every day.

At Tesco, inclusion means that Everyone's Welcome. Everyone is treated fairly and with respect; by valuing individuality and uniqueness we create a sense of belonging.

Diversity and inclusion have always been at the heart of Tesco. It is embedded in our values: we treat people how they want to be treated. We always want our colleagues to feel they can be themselves at work and we are committed to helping them be at their best.

Across the Tesco group we are building an inclusive workplace, a place to actively celebrate the cultures, personalities and preferences of our colleagues, who in turn help to build the success of our business and reflect the diversity of the communities we serve.

Job Description

Key Responsibilities As System Engineer III
Kafka Administration:
- Install; configure; and maintain Kafka clusters; ensuring high availability and reliability.
- Manage Kafka brokers; topics; partitions; and configurations to optimize performance.
- Implement and manage Kafka security protocols; including SSL/SASL; encryption; and access control lists (ACLs)as well Manage Kafka quotas
- Monitor Kafka clusters for performance issues; troubleshoot problems; and implement solutions.
- Perform regular Kafka upgrades; patching; and maintenance to ensure stability and security.
Monitoring & Alerting:
- Set up comprehensive monitoring systems to track Kafka cluster health; performance; and resource usage.
- Implement real-time alerting mechanisms for critical Kafka metrics such as lag; throughput; broker performance; and disk usage.
- Use monitoring tools such as Prometheus; Grafana; Splunk; or Datadog to create dashboards and alerts.
- Continuously refine and improve alert thresholds to minimize false positives and ensure timely issue detection.
Performance Tuning & Optimization:
- Analyse and optimize Kafka performance; including tuning broker configurations; producer/consumer settings; and JVM parameters.
- Conduct capacity planning and ensure Kafka infrastructure can handle growing data volumes.
- Troubleshoot and resolve Kafka performance issues; including slow consumers; high latency; and broker instability.
Apicurio Administration:
- Install; configure; and maintain Apicurio Registry and associated tools in various environments (development; testing; production).
- Manage and maintain API schemas and versions in the Apicurio Registry.
- Implement security protocols for Apicurio; including authentication; authorization; and encryption.
- Monitor the performance and health of the Apicurio Registry; troubleshooting any issues that arise.
- Plan and execute upgrades; patches; and enhancements for the Apicurio environment.
Azure Cloud Management:
- Manage and maintain Azure cloud resources; including virtual machines; storage accounts; virtual networks; databases; and other Azure services.
- Implement and manage security measures; including identity and access management; network security groups; and encryption.
- Monitor and optimize Azure performance; ensuring scalability; reliability; and cost-effectiveness.
- Plan and execute Azure resource provisioning; scaling; and disaster recovery strategies.
Middleware Platform Administration:
- Deep technical knowledge in Biztalk RTI SFG Ab Initio kafka Azure Administration Hadoop & Tibco ( capacity planning troubleshooting performance optimisation deployment & management of application
- Very good experience with system automation and deployment tools (Chef Puppet Ansible)
- Very good understanding of Unix /Linux command Shell scripting & power shell scripting
- Deep technical knowledge of networking (VPN subnet firewall SSH GTM LTM etc.)
- Technical knowledge of modern build tools like Jenkins Ant
- Deep Technical knowledge in architecting designing and integrating new solutions in a large scale enterprise of highly distributed applications (i.e. having an architectural sense for ensuring availability reliability maintainability scalability etc.)
- Technical knowledge on working of virtual machine & Physical machine
- Deep technical knowledge on Hardware configurations and setup like racks disk topology RAID etc.
- Sound knowledge on back up and restoration using Veeam or networker
Capacity Planning & Scaling:
- Perform capacity planning and forecasting to ensure that Kafka /Apicurio clusters can handle growth and increasing data volumes.
- Implement horizontal and vertical scaling strategies to meet business needs.
- Work closely with infrastructure teams to optimize hardware resources for Kafka deployments.
Automation & Scripting:
- Develop and maintain automation scripts for Kafka / Apicurio administration tasks using tools like Ansible; Terraform; or custom scripts.
- Create and maintain monitoring and alerting scripts to ensure the health and performance of Kafka clusters.
- Implement and manage CI/CD pipelines for API deployment and management.
- Automate routine maintenance tasks.
Incident Management & Support:
- Act as the primary point of contact for Kafka /Apicurio/Azure-related incidents; ensuring quick resolution and minimal downtime.
- Collaborate with development; DevOps; and SRE teams to diagnose and resolve Kafka-related issues.
- Participate in on-call rotations to provide 24/7 support for Kafka environments.
Documentation & Training:
- Document configurations; best practices; and troubleshooting procedures in confluence page.
- Provide training and mentorship to junior team members on Platform administration
- Develop and maintain runbooks for incident response
Collaboration:
- Collaborate with Infrastructure development; DevOps; and IT teams to implement and support Kafka-based solutions.
- Participate in on-call rotations and provide support for critical issues outside of regular business hours as needed.
- Engage with stakeholders to understand requirements and design solutions that align with business goals.

Qualifications

We are seeking a skilled and experienced Senior Administrator to join our IT team. . The ideal candidate will have in-depth knowledge and hands-on experience with Apache Kafka; responsible for managing and maintaining the Apicurio Registry and associated API management tools. He/she should have a sound knowledge and experience in Azure cloud infrastructure. In addition; experience with middleware technologies such as Ab Initio; TIBCO; SFG; BizTalk; and RTI is highly desirable. This role requires strong analytical skills; a deep understanding of data streaming; and the ability to manage and scale Kafka clusters effectively.

Additional Information

Important Notice: 

On behalf of Tesco Bengaluru, we must caution all job seekers and educational institutions that Tesco Bengaluru does not authorise any third parties to release employment offers or conduct recruitment drives via a third party. Hence, beware of inauthentic and fraudulent job offers or recruitment drives from any individuals or websites purporting to represent Tesco. Further, Tesco Bengaluru does not charge any fee or other emoluments for any reason (including without limitation, visa fees) or seek compensation from educational institutions to participate in recruitment events. 

Accordingly, please check the authenticity of any such offers before acting on them and where acted upon, you do so at your own risk. Tesco Bengaluru shall neither be responsible for honouring or making good the promises made by fraudulent third parties, nor for any monetary or any other loss incurred by the aggrieved individual or educational institution. 

In the event that you come across any fraudulent activities in the name of Tesco Bengaluru, please feel free report the incident at [email protected] 

Privacy Policy