Using full ITIL framework along with internal and external
expertise to identify and resolve complex issues, ensuring the
best application performance and customer experience
Supporting the development and delivery of platform technology
upgrades.
Taking services to live; acting as quality gate check for the
applications.
Configuring, developing and automating our ticketing system,
monitoring alarms and data feeds to deliver data driven
monitoring and decision making.
Providing support to 1st and 2nd line technical teams, whilst
interacting with developers and other stakeholders to resolve
complex issues and prevent reoccurrence
Using scripting skills to automate regular and repetitive
tasks to streamline our processes and operational experience.
Essential:
Involved in deploy/manage/operate of medium to large scale
production systems.
Experience working as a Site Reliability Engineer
Experience of infrastructure configuration management
automation e.g. Ansible/Terraform/Puppet/Chef
Experience of setting up component logging and configuring
monitoring tools – e.g. Prometheus
Linux administration and automatio
Experience of working with database technologies RDBMS and
NoSQL technologies
Software debugging and tuning experience.
Desirable:
Computing related degree or equivalent experience
Telephony/SIP application development
Experience in developing scripting, e.g. Python
Experience of setting up component logging and configuring
monitoring tools – e.g. Prometheus