Site Reliability Engineer

Site Reliability Engineer
Location:
176 Montague Road, South Brisbane 4101, Newstead, Queensland
Address
176 Montague Road, South Brisbane 4101, Newstead, QueenslandDescription
Reporting directly to the Software Manager, you will be dedicated full-time to creating and deploying solutions that improve the reliability of systems in production, fixing issues, and responding to incidents and issues.You will run the production environments by monitoring availability and taking a holistic view of systems health. You will build software and systems to manage platform infrastructure and applications, having a demonstrated working experience as a Site Reliability Engineer in similar environments that can bring working knowledge and skills.
You will possess excellent interpersonal and communication skills to coordinate between all levels of stakeholders, be a problem solver with a natural passion for coding, scripting, and innovation.
The key duties and responsibilities of this position include, but are not limited to:
General
● Improve reliability, quality, and time-to-market of our digital solutions of our Product Development Team
● Measure and optimise systems performance, with an eye towards pushing our capabilities forward
● Experience implementing and executing triage processes and issue management and resolution processes
● A skill for debugging systems without a full understanding of their operations, with a focus on resolving the root cause of the issue rather than current symptoms
● Creative approach to problem-solving with the ability to focus on details while maintaining a strategic view
● Analytical, planning, and organisational skills with an ability to manage competing demands
● Planning and performing high risk maintenance across our environments
● Provide primary operational support and engineering for our digital solutions and their environments
● Gather and analyse metrics from both operating systems and applications to assist in performance tuning and fault finding
● Work collaboratively as part of the product development team to improve services through rigorous testing and release procedures
● Participate in system design consulting, platform management, and capacity planning
● Create sustainable systems and services through automation and uplifts
● Excellent troubleshooting skills. You love diving into the depths of a problem to reach an outcome
● Embed observability into all aspects of the application ecosystem
● Assist teams in identifying and removing manual repetitive tasks from their work
● Ability to stay calm in challenging circumstances and work through problems methodically.
● Ability to question decisions and challenge the status quo.
● Actively work on planning to increase system reliability and reduce manual interventions
Qualifications/ Memberships:
● Relevant tertiary IT qualifications and certification or related discipline
● Any Microsoft Azure Certifications
● Any Certified Kubernetes Certifications
Experience:
Preferable
● Bachelor’s degree in computer science or other highly technical, scientific, or related discipline
● Prior experience working as a DevOps Engineer, Systems Administrator or SRE
● Ability to program with one or more high level languages, such as PowerShell, C#, JavaScript, SQL, etc.
● Previous experience with Microsoft Azure
Desirable
● Experience with Scrum/Agile development methodologies
● Versed in Multi-tenant event-driven microservice architectures, and large-scale compute requirements
● Solid background in operating relational and non-relational databases (SQL and NoSQL)
● Experience in Identity services
● Experience developing solutions that run in Containers and on Kubernetes (preferable)
● Experience with CI/CD pipeline technologies, and tools such as GitHub Actions and Octopus Deploy
● Exposure with one or more container scheduling/orchestration products for distributed storage technologies and dynamic resource management frameworks (for example, Kubernetes, Terraform)
● Experience using monitoring tools such as Application Insights, Azure Monitor & log analytics, Opsgenie, etc.
● Knowledge and skills automating and monitoring Azure services
● Familiarity with different disaster recovery strategies and recovery practices
Knowledge & Skills
● A proactive approach to spotting problems, areas for improvement, and performance bottlenecks Strong analytical and reasoning skills with an ability to visualise processes and outcomes
● Excellent verbal and written communication skills
● Capable of delivering on multiple competing priorities with little supervision.
Personal Requirements
● A passion for solving problems and providing workable solutions
● A desire to learn and continuously improve
● Takes responsibility for their own actions
● Works well with a team on common goals
no -- Principals only. Recruiters, please don’t contact this job advertiser.
no -- Please, no phone calls about this job!
no -- Reposting this message elsewhere is NOT OK.
Telecommuting is ok.
Placed or Updated:
- Category: Part Time Jobs
- ID: 629189
0views
⚠ Never send money to anyone you have only communicated with online or by phone!