Rainbird Technologies is a rapidly growing intelligent automation scale-up based in London and Norwich. We are undergoing a rapid period of expansion helping organisations automate complex decision-making through the use of our award-winning low code SaaS platform. To accelerate our growth we are looking to expand our platform team with a Site Reliability Engineer who is keen to help us take our platform development, release, and operations to the next level. Within this role, you will be working alongside a small agile development team who are responsible for all aspects of the design, development, testing, delivery, and maintenance of the Rainbird intelligent automation platform. Currently delivering to production every 2 weeks and making constant improvements towards full CI/CD, you will identify and implement automations and efficiencies that accelerate and de-risk software delivery. As our organisation grows and demand on our inference engine increases, you will ensure our platform remains available, stable and performant,minimising manual intervention at every opportunity.
- Support and continuously improve tooling used by the development team to create efficiencies in the
- Increase application observability and monitoring
- Monitor latency, traffic, errors and saturation, identify issues and work with the development team to
- Oversee production releases and be responsible for streamlining deployment and rollback processes.
- Be security conscious with continuous monitoring of security issues and remediation.
- Be responsible for operations support.
- Own and manage our development, testing and production infrastructure and associated costs.
- Be responsible for Business Continuity and Disaster Recovery.
- Introduce automations in all of the above to streamline and de-risk.
- Own and manage the delivery of privately hosted and on-premise versions of the Rainbird platform
We are looking for someone who:
- Is keen to work in a small team with big responsibilities.
- Takes pride in availability, performance and security of production systems.
- Has a strong dislike for manual tasks.
- Enjoys picking up and implementing new tools and frameworks.
- Has the ability to think from a user’s perspective.
- Works well in a team by contributing and listening to ideas before arriving at a technical solution.
- Has a passion for software development and operations and is keen to share ideas and knowledge to improve the team/the platform/the company.
- 2 or more years of experience in DevOps or SRE-related roles.
- Experience working with Docker, Kubernetes, Terraform, Helm, AWS, and modern distributed SaaS infrastructure.
- Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP,
VLANs, the OSI Model, IP Subnetting, and Load Balancing.
- Understanding of good monitoring and alerting practices, using tools like Datadog and Cloudwatch.
- Focus on security in the delivery of all levels of a system.
- Knowledge of the internal workings of at least one of: MySQL, Redis.
- Desire to learn and grow career as a Site Reliability Engineer.
- Previous experience working on the delivery of enterprise SaaS software would be an advantage.
- Proficiency in one or more of: Go, Node.js, bash.
- We’re a small, successful and close-knit scale-up growing fast within the intelligent automation market.
- We’re used to performing to a high standard and delivering great services to our clients around the world.
- We’re friendly, sociable and enjoy working together.
- We hate standing still and are constantly developing new ideas and launching into new markets.
- We are creating ground-breaking change and transformation: come and join us