Rainbird Technologies is a rapidly growing intelligent automation scale-up based in London and Norwich. We areundergoing a rapid period of expansion helping organisations automate complex decision-making through the use of ouraward-winning low code SaaS platform. To accelerate our growth we are looking to expand our platform team with aSite Reliability Engineer who is keen to help us take our platform development, release and operations to the next level.Within this role you will be working alongside a small agile development team who are responsible for all aspects of thedesign, development, testing, delivery and maintenance of the Rainbird intelligent automation platform. Currentlydelivering to production every 2 weeks and making constant improvements towards full CI/CD, you will identify andimplement automations and efficiencies that accelerate and de-risk software delivery. As our organisation grows anddemand on our inference engine increases, you will ensure our platform remains available, stable and performant,minimising manual intervention at every opportunity.
- Support and continuously improve tooling used by the development team to create efficiencies in the
- Increase application observability and monitoring
- Monitor latency, traffic, errors and saturation, identify issues and work with the development team to
- Oversee production releases and be responsible for streamlining deployment and rollback processes.
- Be security conscious with continuous monitoring of security issues and remediation.
- Be responsible for operations support.
- Own and manage our development, testing and production infrastructure and associated costs.
- Be responsible for Business Continuity and Disaster Recovery.
- Introduce automations in all of the above to streamline and de-risk.
- Own and manage the delivery of privately hosted and on-premise versions of the Rainbird platform
We are looking for someone who:
- Is keen to work in a small team with big responsibilities.
- Takes pride in availability, performance and security of production systems.
- Has a strong dislike for manual tasks.
- Enjoys picking up and implementing new tools and frameworks.
- Has the ability to think from a user’s perspective.
- Works well in a team by contributing and listening to ideas before arriving at a technical solution.
- Has a passion for software development and operations and is keen to share ideas and knowledge to improve the team/the platform/the company.
- 2 or more years of experience in DevOps or SRE-related roles.
- Experience working with Docker, Kubernetes, Terraform, Helm, AWS, and modern distributed SaaS infrastructure.
- Understanding of standard networking protocols and components such as: TCP/IP, HTTP, DNS, ICMP,
VLANs, the OSI Model, IP Subnetting, and Load Balancing.
- Understanding of good monitoring and alerting practices, using tools like Datadog and Cloudwatch.
- Focus on security in the delivery of all levels of a system.
- Knowledge of the internal workings of at least one of: MySQL, Redis.
- Desire to learn and grow career as a Site Reliability Engineer.
- Previous experience working on the delivery of enterprise SaaS software would be an advantage.
- Proficiency in one or more of: Go, Node.js, bash.
- We’re a small, successful and close-knit scale-up growing fast within the intelligent automation market.
- We’re used to performing to a high standard and delivering great services to our clients around the world.
- We’re friendly, sociable and enjoy working together.
- We hate standing still and are constantly developing new ideas and launching into new markets.
- We are creating ground-breaking change and transformation: come and join us