In the fast-paced, dynamic world of software engineering, where I have spent countless hours fine-tuning, debugging, and fortifying systems, one thing has become vividly clear: the landscape is ever-evolving. The traditional notions of job roles and team structures are continually being challenged and reshaped. As we chart these new territories, I find myself transitioning from engineering to product management within my site reliability engineering (SRE) team.
This shift, admittedly, may seem unconventional to some. After all, isn’t the SRE team traditionally focused on the nitty-gritty of operational concerns – observability, alerting and incident management, service levels and error budgets, capacity planning, and of course, system reliability? Absolutely, yes! However, what I’ve come to realise is that these are not mere technical issues, to be addressed in isolation. They are, indeed, a cohesive product in and of themselves.
The key here is to view our role as an SRE team not just as maintainers, but also as enablement teams. Our customers are the internal users who rely on our systems to perform their tasks efficiently and effectively. By addressing their needs and solving their problems in ways that align with business objectives, we are, in effect, delivering a product.
Let’s unpack this notion a little bit more. When you think about the fundamental principles that guide product teams, a few key tenets come to mind. First and foremost, the product we deliver should solve a genuine problem for our users. Secondly, the delivery of this product should be continuous and iterative, responding to user feedback and market changes. Lastly, the product should be constructed in a way that aligns with the larger business strategy, ensuring that it contributes to the company’s overall objectives.
Now, let’s translate these principles to the realm of SRE. Our product, in this case, is the suite of services we provide to our internal users - from system observability to incident management and beyond. Each of these services is, in itself, a solution to a problem faced by our users. By understanding and empathising with our users’ needs, we can tailor our solutions to their unique challenges, delivering them in an iterative, user-centered manner.
Moreover, in line with product thinking, our services should align with our organisation’s broader strategic objectives. This means considering not just the technical aspects of our work but also the business impact. We need to understand how our decisions affect the bottom line and shape our strategies accordingly.
An essential element in this transformation involves a shift towards ‘continuous discovery’. This concept, championed by product thought leaders like Marty Cagan and Teresa Torres, involves an ongoing dialogue with our users to uncover their needs and pain points. By fostering a culture of continuous learning, we can iteratively improve our services to better serve our users.
In this context, service levels and error budgets become less about stringent adherence to metrics and more about ensuring a balance between innovation and stability that supports the organisation’s objectives. Capacity planning morphs into a strategic endeavor, where we anticipate user needs and business growth. System reliability, in this framework, is not just about minimising downtime, but about providing a consistently exceptional user experience.
Understanding the product mindset does not negate the need for a solid foundation in SRE principles. The thought processes embodied by Google’s SRE handbook remain as critical as ever. However, by marrying these technical principles with a product-focused approach, we can deliver services that are not only technically robust but also solve real problems for our users in a way that works for the business.
In essence, what we’re looking at here is a blending of domains: SRE principles and practices meeting the methodologies of product management. It’s not a diversion from our traditional role but an expansion, an evolution. It demands that we bring together technical acuity with user empathy and business acumen.
Is this a challenging proposition? Undoubtedly. But it is also one ripe with potential, with the opportunity to reshape the way we view and execute our roles as SREs. It’s about creating a new narrative for SRE, one where we don’t just maintain and manage, but also create, innovate, and deliver. It’s about acknowledging that SRE teams are, in fact, product teams – delivering services that solve problems, drive user satisfaction, and contribute to our organisations’ success.
And so, as I step into this new role, I am excited about the opportunities that lie ahead. I look forward to the challenges, the lessons, and the triumphs. It’s a brave new world out there, and we, as SREs, have a pivotal role to play in it. I invite you to join me on this journey as we reimagine what SRE can be.