To embed or not to embed: That is the question.
At least, that’s one of the questions that companies have to answer as they decide how to implement site reliability engineering. They can either embed SREs into existing teams, or they can build a new, separate team.
Both approaches have their pros and cons. The right strategy for your company or team depends, of course, on your needs and priorities.
What is an Embedded SRE?
An embedded SRE works as part of a non-SRE team. They might join development or IT operations teams, for example—although they could also be part of quality assurance teams, security teams or any other unit within the organization that can benefit from their expertise.
Embedded SREs are the opposite of creating a dedicated, general-purpose SRE team. In the latter approach, an organization hires talent that collaborates with non-SRE teams but who aren’t integrated directly into them.
There are several other organizational models that fall somewhere between these two poles. But typically, the two main types of structures are embedded or dedicated teams.
Advantages of Embedded SREs
Embedding SREs into other teams offers several benefits.
Arguably the most important advantage is that embedded SREs ensure the highest level of collaboration between site reliability engineers and other stakeholders. When they work directly alongside other engineers on a continuous basis, it’s a pretty safe bet that their perspective will inform every decision that the other engineers make. Separate will still generally collaborate with other teams, but not necessarily teams with the same level of dedication as embedded SREs.
Embedded SREs are also beneficial for organizations where the site reliability engineering concept remains new, and where stakeholders still need to learn what SREs do and the value they can bring. It may be a bit challenging to get buy-in when you propose creating an entirely separate team; Indeed, developers or IT engineers may even perceive such a team as a threat to their own jobs (even though in reality, of course, they complement other teams instead of competing with them). But when you add SREs to larger teams, it’s likely to be easier to get everyone on board with the concept and for other types of engineers to appreciate how SREs make their jobs easier.
Finally, embedded SREs work well for organizations that are on the small side and don’t need an entire team. If you have just several dozen engineers on staff instead of several hundred, hiring a few SREs to embed into your existing teams probably makes more sense than investing in a brand-new team.
The Challenges of Embedded SRE
The embedded model has its drawbacks, too.
One major risk is that the SRE’s influence within a larger team may not be strong enough to have a major impact on how that team operates. A single SRE working alongside a dozen developers or IT engineers, for example, may struggle to integrate site reliability engineering techniques and tools into the existing team’s workflow—especially if it’s a team that already has strong opinions about how it should operate.
Along similar lines, a single SRE working within a team using the embedded model could end up being stretched too thin to do his or her job well. That’s especially true if other team members expect them to “own” functions that should really be a collective, team responsibility—like responding to incidents or using chaos engineering to test reliability. The latter processes may be led by SREs, but they shouldn’t be their responsibility alone, and the function can start to break down when other team members think of them as the person on whom they can simply “dump” all of their reliability- related tasks.
Finally, the embedded model can be difficult to scale, which is one reason why it doesn’t work as well at large organizations. If you have dozens of different engineering teams, embedding an SRE into each one requires a lot of management effort. It also deprives the various SREs within your organization of the ability to work closely with each other. In that type of scenario, it probably makes more sense to create a dedicated team and have those SREs interface with other teams as needed, rather than trying to distribute site reliability engineers across the business by embedding them directly into other teams.
In general, then, embedded SREs make most sense for companies that have relatively small engineering organizations, and in which the concept of the function has not yet received complete buy-in. For large businesses that have numerous engineering teams, it will likely be easier to manage SREs by creating a separate team just for them.
That said, your mileage may vary from the norm, and it may be worth experimenting a bit to determine which model—embedded or not—makes the most sense for your company. Keep in mind that you can also use a hybrid approach wherein you embed SREs into some teams, while still maintaining a dedicated team that leads site reliability initiatives for the organization as a whole.
After all, site reliability engineering is all about thinking creatively and coming up with innovative solutions to challenges—even the challenge of figuring out how to structure the SREs within your organization.