DevOps as originally conceived was more of a philosophy than a set of practices—and it certainly wasn’t intended to be a job title or a role spec. Yet today, DevOps engineers, site reliability engineers, cloud engineers, and platform engineers are all in high demand—with overlapping skillsets and with recruiters peppering role descriptions with liberal sprinklings of loosely related keywords such as “CI/CD pipeline,” “deployment engineering,” “cloud provisioning,” and “Kubernetes.”
When I co-founded Kubiya.ai my investors pushed me to better define my target market. Was it just DevOps or also SREs, cloud and platform engineers and other end users?
In this article, I offer my thoughts, but recognize there’s a great deal of room for interpretation. This is an inflammatory topic for many—so at the risk of provoking a conflagration, let’s proceed!
The Proliferation in DevOps Job Specs
The practice of DevOps evolved in the 2000s to address the need to increase release velocity and reduce product time to market while maintaining system stability. Service-oriented architectures were allowing separate developer teams to work independently on individual services and applications, enabling faster prototyping and iteration than ever before.
The traditional tension between a development team focused on software release and a separate, distinct operations team focused on system stability and security grew. This hindered the pace that many businesses aspired to. Devs didn’t always properly understand operational requirements, while ops weren’t able to head off performance problems before they had arisen.
The DevOps answer was to break down silos and encourage greater collaboration facilitated by tooling, cultural change, and shared metrics. Developers would own what they built—they would be able to deploy, monitor, and resolve issues end to end. Operations would better understand developer needs; get involved earlier in the product lifecycle; and provide the education, tools, and guardrails to facilitate dev self-service.
DevOps as originally conceived was more of a philosophy than a prescriptive set of practices—so much so that there isn’t even common agreement on the number and nature of these practices. Some cite the “four pillars of DevOps,” some the “five pillars,” some the six, seven, eight, or nine. You can take your pick.
Different organizations have implemented DevOps differently (and many not at all). And here, we can anticipate the job spec pickle we’ve found ourselves in. As Patrick Debois, founder of DevOpsDays, noted, “It was good and bad not to have a definition. People… are really struggling with what DevOps is right now. Not writing everything down meant that it evolved in so many directions.”
The one thing that DevOps was not was a role specification. Fast forward to today, and numerous organizations are actively recruiting for “DevOps Engineers.” Worse still, there is very little clarity on what one is—with widely differing skillsets sought from one role to the next. Related and overlapping roles such as “site reliability engineer,” “platform engineer,” and “cloud engineer” are muddying already dim waters.
How did we get here, and what—if any—are the real differences between these roles?
DevOps and DevOps Anti-Types
In my experience, realizing DevOps as it was originally conceived—i.e., optimally balancing specialization with collaboration and sharing—has been challenging for many organizations.
Puppet’s 2021 State of DevOps report found that only 18% of respondents identify themselves as “highly evolved” practitioners of DevOps. And as the team at DevOps Topologies describe, some of these benefit from special circumstances. For example, organizations such as Netflix and Facebook arguably have a single web-based product, which reduces the variation between product streams that can force dev and ops further apart.
Others have imposed strict collaboration conditions and criteria—such as the SRE teams of Google (more on that later!), who also wield the power to reject software that endangers system performance.
Many of those at a lower level of DevOps evolution struggle to fully realize the promise of DevOps, owing to organizational resistance to change, skills shortages, lack of automation, or legacy architectures. A wide range of different DevOps implementation approaches will have been adopted across this group, including some of the DevOps “anti-types” described by DevOps Topologies.
For many, dev and ops will still be siloed. For others, DevOps will be a tooling team sitting within development and working on deployment pipelines, configuration management, and such, but still in isolation from ops. And for others, DevOps will be a simple rebranding of SysAdmin, with DevOps engineers hired into ops teams with expanded skillset expectations, but with no real cultural change taking place.
The rapid adoption of public cloud usage has also fueled belief in the promise of a self-service DevOps approach. But being able to provision and configure infrastructure on-demand is a far cry from enabling devs to deploy and run apps and services end to end. Not all organizations understand this, and so automation for many has stalled at the level of infrastructure automation and configuration management.
With so many different incarnations of DevOps, it’s no wonder there’s no clear definition of a DevOps role spec. For one organization, it might be synonymous only with the narrowest of deployment engineering—perhaps just creating CI/CD pipelines—while at the other end of the spectrum, it might essentially be a rebranding of ops, with additional skills in writing infrastructure as code, deployment automation, and internal tooling. For others, it can be any shade of gray in between—and so here we are with a bewildering range of DevOps job listings.
SRE, Cloud Engineer and Platform Engineer – Teasing Apart the Roles
So depending on the hiring organization, for better or worse, a DevOps Engineer can be anything from entirely deployment focused to a more modern variation of a SysAdmin.
What about the other related roles: SREs, cloud engineers, and platform engineers? Here’s my take on each:
Site Reliability Engineer
The concept of SRE was developed at Google by Ben Traynor, who described it as “what you get when you treat operations as a software problem and you staff it with software engineers.” The idea was to have people who combine operations skills and software development skills to design and run production systems.
The definition of service reliability SLAs is central and ensures that dev teams provide evidence up front that software meets strict operational criteria before being accepted for deployment. SREs strive to make infrastructure systems more scalable and maintainable including—to that end—designing and running standardized CI/CD pipelines and cloud infrastructure platforms for developer use.
As you can see, there’s a strong overlap with how some would define a DevOps engineer. Perhaps one way of thinking about the difference is that whereas DevOps originated with the aim of increasing release velocity, SREs evolved from the objective of building more reliable systems in the context of growing system scale and product complexity. To some extent, the two have met in the middle.
As the functionality of cloud has grown, some organizations have created dedicated roles for cloud engineers. Again, although there are no hard and fast rules, cloud engineers are typically focused on deploying and managing cloud infrastructure, and know how to build environments for cloud-native apps. They’ll be experts in AWS/Azure/Google Cloud Platform. Depending on the degree of overlap with DevOps engineer responsibilities, they may also be fluent in Terraform, Kubernetes, etc.
With the forward march of cloud adoption, cloud engineer roles are subsuming what formerly might have been called an infrastructure engineer, with its original emphasis on both cloud and on-premises infrastructure management.
Internal developer platforms (IDPs) have emerged as a more recent solution to cutting the Gordian knot of how to balance developer productivity with system control and stability. Platform engineers design and maintain IDPs that aim to provide developers with self-service capabilities to independently manage the operational aspects of the entire application lifecycle—from CI/CD workflows; to infrastructure provisioning and container orchestration; to monitoring, alerting, and observability.
Many devs simply don’t want to do ops—at least not in the traditional sense. The developer as a creative artist doesn’t want to worry about how infrastructure works; and so, crucially, the platform is conceived of as a product, achieving control by creating a compelling self-serve developer experience rather than by imposing mandated standards and processes.
Getting Comfortable with Dev and Ops Ambiguity
So where does this leave candidates for all these various roles? Probably for now—and at least until there is greater commonality of DevOps implementation approaches—the only realistic answer is to make sure you ask everything you need to during an interview clarifying both the role expectations and the organizational context into which you will be hired.
For recruiters, you may decide for various reasons to cast a wide net, stuffing job postings with trending keywords. But ultimately the details about a candidate’s experience and capabilities must come out in the interview process and conversations with references.
From my perspective here at Kubiya.ai, whether you are a DevOps, Platform Engineer, Cloud Engineer or even an SRE, making sure you are supporting developers with all their operational needs will go a long way in helping them focus on creating the next best thing.
If we were that metaphorical fly on the wall, following an all too common Slack conversation between a software engineer and DevOps engineers, it might go something like this:
Software Engineer: This is gonna take forever. “I need a new environment for my app.”
Two hours later…..
DevOps: Why do software engineers think I have telepathy!? “Okay, what instance types do you need?”
An hour later (after consulting with the team…)
Software Engineer: “I need a g3.8xlarge to test out our latest visualization feature.”
DevOps: “Cool, and what AZ do you need it in? Also, which security group should it be associated with?”
Software Engineer: Don’t they know all this operational stuff, like automatically! “Any AZ in us-west-1, and it’s sg-3164z279.”
48 hours later after a few volleys back and forth over some additional parameters, permissions and other lovely details, DevOps gets the greenlight to spin up the environment, steps out to buy some licorice as a reward for all their suffering, and forgets to notify the software engineer that the request has been approved until 5 minutes before taking off for the day.
Sound familiar? If so, there might be a highly unproductive cold war going on deep in the heart of your software engineering and DevOps departments.
The Heart of the War
What’s at the heart of this war? To understand that, let’s unpack two major issues that emerge from this not-so-smooth but all-too-familiar scenario. First, without a common language and clear communication channels, no two parties can work together even on simple tasks, let alone complex ones. Second, even with a common language, all the excess work, context switching, delays, and the inevitable friction, lead to cold-war-level frustration brewing within your organization.
Adding to these issues are the blurred lines of responsibility that the DevOps model has created for both software engineering and DevOps (aka operations) teams. But the reality is that:
- Software engineers want to code, implement features and run them on infrastructure (so the customers can use them), without a lot of hassle and without getting bogged down in the operational details.
- DevOps want to focus on streamlining and keeping production stable, optimizing infrastructure, improving monitoring and general innovation, without getting sucked into the rabbit hole of end-user (e.g., software engineers’) service and access requests.
When both sides spend multiple cycles on operational bottlenecks—in between throwing invisible daggers of hate at one another—the organization loses software development productivity as well as potential innovation from the DevOps side because nobody is getting what they want.
This massive productivity loss can’t be measured by DORA metrics alone. It goes deep, right to the heart of your organization’s culture. But now, at last, there’s an end in sight to the decades-long cold war between software engineers and DevOps.
Evolution of a DevOps Peace Prize Winner
Nobody thinks the situation we’ve seen here—the radical disconnect between software engineers and DevOps—is okay. No business can work efficiently with this level of wasted time and effort. That’s why, a few years ago, insiders started proclaiming the dawn of self-service DevOps.
When you think of self-service DevOps, it probably calls to mind little robots provisioning all the infrastructure your devs could possibly need. If only that were true.
At the moment, self-service DevOps is still in its adolescent stage with cumbersome internal developer portals, service catalogs, workflow automation tools, and other shiny toys.
But the space is rapidly maturing, thanks to an evolutionary process that’s already well underway.
Yesterday: It Started With Chatbots
Simulated conversation goes all the way back to the dawn of computing. Modern chatbots are definitely smarter but have still failed to live up to their initial promise, which was that chatbots would come to understand any request, easily integrate with DevOps tools, and automate workflows.
In reality, chatbots are somewhat useful but face a number of fundamental problems. Essentially, they rely on simple, predetermined flows and rule-based, canned linear interactions. But we all know that in the real world, questions and requests can be varied and unique… essentially, not something a chatbot is equipped to handle.
Plus, let’s face it—a chatbot that doesn’t do the job is worse than nothing at all: Software engineers try to guess the magic commands that will make the chatbot do their bidding, and if and when they fail (probably), they have to go running to the DevOps team anyway—except now it’s 24, 48, or 72 hours later, and they’re as frustrated as [fill in the blank].
To create an interface that will truly save time on the Engineering and Operations sides, you need more intelligence than a simple chatbot can provide.
Today and Tomorrow: It Continues With Conversational AI
Chatbots are limited because they don’t understand the languages that our (human) developers use. But what if you could use AI to power more sophisticated understanding? To achieve that level of understanding, you need two distinct strategies in place:
- Natural language processing (NLP) uses AI to systematically parse your words and, through extensive training as required by most AI solutions, tries to determine the meaning.
- Then, natural language understanding (NLU) goes one step further, learning to recognize variations in language that reflect the imprecise way that people communicate in the real world, including taking into consideration factors like sentiment, semantics, context, and intent.
Essentially, NLP focuses on building algorithms to recognize and understand natural language, while NLU focuses on the meaning of a sentence. Putting these together, you finally arrive at true conversational AI.
NLP and NLU are just one of a number of essential building blocks that go into conversational AI to provide genuine understanding and intelligence that will ultimately replace chatbots. Let’s look at some of those building blocks:
- Natural language processing engine (using NLP/NLU) to evaluate user input and understand what is being requested
- Integration with an identity provider (IDP) like Okta and other sources, such as a knowledge base or cloud providers, to keep tight control of permissions and security
- Iterative machine learning to identify new data sets and test user behavior predictions to drive continuous improvement of responses
- Dialog management system to retain context of the conversation and allow the conversational AI to respond accordingly
- Context management, or “keeping state,” to track exactly where the conversation left off and the last step that was reached
- Interface to interact with the user, usually via text or speech, ideally through their favorite workflow tool
As an example of context management, in the fictionalized developer/DevOps conversation above, even after an interruption of 24 hours or longer, AI needs to remember what stage was reached so it can carry on once it receives authorization
Furthermore, the last point—user interface—is critically important. To achieve true conversational intelligence that actually streamlines DevOps, you need an interface that meshes with the way your teams are already working. That way, you won’t add additional stress from context switching, which in itself is a big drain on productivity and focus.
And the Peace Prize Winner Is… Your DevOps Virtual Assistant
Remember the disconnect I mentioned earlier between Dev and DevOps? Well, a virtual assistant can bridge the gap, giving both sides exactly what they want—and need. Developers want to code, getting the infrastructure they need without a hassle. DevOps engineers want security and efficiency to avoid over-permissioning and excess cloud costs; they also don’t want to waste time on repetitive, tedious, tasks with lots of context switching.
With a virtual assistant in place, here’s how the interaction might go:
- The software engineer uses their preferred work environment: Slack, Microsoft Teams, etc.
- They provide all the details in plain English.
- The virtual assistant then executes their request. For example:
- Provisions new cloud resources
- Triggers a complex workflow
- Provides some hard-to-find data
- Finally, the virtual assistant provides confirmation within the software engineer’s chosen work environment so they can get to work right away.
With conversational AI, both sides stay focused and productive. Software engineers can focus on development, and DevOps won’t have to waste time on context switching or endless, repetitive requests. So they can all leave the building arm in arm at the end of the day, ready to work out those old resentments at the bowling alley.
So let’s all hand conversational AI a Nobel Peace Prize; after decades of conflict, peace has broken out at last. And the big winner? Your organization, living in DevOps harmony happily ever after.
Changing your tech stack can be painful.
So when, in my previous position at AWS as Global DevOps Partnerships Leader, I heard that one of my customers, a NASDAQ Top-10 tech company, was making the complicated switch from a top CI/CD platform to GitHub Actions, I was immediately curious.
Why would they do it? Features? Cost? A secret handshake between executives? Given the complexities, it had to be something major, but nothing added up.
At least until I bumped into a few of the company’s developers at a conference. Over cocktails (which helped with the information transfer), everything suddenly became clear.
What GitHub offered these developers was something the other leading, world-class CI/CD vendors couldn’t: OWNERSHIP. And in Web 3.0, ownership is everything.
Why Web 3.0 Matters to Dev and DevOps
Talking to those developers gave me a moment of clarity. The world of enterprise software had suddenly converged with the misunderstood (and slightly overhyped) world of Web 3.0.
Because that’s what forecasters have promised us will happen under Web 3.0: Users (in this case, the developers) will seek out new incentive systems, changing the way we do business forever.
If you’re new to the terminology, here’s a basic timeline.
- 1990: Web 1.0—The early internet was characterized by few content creators (mostly big companies), static text, and minimal interactivity. Online resources were largely read-only and decentralized.
- 2000: Web 2.0—The mid-period internet offered a wider array of user-created content (communities), extensive interactivity, and open interfaces (APIs). Online activity became more participatory but also more centralized around a few big sites that owned content and offered few monetization opportunities.
- 2020: Web 3.0—The next stage of online development involves content ownership (by individuals). Online activity will become more decentralized, without intermediaries like banks or tech companies, turning individuals into free agents who can better capitalize on their content contributions.
The key difference between Web 2.0 and Web 3.0 is decentralization and ownership.
In Web 2.0, we share content through social sites and search engines owned and controlled by two or three large companies. These companies’ business model centers around data—our data, which they own and manage, use for targeted ads, and more. We are the product.
In Web 3.0, we are the owners of our own content, creations, and personal brand. This business model centers around the individual.
Emerging decentralized technologies that could help us fully reach Web 3.0 include cryptocurrency, NFTs, smart contracts, and edge computing. Many of these use blockchain, a form of distributed ledger technology that authenticates transactions, eliminating the need for a centralized authority (like banks, governments, or other institutions).
Being a Great Developer Is Not Good Enough
Under Web 2.0, developers are rewarded by their employers in one basic way: with a paycheck. There may be other incentives and a pat on the back from time to time, but what happens in Dev and DevOps typically stays in Dev and DevOps.
Now, that equation is changing as Devs flock to contribute to cloud ops communities. They’re still getting a paycheck, but they’ll also be rewarded within those communities. As their reputation grows, they can leverage that to build their personal brand and become DevOps superstars. They might also be rewarded with NFTs or another type of decentralized token to recognize their contribution. Still, the exact mechanisms remain to be seen as these communities grow and standardize.
Big projects like OpenTelemetry (OTel) will expand and thrive under this model, offering developers not only recognition but also compensation as their efforts are adopted.
We’re currently somewhere between Web 2.0 and Web 3.0: Web 3.0 is definitely on its way, but it’s not quite here yet. But even before the full vision of Web 3.0 has been completely realized, it’s easy to see that empowering individual Devs and DevOps team members are quickly becoming the only way to stay in business and attract quality talent.
In today’s development world, it’s not enough to just be good. Developers want more than a steady paycheck: They want to take ownership of their creations and promote themselves while expanding their influence.
Becoming a Dev Superstar
The way software is created today has changed. Today’s developers don’t work in a vacuum; they’re working hand in hand with third-party contributors from all over the globe.
According to dev.to a community of over 900,000 developers, “The software industry relies on collaboration and networked learning.” Let’s say your app needs a notifications infrastructure. Instead of coding one from scratch, developers draw on others’ contributions from all over the world.
This means that today, developers aren’t just working for their employers—they’re also building their careers and reputations by giving back to the community. And that, in turn, brings us full circle to my revelation, speaking to those developers over cocktails about how they’d forced their employer to switch to GitHub. For me, there were not just one but three key takeaways:
- Devs aren’t in it just for money; they want recognition.
- Devs aren’t just working on projects; they’re building careers.
- Devs don’t just care about their employer; they also care about their personal brand.
Developers want to be recognized and insist on using platforms that provide that recognition.
Additionally, using open forums like GitHub means they can take their work wherever they go (at least any work not covered under an NDA), as it won’t get stuck behind in a closed system.
Just like YouTubers or TikTok influencers, open forums let developers build their personal brands by creating more and more content, hoping they can monetize their contributions.
Just look at Jan De Dobbeleer. He actually began his career as a watchmaker but has parlayed his love for code tinkering into Microsoft MVP and GitHub star status (with 13,000 stars as of this writing). Jan’s theme tool, oh-my-posh, is a little utility with a lot of fans: 7,700 stars on GitHub.
He’s just one of a number of developers who regularly feature in posts listing hot developers and repos to follow (like The Algorithm and freeCodeCamp, both aimed at coding newbies). And, of course, there are lots and lots of posts out there telling Devs how to become a GitHub star themselves.
Developers and DevOps also look for brand recognition companies that will offer them a platform to get their name out there and eventually become that influencer. Think Kelsey Hightower from Google.
Which Brings Us Back to the Cocktail Party Where It All Began…
The DevOps team at that NASDAQ top-10 tech company had no problems with the CI/CD platform they were using. And GitHub didn’t offer better features or functionality. But what GitHub did better was empower developers with a platform to show off and grow their influence and personal brand, giving them recognition for their work and talent.
Today, few developers enter the industry simply hoping only to find a job coding; they want to create tribal knowledge that scales and they’ll seek out employers and tools to support them through developer empowerment platforms like GitHub, GitLab, PyPI, BitBucket, SourceForge, and others.
The best DevOps tools and platforms—the ones Devs will choose if they can!—allow them to build reputational (and hopefully soon monetary) equity that lasts throughout their career.
And so these developers voted with their feet. And their employer, recognizing the need for happy employees, dumped a leading CI/CD solution and moved to GitHub.
Some employers may resist this change, hesitant to ungate their code, kicking and screaming at the thought of developers serving anyone other than the company. But in the long run, giving developers what they want costs employers almost nothing and boosts the company’s reputation for hiring and nurturing experts and thought leaders. Superstar Devs work for superstar companies, and vice-versa.
Developers are an asset no company can squander. By recognizing the impact developers and DevOps have on value creation and buying decisions in the software industry, we can all emerge prepared to embrace and thrive with one of the biggest wins of Web 3.0.
For years, the industry has touted self-service as the salvation of DevOps. Why? Well, every department in your organization has its own reasons for putting self-service DevOps at the top of its wish list:
- Ops teams need to ensure high availability, performance, and security
- Dev teams need speed, self-sufficiency, and transparency
- Security teams need guardrails, compliance, and accountability
- And, of course, the C-suite needs ROI, strategy, and insight on top of everything else
These goals often cause tension—especially between Dev and Ops. Verifying security takes time that developers usually don’t have. And once at scale, even the fastest manual processes break down. Like at the U.S. Department of Defense, with a workforce of close to 200,000 internal and external developers.
Self-service DevOps platforms promise virtual self-service kiosk for software delivery, uniting teams and creating a frictionless path to independent developer access and provisioning. According to the State of DevOps report from Puppet, self-service DevOps is a hallmark of “highly evolved” DevOps organizations.
Many self-service solutions have become available, but has anyone figured it out yet? Afterall, if everyone is happy with their self-service approach, why are they still searching for answers?
Approaches to Self-Service DevOps
True self-service DevOps promises a wide range of benefits: It lets Ops stop rushing to put out fires and shift to big-picture strategy; it makes compliance simpler with automation, safe guardrails, and clear accountability. And, there is the promise of tremendous capital efficiency.
It’s no surprise that so many vendors are rushing into the space, offering so many flavors of approaches that Baskin-Robbins would be jealous:
- Internal developer portals
- Shared service platforms
- Workflow automation
- Service catalogs
- API marketplaces
- Cloud competency centers
But let’s face it: in reality, most of these approaches fail. And when self-service isn’t working as it should, and companies fall behind on their service request queue, many DevOps teams default to risky practices that aim to expedite operations but actually expose the company to attack:
- Over-privileged and over-permissioned accounts, that give broad access to anyone who gets their foot in the door, as in the SolarWinds breaches of 2020
- Inactive accounts, leaving a door open to cybercriminals, as with the Colonial Pipeline breach in 2021
What you might not realize is that these two risks are so prevalent that IBM Security X-Force penetration testers managed to use them to hack into 99% of client cloud environments. So when we say self-service DevOps is failing, it’s failing big-time.
So what’s the solution? Why is true self-service DevOps so hard to achieve?
Why They Fail
There are actually a number of reasons why today’s self-service DevOps platforms fail:
- They break end-user workflows, forcing users to change their behavior and learn new processes.
- They force devs to switch context, which kills productivity.
- They need domain expertise to operate, reducing the target audience that can use them.
- They’re heavily reliant on “human in the loop” involvement, often creating more work.
- They demand significant effort to update and maintain workflows and maintain SLAs.
Most importantly, with current self-service DevOps solutions, users are often flying blind. They “don’t know what they don’t know,” for instance, how to locate, search, identify, create, and use resources and workflows.
Imagine choosing a self-serve checkout to avoid the long line at the cashier. But then…
a) You get stuck behind someone frantically trying to scan a wrinkled, expired coupon.
b) You get stuck behind someone who doesn’t understand barcodes and scans all their items twice.
c) You forget that alcohol requires age verification, causing an alarm bell and flashing lights to start going off above the register – (spoken from experience, embarrassing).
d) All of the above.
In all these scenarios, a human attendant must come running to fix things, defeating the whole point of self-serve. Self-service is only self-service if it’s easy to use even when things go wrong.
With most so-called self-service DevOps solutions, if something goes wrong, you’re back to waiting—drawing the operator back into the loop—just like with the old system.
What True Self-Service Devops Looks Like
To achieve true self-service DevOps, you need a few elements in place.
Most developers are (gasp!) human beings. So the best interface is no interface at all. Why not let developers express their needs freely in natural language using existing tools such as Slack, Microsoft Teams, or a command line interface?
Developers are frustrated when processes don’t make sense or take more steps than necessary. Let’s adopt workflows that actually handle all the nitty-gritty—so that all developers need to do is ask (in their own words) to be granted access to resources, information, and workflows.
For developers, there’s nothing more frustrating than not knowing where to turn for information you need. True self-service DevOps, powered by AI, will use your existing data to provide a single source of truth with all the answers and insight devs need.
End-User Feedback Loop
An ideal self-service DevOps platform will ask the developer for feedback that is then used, in turn, to improve the platform itself. First, because everybody enjoys having their voices heard, and second, because this reinforces the role of the platform, which is to serve DevOps by improving the end-user experience. Similar feedback loops have been used in apps like Waze, which uses social navigation to continuously improve the quality of its end-user experience. Self-service DevOps is next in line.
Once you’ve accomplished all of this, you can work more efficiently, providing immediate on-demand assistance, so your Dev and Ops teams can work without interruption.
How will you know when you’ve achieved true DevOps self-service? When you forget you ever did things any other way—and wouldn’t go back for a million dollars. When you wake up one day and realize that your platform not only lets you build trust and accountability but also gets all your teams working together…and—(gasp) dare I say it?—having fun.
That will be the last DevOps platform you ever buy.
DevOps is great. But if you’re like a lot of people, you may have noticed your DevOps productivity decay over time.
You’ve witnessed how your DevOps culture created pathways to automation, helped speed release by breaking down silos, empowered developers to take responsibility for QA and established a mindset of continuous improvement—not to mention left-shifting security.
But you may also be pondering why aren’t your developers more productive?
Here is one thought: Dev tools competing for developers’ attention has become a zero-sum game. The second you introduce a tool to seemingly make your team more productive, it also comes with a learning curve, not to mention that it diverts attention from those tools.
If there’s one thing we’ve learned looking back on the last 15 years, it’s that DevOps is a journey, not a destination.
Going forward, creating a sustainable and efficient DevOps climate will take more than shiny new tools. DevOps needs to take a long, hard (metaphorical) look in the mirror to rethink the way we view our tools, prioritizing simplicity above all.
The Chaos Of Context Switching
Peeking inside the mind of a typical DevOps engineer, you may notice chaos, mostly stemming from attempting to context-switch effectively.
Context switching is a concept from computer science—because computers are very good at it. It’s Computer Science 101: When a CPU needs to switch tasks, it saves the state, whatever it’s working on. Then, it works on a different task. Afterward, it retrieves the saved state and resumes the previous task. There’s no time lost and no inefficiency—the CPU simply picks up where it left off.
But while computers are great at context switching, study after study has proven that humans are not. Or, as the title of one Harvard Business Review article put it, “You can’t multitask, so stop trying.”
In an interview with Fast Company magazine, researcher Gloria Mark, who studies task switching, said that once a person’s work is interrupted, “it takes an average of 23 minutes and 15 seconds to get back to the task.” That’s a problem that both kills productivity and also cranks up stress across your organization.
Leading Distractors For Your Team
So what’s distracting your teams? Ironically, the very tools that are supposed to make them productive.
Harvard Business Review also points out that our brains are hard-wired to crave access to information, as much information as possible. Today’s DevOps tools give us plenty of that—possibly even too much.
Take an example: Slack is great for helping teams work more productively. Slack also hurls notifications at users that demand instant engagement—regardless of what they’re doing. As a result, the productivity benefits of some tools cancel each other out by the distraction they create, paralleling the alert fatigue seen on security teams.
Think of any IT, security, compliance or production incident. Typically, this pulls all of your cloud ops team members into a “war room” operating mode where they abandon all lower-priority ops work. The ripple effect spreads outward. Developers needing compute resources, salespeople wanting SFDC access, new users waiting to be onboarded and external suppliers requiring system access are all put on hold, bringing the organization to its knees.
Also, every minute your cloud operator is focused on repetitive work like access requests, user onboarding or resource provisioning is time they could be spending on innovation.
The Band-Aid Solutions
Ironically, many organizations try to solve these problems by adding dev tools to keep developers on task. What these have in common is they force developers to change their workflow, which creates pushback.
For example, service catalog solutions let development teams put toolchains in place with automated provisioning and the promise of self-serve resource access. Internal developer platforms (IDPs) offer another approach, focused more on the operations side of DevOps and providing templates for configurations and permissions that promise to get teams up to speed quickly.
Both approaches are promising since they open the door to automating self-service actions; however, both demand that your teams change the way they work. Not surprisingly, “Let’s add another tool to reduce our tools” is a tough sell.
The Journey Toward More Sustainable DevOps
So how can we make both DevOps and developers more productive without simply throwing more tools at them?
The ultimate goal should be empowering a culture of self-service that minimizes reliance on DevOps and SREs. Eventually, much will be codified, solving 80% of the backlog while leaving only 20% for operator intervention. However, the ultimate goal is to introduce a tool that doesn’t look or act like a tool, functioning in the background and taking a load off your teams rather than handing them more work.
Will we ever arrive at zero? Probably not. A human in the loop (HITL) will still be essential to oversee the system, but 80% is effective. And edging your organization closer to that ideal is key.
To start moving in the right direction, it’s essential to implement a few concepts, such as easy searchability and usability of resources and workflows that any organizational user can leverage, as well as uniform access controls, ideally triggered through natural language so there’s less learning curve. It’s also important to establish clearly defined and easily accessible workflows.
These measures will help your DevOps team deal with distractions and avoid the chaos of context switching. And as you choose tools going forward, ask yourself: Are your DevOps processes working for you, or are you working for them? If tools are getting in the way, what you need isn’t more tools.
Wherever possible, elect tools that act as a force multiplier, not a distraction. Choose tools that fit naturally into your developers’ workflows. (Hint: That’s also a great way to get them excited about new tools.)
Ultimately, we need to work toward a culture of self-service that prioritizes protecting your most sacred resources: DevOps and developers. Any tools that don’t prioritize this are not serving your aims. Eliminating end-user frustration will put you well on the way toward an efficient, sustainable DevOps culture that breaks away of the zero-sum game.