May 20, 2020

Book Review: Ahead in the Cloud - Stephen Orban

Book Review: Ahead in the Cloud - Stephen Orban

AWS has a division called Enterprise Strategy, which is staffed by former CIOs who have each had successful Digital Transformations, including migrations to the Cloud. They travel around the globe helping the largest companies in the world discover and unlock the power of AWS. Having lived the journey to cloud and worked with countless customers, executives are always keen to understand what journey lessons they can learn from those who have gone before. The author, Stephen, describes his role as:

IN MY ROLE AS HEAD of Enterprise Strategy for AWS, I have a lot to be grateful for. After having led a large-scale business transformation using the cloud as the CIO of Dow Jones, I now have a front-row seat to watch some of the largest companies in the world (News Corp, Capital One, GE) transform their business using the cloud. This seat affords me the opportunity to learn from some of the brightest and most innovative minds in the industry—both from our customers and from within AWS.

This book - Ahead in the Cloud: Best Practices for Navigating the Future of Enterprise IT - is filled with real world stories of the lessons each of the former CIOs learnt during their cloud journeys, and also includes guest posts from other CIOs. While this book is about their journeys which involved AWS, it does not include any specific AWS services or technologies, so it really could apply to any cloud provider - hence this book is for anyone embarking on a Digital Transformation, because the Cloud, like DevOps, is key to the journey.

Where the journey begins

Common with most journeys, it starts with Stephen at Dow Jones, where he finds that the 2008 financial crises had caused Dow Jones into cost-cutting measures, and much of IT and product development was outsourced to India. But, if you are trying to remain competitive in an increasingly digital world, you probably need to maintain your ability to quickly and effectively build, enhance, and optimize your digital products based on customer feedback. Dow Jones found that their outsourcing arrangements made this more difficult. His job was to shift the technology group’s focus toward product development in order for the company to stay relevant in an increasingly competitive environment, improve operational excellence, and drive down costs. The 4 key things they did was:

  • in-sourcing talent
  • leveraging open-source
  • and bringing in cloud services so we could focus on the business
  • But possibly the best decision they made was to create a CCoE, which they called DevOps, to codify how they built and executed their cloud strategy across the organization.

7 best practices

MEANINGFUL TECHNOLOGY CHANGE IN A very large organization is very rarely just about the technology. It’s about people, it’s about leadership, and it’s about creating a culture that encourages safe-to-fail experiments and smart risk taking instead of creating fear among those who are expected to move the organization forward. The book identifies 7 best practices that he has seen common amongst all the successful transformations and cloud adoptions journeys. They represent an incomplete, non-exhaustive, and opinionated list of what I think are some of the most important things you ought to be thinking about when leading your organization on any kind of change program. While technology is integral to each of these best practices, much of it comes down to leadership’s ability to lead, motivate, inspire, reorganize, and influence the people they work with. These best practices include:

  1. Provide Executive Support
  2. Educate Staff
  3. Create a culture of experimentation
  4. Pick the right partners
  5. Create a Cloud Center of Excellence / Platform Team
  6. Implement a Hybrid Architecture
  7. Implement a Cloud first strategy

Major Themes

  • I love his initial starting point, which only the cloud and DevOps can give you "I was in the right place at the right time—this time with just enough ambition and naiveté to think that I might be able to help the company address some of the issues I highlight above, and transform the role IT played in the company"
  • “cloud-first.” These organizations have reversed the burden of proof from “Why should we use cloud?” to “Why shouldn’t we use the cloud?” when implementing technology solutions for their business.
  • The importance of a CCoE or Platform Team - to create the best practices for the rest of the Feature teams
  • 2-pizza Feature teams that wholly accountable for the service it provides to its customer. This includes what technologies are used, the service’s roadmap, and the service’s operations. The Platform defines the reference architecture and pipelines as the guardrails, which allows the Feature times to release changes as they like
  • The CIO is supposed to the Chief Change Management Officer, that merges business and technology by considering each executive and how the cloud will affect him, and breaking rules to take advantage of the cloud
  • Experimentation - the cloud is the perfect platform for experimentation, simply because of the range of options and ease of provisioning, which allows you to quickly experiment
  • Educate Staff - this is one of the main themes in the book, revolving around "you already have the people you need", as existing staff, with the institutional knowledge, can be effective if trained correctly. He mentioned DevOps Days as key elements of sharing knowledge and onboarding new team members
  • The cloud allows you, and requires you, to make new rules, and break existing  rules that does not make sense anymore. I'd say this is similar to Goldratts Theory of Constraints (TOC), which says the old system was governed by rules around the old systems limitations, and to get full benefit from the new system, you need to have new rules, and not obey the old rules, as that was based on limitations of the old system.

Cloud Migration Strategies

Referred throughout the book as the 6 Rs, these represent the most common options you have when moving applications to the cloud:

  • Rehosting: lift-n-shift. Even though its commonly known as "the same mess for less" because an application does not benefit from the cloud just because its running in the cloud, the book makes a point that its easier to optimise/rearchitect once its in the cloud, so its a worthwhile option to consider
  • Replatforming: lift-tinker-shift. This lies in between Rehosting and Refactoring, as you make a few changes, like swopping a DB for RDS to take advantage of some cloud capabilities, but leave the app mostly untouched
  • Repurchasing: moving off an on-prem product to a SaaS product like Salesforce, etc
  • Refactoring - re-architecturing the app completely to take advantage of cloud features, scale and performance
  • Retire - decommission.
  • Retain - do nothing, for now.

DevOps = run-what-you-build

There are many definitions of DevOps, but key to it is the culture and mindset change, which is most aptly represented by the defintion of "run-what-you-build":

IT’S AN ALL-TOO-COMMON SCENARIO: YOU’RE spending time with your family and your phone suddenly steals away your attention. The dreaded air horn alerts you to a SEV1 failure. Your application—one that periodically suffers from a memory leak that operations “fixed” by restarting it—is now exhausting server resources within minutes of coming online. The application is effectively unusable. The operations team isn’t equipped to do much other than restart or rollback, but the last good copy is months old. Who knows what else changed since then? It’s up to you to fix the leak, but you’re miles away from the office and your computer. Incidents like this are far too common in a traditional enterprise IT model, where development and operations sit on opposite sides of a wall. But it doesn’t have to be this way. DevOps is not just for startups. It can be used in the enterprise too. Like automation and customer service, “run what you build” can be an effective tenet for improving enterprise IT delivery using a DevOps model.

This is often where I see traditional IT become anxious. In a traditional IT model, the operations of an application or service are sometimes managed by those who weren’t involved in creating the asset. There were a number of reasons for doing this (e.g., lower-cost resourcing, centralized expertise), but I would argue these reasons are going away. Cloud technologies now handle much of the heavy operations, and much of the operations can be automated with software. Developers are familiar with software, which means there’s less motivation to separate the operations responsibility of any given task. This is where the term DevOps comes from, after all. Since developers will be the ones most intimately familiar with the nuances of the system, they will likely be able to address issues the fastest. And by using automation, it is easy to methodically propagate changes and roll back or address issues before they impact customers. I encourage centralized DevOps teams to do what they can to make development teams increasingly independent, and not be in the critical path for ongoing operations/releases.

Cloud helps you tear down this wall between Dev and Infra teams, because your infrastructure starts to look a lot like software. The API-driven nature allows you to treat your infrastructure as code, which is something developers understand. Now that everyone is much closer to the infrastructure, operations naturally becomes more of a key requirement.

He then links it to a CCoE or Platform team, based on these 3 tenets:

  1. Be customer service-oriented, to prevent your customers from bypassing the platform and implementing Shadow IT. Shadow IT exists because internal stakeholders are simply not satisfied or don’t know how to get what they want from IT.  My experience has taught me that when someone finds an easier way to execute a task, they’ll likely take it. If they aren’t getting the service they need from IT, they’ll try to find that service elsewhere. The newsroom might download editing software because IT can’t or won’t deliver its own version fast enough. HR may look for a scheduling tool outside the internal calendaring environment. Marketing may go to a third party to have the brand’s website redone. The industry knows this as “Shadow IT,” which can make it much more difficult to effectively manage and secure a large IT environment. The reality, however, is that Shadow IT exists because internal stakeholders are simply not satisfied or don’t know how to get what they want from IT. A centralized DevOps organization that remains customer-service-centric has a much better chance of avoiding these scenarios. When you’re thinking about your customer from the very beginning, you will be able to empathize with their needs and concerns from the start, and determining how solutions to those needs fit into the overall company. Instead of saying, “You can’t use that to do your job,” ask “What are you trying to accomplish and how can I help you be more effective?” Every time an app team implements a workaround for something the DevOps team can’t deliver, there’s an opportunity for the organization to learn how and why that happened, and decide if they should do things differently moving forward. Is there something they can do to alleviate the need for a workaround in the future? The answer may very well be “no,” and in many cases, it will be okay to accept a workaround, but I encourage organizations to be deliberate about it. This is one of the ways IT can become an enabler rather than a point of friction. It is the kind of collaboration that incentivizes customers to work with you and not around you.
  2. Automate everything
  3. Run what you build

I've included below snippets of my favourite parts of the book:

Organizations move to the cloud for different reasons—some to save money, some to expand globally, some to improve their security posture, and some to improve agility. Through my experiences, I’ve found that companies begin to embrace cloud as a platform across the entire organization once they realize how it helps them devote more of their resources to what matters to the business. Those are the activities that matter most to your customers and your stakeholders. And unless you’re an infrastructure provider, these activities are not related to managing infrastructure. When I was with Dow Jones, I was very anxious about explaining our plan to adopt DevOps and run-what-you-build to our auditors. This anxiety made us prepared, though the stress was somewhat misplaced. Once we illustrated that our controls were greatly improved because of the new rules we were employing around automation, our auditors became more comfortable with our future direction. By showing them early that we no longer had ownership spread across siloed teams sitting next to one another but communicating through tickets, and that the opportunity for human mistakes was much less, we were able to gain Every organization has its own way of determining which projects get technology resources. Unfortunately, some organizations now treat the technology or IT department as a cost center and have pushed ideation too far away from those implementing it.
The public cloud is not just another datacenter, so it shouldn’t be treated like one. We help shift our client’s thinking from “How do I replicate what I do in the datacenter?” to “How do I configure the appropriate infrastructure to enable my developers?” In the end, empowering developers with the tools they need—with the right overarching security and governance controls—helps our clients achieve even their loftiest cloud goals.
It’s hard for a CCoE to succeed if it’s not driven by strong leadership. Whenever I talk with executives about creating their CCoE, I encourage them to make bold moves. That means identifying the people best suited for the team, transferring them from their current roles without backfilling them, and shifting responsibility toward the CCoE so the vacant roles don’t matter. Reporting lines, on the other hand, do matter. It’s fine to put the CCoE in an infrastructure-focused organization, but make sure the leaders of that organization aren’t afraid of what the cloud might mean for them. There’s a good chance that as you grow your cloud capabilities and tip the balance to cloud-based solutions, your CCoE will be the dominant part of your infrastructure team. This requires strong leadership, air cover, and a willingness to continue to move resources into the team as you learn.
Companies embrace cloud technologies for many different reasons. Cloud adopters have benefited from increased agility, lower costs, and a global reach. For many of the CIOs I speak with, it really boils down to their ability to funnel precious resources from the things that don’t bring in business to the things that do. In other words, the undifferentiated heavy lifting associated with managing infrastructure to the activities associated with building the products and services that their brand is known for. Faster product development is often a key motivator for implementing centralized DevOps in the first place

Regarding multi-cloud

What scares me is when companies fall into the trap of trying to architect a single application to work across multiple different cloud providers. I understand why engineers are attracted to this—it is quite an accomplishment to engineer the glue that is required to make different clouds work together. Unfortunately, this effort eats into the productivity gains that compelled the organization to the cloud in the first place. I always thought of this as taking me back to square one. Instead of managing your own infrastructure, you’re now managing the nuances between several others. Like Myth Two, this also limits the functionality to the lowest common denominator. I also understand that companies may go down this road to keep their vendors honest, and to avoid being locked into a single provider. On one hand, I would debate the risk of one of the big cloud providers going away, and it seems unlikely that the direction of the cloud computing industry is headed toward punitive business tactics. On the other hand, I feel there is a better way to mitigate this concern. Companies that architect their applications using known automation techniques will be able to reliably reproduce their environments. This best practice is what enables them to take advantage of the elastic properties of the cloud, and will decouple the application from the infrastructure. If done well, it becomes less of a burden to move to a different cloud provider if there is a compelling reason to do so. Technology choices are not always easy, and often imperfect. Creating a hybrid architecture doesn’t have to be. From a technology perspective, we viewed multi-cloud as a strategic hindrance. We felt that a multi-cloud approach would limit us to the lowest-common-denominator of features across clouds.

Regarding Hybrid Cloud

While most enterprises I speak with are in the process of migrating some or the entirety of their IT portfolio to the cloud, they’ve also realized the cloud is not an all-or-nothing value proposition. As each enterprise realizes this, they’re able to bridge their on-premises IT assets with the cloud and use that bridge to migrate the gravity of their IT portfolio to the cloud over time.
Procurement, legal, finance, business development, and product functions can all contribute to making cloud-first a reality. The more these departments know how to make cloud technology vendors work for them, and know why the organization is looking to leverage the cloud—to focus more on what matters to their business—the more active a role they can play in driving the organization to make cloud-first decisions. My team and I implemented a cloud-first policy whilst I was the CIO of Dow Jones, and one of the first things we did was to create an escalation path with our finance department to highlight any request for a hardware-related capital expense. Any department that felt it needed to procure hardware instead of leverage our cloud capabilities had to explain why they couldn’t accomplish what they were trying to do in the cloud before their purchase order would be approved. It didn’t take many escalations for everyone to understand how serious our intentions were. Over time, our legal, procurement, and product teams started asking similar questions.
Some organizations have business cases so compelling they don’t feel they need years of experience to be confident cloud-first is for them. A Fortune 100 enterprise I work with believes their developers will be at least 50 percent more productive when fully trained on and working in an AWS environment. This organization has more than 2,000 developers, and, as a result, will benefit from 1,000-plus man-days of additional development time per year as a result of their migration and cloud-first efforts. When it comes to making the business case for cloud, our first recommendation is to not limit the business case to Total Cost of Ownership (TCO). In fact, if at all possible, we recommend removing TCO from the discussion completely.
Don’t assume everyone is on the same page. Spend the time to make sure cloud is a companywide, company-owned initiative. If it’s viewed as a pure cost savings play or simply a technology initiative, you’re setting up for failure.
Significant top-down support with overwhelmingly strong vision and air cover—along with financial backing—are any technologists’ dream come true, but we know that doesn’t always happen. In absence of this, start somewhere, no matter how small.
But, unfortunately, like most technology leaders, there were many times, perhaps too many times, when I contemplated the possibility of a service failure. And when this happened, I found myself becoming slightly more risk-focused, even a bit risk-averse. It’s only natural. But when you let this fear grab hold of you, you tend to reduce your appetite for change. Of course, if we’re totally rational, the flaws in this thinking quickly surface, because the only constant in life is change—and wow, is it getting faster each day! I believe that the Digital Disruption and the Fourth Industrial Revolution86 we are now experiencing needs to be embraced, and embraced fully, or it will just consume your business. That’s why the choice for all of us, for every technology leader, is fairly simple—do nothing and fade away, or embrace the cloud change to survive and thrive. Ask yourself which choice holds the greater risk.
My last three years there were the most exciting, however. As UK CTO, I had the privilege of leading the people, process, and technology change as Capital One adopted AWS as its predominant platform. It was a career- and life-changing experience for me, although I’ll admit that at the start, when it was all new, it was a little scary, especially when we didn’t have much of a roadmap to guide us (AWS has now built one by partnering with companies like Capital One).87 We got through it, and, in the end, like so many of my peers, I learned that it’s only by accomplishing the difficult things that we find reward.
Now, back in 2014, I thought we had a good setup after we’d just finished building a converged on-premises infrastructure. Things were a bit faster, and there was some benefit. I thought we had bought some breathing space to grow. But, in fact, we hadn’t. The age-old truisms of running your own data center were still as rampant as ever. We were still dealing with—siloed technology, reliability challenges, forlorn automation dreams, hard-to-scale systems and little appetite to experiment when hardware was so capex-intensive and time-bound to procure.

Building closer relationships with the business

The following quotes are all regarding building closer relationships with the business, by building IT to be self-service, and embedding Feature Teams into the business:

I HAVE LONG SINCE FELT that the role of the CIO and central IT is moving away from command and control, and toward line-of-business enablement. I’m also seeing some organizations, like Amazon, which have taken this one step further in a move toward complete decentralization, where culture and best practice serve as the forcing function that allows teams to operate independently. This trend—trading off consistency for time-to-market—is an important one. I have been engaged in a fascinating discussion with Natty Gur, the CIO of The Friedkin Group,119 about their cultural transformation. Natty’s background in fighting terrorism has given him some interesting, and from my perspective unique, perspectives on this trend, which he graciously agreed to offer below.
I’m a great believer in the importance of devolution. As a CTO for a major utility, I’ve adopted a technology strategy centered around the devolution of IT out to the business—enabled by the wholesale adoption of cloud services…. utilizing standardization, infrastructure patterns, automation, and orchestration. Whilst this may be at odds of with the traditional command-and-control approach of some CTOs, I believe it’s essential to ensure we offer our business the shortest possible path to achieving value through use of technology, and to ensure that the IT function remains relevant in our future organization.
One of the main reasons for public cloud transformation is to be more agile, and as a business to provide faster and more reliable services for your customer. But is that enough? Is it just technological transformation that is needed, or do you need to transform your culture as a business as well? In a previous life, I found myself providing IT services to one of the leading counter-terrorism organizations in the world. Just as I began my work with this organization, they began experiencing a wave of suicide bombers which they couldn’t stop, or even minimize. It took this organization considerable time before it realized the reason for their failure was a change within their enemy; their enemy’s structure had changed from a single, centralized group to thousands of unconnected terror cells, each with the same purpose: Destroy the organization’s country. Once this was understood, they made a unique decision: adopt the same structure and operation of their rivals; break the classical organizational silos down into small hybrid groups, each with the needed expertise from the old structure, in order to reach a clear purpose. These groups were also created to run with complete autonomy and full authorization to do what needs to be done in order to reach the whole group’s purpose. This change proved successful and the organization won the war, and while I was there, I learned a very important lesson.
Silos are IT’s worst enemy. I have seen it while working for IT organizations as well. While terror cells and startups are obviously very different, they do share some fundamental organizational characteristics. For instance, as small groups with limited resources, they tend to push their members to take responsibility for many aspects within the group, and are empowering group members to manage themselves. Another similarity between those two groups is the environment of trust between members. Members can make mistakes or bring crazy ideas with the knowledge that no one is going to penalize them. While you may feel that this thinking is too extreme, but empowerment and trust are the two main principles behind the Toyota Production System, which has had such profound influence worldwide.
New ways of working with our business—the ability for the business to deliver IT solutions without the need for engagement or ownership by IT has totally changed the balance of the relationship. Business-led programs are ideal for ensuring continued business sponsorship, engagement and accountability.