It is important to evaluate cloud services before purchasing

Intended Audience: CxOs, Other IT decision makers

Coffee with Steve

It was a Friday afternoon in late 2013, when we reached the CEO’s office of ABC corporation (name changed), to share what we have been championing — Cloud Evaluation as a Service, and to learn from their experience as they had recently started using the cloud to increase the capacity of their data center.

Steve (name changed) is the Group CEO of ABC corporation having many verticals. They recently signed a big contract for delivering a year-long corporate training program which required a significantly expanded data center to deliver the program efficiently. Instead of adding new physical capacity to their existing in-premise data center, Steve decided to use public cloud services from a major vendor. 

Alex, an IT manager in Steve’s team, had prepared a detailed proposal using which Steve could make his decision about going to a public cloud vendor. The proposal was based on several important requirements that need to be considered while selecting the public Cloud Service Provider (CSP) for their requirements. Let’s look at some of those requirements.

As many training modules were developed in .NET, they initially thought that one of the major cloud vendors supporting .NET would be the natural home for them. After some research they discovered that cloud services from this vendor were still maturing, remember it was sometime in December, 2013; moreover, this vendor’s cloud support could not respond convincingly to many of their queries. This prompted Alex to consider another major CSP with a huge market share to understand the services they offered. This second vendor, being one of the leaders in cloud service offerings, had solutions for all technical requirements of ABC corporation. The overall cost of computing resources also looked reasonable to Alex. Finally, they considered two other vendors, reviewed their capabilities and attempted to match them to their requirements and business priorities before making a final purchasing decision. 

Based on the internal review and the proposal from Alex, Steve decided to start their journey into the cloud with one of the biggest CSPs in the business. What Steve didn’t realize is that while Alex did a reasonable job comparing CSPs, he also relied on an old adage that said ‘no one gets fired for recommending IBM’s servers’. Alex adopted that adage to cloud and transformed it to say that ‘no one will get fired for recommending one of the leaders of Gartner’s Magic Quadrant for public cloud services’.  ABC corporation expanded its capability through cloud; delivery of training modules started for the major new contract and Alex’s recommendation and efforts were very much appreciated.

Then came a bolt from the blue! The monthly bill for compute resources, which was estimated to be around $1500, was almost touching $2000 — 33% more than expected! This was a major crisis considering upcoming expansions, and Steve started wondering if he did the right thing relying only on Alex’s internal recommendations while selecting the CSP. 

Steve and Alex got into a detailed analysis of the billed items to see what caused the surge and how they can manage these costs better. 

Lessons Learnt

  1. Their analysis showed that the unplanned excess in billing was caused largely by their applications exceeding the allotted IOPS (I/O per sec) thresholds. Their chosen CSP billed its customers for IOPS, even though this is something which is generally not in a customer’s control. Even plain vanilla instances with just the OS installed on it can have significantly large IOPS (possibly triggered by OS updates etc.), thereby causing possible billing surges. 
  2. Since the in-house IT team did not understand the cloud billing model well, they did not shutdown the ‘not-in-use’ instances, incorrectly assuming that ‘not-in-use’ also implies ‘not-to-be-billed’. 
  3. Again, due to lack of experience with cloud storage, they were using the most expensive form of online storage available when most of their needs could have been met by less expensive archival storage.
  4. Steve concluded that he should have used a professional, unbiased evaluation agency to help him select the best CSP for his requirements – one that felt no need to stick to any adage !


An SOS from Adrian

It was a Sunday morning in the summer of 2014. I got an intriguing call from Adrian, the CEO of a successful enterprise. “How do you say that these cloud services are elastic? And here, we have lost data of two days!” — Adrian wanted an explanation for something I did not have any context for.

After calming him down and talking to him for a few minutes, I got the complete picture. Adrian was very upset that business critical data captured by the thousands of sensors across different geographical regions of his business were not being recorded consistently by his application servers. This was absolutely critical for his business analytics service to be useful to his customers. 

The main issue was that their instances were consuming the allotted quota on the Solid State Drive (SSD) storage and the data received on Friday and Saturday evening could not be saved. In fact, one particular week, the SSD limit was reached on Friday evening itself and the issue was discovered only on Saturday night when the automated weekly summary of activities showed several error messages!

On top of it, when they contacted their cloud vendor for additional SSD storage space, they were told to move to a new instance with more SSD space to get a contiguous space; the process took extra time making their downtime longer.

They had made their decision to go with this vendor based on the suggestion from their IT Manager, Samuel who had recommended this particular vendor ONLY for its very simple and easy to understand pricing model. However, not considering many other factors that are critical while making cloud purchasing decisions was making them run into unexpected issues. 

Lessons Learnt

  1. Only one factor was considered before selecting the cloud vendor when other important factors like supportability, elasticity and ease of use should have been considered. 
  2. Notifications and messaging were not setup correctly for exception situations. 
  3. They had incorrectly assumed that sufficient resources had been provisioned for development and production environments and both were deployed within the same cloud configuration.

Summary

While most CxOs understand and follow extensive processes – including evaluation – when it comes to, say, hardware procurement, when it comes to cloud resource procurement, required due diligence is often not done – perhaps because of the thought that it would easy to switch to another cloud if one does not work.

In the case of Steve, they should have understood every aspect of the vendor’s billing process, instead of getting surprised after already committing their investments to that vendor. They should have realized that correcting cloud purchasing decisions can be expensive and its better to get a professional evaluation done to minimize risk.

In case of Adrian, some of their basic assumptions costed them dearly. The amount of storage required is a function of the rate at which the data is captured and the number of days those data should be kept in the online storage can’t be left on some assumption. In this case a cloud consultant or a cloud expert or even a solution architect from the cloud service provider could have helped in better scoping the requirements.

Corollary

  • Steve decided to migrate to another public cloud service provider to rein in operating cost and moved out of their first “cloud home” within one year.
  • Adrian separated out their deployment environment and enabled important monitoring and alert features from day one in the new deployment platform.


Notes:
These are real case studies. The names of the persons and the companies have been altered at their request.