Welcome to the Modern Data Experience

Well fellow Sumo’s, it is time for another professional change. I have always said that the simplest solution is always the lowest TCO. To that end, I have always admired Pure Storage’s approach to simplifying data consumption.

Storage is an essential component for serving apps, yet most IT pros treat it like a necessary evil. I have met thousands of customers over the past 16 years and not a single one ever said anything like “I can’t wait to get back to my desk and use my storage array!”

Data should be easy to consume. The old infrastructure mafia failed miserably to provide a consumer-like user experience. Furthermore, these stuffy storage overlords want customers to pay extra for the extortion practices that keep them in business.

What I like about Pure is that they took a problem that desperately needed solving (deliver a modern data experience) and made it comically simple.

Since that first clean all-flash message, Pure has evolved into a subscription company, setting the standard for fair and transparent business practices with Evergreen.  Then Pure progressed into a predictive support company with Pure1, and in so doing, set a new bar for customer satisfaction with an NPS of 82. Next, Pure became a cloud services company with Cloud Block Storage and an analytics company with AIRI.

I look forward to taking you with me on this journey to the future. The evolution continues!

Cohesity Support Scores a Perfect 100 NPS

Calling tech support can be a lot like shopping at your only grocery store after it has initiated an aggressive stop-and-frisk policy. You may have come for food but an enthusiastic pat-down is definitely going to happen.

Think about calling your local cable company for help— Most people would rather shave their head with a cheese grater. I know I will certainly google-fu my issues before picking up the phone to call a vendor.

No product is impervious to problems. Customers understand this but when things go wrong, they expect snappy expertise from the vendor’s support to come to the rescue. Support is a critical aspect of technology consumption and largely determines an overall experience, yet it rarely gets seriously considered when evaluating new technology.

Typically, tech support does not meet customer’s reasonable expectations. As a result, organizations suffer frustrations or perhaps even a disruption to business processes. That’s what makes Cohesity’s accomplishment so impressive. Cohesity recently reported a perfect 100 Net Promoter Score, or “NPS” for short.

NPS is quickly becoming a notable standard in customer satisfaction survey/rating systems and is not limited to support or even technology. Historically, NPS has been used internally by some larger companies to help them understand how they are perceived by customers and to gauge general satisfaction levels.

Basically, customers are asked a single question: “How likely is it that you would recommend this company to a friend or colleague?” On a scale of 1-10, customers must rate at least a 9 or 10 to be considered a “promoter.”

The scoring system is not 0-100 as common sense would suggest, but rather -100 to +100. A score of 0 would be neutral, having a net of no promoters or detractors, which would not be very good. Most blue-chip tech companies range from 20s to the 30s. Apple currently has an NPS of 72, which is considered outstanding whereas Dell scored a 33.

At my last gig, we bragged about our stellar NPS of 85, and I’ve heard a few other young tech companies use NPS in their pitch with ratings in the upper 80s. Younger, smaller and more innovative tech companies often have much better product support ratings. This is largely due to:

  • A Fresh Approach

Established tech companies are often handcuffed to older systems with a large customer base, unable to start over and disrupt their legacy customers. They rely on older infrastructure that is built on manual, reactive processes. Starting fresh, companies can build the support processes right into the products with advanced technology.

  • Technological Advancement

Forward-thinking tech companies build the support infrastructure upfront (remote sensors, predictive AI, automation), to be more responsive to problems, often proactively detecting (or even resolving) the majority of support cases. Due to advanced automation and efficiencies, these new companies can often do away with tiered support models that delay resolutions and frustrates customers.

  • Single-Product Focus

Often, younger innovative tech companies have only one product to support, so skill, investments and expertise are centered around a singular area. For example, a customer’s first call into support at Cohesity is picked up by a level-3 engineer (in less than two minutes on average) that will have advanced expertise and typically resolve even sticky issues on the initial call.

Cohesity’s support has consistently had a score 90+, which basically meant we do not have unhappy customers. A perfect 100 means that Cohesity customers were not simply satisfied, they were ecstatic. ALL of them. That is ridiculously difficult to do.

Congratulations to all our Site Reliability Engineers (SREs) for reaching this achievement! Statistically, we are unlikely to maintain that forever but, WOW. Cohesity is clearly providing a radically differentiated support experience vs. our contemporaries.

You can read more about Cohesity’s NPS score here.

The Challenge of Ransomware Demands a “Cohesive” Approach

Finding out your data center has been infected by ransomware is sort of like finding out your mother has been dating Mötley Crüe rock drummer Tommy Lee— You know some terrible things have already happened and you’re going to have a mess to clean up. It might seem like something that happens only to other people but statistically it is likely to happen to you. StorageSumo is here to help.

The problem is so much worse than the average consumer knows. Let’s peel back some layers of this stinky onion and discuss how modern backups with Cohesity can give your IT staff an unfair advantage against cyberthreats like ransomware.

I’m going to address some basics including:

  • What ransomware is
  • What the true cost is to you and your organization
  • How ransomware enters your datacenter
  • How to prevent, detect and recover from a ransomware attack.

Sadly, very little is known about this particularly insidious form of cybercrimes. I suspect this is mostly because organizations are highly incented to minimize bad publicity, so the majority of incidents go unknown by the general population. Organizations also generally feel like their current backups act as an insurance policy that will effectively recover from a ransomware attack, so “we’re good.” Human nature is to avoid the most unpleasant aspects of life, no matter how likely.

What is Ransomware?

The typical answer goes something like this: Ransomware is an especially sinister strain of malware. Simply put, once your system is infected, ransomware holds your data hostage by encrypting the files, rendering them illegible and unusable until a ransom is paid. While this is probably the answer you were expecting, it is not the accurate answer.

The correct answer? Ransomware is a business.  Ransomware was not designed by anarchists for the purpose of sabotage. Data destruction is not the end goal. The business of ransomware is to be lucrative, which means getting customers to pay the ransom. To achieve this, an effective ransomware attack must work to make the payment as user-friendly as possible and also eliminate all possibility of recovery without paying. Let’s take a look at the payment instructions from an example of ransomware called “Wannacry:”

Notice the simplicity— very similar to other modern simple software designs. The local language, the clarity of the instructions and helpful links to find more info. Cybercriminals want you to pay the ransom and the easier they make it to pay, the better. This also means the ransom is typically affordable. It has to make actual sense to pay up. When the ransom is paid, victims will get most of their data back most of the time but, as you are about to see, the process and pain involved in recovering from ransomware goes far beyond the amount of the ransom.

The True Costs of Ransomware

Gather round, it’s sumo story time. The full cost of a ransomware attack is not easy to calculate.  In one particular instance, I had a retail customer that was the victim of phishing (which we will discuss further). As a result, approximately 200 Windows servers were encrypted, including the backup server. This nightmare began when the backups IT thought would save them from cyberthreats actually became a liability.

The lost data typically meant that business systems were offline, and files were illegible. With no access to critical systems, the employees were sent home. They were no longer able to accept new orders or fulfill existing orders and lost all access to the customer records system. This lasted over 5 days, which meant customers were forced to pivot to alternate suppliers.

The decision even to pay the ransom is a pain-point. The FBI encourages victims to not pay ransom, which would ideally discourage future criminals. Most would agree with this stance. After all, no one wants to reward a cybercriminal that successfully attacked their organization and there is no guarantee that victims would actually recover their data. This is a noble thought but remember that ransomware is a business, and the requested ransom typically ranges from a few hundred dollars to several thousand dollars. These amounts are low enough that, in most cases, it actually makes good business sense to pay the ransom. Also, Wannacry is a relatively un-sophisticated variant compared to newer strains that are starting actually to threaten to leak customer’s sensitive data.

In our example, the pain was so severe that this organization easily decided to pay the ransom ($17K), which represented only a few hours of lost profits. After an uncomfortable discussion with the company’s leadership and finance team, another unforeseen obstacle was the payment currency. Cybercriminals don’t accept purchase orders and they don’t offer net-30-day payment terms. This organization did not have a corporate Bitcoin wallet. Eventually a third-party consultant was engaged to pay the ransom and recover the data, but this delay cost the company valuable time.

Surprisingly, most of the time customers do get the decryption keys to “unlock” their data once the ransom is paid. Again, it’s a business. If no one ever got their data back, ransomware would not be effective. Unfortunately, this particular customer was emailed a spreadsheet with 200x unmarked decryption keys. Imagine being handed a bag of 200 unmarked door keys. There are 200 doors, each with a matching key and your job is to match all 200 doors with keys. As you can imagine, this would take some time, even with a large staff and a shared google sheet. Also remember that there’s a time bomb strapped to the data. After the first three days, the price goes up, but after seven days, it’s gone forever. This fun little detail is just another tactic to remind victims to pay, and to do it quickly:

Even when a matching decryption key was found, the IT staff noticed that many of the recovered servers would crash midway through the decryption process. The encrypted files were actually not decrypted “in-place,” but rather copied, which doubled the data capacity consumed. Servers that were <50% disk utilization decrypted OK, but many were >50% and required a game of musical chairs with the underlying storage system in order to accommodate the unexpected increased disk usage on many systems.

In the end, approximately 7% of the servers were unable to be recovered because keys were not found, there were disk capacity issues, or the customer simply ran out of time.

When most of the systems were eventually recovered, they were no longer the supplier for their previous customers. The loss to revenues and productivity was obvious, but the organization couldn’t foresee the lasting loss of credibility.

Employee morale was also noticeably lower. The IT staff had simply lost all credibility with their peers. The workforce had assumed that IT was protecting them from such an incident. A reputation is a fickle thing. Think about giving your money to Bernie Madoff for a new investment. That sounds crazy, but is it any more irrational than trusting your data with a staff that just lost it?

How Ransomware Enters Your Data Center

The entry point can vary but there are two primary sources: 1) user action and 2) system vulnerabilities. Until the Borg entirely lobotomizes all humans, ransomware will remain a source of pain. An infected email attachment or link is the most frequent source, so a good email scanning/filtering system is essential, but real people will still show up with their own infected devices, they will still plug USB flash drives with loaded malware and still click on infected attachments.

Forrester also says approximately 18% of attacks come from phishing. This technique involves social engineering, which tricks users into thinking they should enter/change their password or account info. This is really a challenge of educating your users, but this brings us to an important first point – no filtration system will prevent 100% of incidents because humans are a vulnerability. Cybercriminals can target users with pinpoint precision and leverage a user’s own access and knowledge to infiltrate a network. This strategic, intentional, customized version of phishing even has a sub-category, aptly called spear-phishing. No matter how impenetrable your firewall may seem, there is still a human element involved.

The other source of ransomware is through system vulnerabilities. Most are software-based vulnerabilities such as RDP (remote desktop protocol) that live within a popular operating system, such as Windows. Brute-force attacks have successfully targeted Windows desktops and servers with RDP enabled, allowing a cybercriminal to have full control of a system on your network. Occasionally there are hardware vulnerabilities exposed such as the Intel processor-based Spectre/Meltdown just to keep things spicy. 

Another software vulnerability is SMB (Windows server message block, aka “CIFS”), which enables ransomware to encrypt file-sharing standards as well as spread throughout the network like a virus, encrypting more desktops and servers. This means corporate network file shares are fish in barrels for ransomware because the attack surface is enormous, but also because it is especially difficult to enforce access controls against an attack.

Preventing Ransomware – Modern Threats Require Modern Backups

Any good ransomware plan needs to start with a strong backup strategy. Firewalls are great but, as the last line of defense, your backups might be the only thing standing between you and the torture chamber described thus far. Unfortunately, most IT organizations rely on backup systems that are just as vulnerable as the rest of the servers. 

Thieves know that backups are an organization’s only chance, so backups are the first thing cybercriminals target. Most backup applications are built on Windows servers, which are vulnerable to the same open liabilities that allow hackers to access any other systems. Backups also typically use network-attached storage (NAS) for repository/backup disk capacity, which, as previously discussed, is a tasty target for criminals to inflict their encryption pain.

These two components of most backup solutions (Windows operating systems and network file shares) are no longer protecting customers, but rather have turned into liabilities that leave organizations exposed to being attacked on the very systems they count on to save them from these types of threats.

Furthermore, these types of older backup solutions are designed to restore a single item. In the case of ransomware, it is likely that EVERYTHING needs to be restored. This would typically take weeks, assuming the backups were not also encrypted.  Also, what prevent defense has been implemented to prevent an immediate repeat incident?

Cohesity Offers a New Approach

In this age of prolific widespread ransomware attacks, organizations need a new type of backup architecture to address this new form of modern threat. A new type of data protection that is designed to address ransomware would:

  • Provide air-gapped immutability from corruption
  • Detect the likeliness of ransomware and alert users
  • Offer large-scale instant recovery from incidents

Designed in the modern era, Cohesity was specifically architected to provide secure protection from modern cyberthreats such as ransomware. Trigger-warning: things are about to get nerdy.

First, Cohesity prevents ransomware. Cohesity is a hyper-converged platform, which means the compute, storage and software are tightly coupled in a “node” architecture that scales out. The fact that storage is entirely integrated means there are some magical automated protections built right in that create immutability.

After Cohesity creates a backup, the file system (SpanFS) immediately creates a “snapshot.” This snapshot copy is kept offline and NEVER exposed back to the network. Even when backup data need to be accessed, Cohesity creates a “clone” of the snapshot and uses that copy rather than the original, just in case a crafty hacker is waiting to attack. The important point to remember is there is always a gold copy kept securely offline.  This process happens completely automatically and illustrates a fundamental advantage of having vertically integrated software and hardware.

Because the software is tightly integrated, Cohesity has controls in place to prevent unauthorized access. Cohesity has multi-factor authentication to block unauthorized access.

In a worse-case scenario, a human error or particularly devious sabotage incident could result in deleted backups, whether accidental or intentional. That is why Cohesity developed DataLock™, a feature that defines data as non-deletable until it hits the predefined retention policy, even by super-users.

Second, Cohesity has an AI-based detection feature that scans for anomalies, such as change rates/encryption rates and mass deletes. This creates and an entropy “score.” If it is determined that a customer’s data has possibly been victimized by ransomware, Cohesity alerts customers so that they can take action immediately before more business systems are affected.

Third, Cohesity can recover from a ransomware incident with a feature aptly named Instant Mass Restore. This is crucial because in the unfortunate event of a ransomware attack, it’s likely most if not all of the organization’s systems were affected, and thus all need to be restored to a point before ransomware encrypted the files. Instant Mass Restore allows your organization to instantly recover all your servers, databases and files to a granular restore point just before ransomware infected your data center.

Conclusion

Cybersecurity has traditionally been network-based. Network security is important, but networks can only be babyproofed so much without severely degrading the user experience or obstructing productivity. A more modern approach can allow for easy protection from ransomware without locking down the user.

What if your organization was hit by ransomware but didn’t have to worry because your IT team could instantly restore EVERYTHING from an immutable copy? Network security will never prevent 100% of threats. It’s time for the backup and network teams to get more cohesive (insert wink emoji-face) with Cohesity.

Chris Colotti (@ccolotti) giving an in-depth tour of the CohesityonWheels Unstoppable Truck Roadshow

What is Secondary Storage?

Why does the universe exist?  Do we have free will?  Coke or Pepsi?  What is secondary storage?

These are a few of the tough questions we at StorageSumo are here to answer for you.  As discussed in my first post (Greener Pastures), @BTNimble and I have recently gone green and we are now working for Cohesity, A provider of secondary storage solutions. Secondary stuff is awesome!  If you’re like me, you might have a secondary fridge, a secondary car, and even a secondary child.  We all know secondary stuff is great but what about secondary storage?

I suppose the easiest (and probably laziest) definition is, “Everything that is not primary storage.”  Primary storage represents approximately 20% of the overall data center capacity.  At our previous job, Bryan and I provided high-performance primary storage.  We were also peers with similar vendors that offered storage and hyperconverged appliances for customers looking to store primary production data and applications that make up this 20%.

Primary Apps?  Like Candy Crush?

Not exactly.  These applications were typically systems of record.  In every organization I have consulted with, there is invariably a database containing critical information such as customer records, order management, student enrollment and so on.  These systems of record are almost always the most vital assets the organization possesses and are serviced with maximum care.  These types of workloads are very performance-sensitive and if they aren’t snappy, the organization suffers tremendously with lack of productivity, possibly even availability, and worse- grumpy employees standing over your desk asking if you tried turning it off and on again.

If ANY impedance to data delivery of this type occurs, the organization suffers.  If these records were to somehow vanish, the organization might just as well not exist.  THIS is obviously primary data meant for primary storage because it is essential, but also it has a strict performance SLA (service level agreement).

Backup and Archival

Now we’re getting to some secondary stuff!  For many organizations, the largest amount of storage capacity is dedicated for backup and archival retention of data. This gives IT a time machine to be able to recover from any incident that might impede the organization’s data availability or integrity.  These incidents could include (but are not limited to):

  • Physical hardware failure
  • Data corruption
  • Human error
  • Ransomware attacks
  • Sabotage
  • Lost or misplaced files
  • Site-Disasters (fire/theft/flood/Chernobyl)

If none of these things have ever happened to your organization, you are very lucky and I hate you.  More likely, multiple variations of these incidents have happened to your organization multiple times.  In a best-case scenario, your IT staff was prepared and able to recover quickly.  In either case, we’ll look closer at this dark little corner later in another post but for now, just know that backups require tertiary storage – what’s known as “air-gapped” from the primary storage system – and lots of it.

While backups and archives are incredibly important, by their very nature these are a second (or more) copy of the primary storage.  This is by design so that there is physical separation from the primary data, a so-called “air-gap.”  In fact, a good backup strategy will include multiple copies for reasons such as multiple recovery-points (versioning), and also copies offsite to preserve data in the event of a site disaster.

For the purpose of defining secondary storage, remember that backups are not SLA-driven in the same way primary data is.  Sure, it’s vital that the backups are complete, reliable and in a certain window of time, but this type of workload is more throughput-dependent than latency-sensitive.  In other words, backup jobs are sort of like moving gravel with dump trucks.  There’s a lot of stuff to move and it is important stuff, but it wouldn’t make sense to use a Lamborghini and no one will complain if it takes a little longer to get there.

Other Examples

We’ve established that backups are one clear way we make copies of primary storage, an obvious use of secondary storage.  What else is there?  Well, there are actually tons of use cases.  Take testing and development, or “Test/Dev” as the cool kids say.

To paraphrase Murphy’s law, what could go wrong usually does.  This is why organizations would prefer to test changes, such as updates or upgrades in a “safe-zone.”  IT often has a duplicate similar environment of their primary data and applications in a test/dev silo.  In many cases, organizations will try to repurpose older gear to save money but this does often require doubling infrastructure costs.

But what about my PowerPoint proposal for television in the men’s restroom?

Don’t worry.  That and the rest of your documents are important too.  Interestingly, secondary data also includes user’s data such as documents, spreadsheets, presentation files and even pictures or videos.  This “bulk” data is typically stored on a network-attached storage (NAS) device.  This is so the user files can be efficiently and securely managed centrally and persist even when a desktop is replaced or an employee quits.  This file/NAS storage is also critical to the application’s success and the users depend on this to work, but this data does not have a strict performance requirement.  This type of file/NAS storage is also an excellent example of secondary storage.

Why Cohesity is built for Secondary Storage

Secondary data is actually many, MANY times larger than primary data capacity.  By even conservative estimates, secondary data comprises >80% of the overall data capacity.  Storage in the data center actually maps nicely to an iceberg- 80% or more is actually below the surface.  Most IT leaders will admit that they have 6-8 copies of their data for various reasons. Primary storage is for apps with strict performance SLA’s and secondary storage is for apps without strict SLA’s. 

Mass Data Fragmentation

Cohesity was founded to solve the problem of mass data fragmentation.  While primary storage vendors have made tremendous progress consolidating that 20% of data center workloads, Cohesity is purpose-built to consolidate the much larger 80% of data that is considered secondary.  This 80% is typically scattered across multiple siloed environments.  By consolidating backups, archival data, file shares, test/dev and analytics into a single web-scale platform, Cohesity customers are able to reduce the number of physical data copies, vendors, support renewals and management interfaces down to one.  This makes managing data radically simpler, and I firmly believe that the simplest solution always has the lowest total cost.

There you have it!  Next post, we’ll tackle another one of life’s deep mysteries but for now we can close the books on this one.  Stay tuned for more storage ramblings as well as details around Cohesity’s DataPlatform®.  Until then, au revoir!