Well fellow Sumo’s, it is time for another professional change. I have always said that the simplest solution is always the lowest TCO. To that end, I have always admired Pure Storage’s approach to simplifying data consumption.
Storage is an essential component for serving apps, yet most IT pros treat it like a necessary evil. I have met thousands of customers over the past 16 years and not a single one ever said anything like “I can’t wait to get back to my desk and use my storage array!”
Data should be easy to consume. The old infrastructure mafia failed miserably to provide a consumer-like user experience. Furthermore, these stuffy storage overlords want customers to pay extra for the extortion practices that keep them in business.
What I like about Pure is that they took a problem that desperately needed solving (deliver a modern data experience) and made it comically simple.
Since that first clean all-flash message, Pure has evolved into a subscription company, setting the standard for fair and transparent business practices with Evergreen. Then Pure progressed into a predictive support company with Pure1, and in so doing, set a new bar for customer satisfaction with an NPS of 82. Next, Pure became a cloud services company with Cloud Block Storage and an analytics company with AIRI.
I look forward to taking you with me on this journey to the future. The evolution continues!
Calling tech support can
be a lot like shopping at your only grocery store after it has initiated an
aggressive stop-and-frisk policy. You may have come for food but an
enthusiastic pat-down is definitely going to happen.
Think about calling
your local cable company for help— Most people would rather shave their head
with a cheese grater. I know I will certainly google-fu my issues before
picking up the phone to call a vendor.
No product is
impervious to problems. Customers understand this but when things go wrong,
they expect snappy expertise from the vendor’s support to come to the rescue. Support
is a critical aspect of technology consumption and largely determines an
overall experience, yet it rarely gets seriously considered when evaluating new
Typically, tech support does not meet customer’s reasonable expectations. As a result, organizations suffer frustrations or perhaps even a disruption to business processes. That’s what makes Cohesity’s accomplishment so impressive. Cohesity recently reported a perfect 100 Net Promoter Score, or “NPS” for short.
NPS is quickly
becoming a notable standard in customer satisfaction survey/rating systems and
is not limited to support or even technology. Historically, NPS has been
used internally by some larger companies to help them understand how they are
perceived by customers and to gauge general satisfaction levels.
are asked a single question: “How likely is it that you would recommend this
company to a friend or colleague?” On a scale of 1-10, customers must rate at
least a 9 or 10 to be considered a “promoter.”
The scoring system is not 0-100 as common sense would suggest, but rather -100 to +100. A score of 0 would be neutral, having a net of no promoters or detractors, which would not be very good. Most blue-chip tech companies range from 20s to the 30s. Apple currently has an NPS of 72, which is considered outstanding whereas Dell scored a 33.
At my last gig, we bragged about our stellar NPS of 85, and I’ve heard a few other young tech companies use NPS in their pitch with ratings in the upper 80s. Younger, smaller and more innovative tech companies often have much better product support ratings. This is largely due to:
A Fresh Approach
Established tech companies are often handcuffed to older systems
with a large customer base, unable to start over and disrupt their legacy customers.
They rely on older infrastructure that is built on manual, reactive processes.
Starting fresh, companies can build the support processes right into the
products with advanced technology.
Forward-thinking tech companies build the support infrastructure
upfront (remote sensors, predictive AI, automation), to be more responsive to
problems, often proactively detecting (or even resolving) the majority of
support cases. Due to advanced automation and efficiencies, these new companies
can often do away with tiered support models that delay resolutions and
Often, younger innovative tech companies have only
one product to support, so skill, investments and expertise are centered around
a singular area. For example, a customer’s first call into support at Cohesity
is picked up by a level-3 engineer (in less than two minutes on average) that
will have advanced expertise and typically resolve even sticky issues on the
Cohesity’s support has consistently had a score 90+, which basically meant
we do not have unhappy customers. A perfect 100 means that Cohesity
customers were not simply satisfied, they were ecstatic. ALL of them. That
is ridiculously difficult to do.
Congratulations to all our Site Reliability Engineers (SREs) for reaching this achievement! Statistically, we are unlikely to maintain that forever but, WOW. Cohesity is clearly providing a radically differentiated support experience vs. our contemporaries.
You can read more about Cohesity’s NPS score here.
Finding out your data center has been infected by ransomware is sort of like finding out your mother has been dating Mötley Crüe rock drummer Tommy Lee— You know some terrible things have already happened and you’re going to have a mess to clean up. It might seem like something that happens only to other people but statistically it is likely to happen to you. StorageSumo is here to help.
The problem is so much worse than the average consumer
knows. Let’s peel back some layers of this stinky onion and discuss how modern
backups with Cohesity can give your IT staff an unfair advantage against
cyberthreats like ransomware.
I’m going to address some basics including:
What ransomware is
What the true cost is to you and your
How ransomware enters your datacenter
How to prevent, detect and recover from a
Sadly, very little is known about this particularly
insidious form of cybercrimes. I suspect this is mostly because organizations are
highly incented to minimize bad publicity, so the majority of incidents go
unknown by the general population. Organizations also generally feel like their
current backups act as an insurance policy that will effectively recover from a
ransomware attack, so “we’re good.” Human nature is to avoid the most
unpleasant aspects of life, no matter how likely.
What is Ransomware?
The typical answer goes something like this: Ransomware is
an especially sinister strain of malware. Simply put, once your system is infected,
ransomware holds your data hostage by encrypting the files, rendering them
illegible and unusable until a ransom is paid. While this is probably the
answer you were expecting, it is not the accurate answer.
The correct answer? Ransomware is a business. Ransomware was not designed by anarchists for
the purpose of sabotage. Data destruction is not the end goal. The business of
ransomware is to be lucrative, which means getting customers to pay the ransom.
To achieve this, an effective ransomware attack must work to make the payment
as user-friendly as possible and also eliminate all possibility of recovery
without paying. Let’s take a look at the payment instructions from an example
of ransomware called “Wannacry:”
Notice the simplicity— very similar to other modern simple
software designs. The local language, the clarity of the instructions and helpful
links to find more info. Cybercriminals want you to pay the ransom and the
easier they make it to pay, the better. This also means the ransom is typically
affordable. It has to make actual sense to pay up. When the ransom is paid,
victims will get most of their data back most of the time but, as you are about
to see, the process and pain involved in recovering from ransomware goes far
beyond the amount of the ransom.
The True Costs of Ransomware
Gather round, it’s sumo story time. The full cost of a ransomware
attack is not easy to calculate. In one
particular instance, I had a retail customer that was the victim of phishing
(which we will discuss further). As a result, approximately 200 Windows servers
were encrypted, including the backup server. This nightmare began when the
backups IT thought would save them from cyberthreats actually became a
The lost data typically meant that business systems were offline, and files were illegible. With no access to critical systems, the employees were sent home. They were no longer able to accept new orders or fulfill existing orders and lost all access to the customer records system. This lasted over 5 days, which meant customers were forced to pivot to alternate suppliers.
The decision even to pay the ransom is a pain-point. The FBI encourages victims to not pay ransom, which would ideally discourage future criminals. Most would agree with this stance. After all, no one wants to reward a cybercriminal that successfully attacked their organization and there is no guarantee that victims would actually recover their data. This is a noble thought but remember that ransomware is a business, and the requested ransom typically ranges from a few hundred dollars to several thousand dollars. These amounts are low enough that, in most cases, it actually makes good business sense to pay the ransom. Also, Wannacry is a relatively un-sophisticated variant compared to newer strains that are starting actually to threaten to leak customer’s sensitive data.
In our example, the pain was so severe that this
organization easily decided to pay the ransom ($17K), which represented only a
few hours of lost profits. After an uncomfortable discussion with the company’s
leadership and finance team, another unforeseen obstacle was the payment
currency. Cybercriminals don’t accept purchase orders and they don’t offer
net-30-day payment terms. This organization did not have a corporate Bitcoin
wallet. Eventually a third-party consultant was engaged to pay the ransom and
recover the data, but this delay cost the company valuable time.
Surprisingly, most of the time customers do get the decryption keys to “unlock” their data once the ransom is paid. Again, it’s a business. If no one ever got their data back, ransomware would not be effective. Unfortunately, this particular customer was emailed a spreadsheet with 200x unmarked decryption keys. Imagine being handed a bag of 200 unmarked door keys. There are 200 doors, each with a matching key and your job is to match all 200 doors with keys. As you can imagine, this would take some time, even with a large staff and a shared google sheet. Also remember that there’s a time bomb strapped to the data. After the first three days, the price goes up, but after seven days, it’s gone forever. This fun little detail is just another tactic to remind victims to pay, and to do it quickly:
Even when a matching decryption key was found, the IT staff
noticed that many of the recovered servers would crash midway through the
decryption process. The encrypted files were actually not decrypted “in-place,”
but rather copied, which doubled the data capacity consumed. Servers that were
<50% disk utilization decrypted OK, but many were >50% and required a
game of musical chairs with the underlying storage system in order to
accommodate the unexpected increased disk usage on many systems.
In the end, approximately 7% of the servers were unable to
be recovered because keys were not found, there were disk capacity issues, or
the customer simply ran out of time.
When most of the systems were eventually recovered, they
were no longer the supplier for their previous customers. The loss to revenues
and productivity was obvious, but the organization couldn’t foresee the lasting
loss of credibility.
Employee morale was also noticeably lower. The IT staff had
simply lost all credibility with their peers. The workforce had assumed that IT
was protecting them from such an incident. A reputation is a fickle thing. Think
about giving your money to Bernie Madoff for a new investment. That sounds
crazy, but is it any more irrational than trusting your data with a staff that just
How Ransomware Enters Your Data Center
The entry point can vary but there are two primary sources:
1) user action and 2) system vulnerabilities. Until the Borg entirely lobotomizes
all humans, ransomware will remain a source of pain. An infected email
attachment or link is the most frequent source, so a good email
scanning/filtering system is essential, but real people will still show up with
their own infected devices, they will still plug USB flash drives with loaded
malware and still click on infected attachments.
Forrester also says approximately 18% of attacks come from
phishing. This technique involves social engineering, which tricks users into
thinking they should enter/change their password or account info. This is
really a challenge of educating your users, but this brings us to an important
first point – no filtration system will prevent 100% of incidents because humans
are a vulnerability. Cybercriminals can target users with pinpoint
precision and leverage a user’s own access and knowledge to infiltrate a
network. This strategic, intentional, customized version of phishing even has a
sub-category, aptly called spear-phishing. No matter how impenetrable your
firewall may seem, there is still a human element involved.
The other source of ransomware is through system
vulnerabilities. Most are software-based vulnerabilities such as RDP (remote
desktop protocol) that live within a popular operating system, such as Windows.
Brute-force attacks have successfully targeted Windows desktops and servers
with RDP enabled, allowing a cybercriminal to have full control of a system on
your network. Occasionally there are hardware vulnerabilities exposed such as
the Intel processor-based Spectre/Meltdown just to keep things spicy.
Another software vulnerability is SMB (Windows server
message block, aka “CIFS”), which enables ransomware to encrypt file-sharing
standards as well as spread throughout the network like a virus, encrypting more
desktops and servers. This means corporate network file shares are fish in
barrels for ransomware because the attack surface is enormous, but also because
it is especially difficult to enforce access controls against an attack.
Preventing Ransomware – Modern Threats Require Modern Backups
Any good ransomware plan needs to start with a strong backup
strategy. Firewalls are great but, as the last line of defense, your backups
might be the only thing standing between you and the torture chamber described thus
far. Unfortunately, most IT organizations rely on backup systems that are just
as vulnerable as the rest of the servers.
Thieves know that backups are an organization’s only chance,
so backups are the first thing cybercriminals target. Most backup applications are
built on Windows servers, which are vulnerable to the same open liabilities
that allow hackers to access any other systems. Backups also typically use
network-attached storage (NAS) for repository/backup disk capacity, which, as
previously discussed, is a tasty target for criminals to inflict their
These two components of most backup solutions (Windows
operating systems and network file shares) are no longer protecting customers,
but rather have turned into liabilities that leave organizations exposed to being
attacked on the very systems they count on to save them from these types of
Furthermore, these types of older backup solutions are
designed to restore a single item. In the case of ransomware, it is likely that
EVERYTHING needs to be restored. This would typically take weeks, assuming the
backups were not also encrypted. Also,
what prevent defense has been implemented to prevent an immediate repeat
Cohesity Offers a New Approach
In this age of prolific widespread ransomware attacks, organizations
need a new type of backup architecture to address this new form of modern threat.
A new type of data protection that is designed to address ransomware would:
Provide air-gapped immutability from corruption
Detect the likeliness of ransomware and alert users
Offer large-scale instant recovery from
Designed in the modern era, Cohesity was specifically
architected to provide secure protection from modern cyberthreats such as
ransomware. Trigger-warning: things are about to get nerdy.
First, Cohesity prevents ransomware. Cohesity is
a hyper-converged platform, which means the compute, storage and software are
tightly coupled in a “node” architecture that scales out. The fact that storage
is entirely integrated means there are some magical automated protections built
right in that create immutability.
After Cohesity creates a backup, the file system
(SpanFS) immediately creates a “snapshot.” This snapshot copy is kept offline
and NEVER exposed back to the network. Even when backup data need to be
accessed, Cohesity creates a “clone” of the snapshot and uses that copy
rather than the original, just in case a crafty hacker is waiting to attack. The
important point to remember is there is always a gold copy kept securely
offline. This process happens
completely automatically and illustrates a fundamental advantage of having vertically
integrated software and hardware.
Because the software is tightly integrated, Cohesity
has controls in place to prevent unauthorized access. Cohesity has
multi-factor authentication to block unauthorized access.
In a worse-case scenario, a human error or particularly
devious sabotage incident could result in deleted backups, whether accidental
or intentional. That is why Cohesity developed DataLock™, a feature that
defines data as non-deletable until it hits the predefined retention policy,
even by super-users.
Second, Cohesity has an AI-based detection feature
that scans for anomalies, such as change rates/encryption rates and mass
deletes. This creates and an entropy “score.” If it is determined that a
customer’s data has possibly been victimized by ransomware, Cohesity
alerts customers so that they can take action immediately before more business
systems are affected.
Third, Cohesity can recover from a ransomware
incident with a feature aptly named Instant Mass Restore. This is crucial
because in the unfortunate event of a ransomware attack, it’s likely most if
not all of the organization’s systems were affected, and thus all need to be
restored to a point before ransomware encrypted the files. Instant Mass Restore
allows your organization to instantly recover all your servers, databases and
files to a granular restore point just before ransomware infected your data
Cybersecurity has traditionally been network-based. Network
security is important, but networks can only be babyproofed so much without
severely degrading the user experience or obstructing productivity. A more
modern approach can allow for easy protection from ransomware without locking
down the user.
What if your organization was hit by ransomware but didn’t
have to worry because your IT team could instantly restore EVERYTHING from an
immutable copy? Network security will never prevent 100% of threats. It’s time
for the backup and network teams to get more cohesive (insert wink
emoji-face) with Cohesity.
Why does the universe exist? Do we have free will? Coke or Pepsi? What is secondary storage?
These are a few of the tough questions we at StorageSumo are here to answer for you. As discussed in my first post (Greener Pastures), @BTNimble and I have recently gone green and we are now working for Cohesity, A provider of secondary storage solutions. Secondary stuff is awesome! If you’re like me, you might have a secondary fridge, a secondary car, and even a secondary child. We all know secondary stuff is great but what about secondary storage?
I suppose the easiest (and probably laziest) definition is,
“Everything that is not primary storage.”
Primary storage represents approximately 20% of the overall data center
capacity. At our previous job, Bryan and
I provided high-performance primary storage.
We were also peers with similar vendors that offered storage and
hyperconverged appliances for customers looking to store primary production data
and applications that make up this 20%.
Primary Apps? Like Candy Crush?
Not exactly. These
applications were typically systems of record.
In every organization I have consulted with, there is invariably a database
containing critical information such as customer records, order management, student
enrollment and so on. These systems of
record are almost always the most vital assets the organization possesses and
are serviced with maximum care. These
types of workloads are very performance-sensitive and if they aren’t snappy,
the organization suffers tremendously with lack of productivity, possibly even availability,
and worse- grumpy employees standing over your desk asking if you tried turning
it off and on again.
If ANY impedance to data delivery of this type occurs, the organization suffers. If these records were to somehow vanish, the organization might just as well not exist. THIS is obviously primary data meant for primary storage because it is essential, but also it has a strict performance SLA (service level agreement).
Backup and Archival
Now we’re getting to some secondary stuff! For many organizations, the largest amount of storage capacity is dedicated for backup and archival retention of data. This gives IT a time machine to be able to recover from any incident that might impede the organization’s data availability or integrity. These incidents could include (but are not limited to):
Physical hardware failure
Lost or misplaced files
If none of these things have ever happened to your organization, you are very lucky and I hate you. More likely, multiple variations of these incidents have happened to your organization multiple times. In a best-case scenario, your IT staff was prepared and able to recover quickly. In either case, we’ll look closer at this dark little corner later in another post but for now, just know that backups require tertiary storage – what’s known as “air-gapped” from the primary storage system – and lots of it.
While backups and archives are incredibly important, by their
very nature these are a second (or more) copy of the primary storage. This is by design so that there is physical
separation from the primary data, a so-called “air-gap.” In fact, a good backup strategy will include
multiple copies for reasons such as multiple recovery-points (versioning), and
also copies offsite to preserve data in the event of a site disaster.
For the purpose of defining secondary storage, remember that backups are not SLA-driven in the same way primary data is. Sure, it’s vital that the backups are complete, reliable and in a certain window of time, but this type of workload is more throughput-dependent than latency-sensitive. In other words, backup jobs are sort of like moving gravel with dump trucks. There’s a lot of stuff to move and it is important stuff, but it wouldn’t make sense to use a Lamborghini and no one will complain if it takes a little longer to get there.
We’ve established that backups are one clear way we make
copies of primary storage, an obvious use of secondary storage. What else is there? Well, there are actually tons of use cases. Take testing and development, or “Test/Dev”
as the cool kids say.
To paraphrase Murphy’s law, what could go wrong usually
does. This is why organizations would
prefer to test changes, such as updates or upgrades in a “safe-zone.” IT often has a duplicate similar environment
of their primary data and applications in a test/dev silo. In many cases, organizations will try to
repurpose older gear to save money but this does often require doubling
But what about my PowerPoint proposal for television in the men’s restroom?
Don’t worry. That and
the rest of your documents are important too.
Interestingly, secondary data also includes user’s data such as
documents, spreadsheets, presentation files and even pictures or videos. This “bulk” data is typically stored on a
network-attached storage (NAS) device.
This is so the user files can be efficiently and securely managed
centrally and persist even when a desktop is replaced or an employee quits. This file/NAS storage is also critical to the
application’s success and the users depend on this to work, but this data does
not have a strict performance requirement.
This type of file/NAS storage is also an excellent example of secondary
Why Cohesity is built for Secondary Storage
Secondary data is actually many, MANY times larger than
primary data capacity. By even
conservative estimates, secondary data comprises >80% of the overall data
capacity. Storage in the data center
actually maps nicely to an iceberg- 80% or more is actually below the surface. Most IT leaders will admit that they have 6-8
copies of their data for various reasons. Primary storage is for apps with
strict performance SLA’s and secondary storage is for apps without strict
Cohesity was founded to solve the problem of mass data
fragmentation. While primary storage
vendors have made tremendous progress consolidating that 20% of data center
workloads, Cohesity is purpose-built to consolidate the much larger 80% of data
that is considered secondary. This 80%
is typically scattered across multiple siloed environments. By consolidating backups, archival data, file
shares, test/dev and analytics into a single web-scale platform, Cohesity
customers are able to reduce the number of physical data copies, vendors,
support renewals and management interfaces down to one. This makes managing data radically simpler,
and I firmly believe that the simplest solution always has the lowest total
There you have it!
Next post, we’ll tackle another one of life’s deep mysteries but for now
we can close the books on this one. Stay
tuned for more storage ramblings as well as details around Cohesity’s DataPlatform®. Until then, au revoir!