Need an Air Gap? Call a Plumber

Recommendations for a Secure Storage Project

I have spent the last several years of my career attempting to alert my fellow sumos, partners and customers about the growing cybersecurity threat known as ransomware (see previous post here). Most people (certainly in the cybersecurity and information technology space) now understand the threat better, and some have even taken precautions, updating their security appliances, educating their employees and re-evaluating their backup systems.

I have also waxed poetic that an organization’s backup systems could be cause for concern, since criminals often seek out and destroy an org’s backup data first, to eliminate any possibility of recovery and thus increasing the likelihood of having to pay ransom, typically in untraceable bitcoin.

Recognizing this risk, some organizations have even gone so far as to evaluate the need for a supplemental, redundant backup system that would serve as a sort of “secure enclave”, should the primary data and backup systems also be compromised.

While this is the right idea, I have noticed some troubling trends regarding strategies for improving cybersecurity posture with a separate, redundant backup system. I have also noticed a general misunderstanding regarding terms and nomenclature for this wild west of cybersecurity. This post is designed to explain the most important terms to know, what to look for in a secure, tertiary storage environment and some recommendations for evaluation criteria.

Talk Like an Expert- The Terms to Know

Air Gap

An Air Gap. Useful for passing a health inspection.

An air gap is a plumbing term, referring to the unobstructed vertical space between the water outlet (like a faucet) and the flood level of a fixture (like a kitchen sink). This provides backflow safety, which protects the water source from contamination. “Air gap” is NOT a useful term regarding secure data access and storage.

Unless the data is stored on removable media (like tape or CD) and stored offline on a shelf, there is no air gap. Furthermore, a true air-gapped computer system also wouldn’t be very useful for recovery, testing or patch management. Any product that is network-attached cannot be “air-gapped”, and even vendors that have adopted this term are still careful to call it an “operational” air gap, which is acknowledging that it is certainly not an air gap.  Remember, if you need an air gap, call a plumber. Typically when customers are referring to an air gap solution, they actually mean a strong set of security features that I will describe below in greater detail. So what terms should we be using?

Immutable


FlashArray Storage Snapshots

Now we’re getting somewhere! Immutable storage means that the data cannot be altered, updated or changed in any way. Storage snapshots are a superb example of immutable data storage. Snapshots create a frozen copy of data that is impervious to change. This typically provides a DVR capability to revert back to a point in time before ransomware encrypted the data. Please note that network storage appliances (that are not protected with snapshots) are NOT immutable. Conceivably, the files could be opened and changed. More importantly, even immutable storage can still be destroyed and eradicated, which is where WO/RM comes in⇣

Write Once, Read Many (WO/RM)

WO/RM storage describes an indestructible quality that means data cannot be overwritten, deleted or removed by any user, even a mighty administrator. WO/RM storage has been available for decades dating back to ROM (read-only memory). Tape systems also offered WO/RM varieties and many of you fondly remember CD-R and DVD-R, which allowed for a single-write operation, then prevented any further overwrites to that optical disk. Now that network-attached storage is preferred for backup applications, WO/RM is not a common standard feature found on most appliances but make no mistake; WO/RM is a CRITICAL feature to demand in any highly secure application, such as a secure, hacker-proof storage environment.

Legal Hold

Legal hold is a notification sent from an organization’s legal team to an IT team (and probably relevant employees), instructing them not to delete electronically stored information. Similar to WO/RM, legal hold requires that data be preserved in a tamper-proof and indestructible way. Legal hold differs from WO/RM, in that a legal hold request typically requires the preservation of data to be applied retroactively. To guarantee extended retention of data, legal hold must be applied to an individual’s or organization’s data, often for an indefinite period of time.

Multifactor Authentication (MFA)

MFA is an authentication system that requires more than one distinct authentication factor for successful authentication, typically to gain access to a secure management system. MFA can be implemented through an authentication platform (such as Okta or Duo), or can be implemented with local credentials, and authenticated through an SMS text code, or using a popular authentication app, such as Google’s “Authenticator”. MFA is quite possibly the single most important safeguard against unwanted access to critical applications and security systems.

Now that we know the game, let’s play. Below are suggested starter evaluation criteria for anyone evaluating a secure storage solution ⇣

Example Evaluation Criteria

Recommendations

We recommend that customers looking for secure storage solutions for protection against ransomware should research both modern on-premises and cloud-based storage systems that incorporate a combination of immutable architecture and WO/RM (write-once, read many) technology. Together, these indestructible features create a bedrock of data that cannot be compromised by external threats such as ransomware, or even internal threats such as sabotage.

These devices absolutely must also use strong access controls with multifactor authentication (MFA), preferably with separate credentials from the primary domain (such as Microsoft Active Directory). Ideally, select a system that provides local user authentication with multifactor support.

In this ultra-secure application, we want as much risk isolation as possible, which includes separation of hardware and software development cycles. Another recommendation is to consider only products with entirely different hardware and software from what is currently used for primary storage and backup. One of our customers told us that, during a routine service event, their current vendor’s service technician accidentally reformatted the wrong backup storage system, causing data loss and an outage. While this was unintentional, this event caused the customer to research secure storage for protection against risks like ransomware and sabotage, and the customer evaluated only storage products and vendors that were different from their current provider, in order to minimize the risk of exposure to their secure environment.

Finally, we strongly recommend evaluating only those systems that provide high-performance to be able to meet more demanding SLAs for recovery. Ransomware typically inflicts maximum pain by encrypting as much data as possible, which would require quick recovery of potentially all your data. Pre-ransomware era backup and storage technologies are typically based on slower, low-cost components. We recommend all-flash technologies that can perform at-scale, and allow for easy testing. A system that can perform a near instantaneous restore of all data can also account for the contingency that even the primary storage is unavailable, and thus run indefinitely, until the compromised storage is back online.

Better yet, start to demand these secure storage features in your primary storage and backup systems, and reduce the need to rely on such ultra-secure redundant devices in the first place.

What to Expect When You’re Expecting a Ransomware Attack

I’m noticing a troubling trend, wherein after a ransomware attack is publicly exposed, sales teams descend upon the victims like a flock of thirsty pigeons offering to “help.” As a PSA, it’s time to say the quiet part out loud: AFTER ransomware has encrypted your data, it’s too late. There is NOTHING any vendor can do unless they are selling a time machine. Unless Google calls to loan you their quantum computer currently levitating near absolute zero temperatures, tell them they cannot help you break the AES-256 cryptography. The time to address a ransomware attack is BEFORE it takes place.

Regular sumo readers (I don’t check sumo stats but can only assume there are millions) might recall that I recently wrote a more comprehensive ransomware article here but there’s no doubt about it, ransomware is on the rise and merits revisiting.  Let’s pull up the manhole cover again and see what new rats are lurking in the ransomware sewers this time, and why network security measures are inadequate.

Since I have written in detail before on this topic, this time I’m going to focus on three topics I believe to be largely misunderstood:

  1. What a typical experience looks like when escaping ransomware jail
  2. Why network security won’t help
  3. Why your current backup is useless

It’s been well-publicized that just a few weeks ago, hospitals started reporting a sharp increase in ransomware attacks. Just this past week, the Wall Street Journal published an article with gruesome details of several attacks on public school systems. While there is no central report to track these attacks, the Journal reports tracking nearly three dozen public school districts that have been attacked since the pandemic began in March, which does not include private schools, colleges or universities. Just this week, school systems in Toledo, Ohio and Athens, Texas have released more details on their attacks and the stories are frightening.

These rancid attacks have increased so fast that two US Senators have recently asked the US Department of Education Secretary to come up with a national response to the growing crisis. The FBI’s answer is not to pay, which is virtuous thought as long as the attack is happening to someone else.

Escaping From Ransomware Jail

I don’t think most people have any idea what it takes to recover encrypted data, even after criminals receive payment and release decryption keys. Imagine you are the IT administrator at a local school system, and you get a call that one of your systems is not responding. You log into the application’s server and see something that looks like this:

Screen-capture of “Maze,” the most common ransomware variant of 2020. Nice, right?

It’s definitely worrisome but you’ve been backing up that server’s data every night. You log into your backup server and notice the backup server’s data is ALSO encrypted.

Now you’re worried. Additional calls start coming in and confirm what you already started to fear- everything is encrypted.

Onward we go to the third stage of grief, bargaining. After a frantic call with your manager, you are instructed to look into what it would take to pay the ransom. You look closer at the scary instructions on the screen.

The friendly instructions say to pay in 7 days or the data decryption keys will be lost forever. Awesome. Ordinarily you could turn around an emergency purchase order in about a week but something tells you these punks won’t extend net-30-day billing terms. A closer inspection confirms that payment must be made in Bitcoin. Dandy. Unsurprisingly the school system did not have a Bitcoin wallet.

The “consultant” your school board hired after you reported the attack has now offered to “help,” which means the consultant will pay in Bitcoin and bill the school system. Whew. It’s been painful but surely now you’re out of the woods, right?

You finally get a list of decryption keys to “unlock” the files. Only, oh no! They aren’t even marked! Now you have to manually match these keys with each server and desktop. This extended downtime was unexpected and costly.

When you do get lucky and find a matching key, you notice another fun surprise- the servers are crashing before the data fully-decrypts. Another fun fact about ransomware is that the data is not decrypted “in-place.” Rather, the recovered data is copied. Many of your servers did not have 50% free space and the disks are filling up.

Adding insult to injury, you were unable to restore several critical systems (approximately 10% of data on average is never recovered). Worst of all, you have no idea how the systems were compromised and now you are understandably worried you might become a repeat customer.

As the school’s IT administrator who recently experienced this first hand, one recent victim blamed himself. “I felt like a complete and total failure,” he said.

Network Security Won’t Help

Modern threats require modern backups. Any good ransomware plan needs to start with a strong backup strategy. Firewalls are great but, as the last line of defense, your backups might be the only thing standing between you and the brutal story we just witnessed.

Cybersecurity has traditionally been network-based. Network security is important, but no firewall or email filter can prevent 100% of attacks. Between email, BYOD policies and work-from-home realities, the attack surface has grown significantly in just the last few months. Think of your firewall as a goaltender. Even the best hockey goalies only prevent approximately 90% of shots from going in. With ubiquitous computing power and automation, the number of ransomware shots is going to continue to climb exponentially. While a shiny new security appliance might be even better than Patrick Roy at preventing ransomware, all it takes is one that gets through and you’re still looking at the same painful outcome. The school systems in Toledo reported that ransomware most likely entered their network after a faculty member left a web meeting open.

This brings me to my point- Networks can only be babyproofed so much. Zoom/WebEx/Teams are a way of life now. There is no practical way to prevent ransomware without severely degrading the user experience or obstructing productivity. A more modern approach can allow for easy protection from ransomware without locking down the user.

In order to assess your ransomware readiness, ask your IT staff some tough questions:

  1. Assuming $1,000/system, how much would we likely have to pay if all our servers were encrypted by ransomware?
  2. Do we have a Bitcoin Wallet?
  3. How long would it take to recover ALL our data from a ransomware attack?
  4. What is keeping someone with YOUR internal access from corrupting/destroying the backups?
  5. Can we easily and routinely test recovering ALL our servers from a ransomware attack?

If Vendors and Networks Can’t Help After an Attack, What Can You Do?

The only way to successfully recover from a ransomware attack is by restoring from a safe, immutable backup copy made BEFORE the data was encrypted. This sounds simple enough. Surely all organizations are backing up their data, right? Actually, classic backup architecture is useless against a modern-day ransomware attack.

Remember that ransomware is a business. The purpose of ransomware is not anarchy through data destruction, but rather to be lucrative. Criminals know backups are an organization’s only chance to avoid paying ransom, so they are actively searching for the backup data in order to encrypt or destroy the backups. Data protection architecture designed before a few years ago (say, 2015ish) are based on windows servers and network shares that are vulnerable to the same ransomware that encrypts as any other system.

The other issue with classic backup is recovery times. Ransomware is designed to create urgency by attaching a time bomb to the decryption keys. In the lucky scenario where your backups weren’t also encrypted, your organization still needs to recover quickly and ensure your backups are valid before time runs out.

Reports indicate that cyber criminals typically charge approximately $1,000 USD/server or desktop system that was encrypted. The more systems the criminals can incapacitate, the more they can charge. This is why ransomware spreads like a virus. This means it’s likely your organization will need to recover all your data. Typical backup architectures were designed prior to ransomware and were designed to restore a small amount of data, like a single lost file or a single corrupt database. If it takes several days to restore and verify systems are up and there is an issue with any of your files, it could be too late to pay ransom and retrieve data.

Your organization may have spent a ton of time and money on that backup software and storage but it’s time to ditch it for something better before it’s too late.

Pure Storage FlashRecover- Why Fast Recovery is the Only Way to Escape Ransomware Jail

Thieves know that backups are an organization’s only chance, so backups are the first thing cybercriminals target. In this age of prolific widespread ransomware attacks, organizations need a new type of backup architecture to address this new form of modern threat.

Pure Storage FlashRecover is a simple, scalable ransomware recovery solution designed to instantly restore all your data from a safe, immutable offline copy with minimal interruptions to IT operations.

FlashRecover is powered by Cohesity, which means the backup data is never exposed to the network like traditional backup architectures. Rather, data is safely kept offline, only accessible via two-factor authentication with safeguards such as DataLock™ which renders backup objects non-deletable. With FlashRecover, you can be sure even your backup administrators could not delete your backup copies, much less a bad guy.

Now let’s talk about how you get your data back. FlashRecover uses FlashBlade’s all-flash, highly parallelized scale-out fast file and object platform for the storage layer. This means recoveries are even faster than the blazing backup speeds. Better yet, the entire solution is designed to leverage “Instant-Mass Restore” to be able to instantly recover thousands of virtual machines or even the largest databases, presented right from the backup storage, without waiting for data migrations. The instant recovery workflow is so fast, customers could easily test a full recovery every day if necessary.

Conclusion

What if your organization was hit by ransomware but didn’t have to worry because your IT team could restore everything instantly from an immutable copy? Network security will never prevent 100% of threats. It’s time for IT teams to upgrade old gear to a platform specifically designed to recover from ransomware. Pure Storage would like to help. Give your account team a call today to hear more about FlashRecover.

Welcome to the Modern Data Experience

Well fellow Sumo’s, it is time for another professional change. I have always said that the simplest solution is always the lowest TCO. To that end, I have always admired Pure Storage’s approach to simplifying data consumption.

Storage is an essential component for serving apps, yet most IT pros treat it like a necessary evil. I have met thousands of customers over the past 16 years and not a single one ever said anything like “I can’t wait to get back to my desk and use my storage array!”

Data should be easy to consume. The old infrastructure mafia failed miserably to provide a consumer-like user experience. Furthermore, these stuffy storage overlords want customers to pay extra for the extortion practices that keep them in business.

What I like about Pure is that they took a problem that desperately needed solving (deliver a modern data experience) and made it comically simple.

Since that first clean all-flash message, Pure has evolved into a subscription company, setting the standard for fair and transparent business practices with Evergreen.  Then Pure progressed into a predictive support company with Pure1, and in so doing, set a new bar for customer satisfaction with an NPS of 82. Next, Pure became a cloud services company with Cloud Block Storage and an analytics company with AIRI.

I look forward to taking you with me on this journey to the future. The evolution continues!

Cohesity Support Scores a Perfect 100 NPS

Calling tech support can be a lot like shopping at your only grocery store after it has initiated an aggressive stop-and-frisk policy. You may have come for food but an enthusiastic pat-down is definitely going to happen.

Think about calling your local cable company for help— Most people would rather shave their head with a cheese grater. I know I will certainly google-fu my issues before picking up the phone to call a vendor.

No product is impervious to problems. Customers understand this but when things go wrong, they expect snappy expertise from the vendor’s support to come to the rescue. Support is a critical aspect of technology consumption and largely determines an overall experience, yet it rarely gets seriously considered when evaluating new technology.

Typically, tech support does not meet customer’s reasonable expectations. As a result, organizations suffer frustrations or perhaps even a disruption to business processes. That’s what makes Cohesity’s accomplishment so impressive. Cohesity recently reported a perfect 100 Net Promoter Score, or “NPS” for short.

NPS is quickly becoming a notable standard in customer satisfaction survey/rating systems and is not limited to support or even technology. Historically, NPS has been used internally by some larger companies to help them understand how they are perceived by customers and to gauge general satisfaction levels.

Basically, customers are asked a single question: “How likely is it that you would recommend this company to a friend or colleague?” On a scale of 1-10, customers must rate at least a 9 or 10 to be considered a “promoter.”

The scoring system is not 0-100 as common sense would suggest, but rather -100 to +100. A score of 0 would be neutral, having a net of no promoters or detractors, which would not be very good. Most blue-chip tech companies range from 20s to the 30s. Apple currently has an NPS of 72, which is considered outstanding whereas Dell scored a 33.

At my last gig, we bragged about our stellar NPS of 85, and I’ve heard a few other young tech companies use NPS in their pitch with ratings in the upper 80s. Younger, smaller and more innovative tech companies often have much better product support ratings. This is largely due to:

  • A Fresh Approach

Established tech companies are often handcuffed to older systems with a large customer base, unable to start over and disrupt their legacy customers. They rely on older infrastructure that is built on manual, reactive processes. Starting fresh, companies can build the support processes right into the products with advanced technology.

  • Technological Advancement

Forward-thinking tech companies build the support infrastructure upfront (remote sensors, predictive AI, automation), to be more responsive to problems, often proactively detecting (or even resolving) the majority of support cases. Due to advanced automation and efficiencies, these new companies can often do away with tiered support models that delay resolutions and frustrates customers.

  • Single-Product Focus

Often, younger innovative tech companies have only one product to support, so skill, investments and expertise are centered around a singular area. For example, a customer’s first call into support at Cohesity is picked up by a level-3 engineer (in less than two minutes on average) that will have advanced expertise and typically resolve even sticky issues on the initial call.

Cohesity’s support has consistently had a score 90+, which basically meant we do not have unhappy customers. A perfect 100 means that Cohesity customers were not simply satisfied, they were ecstatic. ALL of them. That is ridiculously difficult to do.

Congratulations to all our Site Reliability Engineers (SREs) for reaching this achievement! Statistically, we are unlikely to maintain that forever but, WOW. Cohesity is clearly providing a radically differentiated support experience vs. our contemporaries.

You can read more about Cohesity’s NPS score here.

The Challenge of Ransomware Demands a “Cohesive” Approach

Finding out your data center has been infected by ransomware is sort of like finding out your mother has been dating Mötley Crüe rock drummer Tommy Lee— You know some terrible things have already happened and you’re going to have a mess to clean up. It might seem like something that happens only to other people but statistically it is likely to happen to you. StorageSumo is here to help.

The problem is so much worse than the average consumer knows. Let’s peel back some layers of this stinky onion and discuss how modern backups with Cohesity can give your IT staff an unfair advantage against cyberthreats like ransomware.

I’m going to address some basics including:

  • What ransomware is
  • What the true cost is to you and your organization
  • How ransomware enters your datacenter
  • How to prevent, detect and recover from a ransomware attack.

Sadly, very little is known about this particularly insidious form of cybercrimes. I suspect this is mostly because organizations are highly incented to minimize bad publicity, so the majority of incidents go unknown by the general population. Organizations also generally feel like their current backups act as an insurance policy that will effectively recover from a ransomware attack, so “we’re good.” Human nature is to avoid the most unpleasant aspects of life, no matter how likely.

What is Ransomware?

The typical answer goes something like this: Ransomware is an especially sinister strain of malware. Simply put, once your system is infected, ransomware holds your data hostage by encrypting the files, rendering them illegible and unusable until a ransom is paid. While this is probably the answer you were expecting, it is not the accurate answer.

The correct answer? Ransomware is a business.  Ransomware was not designed by anarchists for the purpose of sabotage. Data destruction is not the end goal. The business of ransomware is to be lucrative, which means getting customers to pay the ransom. To achieve this, an effective ransomware attack must work to make the payment as user-friendly as possible and also eliminate all possibility of recovery without paying. Let’s take a look at the payment instructions from an example of ransomware called “Wannacry:”

Notice the simplicity— very similar to other modern simple software designs. The local language, the clarity of the instructions and helpful links to find more info. Cybercriminals want you to pay the ransom and the easier they make it to pay, the better. This also means the ransom is typically affordable. It has to make actual sense to pay up. When the ransom is paid, victims will get most of their data back most of the time but, as you are about to see, the process and pain involved in recovering from ransomware goes far beyond the amount of the ransom.

The True Costs of Ransomware

Gather round, it’s sumo story time. The full cost of a ransomware attack is not easy to calculate.  In one particular instance, I had a retail customer that was the victim of phishing (which we will discuss further). As a result, approximately 200 Windows servers were encrypted, including the backup server. This nightmare began when the backups IT thought would save them from cyberthreats actually became a liability.

The lost data typically meant that business systems were offline, and files were illegible. With no access to critical systems, the employees were sent home. They were no longer able to accept new orders or fulfill existing orders and lost all access to the customer records system. This lasted over 5 days, which meant customers were forced to pivot to alternate suppliers.

The decision even to pay the ransom is a pain-point. The FBI encourages victims to not pay ransom, which would ideally discourage future criminals. Most would agree with this stance. After all, no one wants to reward a cybercriminal that successfully attacked their organization and there is no guarantee that victims would actually recover their data. This is a noble thought but remember that ransomware is a business, and the requested ransom typically ranges from a few hundred dollars to several thousand dollars. These amounts are low enough that, in most cases, it actually makes good business sense to pay the ransom. Also, Wannacry is a relatively un-sophisticated variant compared to newer strains that are starting actually to threaten to leak customer’s sensitive data.

In our example, the pain was so severe that this organization easily decided to pay the ransom ($17K), which represented only a few hours of lost profits. After an uncomfortable discussion with the company’s leadership and finance team, another unforeseen obstacle was the payment currency. Cybercriminals don’t accept purchase orders and they don’t offer net-30-day payment terms. This organization did not have a corporate Bitcoin wallet. Eventually a third-party consultant was engaged to pay the ransom and recover the data, but this delay cost the company valuable time.

Surprisingly, most of the time customers do get the decryption keys to “unlock” their data once the ransom is paid. Again, it’s a business. If no one ever got their data back, ransomware would not be effective. Unfortunately, this particular customer was emailed a spreadsheet with 200x unmarked decryption keys. Imagine being handed a bag of 200 unmarked door keys. There are 200 doors, each with a matching key and your job is to match all 200 doors with keys. As you can imagine, this would take some time, even with a large staff and a shared google sheet. Also remember that there’s a time bomb strapped to the data. After the first three days, the price goes up, but after seven days, it’s gone forever. This fun little detail is just another tactic to remind victims to pay, and to do it quickly:

Even when a matching decryption key was found, the IT staff noticed that many of the recovered servers would crash midway through the decryption process. The encrypted files were actually not decrypted “in-place,” but rather copied, which doubled the data capacity consumed. Servers that were <50% disk utilization decrypted OK, but many were >50% and required a game of musical chairs with the underlying storage system in order to accommodate the unexpected increased disk usage on many systems.

In the end, approximately 7% of the servers were unable to be recovered because keys were not found, there were disk capacity issues, or the customer simply ran out of time.

When most of the systems were eventually recovered, they were no longer the supplier for their previous customers. The loss to revenues and productivity was obvious, but the organization couldn’t foresee the lasting loss of credibility.

Employee morale was also noticeably lower. The IT staff had simply lost all credibility with their peers. The workforce had assumed that IT was protecting them from such an incident. A reputation is a fickle thing. Think about giving your money to Bernie Madoff for a new investment. That sounds crazy, but is it any more irrational than trusting your data with a staff that just lost it?

How Ransomware Enters Your Data Center

The entry point can vary but there are two primary sources: 1) user action and 2) system vulnerabilities. Until the Borg entirely lobotomizes all humans, ransomware will remain a source of pain. An infected email attachment or link is the most frequent source, so a good email scanning/filtering system is essential, but real people will still show up with their own infected devices, they will still plug USB flash drives with loaded malware and still click on infected attachments.

Forrester also says approximately 18% of attacks come from phishing. This technique involves social engineering, which tricks users into thinking they should enter/change their password or account info. This is really a challenge of educating your users, but this brings us to an important first point – no filtration system will prevent 100% of incidents because humans are a vulnerability. Cybercriminals can target users with pinpoint precision and leverage a user’s own access and knowledge to infiltrate a network. This strategic, intentional, customized version of phishing even has a sub-category, aptly called spear-phishing. No matter how impenetrable your firewall may seem, there is still a human element involved.

The other source of ransomware is through system vulnerabilities. Most are software-based vulnerabilities such as RDP (remote desktop protocol) that live within a popular operating system, such as Windows. Brute-force attacks have successfully targeted Windows desktops and servers with RDP enabled, allowing a cybercriminal to have full control of a system on your network. Occasionally there are hardware vulnerabilities exposed such as the Intel processor-based Spectre/Meltdown just to keep things spicy. 

Another software vulnerability is SMB (Windows server message block, aka “CIFS”), which enables ransomware to encrypt file-sharing standards as well as spread throughout the network like a virus, encrypting more desktops and servers. This means corporate network file shares are fish in barrels for ransomware because the attack surface is enormous, but also because it is especially difficult to enforce access controls against an attack.

Preventing Ransomware – Modern Threats Require Modern Backups

Any good ransomware plan needs to start with a strong backup strategy. Firewalls are great but, as the last line of defense, your backups might be the only thing standing between you and the torture chamber described thus far. Unfortunately, most IT organizations rely on backup systems that are just as vulnerable as the rest of the servers. 

Thieves know that backups are an organization’s only chance, so backups are the first thing cybercriminals target. Most backup applications are built on Windows servers, which are vulnerable to the same open liabilities that allow hackers to access any other systems. Backups also typically use network-attached storage (NAS) for repository/backup disk capacity, which, as previously discussed, is a tasty target for criminals to inflict their encryption pain.

These two components of most backup solutions (Windows operating systems and network file shares) are no longer protecting customers, but rather have turned into liabilities that leave organizations exposed to being attacked on the very systems they count on to save them from these types of threats.

Furthermore, these types of older backup solutions are designed to restore a single item. In the case of ransomware, it is likely that EVERYTHING needs to be restored. This would typically take weeks, assuming the backups were not also encrypted.  Also, what prevent defense has been implemented to prevent an immediate repeat incident?

Cohesity Offers a New Approach

In this age of prolific widespread ransomware attacks, organizations need a new type of backup architecture to address this new form of modern threat. A new type of data protection that is designed to address ransomware would:

  • Provide air-gapped immutability from corruption
  • Detect the likeliness of ransomware and alert users
  • Offer large-scale instant recovery from incidents

Designed in the modern era, Cohesity was specifically architected to provide secure protection from modern cyberthreats such as ransomware. Trigger-warning: things are about to get nerdy.

First, Cohesity prevents ransomware. Cohesity is a hyper-converged platform, which means the compute, storage and software are tightly coupled in a “node” architecture that scales out. The fact that storage is entirely integrated means there are some magical automated protections built right in that create immutability.

After Cohesity creates a backup, the file system (SpanFS) immediately creates a “snapshot.” This snapshot copy is kept offline and NEVER exposed back to the network. Even when backup data need to be accessed, Cohesity creates a “clone” of the snapshot and uses that copy rather than the original, just in case a crafty hacker is waiting to attack. The important point to remember is there is always a gold copy kept securely offline.  This process happens completely automatically and illustrates a fundamental advantage of having vertically integrated software and hardware.

Because the software is tightly integrated, Cohesity has controls in place to prevent unauthorized access. Cohesity has multi-factor authentication to block unauthorized access.

In a worse-case scenario, a human error or particularly devious sabotage incident could result in deleted backups, whether accidental or intentional. That is why Cohesity developed DataLock™, a feature that defines data as non-deletable until it hits the predefined retention policy, even by super-users.

Second, Cohesity has an AI-based detection feature that scans for anomalies, such as change rates/encryption rates and mass deletes. This creates and an entropy “score.” If it is determined that a customer’s data has possibly been victimized by ransomware, Cohesity alerts customers so that they can take action immediately before more business systems are affected.

Third, Cohesity can recover from a ransomware incident with a feature aptly named Instant Mass Restore. This is crucial because in the unfortunate event of a ransomware attack, it’s likely most if not all of the organization’s systems were affected, and thus all need to be restored to a point before ransomware encrypted the files. Instant Mass Restore allows your organization to instantly recover all your servers, databases and files to a granular restore point just before ransomware infected your data center.

Conclusion

Cybersecurity has traditionally been network-based. Network security is important, but networks can only be babyproofed so much without severely degrading the user experience or obstructing productivity. A more modern approach can allow for easy protection from ransomware without locking down the user.

What if your organization was hit by ransomware but didn’t have to worry because your IT team could instantly restore EVERYTHING from an immutable copy? Network security will never prevent 100% of threats. It’s time for the backup and network teams to get more cohesive (insert wink emoji-face) with Cohesity.

Chris Colotti (@ccolotti) giving an in-depth tour of the CohesityonWheels Unstoppable Truck Roadshow

Are Snapshots Good Enough for My Backup?

I do not mean to start a panic with this unsolicited advice but lately I have met a few vendors and even partners that are beginning to advocate for combining primary storage and data protection together with snapshots, forgoing the requirement for a separate backup solution.  So, are storage snapshots backup?  No!  Definitely not!  But also, maybe sort of.  Let me explain.

As you all may know, my previous gig was at a primary storage vendor, Nimble Storage.  That product offered efficient redirect-on-write (ROW) snapshots and so we frequently pushed the benefits of thin snapshots including no data movement, fast restores and low space overhead.  I always encouraged customers to snapshot and replicate everything including their servers, databases and network file shares.

Storage snaps are an important component to a complete data protection plan, but they are more of a “near-line” backup than a complete backup strategy.

“So if snapshots are so great, why are you telling us not to rely on them?”

Again- I’m a big snapshot fan.  Please keep doing them.  Just don’t rely totally on snapshots.  Why?  Snapshots are completely reliant on the underlying primary storage system they are intricately tied to.  Snapshots are mostly the same data blocks/files as the running primary copies of that data, with some clever pointer tables to create additional restore points.  These snapshots are generally very reliable because the primary storage systems they depend on are reliable, but they are not infallible.

Preface – Please know this is NOT in any way a judgement on NetApp storage.  NetApp is an excellent, successful company and they make terrific products.  NetApp also pioneered the modern ROW-style snapshots that allow the protection benefits aforementioned and is still the gold standard today.  I would gladly discuss this incident with any NetApp team directly or any other storage vendor for that matter.

Several years ago, a local city municipality (who shall remain shameless) was considering my storage product versus staying with the NetApp incumbent storage and upgrading to a newer array.   While they liked both offers, it was no surprise when the city decided to stay with NetApp because of their familiarity, easier public contract procurement & previously positive experience.

We parted ways as friends, and I assumed I would not be hearing from the city IT team again.  Sadly, I was wrong.  Approximately 5 months later, we were urgently asked to come back and present our solution again.  During the meeting, we were informed that approximately three months after their new system was deployed, the NetApp array suffered a catastrophic system failure, destroying all primary volumes and snapshots.

“But we replicate to another system so we’re good, right?”

This customer also had a second NetApp system they replicated to.  The corrupted data had been replicated and even reported completion but were also entirely corrupted beyond recovery.  This happened because even though there was a copy of the data on another device, it was the same format and underlying platform and thus proliferated the problem rather than creating an air-gapped separate copy on a different platform.

After weeks of escalated data recovery efforts with the vendor, the customer was finally able to restore most of their data from approximately three months earlier from the downstream replicated system.  Approximately three months of public records were completely lost.

The city IT manager explained that they were in active litigation with the storage vendor and reseller to get their money back, and if successful, wanted to know if they could still get that deal on our storage.

Again, no primary storage array is impervious to serious problems such as downtime or worse- data loss.  Any enterprise-grade storage system will include multi-level checksums, redundant hardware and even snapshots to prevent such issues, yet they still happen.  To make matters worse, storage vendors are highly motivated to camouflage or even outright deny their losses to prevent/minimize bad press, which I believe leads to a false sense of security.

To make this problem more blurred, there are some hyper-converged (HCI) primary providers that are now claiming to “build-in” backup into the solution.  HCI is becoming more and more popular, but it’s relatively new and many customers are woefully under-educated.  So when a primary HCI vendor comes along and says “you don’t need to do backup anymore, we do that already,” it sounds like a used car salesman explaining how that hood latch is actually supposed to open with a coat hanger.

Typical Technology Buying Experience (Dramatized)

To clarify, hyperconverged infrastructure is a newer way to store data and manage infrastructure, combining servers and storage into one unit that scales out.  To protect against component failure, HCI platforms typically make copies of the data across multiple nodes which creates redundancy.  So, when these HCI vendors create a snapshot, some of these vendors are now calling these backups rather than snapshots simply because the data will be replicated across nodes.  This works well for protection from component failures but does little to protect against platform-level events such as what my customer experienced.

The only way to totally protect data from these sorts of incidents is to create an air-gapped copy of the data.  This means when planning for data protection, organizations should always create a backup that is on an entirely separate storage platform that is not accessible on the network.

I was fortunate to not have a customer case that resulted in such catastrophic data loss, but I never would advocate that customers only use the vertically integrated snapshots and replication features to protect data.

Snapshots AND Backup – Love Will Keep Us Together

Cohesity Backup with Storage Snapshot Integration

Chips & salsa, crocs & socks, Captain & Tennille- Some things are great by themselves but are simply unstoppable when combined.  Such is the case for primary storage snapshots and backup.  The increased demands of organizations to protect more applications faster is driving the need for the kind of protection that can only be initialized by snapshots which makes it possible to take near instant recovery points with no data movement.  The only trouble is these snapshots are insufficient to adequately protect workloads from system failures, cybercrimes and site disasters.

Do both!  Snapshots and backup are not mutually exclusive but rather two integral parts of a complete data protection strategy.  Better yet, choose platforms with tight integration between backup and primary storage.  I know what you’re thinking-

“Hey pal, I came to this site looking for Sumo suits- I’m not even sure I like this blog, don’t throw me curve balls like that!”

Storage Sumo Browser History

When Cohesity customers have Pure Storage, Cisco HyperFlex, Isilon or NetApp, Cohesity can manage and offload the snapshots of the primary storage system to the Cohesity cluster.  Cohesity can initiate an instant backup job simply by telling the primary storage system to take a snapshot.  Next, in the background and completely automatically, Cohesity will backup the changed snapshot data rather than the running instance of the object, such as a virtual server.  This process means older snapshots can be deleted from the primary system where they could consume valuable resources but are still accessible from the secondary Cohesity system.  This process also takes the backup workflow completely out of band from the primary production network, significantly lowering the impact of the backup process on the primary server & storage networks.  Cohesity has integration planned soon with many more popular primary storage systems so stay tuned for more!

I will mercifully summarize – If there are any workloads today that are only protected with snapshots & replication, I would recommend some augmented protected solution.  Now as a reward for reading all the way to the bottom and eating your vegetables, please enjoy the greatest song ever made:

What is Secondary Storage?

Why does the universe exist?  Do we have free will?  Coke or Pepsi?  What is secondary storage?

These are a few of the tough questions we at StorageSumo are here to answer for you.  As discussed in my first post (Greener Pastures), @BTNimble and I have recently gone green and we are now working for Cohesity, A provider of secondary storage solutions. Secondary stuff is awesome!  If you’re like me, you might have a secondary fridge, a secondary car, and even a secondary child.  We all know secondary stuff is great but what about secondary storage?

I suppose the easiest (and probably laziest) definition is, “Everything that is not primary storage.”  Primary storage represents approximately 20% of the overall data center capacity.  At our previous job, Bryan and I provided high-performance primary storage.  We were also peers with similar vendors that offered storage and hyperconverged appliances for customers looking to store primary production data and applications that make up this 20%.

Primary Apps?  Like Candy Crush?

Not exactly.  These applications were typically systems of record.  In every organization I have consulted with, there is invariably a database containing critical information such as customer records, order management, student enrollment and so on.  These systems of record are almost always the most vital assets the organization possesses and are serviced with maximum care.  These types of workloads are very performance-sensitive and if they aren’t snappy, the organization suffers tremendously with lack of productivity, possibly even availability, and worse- grumpy employees standing over your desk asking if you tried turning it off and on again.

If ANY impedance to data delivery of this type occurs, the organization suffers.  If these records were to somehow vanish, the organization might just as well not exist.  THIS is obviously primary data meant for primary storage because it is essential, but also it has a strict performance SLA (service level agreement).

Backup and Archival

Now we’re getting to some secondary stuff!  For many organizations, the largest amount of storage capacity is dedicated for backup and archival retention of data. This gives IT a time machine to be able to recover from any incident that might impede the organization’s data availability or integrity.  These incidents could include (but are not limited to):

  • Physical hardware failure
  • Data corruption
  • Human error
  • Ransomware attacks
  • Sabotage
  • Lost or misplaced files
  • Site-Disasters (fire/theft/flood/Chernobyl)

If none of these things have ever happened to your organization, you are very lucky and I hate you.  More likely, multiple variations of these incidents have happened to your organization multiple times.  In a best-case scenario, your IT staff was prepared and able to recover quickly.  In either case, we’ll look closer at this dark little corner later in another post but for now, just know that backups require tertiary storage – what’s known as “air-gapped” from the primary storage system – and lots of it.

While backups and archives are incredibly important, by their very nature these are a second (or more) copy of the primary storage.  This is by design so that there is physical separation from the primary data, a so-called “air-gap.”  In fact, a good backup strategy will include multiple copies for reasons such as multiple recovery-points (versioning), and also copies offsite to preserve data in the event of a site disaster.

For the purpose of defining secondary storage, remember that backups are not SLA-driven in the same way primary data is.  Sure, it’s vital that the backups are complete, reliable and in a certain window of time, but this type of workload is more throughput-dependent than latency-sensitive.  In other words, backup jobs are sort of like moving gravel with dump trucks.  There’s a lot of stuff to move and it is important stuff, but it wouldn’t make sense to use a Lamborghini and no one will complain if it takes a little longer to get there.

Other Examples

We’ve established that backups are one clear way we make copies of primary storage, an obvious use of secondary storage.  What else is there?  Well, there are actually tons of use cases.  Take testing and development, or “Test/Dev” as the cool kids say.

To paraphrase Murphy’s law, what could go wrong usually does.  This is why organizations would prefer to test changes, such as updates or upgrades in a “safe-zone.”  IT often has a duplicate similar environment of their primary data and applications in a test/dev silo.  In many cases, organizations will try to repurpose older gear to save money but this does often require doubling infrastructure costs.

But what about my PowerPoint proposal for television in the men’s restroom?

Don’t worry.  That and the rest of your documents are important too.  Interestingly, secondary data also includes user’s data such as documents, spreadsheets, presentation files and even pictures or videos.  This “bulk” data is typically stored on a network-attached storage (NAS) device.  This is so the user files can be efficiently and securely managed centrally and persist even when a desktop is replaced or an employee quits.  This file/NAS storage is also critical to the application’s success and the users depend on this to work, but this data does not have a strict performance requirement.  This type of file/NAS storage is also an excellent example of secondary storage.

Why Cohesity is built for Secondary Storage

Secondary data is actually many, MANY times larger than primary data capacity.  By even conservative estimates, secondary data comprises >80% of the overall data capacity.  Storage in the data center actually maps nicely to an iceberg- 80% or more is actually below the surface.  Most IT leaders will admit that they have 6-8 copies of their data for various reasons. Primary storage is for apps with strict performance SLA’s and secondary storage is for apps without strict SLA’s. 

Mass Data Fragmentation

Cohesity was founded to solve the problem of mass data fragmentation.  While primary storage vendors have made tremendous progress consolidating that 20% of data center workloads, Cohesity is purpose-built to consolidate the much larger 80% of data that is considered secondary.  This 80% is typically scattered across multiple siloed environments.  By consolidating backups, archival data, file shares, test/dev and analytics into a single web-scale platform, Cohesity customers are able to reduce the number of physical data copies, vendors, support renewals and management interfaces down to one.  This makes managing data radically simpler, and I firmly believe that the simplest solution always has the lowest total cost.

There you have it!  Next post, we’ll tackle another one of life’s deep mysteries but for now we can close the books on this one.  Stay tuned for more storage ramblings as well as details around Cohesity’s DataPlatform®.  Until then, au revoir!

Greener Pastures

After 5 ½ years at Nimble Storage, I recently made the difficult decision to leave for greener pastures-  As of today, my stellar account executive Bryan and I are Cohesians!  I wish my friends back at HPE | Nimble all the best.  I am grateful to you all for the experience and I look forward to watching as you duel it out in the Coliseum that is the primary storage marketplace – This time watching from the stands.  I’ve got my popcorn ready!

Why Cohesity

I hope it goes without saying I wanted to work for a company that shares my professional values around winning the right way, customer focus, and something our founder, Mohit Aron says, “Stay humble and keep learning.”  That statement itself is pretty humble from the lead architect behind the google file system (GFS), along with founding one of technology’s most successful startups in recent years, Nutanix.

Of course I also needed to work for a company I believe has luminescent technology, something that can actually help organizations reach their objectives easier and faster.  I joined Cohesity because I firmly believe that the simplest solution always has the lowest TCO, and managing data has to get radically simpler. 

The Problem with Today’s Storage Offerings

The primary storage market has undergone a serious transformation in the past 5 years, thanks largely to NAND flash, which is THE game changer in primary storage today.  The transition is well under way and primary storage providers offer terrific choices for customers looking to upgrade their old primary production storage systems with a flashy new storage array or hyperconverged appliances.

Are newer storage systems faster?  You bet!  More efficient?  Certainly.  Simpler?  Like, completely eliminating islands of backup/file/object/cloud storage?  Eh, not really.  While primary storage today is faster and more efficient than ever before, I noticed that many of my customer’s most insistent demands were not being addressed.  These needs include (but are not restricted to):

  • Comprehensive consolidation of data silos
  • Modernized data protection and ransomware strategy
  • Improved operational visibility (think dark data)
  • App mobility (from site à site, site à cloud or cloud à cloud)
  • Ability to scale up/out
  • Tech refresh & lifecycle management

A system for managing data storage that is truly simple would address most of these needs, but the problem is that massive data fragmentation has led to dark silos of fractured infrastructure that is vulnerable to threats, immobile, inefficient and impossible to extract any value from.  No commercially available platform can really address all these needs, at least not until now.

We are entering into a new era where data growth is exponential, and merely updating new storage media and protocols has done very little to solve these newer fragmentation, mobility and visibility difficulties.  Historically, attempting to address these large-scale customer needs with a single service, vendor or application is like trying to strap up another horse to pull your buggy…  It might go a little faster but it’s more complex and it will never be a car.

Back it Up, Back it Up – Beep, Beep, Beep

Take for example, data protection.  Let’s say your company puts out an RFP for a complete backup solution.  One particular vendor offers data protection software for backup & DR.  After a nice demo, this starts to sound pretty good but upon further examination, a complete solution would require a server OS to run the software (like Windows), a general purpose files system (like NTFS), and a disk appliance sized to your best projections for several years.

Even within just this one area of secondary data, we have given an example of fragmentation of GUI’s, multiple vendor relationships, support contracts and so on.  Even worse, at large scale, this type of traditional backup architecture will require multiple proxy servers and disk silos to spread the load, further amplifying the fragmentation for larger enterprise organizations.

While at first, this backup software looked promising to address a critical need for data protection, now this solution looks far too complex and limited.  This type of typical backup solution does nothing to collapse other silos of storage such as file/object, test/dev & analytical workloads, and prolonged exposure will give you… Confusion!

Managing data is way too complex.  As my peer Dimitris says on his excellent blog post here, “storage should be easy to consume.” I whole-heartedly agree with his thesis statement.  I believe what Cohesity offers is something fundamentally new. 

After a few of my most-respected peers moved to Cohesity, I had to rub the magic lamp.  Out popped a blue Robin Williams and Poof, I’m here!  Now let me introduce you to our enchanted potion (ok, I’m done with magic jokes.)

Cohesity addresses these needs with a platform we suitably named DataPlatform®.  Cohesity DataPlatform® is a scale-out solution that consolidates all your secondary workloads, including backup and recovery, files and objects, test/dev and analytics in a single, cloud-native solution.  I look forward to telling you all about Cohesity in follow-on posts about what DataPlatform® does and how it does it, but for now- just know that it’s really cool.

If you read this far, thank you! Bryan and I plan to have many more posts and pics as we tour the mid-south spreading green goodness and making all our customer’s data dreams come true so stay tuned, and as always, #GoGreen!