Are Snapshots Good Enough for My Backup?

I do not mean to start a panic with this unsolicited advice but lately I have met a few vendors and even partners that are beginning to advocate for combining primary storage and data protection together with snapshots, forgoing the requirement for a separate backup solution.  So, are storage snapshots backup?  No!  Definitely not!  But also, maybe sort of.  Let me explain.

As you all may know, my previous gig was at a primary storage vendor, Nimble Storage.  That product offered efficient redirect-on-write (ROW) snapshots and so we frequently pushed the benefits of thin snapshots including no data movement, fast restores and low space overhead.  I always encouraged customers to snapshot and replicate everything including their servers, databases and network file shares.

Storage snaps are an important component to a complete data protection plan, but they are more of a “near-line” backup than a complete backup strategy.

“So if snapshots are so great, why are you telling us not to rely on them?”

Again- I’m a big snapshot fan.  Please keep doing them.  Just don’t rely totally on snapshots.  Why?  Snapshots are completely reliant on the underlying primary storage system they are intricately tied to.  Snapshots are mostly the same data blocks/files as the running primary copies of that data, with some clever pointer tables to create additional restore points.  These snapshots are generally very reliable because the primary storage systems they depend on are reliable, but they are not infallible.

Preface – Please know this is NOT in any way a judgement on NetApp storage.  NetApp is an excellent, successful company and they make terrific products.  NetApp also pioneered the modern ROW-style snapshots that allow the protection benefits aforementioned and is still the gold standard today.  I would gladly discuss this incident with any NetApp team directly or any other storage vendor for that matter.

Several years ago, a local city municipality (who shall remain shameless) was considering my storage product versus staying with the NetApp incumbent storage and upgrading to a newer array.   While they liked both offers, it was no surprise when the city decided to stay with NetApp because of their familiarity, easier public contract procurement & previously positive experience.

We parted ways as friends, and I assumed I would not be hearing from the city IT team again.  Sadly, I was wrong.  Approximately 5 months later, we were urgently asked to come back and present our solution again.  During the meeting, we were informed that approximately three months after their new system was deployed, the NetApp array suffered a catastrophic system failure, destroying all primary volumes and snapshots.

“But we replicate to another system so we’re good, right?”

This customer also had a second NetApp system they replicated to.  The corrupted data had been replicated and even reported completion but were also entirely corrupted beyond recovery.  This happened because even though there was a copy of the data on another device, it was the same format and underlying platform and thus proliferated the problem rather than creating an air-gapped separate copy on a different platform.

After weeks of escalated data recovery efforts with the vendor, the customer was finally able to restore most of their data from approximately three months earlier from the downstream replicated system.  Approximately three months of public records were completely lost.

The city IT manager explained that they were in active litigation with the storage vendor and reseller to get their money back, and if successful, wanted to know if they could still get that deal on our storage.

Again, no primary storage array is impervious to serious problems such as downtime or worse- data loss.  Any enterprise-grade storage system will include multi-level checksums, redundant hardware and even snapshots to prevent such issues, yet they still happen.  To make matters worse, storage vendors are highly motivated to camouflage or even outright deny their losses to prevent/minimize bad press, which I believe leads to a false sense of security.

To make this problem more blurred, there are some hyper-converged (HCI) primary providers that are now claiming to “build-in” backup into the solution.  HCI is becoming more and more popular, but it’s relatively new and many customers are woefully under-educated.  So when a primary HCI vendor comes along and says “you don’t need to do backup anymore, we do that already,” it sounds like a used car salesman explaining how that hood latch is actually supposed to open with a coat hanger.

Typical Technology Buying Experience (Dramatized)

To clarify, hyperconverged infrastructure is a newer way to store data and manage infrastructure, combining servers and storage into one unit that scales out.  To protect against component failure, HCI platforms typically make copies of the data across multiple nodes which creates redundancy.  So, when these HCI vendors create a snapshot, some of these vendors are now calling these backups rather than snapshots simply because the data will be replicated across nodes.  This works well for protection from component failures but does little to protect against platform-level events such as what my customer experienced.

The only way to totally protect data from these sorts of incidents is to create an air-gapped copy of the data.  This means when planning for data protection, organizations should always create a backup that is on an entirely separate storage platform that is not accessible on the network.

I was fortunate to not have a customer case that resulted in such catastrophic data loss, but I never would advocate that customers only use the vertically integrated snapshots and replication features to protect data.

Snapshots AND Backup – Love Will Keep Us Together

Cohesity Backup with Storage Snapshot Integration

Chips & salsa, crocs & socks, Captain & Tennille- Some things are great by themselves but are simply unstoppable when combined.  Such is the case for primary storage snapshots and backup.  The increased demands of organizations to protect more applications faster is driving the need for the kind of protection that can only be initialized by snapshots which makes it possible to take near instant recovery points with no data movement.  The only trouble is these snapshots are insufficient to adequately protect workloads from system failures, cybercrimes and site disasters.

Do both!  Snapshots and backup are not mutually exclusive but rather two integral parts of a complete data protection strategy.  Better yet, choose platforms with tight integration between backup and primary storage.  I know what you’re thinking-

“Hey pal, I came to this site looking for Sumo suits- I’m not even sure I like this blog, don’t throw me curve balls like that!”

Storage Sumo Browser History

When Cohesity customers have Pure Storage, Cisco HyperFlex, Isilon or NetApp, Cohesity can manage and offload the snapshots of the primary storage system to the Cohesity cluster.  Cohesity can initiate an instant backup job simply by telling the primary storage system to take a snapshot.  Next, in the background and completely automatically, Cohesity will backup the changed snapshot data rather than the running instance of the object, such as a virtual server.  This process means older snapshots can be deleted from the primary system where they could consume valuable resources but are still accessible from the secondary Cohesity system.  This process also takes the backup workflow completely out of band from the primary production network, significantly lowering the impact of the backup process on the primary server & storage networks.  Cohesity has integration planned soon with many more popular primary storage systems so stay tuned for more!

I will mercifully summarize – If there are any workloads today that are only protected with snapshots & replication, I would recommend some augmented protected solution.  Now as a reward for reading all the way to the bottom and eating your vegetables, please enjoy the greatest song ever made:

Greener Pastures

After 5 ½ years at Nimble Storage, I recently made the difficult decision to leave for greener pastures-  As of today, my stellar account executive Bryan and I are Cohesians!  I wish my friends back at HPE | Nimble all the best.  I am grateful to you all for the experience and I look forward to watching as you duel it out in the Coliseum that is the primary storage marketplace – This time watching from the stands.  I’ve got my popcorn ready!

Why Cohesity

I hope it goes without saying I wanted to work for a company that shares my professional values around winning the right way, customer focus, and something our founder, Mohit Aron says, “Stay humble and keep learning.”  That statement itself is pretty humble from the lead architect behind the google file system (GFS), along with founding one of technology’s most successful startups in recent years, Nutanix.

Of course I also needed to work for a company I believe has luminescent technology, something that can actually help organizations reach their objectives easier and faster.  I joined Cohesity because I firmly believe that the simplest solution always has the lowest TCO, and managing data has to get radically simpler. 

The Problem with Today’s Storage Offerings

The primary storage market has undergone a serious transformation in the past 5 years, thanks largely to NAND flash, which is THE game changer in primary storage today.  The transition is well under way and primary storage providers offer terrific choices for customers looking to upgrade their old primary production storage systems with a flashy new storage array or hyperconverged appliances.

Are newer storage systems faster?  You bet!  More efficient?  Certainly.  Simpler?  Like, completely eliminating islands of backup/file/object/cloud storage?  Eh, not really.  While primary storage today is faster and more efficient than ever before, I noticed that many of my customer’s most insistent demands were not being addressed.  These needs include (but are not restricted to):

  • Comprehensive consolidation of data silos
  • Modernized data protection and ransomware strategy
  • Improved operational visibility (think dark data)
  • App mobility (from site à site, site à cloud or cloud à cloud)
  • Ability to scale up/out
  • Tech refresh & lifecycle management

A system for managing data storage that is truly simple would address most of these needs, but the problem is that massive data fragmentation has led to dark silos of fractured infrastructure that is vulnerable to threats, immobile, inefficient and impossible to extract any value from.  No commercially available platform can really address all these needs, at least not until now.

We are entering into a new era where data growth is exponential, and merely updating new storage media and protocols has done very little to solve these newer fragmentation, mobility and visibility difficulties.  Historically, attempting to address these large-scale customer needs with a single service, vendor or application is like trying to strap up another horse to pull your buggy…  It might go a little faster but it’s more complex and it will never be a car.

Back it Up, Back it Up – Beep, Beep, Beep

Take for example, data protection.  Let’s say your company puts out an RFP for a complete backup solution.  One particular vendor offers data protection software for backup & DR.  After a nice demo, this starts to sound pretty good but upon further examination, a complete solution would require a server OS to run the software (like Windows), a general purpose files system (like NTFS), and a disk appliance sized to your best projections for several years.

Even within just this one area of secondary data, we have given an example of fragmentation of GUI’s, multiple vendor relationships, support contracts and so on.  Even worse, at large scale, this type of traditional backup architecture will require multiple proxy servers and disk silos to spread the load, further amplifying the fragmentation for larger enterprise organizations.

While at first, this backup software looked promising to address a critical need for data protection, now this solution looks far too complex and limited.  This type of typical backup solution does nothing to collapse other silos of storage such as file/object, test/dev & analytical workloads, and prolonged exposure will give you… Confusion!

Managing data is way too complex.  As my peer Dimitris says on his excellent blog post here, “storage should be easy to consume.” I whole-heartedly agree with his thesis statement.  I believe what Cohesity offers is something fundamentally new. 

After a few of my most-respected peers moved to Cohesity, I had to rub the magic lamp.  Out popped a blue Robin Williams and Poof, I’m here!  Now let me introduce you to our enchanted potion (ok, I’m done with magic jokes.)

Cohesity addresses these needs with a platform we suitably named DataPlatform®.  Cohesity DataPlatform® is a scale-out solution that consolidates all your secondary workloads, including backup and recovery, files and objects, test/dev and analytics in a single, cloud-native solution.  I look forward to telling you all about Cohesity in follow-on posts about what DataPlatform® does and how it does it, but for now- just know that it’s really cool.

If you read this far, thank you! Bryan and I plan to have many more posts and pics as we tour the mid-south spreading green goodness and making all our customer’s data dreams come true so stay tuned, and as always, #GoGreen!

What is Secondary Storage?

Why does the universe exist?  Do we have free will?  Coke or Pepsi?  What is secondary storage?

These are a few of the tough questions we at StorageSumo are here to answer for you.  As discussed in my first post (Greener Pastures), @BTNimble and I have recently gone green and we are now working for Cohesity, A provider of secondary storage solutions. Secondary stuff is awesome!  If you’re like me, you might have a secondary fridge, a secondary car, and even a secondary child.  We all know secondary stuff is great but what about secondary storage?

I suppose the easiest (and probably laziest) definition is, “Everything that is not primary storage.”  Primary storage represents approximately 20% of the overall data center capacity.  At our previous job, Bryan and I provided high-performance primary storage.  We were also peers with similar vendors that offered storage and hyperconverged appliances for customers looking to store primary production data and applications that make up this 20%.

Primary Apps?  Like Candy Crush?

Not exactly.  These applications were typically systems of record.  In every organization I have consulted with, there is invariably a database containing critical information such as customer records, order management, student enrollment and so on.  These systems of record are almost always the most vital assets the organization possesses and are serviced with maximum care.  These types of workloads are very performance-sensitive and if they aren’t snappy, the organization suffers tremendously with lack of productivity, possibly even availability, and worse- grumpy employees standing over your desk asking if you tried turning it off and on again.

If ANY impedance to data delivery of this type occurs, the organization suffers.  If these records were to somehow vanish, the organization might just as well not exist.  THIS is obviously primary data meant for primary storage because it is essential, but also it has a strict performance SLA (service level agreement).

Backup and Archival

Now we’re getting to some secondary stuff!  For many organizations, the largest amount of storage capacity is dedicated for backup and archival retention of data. This gives IT a time machine to be able to recover from any incident that might impede the organization’s data availability or integrity.  These incidents could include (but are not limited to):

  • Physical hardware failure
  • Data corruption
  • Human error
  • Ransomware attacks
  • Sabotage
  • Lost or misplaced files
  • Site-Disasters (fire/theft/flood/Chernobyl)

If none of these things have ever happened to your organization, you are very lucky and I hate you.  More likely, multiple variations of these incidents have happened to your organization multiple times.  In a best-case scenario, your IT staff was prepared and able to recover quickly.  In either case, we’ll look closer at this dark little corner later in another post but for now, just know that backups require tertiary storage – what’s known as “air-gapped” from the primary storage system – and lots of it.

While backups and archives are incredibly important, by their very nature these are a second (or more) copy of the primary storage.  This is by design so that there is physical separation from the primary data, a so-called “air-gap.”  In fact, a good backup strategy will include multiple copies for reasons such as multiple recovery-points (versioning), and also copies offsite to preserve data in the event of a site disaster.

For the purpose of defining secondary storage, remember that backups are not SLA-driven in the same way primary data is.  Sure, it’s vital that the backups are complete, reliable and in a certain window of time, but this type of workload is more throughput-dependent than latency-sensitive.  In other words, backup jobs are sort of like moving gravel with dump trucks.  There’s a lot of stuff to move and it is important stuff, but it wouldn’t make sense to use a Lamborghini and no one will complain if it takes a little longer to get there.

Other Examples

We’ve established that backups are one clear way we make copies of primary storage, an obvious use of secondary storage.  What else is there?  Well, there are actually tons of use cases.  Take testing and development, or “Test/Dev” as the cool kids say.

To paraphrase Murphy’s law, what could go wrong usually does.  This is why organizations would prefer to test changes, such as updates or upgrades in a “safe-zone.”  IT often has a duplicate similar environment of their primary data and applications in a test/dev silo.  In many cases, organizations will try to repurpose older gear to save money but this does often require doubling infrastructure costs.

But what about my PowerPoint proposal for television in the men’s restroom?

Don’t worry.  That and the rest of your documents are important too.  Interestingly, secondary data also includes user’s data such as documents, spreadsheets, presentation files and even pictures or videos.  This “bulk” data is typically stored on a network-attached storage (NAS) device.  This is so the user files can be efficiently and securely managed centrally and persist even when a desktop is replaced or an employee quits.  This file/NAS storage is also critical to the application’s success and the users depend on this to work, but this data does not have a strict performance requirement.  This type of file/NAS storage is also an excellent example of secondary storage.

Why Cohesity is built for Secondary Storage

Secondary data is actually many, MANY times larger than primary data capacity.  By even conservative estimates, secondary data comprises >80% of the overall data capacity.  Storage in the data center actually maps nicely to an iceberg- 80% or more is actually below the surface.  Most IT leaders will admit that they have 6-8 copies of their data for various reasons. Primary storage is for apps with strict performance SLA’s and secondary storage is for apps without strict SLA’s. 

Mass Data Fragmentation

Cohesity was founded to solve the problem of mass data fragmentation.  While primary storage vendors have made tremendous progress consolidating that 20% of data center workloads, Cohesity is purpose-built to consolidate the much larger 80% of data that is considered secondary.  This 80% is typically scattered across multiple siloed environments.  By consolidating backups, archival data, file shares, test/dev and analytics into a single web-scale platform, Cohesity customers are able to reduce the number of physical data copies, vendors, support renewals and management interfaces down to one.  This makes managing data radically simpler, and I firmly believe that the simplest solution always has the lowest total cost.

There you have it!  Next post, we’ll tackle another one of life’s deep mysteries but for now we can close the books on this one.  Stay tuned for more storage ramblings as well as details around Cohesity’s DataPlatform®.  Until then, au revoir!