What is Secondary Storage?

Why does the universe exist?  Do we have free will?  Coke or Pepsi?  What is secondary storage?

These are a few of the tough questions we at StorageSumo are here to answer for you.  As discussed in my first post (Greener Pastures), @BTNimble and I have recently gone green and we are now working for Cohesity, A provider of secondary storage solutions. Secondary stuff is awesome!  If you’re like me, you might have a secondary fridge, a secondary car, and even a secondary child.  We all know secondary stuff is great but what about secondary storage?

I suppose the easiest (and probably laziest) definition is, “Everything that is not primary storage.”  Primary storage represents approximately 20% of the overall data center capacity.  At our previous job, Bryan and I provided high-performance primary storage.  We were also peers with similar vendors that offered storage and hyperconverged appliances for customers looking to store primary production data and applications that make up this 20%.

Primary Apps?  Like Candy Crush?

Not exactly.  These applications were typically systems of record.  In every organization I have consulted with, there is invariably a database containing critical information such as customer records, order management, student enrollment and so on.  These systems of record are almost always the most vital assets the organization possesses and are serviced with maximum care.  These types of workloads are very performance-sensitive and if they aren’t snappy, the organization suffers tremendously with lack of productivity, possibly even availability, and worse- grumpy employees standing over your desk asking if you tried turning it off and on again.

If ANY impedance to data delivery of this type occurs, the organization suffers.  If these records were to somehow vanish, the organization might just as well not exist.  THIS is obviously primary data meant for primary storage because it is essential, but also it has a strict performance SLA (service level agreement).

Backup and Archival

Now we’re getting to some secondary stuff!  For many organizations, the largest amount of storage capacity is dedicated for backup and archival retention of data. This gives IT a time machine to be able to recover from any incident that might impede the organization’s data availability or integrity.  These incidents could include (but are not limited to):

  • Physical hardware failure
  • Data corruption
  • Human error
  • Ransomware attacks
  • Sabotage
  • Lost or misplaced files
  • Site-Disasters (fire/theft/flood/Chernobyl)

If none of these things have ever happened to your organization, you are very lucky and I hate you.  More likely, multiple variations of these incidents have happened to your organization multiple times.  In a best-case scenario, your IT staff was prepared and able to recover quickly.  In either case, we’ll look closer at this dark little corner later in another post but for now, just know that backups require tertiary storage – what’s known as “air-gapped” from the primary storage system – and lots of it.

While backups and archives are incredibly important, by their very nature these are a second (or more) copy of the primary storage.  This is by design so that there is physical separation from the primary data, a so-called “air-gap.”  In fact, a good backup strategy will include multiple copies for reasons such as multiple recovery-points (versioning), and also copies offsite to preserve data in the event of a site disaster.

For the purpose of defining secondary storage, remember that backups are not SLA-driven in the same way primary data is.  Sure, it’s vital that the backups are complete, reliable and in a certain window of time, but this type of workload is more throughput-dependent than latency-sensitive.  In other words, backup jobs are sort of like moving gravel with dump trucks.  There’s a lot of stuff to move and it is important stuff, but it wouldn’t make sense to use a Lamborghini and no one will complain if it takes a little longer to get there.

Other Examples

We’ve established that backups are one clear way we make copies of primary storage, an obvious use of secondary storage.  What else is there?  Well, there are actually tons of use cases.  Take testing and development, or “Test/Dev” as the cool kids say.

To paraphrase Murphy’s law, what could go wrong usually does.  This is why organizations would prefer to test changes, such as updates or upgrades in a “safe-zone.”  IT often has a duplicate similar environment of their primary data and applications in a test/dev silo.  In many cases, organizations will try to repurpose older gear to save money but this does often require doubling infrastructure costs.

But what about my PowerPoint proposal for television in the men’s restroom?

Don’t worry.  That and the rest of your documents are important too.  Interestingly, secondary data also includes user’s data such as documents, spreadsheets, presentation files and even pictures or videos.  This “bulk” data is typically stored on a network-attached storage (NAS) device.  This is so the user files can be efficiently and securely managed centrally and persist even when a desktop is replaced or an employee quits.  This file/NAS storage is also critical to the application’s success and the users depend on this to work, but this data does not have a strict performance requirement.  This type of file/NAS storage is also an excellent example of secondary storage.

Why Cohesity is built for Secondary Storage

Secondary data is actually many, MANY times larger than primary data capacity.  By even conservative estimates, secondary data comprises >80% of the overall data capacity.  Storage in the data center actually maps nicely to an iceberg- 80% or more is actually below the surface.  Most IT leaders will admit that they have 6-8 copies of their data for various reasons. Primary storage is for apps with strict performance SLA’s and secondary storage is for apps without strict SLA’s. 

Mass Data Fragmentation

Cohesity was founded to solve the problem of mass data fragmentation.  While primary storage vendors have made tremendous progress consolidating that 20% of data center workloads, Cohesity is purpose-built to consolidate the much larger 80% of data that is considered secondary.  This 80% is typically scattered across multiple siloed environments.  By consolidating backups, archival data, file shares, test/dev and analytics into a single web-scale platform, Cohesity customers are able to reduce the number of physical data copies, vendors, support renewals and management interfaces down to one.  This makes managing data radically simpler, and I firmly believe that the simplest solution always has the lowest total cost.

There you have it!  Next post, we’ll tackle another one of life’s deep mysteries but for now we can close the books on this one.  Stay tuned for more storage ramblings as well as details around Cohesity’s DataPlatform®.  Until then, au revoir!

Greener Pastures

After 5 ½ years at Nimble Storage, I recently made the difficult decision to leave for greener pastures-  As of today, my stellar account executive Bryan and I are Cohesians!  I wish my friends back at HPE | Nimble all the best.  I am grateful to you all for the experience and I look forward to watching as you duel it out in the Coliseum that is the primary storage marketplace – This time watching from the stands.  I’ve got my popcorn ready!

Why Cohesity

I hope it goes without saying I wanted to work for a company that shares my professional values around winning the right way, customer focus, and something our founder, Mohit Aron says, “Stay humble and keep learning.”  That statement itself is pretty humble from the lead architect behind the google file system (GFS), along with founding one of technology’s most successful startups in recent years, Nutanix.

Of course I also needed to work for a company I believe has luminescent technology, something that can actually help organizations reach their objectives easier and faster.  I joined Cohesity because I firmly believe that the simplest solution always has the lowest TCO, and managing data has to get radically simpler. 

The Problem with Today’s Storage Offerings

The primary storage market has undergone a serious transformation in the past 5 years, thanks largely to NAND flash, which is THE game changer in primary storage today.  The transition is well under way and primary storage providers offer terrific choices for customers looking to upgrade their old primary production storage systems with a flashy new storage array or hyperconverged appliances.

Are newer storage systems faster?  You bet!  More efficient?  Certainly.  Simpler?  Like, completely eliminating islands of backup/file/object/cloud storage?  Eh, not really.  While primary storage today is faster and more efficient than ever before, I noticed that many of my customer’s most insistent demands were not being addressed.  These needs include (but are not restricted to):

  • Comprehensive consolidation of data silos
  • Modernized data protection and ransomware strategy
  • Improved operational visibility (think dark data)
  • App mobility (from site à site, site à cloud or cloud à cloud)
  • Ability to scale up/out
  • Tech refresh & lifecycle management

A system for managing data storage that is truly simple would address most of these needs, but the problem is that massive data fragmentation has led to dark silos of fractured infrastructure that is vulnerable to threats, immobile, inefficient and impossible to extract any value from.  No commercially available platform can really address all these needs, at least not until now.

We are entering into a new era where data growth is exponential, and merely updating new storage media and protocols has done very little to solve these newer fragmentation, mobility and visibility difficulties.  Historically, attempting to address these large-scale customer needs with a single service, vendor or application is like trying to strap up another horse to pull your buggy…  It might go a little faster but it’s more complex and it will never be a car.

Back it Up, Back it Up – Beep, Beep, Beep

Take for example, data protection.  Let’s say your company puts out an RFP for a complete backup solution.  One particular vendor offers data protection software for backup & DR.  After a nice demo, this starts to sound pretty good but upon further examination, a complete solution would require a server OS to run the software (like Windows), a general purpose files system (like NTFS), and a disk appliance sized to your best projections for several years.

Even within just this one area of secondary data, we have given an example of fragmentation of GUI’s, multiple vendor relationships, support contracts and so on.  Even worse, at large scale, this type of traditional backup architecture will require multiple proxy servers and disk silos to spread the load, further amplifying the fragmentation for larger enterprise organizations.

While at first, this backup software looked promising to address a critical need for data protection, now this solution looks far too complex and limited.  This type of typical backup solution does nothing to collapse other silos of storage such as file/object, test/dev & analytical workloads, and prolonged exposure will give you… Confusion!

Managing data is way too complex.  As my peer Dimitris says on his excellent blog post here, “storage should be easy to consume.” I whole-heartedly agree with his thesis statement.  I believe what Cohesity offers is something fundamentally new. 

After a few of my most-respected peers moved to Cohesity, I had to rub the magic lamp.  Out popped a blue Robin Williams and Poof, I’m here!  Now let me introduce you to our enchanted potion (ok, I’m done with magic jokes.)

Cohesity addresses these needs with a platform we suitably named DataPlatform®.  Cohesity DataPlatform® is a scale-out solution that consolidates all your secondary workloads, including backup and recovery, files and objects, test/dev and analytics in a single, cloud-native solution.  I look forward to telling you all about Cohesity in follow-on posts about what DataPlatform® does and how it does it, but for now- just know that it’s really cool.

If you read this far, thank you! Bryan and I plan to have many more posts and pics as we tour the mid-south spreading green goodness and making all our customer’s data dreams come true so stay tuned, and as always, #GoGreen!