Why does the universe exist? Do we have free will? Coke or Pepsi? What is secondary storage?
These are a few of the tough questions we at StorageSumo are here to answer for you. As discussed in my first post (Greener Pastures), @BTNimble and I have recently gone green and we are now working for Cohesity, A provider of secondary storage solutions. Secondary stuff is awesome! If you’re like me, you might have a secondary fridge, a secondary car, and even a secondary child. We all know secondary stuff is great but what about secondary storage?
I suppose the easiest (and probably laziest) definition is, “Everything that is not primary storage.” Primary storage represents approximately 20% of the overall data center capacity. At our previous job, Bryan and I provided high-performance primary storage. We were also peers with similar vendors that offered storage and hyperconverged appliances for customers looking to store primary production data and applications that make up this 20%.
Primary Apps? Like Candy Crush?
Not exactly. These applications were typically systems of record. In every organization I have consulted with, there is invariably a database containing critical information such as customer records, order management, student enrollment and so on. These systems of record are almost always the most vital assets the organization possesses and are serviced with maximum care. These types of workloads are very performance-sensitive and if they aren’t snappy, the organization suffers tremendously with lack of productivity, possibly even availability, and worse- grumpy employees standing over your desk asking if you tried turning it off and on again.
If ANY impedance to data delivery of this type occurs, the organization suffers. If these records were to somehow vanish, the organization might just as well not exist. THIS is obviously primary data meant for primary storage because it is essential, but also it has a strict performance SLA (service level agreement).
Backup and Archival
Now we’re getting to some secondary stuff! For many organizations, the largest amount of storage capacity is dedicated for backup and archival retention of data. This gives IT a time machine to be able to recover from any incident that might impede the organization’s data availability or integrity. These incidents could include (but are not limited to):
- Physical hardware failure
- Data corruption
- Human error
- Ransomware attacks
- Lost or misplaced files
- Site-Disasters (fire/theft/flood/Chernobyl)
If none of these things have ever happened to your organization, you are very lucky and I hate you. More likely, multiple variations of these incidents have happened to your organization multiple times. In a best-case scenario, your IT staff was prepared and able to recover quickly. In either case, we’ll look closer at this dark little corner later in another post but for now, just know that backups require tertiary storage – what’s known as “air-gapped” from the primary storage system – and lots of it.
While backups and archives are incredibly important, by their very nature these are a second (or more) copy of the primary storage. This is by design so that there is physical separation from the primary data, a so-called “air-gap.” In fact, a good backup strategy will include multiple copies for reasons such as multiple recovery-points (versioning), and also copies offsite to preserve data in the event of a site disaster.
For the purpose of defining secondary storage, remember that backups are not SLA-driven in the same way primary data is. Sure, it’s vital that the backups are complete, reliable and in a certain window of time, but this type of workload is more throughput-dependent than latency-sensitive. In other words, backup jobs are sort of like moving gravel with dump trucks. There’s a lot of stuff to move and it is important stuff, but it wouldn’t make sense to use a Lamborghini and no one will complain if it takes a little longer to get there.
We’ve established that backups are one clear way we make copies of primary storage, an obvious use of secondary storage. What else is there? Well, there are actually tons of use cases. Take testing and development, or “Test/Dev” as the cool kids say.
To paraphrase Murphy’s law, what could go wrong usually does. This is why organizations would prefer to test changes, such as updates or upgrades in a “safe-zone.” IT often has a duplicate similar environment of their primary data and applications in a test/dev silo. In many cases, organizations will try to repurpose older gear to save money but this does often require doubling infrastructure costs.
But what about my PowerPoint proposal for television in the men’s restroom?
Don’t worry. That and the rest of your documents are important too. Interestingly, secondary data also includes user’s data such as documents, spreadsheets, presentation files and even pictures or videos. This “bulk” data is typically stored on a network-attached storage (NAS) device. This is so the user files can be efficiently and securely managed centrally and persist even when a desktop is replaced or an employee quits. This file/NAS storage is also critical to the application’s success and the users depend on this to work, but this data does not have a strict performance requirement. This type of file/NAS storage is also an excellent example of secondary storage.
Why Cohesity is built for Secondary Storage
Secondary data is actually many, MANY times larger than primary data capacity. By even conservative estimates, secondary data comprises >80% of the overall data capacity. Storage in the data center actually maps nicely to an iceberg- 80% or more is actually below the surface. Most IT leaders will admit that they have 6-8 copies of their data for various reasons. Primary storage is for apps with strict performance SLA’s and secondary storage is for apps without strict SLA’s.
Cohesity was founded to solve the problem of mass data fragmentation. While primary storage vendors have made tremendous progress consolidating that 20% of data center workloads, Cohesity is purpose-built to consolidate the much larger 80% of data that is considered secondary. This 80% is typically scattered across multiple siloed environments. By consolidating backups, archival data, file shares, test/dev and analytics into a single web-scale platform, Cohesity customers are able to reduce the number of physical data copies, vendors, support renewals and management interfaces down to one. This makes managing data radically simpler, and I firmly believe that the simplest solution always has the lowest total cost.
There you have it! Next post, we’ll tackle another one of life’s deep mysteries but for now we can close the books on this one. Stay tuned for more storage ramblings as well as details around Cohesity’s DataPlatform®. Until then, au revoir!