For many years now, the storage industry has impressed on the IT department the need for tiering to reduce costs and gain cost efficiency.
Since the introduction of the first enterprise flash drive (EFD) from EMC in January 2008, however, one has had to question the continuing need for storage tiering. Roughly 3 to 10% (depending on industry and size of company) of an active data set is "hot."
Many companies have compensated for the relatively low performance of magnetic rotating hard disk drives (HDDs) -- compared to non-volatile memory -- by taking an array of HDDs and striping data across them (using a practice known as short-stroking). But this practice is expensive in many ways, not only due to the capacity that is wasted, but also the heat produced and power consumed by HDDs.
HDDs out, SATA drives in
EFDs or solid state disks (SSDs) can significantly reduce the cost per unit of performance by up to 100 times. So, introducing a small amount of SSDs can allow enterprises to recover the lost capacity used for short-stroking.
But even more importantly, CIOs should consider abandoning the acquisition of high-performance drives altogether.
Like most technologies, tiering adds a level of complexity … If the gain is somewhat marginal, the effort may not be worth it.
This idea is not so radical. Referring back to the comment made earlier -- that 3 to 10% of active data is "hot" -- logically, the rest of the data is (at least) not hot." For the most part, this "not hot" data, (although not necessarily cold data), is used in a referential fashion by applications. In other words, there are significantly more reads than writes.
Additionally, applications generally read data from these "not hot" data sets in large chunks. A prime example would be a reporting application, in which it reads data from a database for the entire day's transactions to perform end-of-day reporting. With the super read-ahead capabilities of most disk drives today, and the way that RAID algorithms generally lay data on disk drives, many of these reads are very likely to be sequential.
So, large sequential reads, minimal writes lends itself to low-cost SATA drives. This brings our argument all the way back around again. If we have a small percentage of "hot" data that can be served by minimal amounts of SSDs (a medium that gives the lowest dollars per IOPS), and the rest of the data being served by SATA drives (a medium that gives the lowest dollars per gigabytes), it is very reasonable to conclude that, moving forward, an enterprise only ever needs to have two storage tiers.
Focus on service levels, not specific storage tech
Most users today who are considering storage tiering or are tiering already are likely to be doing so on a manual basis. Automated tiering makes better sense. Referring to the comments above, only a very small percentage of data is actually applicable for EFDs or SSDs. While there are very valid reasons for "pinning" or persisting an entire data set in flash memory (or main RAM memory), it somewhat defeats the point of tiering in the first place. All the leading storage vendors offer some form of automated tiering.
More on storage strategies
Optimize storage capacity management for a virtual environment
Storage area network basics: The right questions to ask
How to stop storage infrastructure costs from getting out of virtual control
Online data storage and backup options for the midmarket
Ultimately, IT departments need to focus on the service levels that they are delivering rather than any specific technologies. IT departments need to also be cognizant that whatever tiering strategy they implement today is very likely to change tomorrow.
Consider the impact of mobile, social, cloud and big data. Will accessing file data remotely skew the tiering algorithm? Will the integration of cloud data (be it social data or big data) have an impact on what gets tiered?
In every case, the KISS (keep it simple, stupid) principle is your best friend. Like most technologies, tiering adds a level of complexity. The ultimate question is does your organization need that complexity, and what will be the performance gain? If the gain is somewhat marginal, the effort may not be worth it.
Benjamin S. Woo is founder and managing director at Neuralytix Inc., a consultancy based in New York, N.Y. He was previously the program vice president at IDC's Worldwide Storage Systems Research. Write to him at email@example.com.
This was first published in June 2013