Storage hierarchies just aren’t what they used to be.
In the good old days, you stored files on hard disk or magnetic tape. And given the huge disparity between price and performance between those two media, it was pretty obvious which of your files went where. But with the advent of flash storage devices, and now storage-class memory (SCM), figuring out where your data should reside at any given moment has become a lot more challenging.
That quandary was the central thesis of our interview with Panasas software architect Curtis Anderson at last week’s Next I/O Platform event. And as Anderson explained, moving from two to four storage layers is not just a matter of doubling up on the hierarchy. With the diversity provided by these new storage layers, it’s difficult to figure out when to promote your files to the faster tiers or demote them to the slower ones.
“It used to be that an LRU [least recently used] algorithm was the best way to manage storage promotion and demotions,” Anderson explained. “Now the industry is moving toward predictive analytics that attempts to forecast when data is going to get hotter or colder.”
Of course, that assumes that you have a specific file management policy in place to drive the results of those analytics. But according to Anderson, coming up with a policy has become too complex for most sites nowadays. From his perspective at Panasas, often the customer has only a general idea of the storage demands of their workloads, (“It’s not going fast enough!”), so coming up file management rules around storage tiering is all but impossible. In this case, they will tend to rely on their storage provider to guide them. But even Anderson admits, coming up with a single policy for a site running many different types of applications involves a fair amount of guesswork, no matter who does it.
The diversity of these new storage devices also adds to the challenges. For example, a SATA flash device is optimized for capacity, while an NVM-Express device is optimized for performance. That suggests any policies devised around this flash layer need to take these different attributes into account.
Storage-class memory, like Intel’s Optane Persistent Memory, has a different set of challenges. For one thing, SCM is too new for either customers or vendors to have figured out how best to use it. “People are slowing starting to figure out that storage-class memory being on the memory bus is just a different beast than block-oriented storage devices of the past,” said Anderson
The additional complication here is that in the case of Optane Persistent Memory, at any given moment it can be configured as either a block storage device or as a main memory extender. In the latter case, the device is no longer part of the storage system, so any policy devised around it will have to be aware of its split personality.
Anderson noted that the culture around storage is changing as well, at least in the HPC arena that he resides in. One illustration of that is the changing nature of application developers. In the past, traditional HPC programmers had a good understanding how their application’s performance could be sped up by a parallel file system, so they wrote their software specifically with that in mind. However, the new breed of application developers, in areas like AI and genomics, only have a vague notion of how their codes impact the file system and storage hardware. They fully expect the low-level I/O libraries and storage system software to work out the details. As a result, said Anderson, “we’re not getting help from the application writer any longer.”
Another cultural change is the disappearance of HPC storage administrators. Since specialized skills are required to manage such demanding environments, these individuals have always been rare and in demand. Anderson estimates that such a person would cost an organization about $200,000 per year (taking into account benefits and other overhead costs). A staff of just five of them would run a cool million. “So you have a $3 million storage subsystem and you’re spending $1 million a year on it,” said Anderson. “The economics are just different than you would expect in an enterprise datacenter, let alone a cloud.”
That suggests that high performance storage systems themselves will have to become self-monitoring and perform some level of self-maintenance. While that might not apply to a storage setup for a cutting-edge supercomputer at a national lab, for everyone else, the expectation will be that these systems will need to be a lot more autonomous. “That’s one of the things I think where HPC storage is going to have to get a lot better at,” said Anderson.