Humans are quickly being outnumbered by Internet-connected devices that are constantly collecting and transmitting data. The term used to describe this is the Internet of Things. Regardless of how you feel about it, the explosion in machine-generated data is changing storage and data protection forever.
These machines -- or things -- perform a range of tasks, relatively simple functions like capturing images and uploading them to social sharing sites to capturing and transmitting more complicated sensor data and sending real-time information on an organization's various assets. Thanks to analytics, businesses now want the ability to, say, compare the current condition of their assets compared with five years ago.
The impact on storage at first seems fairly obvious: There is more data to store. The less obvious part is that machine-generated data comes in two distinct types, creating two entirely different challenges. First, there is large-file data, such as images and videos captured from smartphones and other devices. This data type is typically accessed sequentially. The second data type is very small, for example, log-file data captured from sensors. These sensors, while small in size, can create billions of files that must be accessed randomly.
[The Internet of Important Things: Chat with GE Power & Water CIO Jim Fowler. Join in Wed 3/5, 2:00 p.m. ET. Get more info here.]
It used to be that a datacenter would have only one of these data types: They were either in the business of capturing image-based data or they were not. Now, however, datacenters must deal with both data types, and the two usually require different storage systems -- one designed for large-file sequential I/O and the other for small-file random I/O.
Historically, image-based data has typically been placed on large-capacity NAS systems, but we are seeing a shift to object-based storage, especially at scale. Sensor data, usually stored on high-performance NAS systems, is moving to all-flash arrays, primarily to allow faster analytics.
Data protection implications
The data generated from the Internet of Things also has a big impact on data protection. Most, if not all, of this data can never be recreated -- an image and soil sample from last year, for example, will never be the same as it was on the day it was collected. Therefore, data protection is potentially even more critical than it is for more conventional data. The challenges such data brings to storage also impact data protection -- especially when dealing with sensor data, as most backup applications don't handle billions of files well.
The answer may be to not back this data up at all, but instead integrate data protection to the archive process. An ideal approach might involve a tape-integrated NAS solution that has a disk or flash cache in front of a large tape library. As data lands on the disk cache, copies can be made to multiple tape devices. This provides high-performance access for analytics processing, but also excellent protection and long-term retention.
One consistent experience I have observed from IT managers who deal with the Internet of Things is how quickly data transforms from merely interesting to mission-critical and in need of analysis, retention, and protection. The storage systems for these initiatives almost always start out ad hoc and then become a focal point. If you have sensors, or things, that are creating data, keep an eye on that data now. Protect it and be prepared for it to become more important to the organization.
Solid state alone can't solve your volume and performance problem. Think scale-out, virtualization, and cloud. Find out more about the 2014 State of Enterprise Storage Survey results in the new issue of InformationWeek Tech Digest.