• 03/05/2014
    9:28 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

Internet Of Things: What About Data Storage?

The Internet of Things presents a new set of data storage and protection challenges. Here's how to tackle those issues.

Humans are quickly being outnumbered by Internet-connected devices that are constantly collecting and transmitting data. The term used to describe this is the Internet of Things. Regardless of how you feel about it, the explosion in machine-generated data is changing storage and data protection forever.

These machines -- or things -- perform a range of tasks, relatively simple functions like capturing images and uploading them to social sharing sites to capturing and transmitting more complicated sensor data and sending real-time information on an organization's various assets. Thanks to analytics, businesses now want the ability to, say, compare the current condition of their assets compared with five years ago.

Storage implications
The impact on storage at first seems fairly obvious: There is more data to store. The less obvious part is that machine-generated data comes in two distinct types, creating two entirely different challenges. First, there is large-file data, such as images and videos captured from smartphones and other devices. This data type is typically accessed sequentially. The second data type is very small, for example, log-file data captured from sensors. These sensors, while small in size, can create billions of files that must be accessed randomly.

[The Internet of Important Things: Chat with GE Power & Water CIO Jim Fowler. Join in Wed 3/5, 2:00 p.m. ET. Get more info here.]

It used to be that a datacenter would have only one of these data types: They were either in the business of capturing image-based data or they were not. Now, however, datacenters must deal with both data types, and the two usually require different storage systems -- one designed for large-file sequential I/O and the other for small-file random I/O.

Historically, image-based data has typically been placed on large-capacity NAS systems, but we are seeing a shift to object-based storage, especially at scale. Sensor data, usually stored on high-performance NAS systems, is moving to all-flash arrays, primarily to allow faster analytics.

Data protection implications
The data generated from the Internet of Things also has a big impact on data protection. Most, if not all, of this data can never be recreated -- an image and soil sample from last year, for example, will never be the same as it was on the day it was collected. Therefore, data protection is potentially even more critical than it is for more conventional data. The challenges such data brings to storage also impact data protection -- especially when dealing with sensor data, as most backup applications don't handle billions of files well.

The answer may be to not back this data up at all, but instead integrate data protection to the archive process. An ideal approach might involve a tape-integrated NAS solution that has a disk or flash cache in front of a large tape library. As data lands on the disk cache, copies can be made to multiple tape devices. This provides high-performance access for analytics processing, but also excellent protection and long-term retention.

One consistent experience I have observed from IT managers who deal with the Internet of Things is how quickly data transforms from merely interesting to mission-critical and in need of analysis, retention, and protection. The storage systems for these initiatives almost always start out ad hoc and then become a focal point. If you have sensors, or things, that are creating data, keep an eye on that data now. Protect it and be prepared for it to become more important to the organization.

Solid state alone can't solve your volume and performance problem. Think scale-out, virtualization, and cloud. Find out more about the 2014 State of Enterprise Storage Survey results in the new issue of InformationWeek Tech Digest.


Selective Storage

Great point.  Is it possible that as the data science field becomes more defined, we will have a better grasp on which pieces of data are valuable (signal) and which are not (noise)?  I would imagine that as that becomes more defined we can be more selective with regards to which pieces of data we keep and which we discard.

Either way, additional storage options are needed.  Great article, thanks.

Re: Selective Storage

I absolutely agree, the amount of data expected to be pulled from these devices will no doubt need a strong storage solution with redundency.  But as data stores (and polls) get bigger, is tape really the best option, what about cloud storage with full redundency, as the amount of processing power to actually dig through these files would no doubt need a full-on data mining/analysis capability, which may be more efficient when you integrate the storage into the actual processing.