The digital universe--meaning all the digital data stored on any medium or device anywhere in the world--is growing at 7,600 petabytes every day and doubling in size every two years, according to a December 2012 study by IDC. Big companies own 80% of that data right now. By 2020, 40% of it will be generated by the approximately 200 billion refrigerators, closed-circuit cameras, power meters, sewage-pumping-station monitors and doorstops that will be connected to the Internet by then, fighting with 7.6 billion people for bandwidth and storage capacity.
In 2012 2.8 zettabytes of data were created and stored somewhere, on something, according to the study. By 2020 the digital universe will swell to 40 zettabytes of data--a 50-fold rate of growth between 2010 and 2020. How much data is that? Fifty-seven times the number of grains of sand on all the beaches on earth, according to IDC. (Makes you wonder who they got to count the sand.)
IDC, which did the study on EMC’s behalf, has a modular, easily digestible version of the report. There’s also an executive summary if you prefer an overview. Would it have made this article easier to read if I'd only included one link, or summarized the data more succinctly and added enough interpretation that you could eliminate the portions least relevant to you and just read the parts that might be useful?
That's not how the digital universe works.
The digital universe just keeps expanding, as more of us copy badly compressed music files, LOLCat pics, old Reddit witticism chains and bits of information that may someday be useful onto our PCs and phones and tablets and refrigerators and thumb drives and (eventually, I imagine) actual thumbs.
Enterprises will do the same thing, at a much higher rate, as BYOD-equipped users copy highly secure, mission-critical bits of data onto their iThings so they can work effectively anywhere, unless there's something good on cable at the hotel or there's no Wi-Fi at the kids' baseball game.
Buried in Bytes
While EMC and other storage vendors might be pleased at the thought of all that data, there are consequences, and they won't wait until 2020 to show up.
For one, as the corpus datalecti grows larger, security becomes more problematic because enterprises will have to put more effort into applying, and enforcing, appropriate security policies on different tiers of data. It also means more operational difficulties for security teams that have to scan and monitor ever-increasing amounts of information flowing within and between corporate boundaries.
For another, as the amount of data increases, it will become more difficult to keep track of what data exists and where it is, let alone make sure it's all up to date, properly backed up, indexed and analyzed. In fact, large chunks of very important data will disappear, according to IDC analyst John Gantz. The data won't actually go away; pointers to it will simply be forgotten, corrupted or changed, leaving the data in to take up space but removing its potential to be useful.
In the digital universe there are no rules of death and decay except those we impose ourselves. So far, even in well-ordered, heavily managed data centers, there is little incentive and almost no real desire to delete anything. It's too hard to predict what might turn out to contribute to a uniquely valuable insight in two or three or five years, when the analytics are good enough to crunch zettabytes.
In one sense, the amount of digital data being created and stored is a marvel—a sort of Stonehenge of information that evokes awe. That said, it may be time to marvel less at the amount of data we're storing and wonder whether it really is a triumph over ignorance to fill every cubbyhole with data of questionable value that we'll never look at again. It may be time to get help for this hoarding problem before we're crushed in the collapse of supposedly valuable information that looks, in retrospect, disturbingly like piles of old newspaper.