Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up

 
NetNews
N E W S / A N A L Y S I S  


A Flawed Random-Number Theory

  June 24, 2002
  By Sean Doherty


Most of us register aliases, like John Doe and Juxta Position, on the Internet in return for information and services. The use of aliases maintains our privacy, reduces spam and thwarts probes looking for a more detailed Web user profile. Now there's a privacy-protection scheme that aims to eliminate the need for aliases. But it's not as comprehensive as it appears.

IBM's Privacy Research Institute recently revealed techniques that aim to preserve individual privacy while giving e-businesses information to generate data models. These techniques scramble or ÒrandomizeÓ private information and reconstruct data distributions at an aggregate level to perform data mining. This means that Web site administrators and merchants can use scrambled data without knowing the underlying private information.

Let's say I enter 45 in a forthcoming Java application that uses the IBM techniques to provide a merchant with age information in return for a music sample. The Java app takes my age and adds or subtracts a random value. The value would differ with each user. Then it sends the new number to the merchant. So, my 45 years may be reduced to 32. This program may also increase my net worth in a single keystroke! I like it already, but what's the value to the merchant?

Although the numbers change, the allowed range of randomization does not. That range is linked to an acceptable range of data at an aggregate level-and a level of privacy. The merchant might not care about my exact age, but it might like to know I'm between 30 and 50. Large randomizations will increase the personal privacy for users but reduce accuracy for merchants. If my age were randomized to 17, that would hardly be valuable to a merchant if it were used in conjunction with the title to the music I requested. Not too many 17-year-olds are into Bob Dylan.

But how do you randomize gender, race, ethnicity and marital status? This seems like a fine solution for numbers, but numbers alone don't make up our private data. Even if they did, the IBM proposal does nothing to support the notion that individuals own their private information, whether in their purses or in remote databases. Rather, the software simply provides an automated mechanism to mislead merchants while providing them with sufficient information to model data and prepare marketing campaigns. I don't need software to lie about my age.

In the end, e-businesses must acknowledge that individuals own their private information and should have the power to access, modify or remove it from a site if the merchant doesn't handle the information with care. Otherwise, we might have to look to our legislators for an answer. Heaven help us.
--Sean Doherty, sdoherty@nwc.com


Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers