• 06/24/2002
    4:00 AM
  • Network Computing
  • News
  • Connect Directly
  • Rating: 
    0 votes
    Vote up!
    Vote down!

A Flawed Random-Number Theory

The large number of us who use aliases on the Internet to maintain our privacy and reduce spam has spawned a privacy-protection scheme that aims to eliminate the need for
Most of us register aliases, like John Doe and Juxta Position, on the Internet in return for information and services. The use of aliases maintains our privacy, reduces spam and thwarts probes looking for a more detailed Web user profile. Now there's a privacy-protection scheme that aims to eliminate the need for aliases. But it's not as comprehensive as it appears.

IBM's Privacy Research Institute recently revealed techniques that aim to preserve individual privacy while giving e-businesses information to generate data models. These techniques scramble or 'randomize" private information and reconstruct data distributions at an aggregate level to perform data mining. This means that Web site administrators and merchants can use scrambled data without knowing the underlying private information.

Let's say I enter 45 in a forthcoming Java application that uses the IBM techniques to provide a merchant with age information in return for a music sample. The Java app takes my age and adds or subtracts a random value. The value would differ with each user. Then it sends the new number to the merchant. So, my 45 years may be reduced to 32. This program may also increase my net worth in a single keystroke! I like it already, but what's the value to the merchant?

Although the numbers change, the allowed range of randomization does not. That range is linked to an acceptable range of data at an aggregate level-and a level of privacy. The merchant might not care about my exact age, but it might like to know I'm between 30 and 50. Large randomizations will increase the personal privacy for users but reduce accuracy for merchants. If my age were randomized to 17, that would hardly be valuable to a merchant if it were used in conjunction with the title to the music I requested. Not too many 17-year-olds are into Bob Dylan.

But how do you randomize gender, race, ethnicity and marital status? This seems like a fine solution for numbers, but numbers alone don't make up our private data. Even if they did, the IBM proposal does nothing to support the notion that individuals own their private information, whether in their purses or in remote databases. Rather, the software simply provides an automated mechanism to mislead merchants while providing them with sufficient information to model data and prepare marketing campaigns. I don't need software to lie about my age.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.

Log in or Register to post comments