Upcoming Events

Cloud Connect
Santa Clara
Feb 13-16, 2012

Cloud Connect brings together the entire cloud eco-system to better understand the transformation we're experiencing and promises to be the defining event of the cloud computing industry. Learn about the latest cloud technologies and platforms from thought leaders in Cloud Connect’s comprehensive conference.

Register Now!

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up
Technology Business Applications
R E V I E W  
Business Intelligence with Smarts

  September 30, 2002
  By Lori MacVittie


>> continued from previous page

Data Clean and Normal

TOC Issue TOC
Printer Print full article
Printer Print this page
Printer Download as PDF
E-Mail E-Mail this URL
flame author Flame the author
 
  In this article
arrow
Introduction
arrow
Information Dissemination
arrow
Products Reviewed
arrow
How We Tested
arrow
Data Clean and Normal
arrow
Report Card

Normalizing data is an important part of database design. And cleaning data is critical for effective analysis. Essentially, normalization is the process of removing duplicate tuples (a tuple is a collection of attributes). This procedure reduces database size and helps ensure data integrity. Notice that several cells in the non-normalized chart below contain the same value multiple times. To normalize this data, the values are removed and placed in a separate tableÑcommonly referred to as a lookup tableÑand then referenced from the original table.

When a data query is performed on the second set of tables, a join must be performed. Data residing in a data warehouse is often non-normalized because of the amount of data stored and the performance degradation resulting from joins across more than one table when dealing with excessively large data sets. The data used in testing was non-normalized.

It was also very, very dirty. But clean data is essential to ensuring that the information a business-intelligence tool generates is valid and useful. An enterprise applicationÕs native database almost never contains a clean data set. Changes from migration, upgrades and day-to-day interaction introduce errors. As an example, imagine a database in whose cells an X is supposed to represent "yes" and blank spaces represent "no." If, instead, "Y" or "N" appears in place of X or the empty cellÑa common occurrenceÑyour data is dirty.


start top   How We Tested Report Card 

Research and Reports

Hypervisor Derby
August 2011

Network Computing: August 2011

TechWeb Careers