Upcoming Events

Executive conference

Cloud Connect March 16-18

Comprehensive thought leadership for executives, IT professionals and developers. Topics include: the ROI, cost and economics of on-demand computing; Migration strategies to move from on-premise to cloud-based IT; Vertical cloud specialization, tailoring features and architectures to specific applications, industries, and customer ecosystems

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up
Technology Business Applications
R E V I E W  
Business Intelligence with Smarts

  September 30, 2002
  By Lori MacVittie


>> continued from previous page

Data Clean and Normal

TOC Issue TOC
Printer Print full article
Printer Print this page
Printer Download as PDF
E-Mail E-Mail this URL
flame author Flame the author
 
  In this article
arrow
Introduction
arrow
Information Dissemination
arrow
Products Reviewed
arrow
How We Tested
arrow
Data Clean and Normal
arrow
Report Card

Normalizing data is an important part of database design. And cleaning data is critical for effective analysis. Essentially, normalization is the process of removing duplicate tuples (a tuple is a collection of attributes). This procedure reduces database size and helps ensure data integrity. Notice that several cells in the non-normalized chart below contain the same value multiple times. To normalize this data, the values are removed and placed in a separate tableÑcommonly referred to as a lookup tableÑand then referenced from the original table.

When a data query is performed on the second set of tables, a join must be performed. Data residing in a data warehouse is often non-normalized because of the amount of data stored and the performance degradation resulting from joins across more than one table when dealing with excessively large data sets. The data used in testing was non-normalized.

It was also very, very dirty. But clean data is essential to ensuring that the information a business-intelligence tool generates is valid and useful. An enterprise applicationÕs native database almost never contains a clean data set. Changes from migration, upgrades and day-to-day interaction introduce errors. As an example, imagine a database in whose cells an X is supposed to represent "yes" and blank spaces represent "no." If, instead, "Y" or "N" appears in place of X or the empty cellÑa common occurrenceÑyour data is dirty.


start top   How We Tested Report Card 

Best of the Web

Data deduplication: Declawing the clones

Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.

Quick Read

Compression, Encryption, Deduplication, and Replication: Strange Bedfellows

One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.

Quick Read

WAN Optimization Whitelists and Blacklists

Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.

Quick Read

WAN Optimization as a Managed Service: It's Not About the Cost

This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.

Quick Read

  Sponsored Links

Premium Content

Data Centers Gone Wild
February 22, 2010

NWC


Salary

Video