EMC Greenplum Offers Free Open Source Tool For Building Database Apps

The Greenplum division of storage vendor EMC is offering a Free Community License version of its EMC Greenplum Database software, which allows software developers to build new applications to deal with the explosion of so-called "big data" that businesses and other enterprises have to try to manage. The community license is based on code from Greenplum's massive parallel processing (MPP) database product, and includes the open-source MADlib library of analytic algorithms and Alpine Miner, a data

February 2, 2011

2 Min Read
Network Computing logo

The Greenplum division of storage vendor EMC is offering a Free Community License version of its EMC Greenplum Database software, which allows software developers to build new applications to deal with the explosion of so-called "big data" that businesses and other enterprises have to try to manage. The community license is based on code from Greenplum's massive parallel processing (MPP) database product, and includes the open-source MADlib library of analytic algorithms and Alpine Miner, a data mining modeling tool.

As companies build databases of ever-expanding amounts of data, they need more tools to analyze it and make business decisions based on those findings. Eventually, the databases hit a limit on how much they can scale, says Luke Lonergan, chief technology officer and VP of EMC Data Computing Products Division and co-founder of Greenplum, which EMC acquired in July 2010.

Lonergan gave an example of a company that introduces a new product that quickly becomes popular and all of a sudden they've got 1 million visitors to their site within a month or two. "What does an operation do when they get hit by the scale truck?" Lonergan asks.

Big data applications require "scale-out" technology, he says, which keeps up with demand as enterprises add more servers and storage hardware, and need database analytics software that keeps up with the data. The community license is to be used only for research; a commercial license is required to deploy an application in production or for commercial purposes. Greenplum's commercial- and community-licensed database software is based on the open-source PostgreSQL database software project, to which Greenplum has been a contributor.

The MADlib library offers tools that provide mathematical, statistical and machine learning methods for structured and unstructured data. MAD stands for "magnetic, agile and deep." Alpine Miner is a visual data mining tool from a company that Greenplum incubated within its own company, Lonergan says. Its chief advantage is that it can run right in the database engine as opposed to a situation where a small amount of data is copied from the database and tested in a separate workstation, saving several steps in the modeling process."What we have assembled with Community Edition are best-of-breed tools that are the right categories here for building big data applications," Lonergan says.

Greenplum's database product is based on "massively parallel processing architecture," where data is partitioned into segments in different servers. It is called a "shared nothing" environment because there is no disk-sharing of data. Instead, all communications between servers are via a network connection. This is in contrast to "shared disk" or "shared everything" environments for online transaction processing such as Oracle or Microsoft SQL Server relational database systems.

News of the EMC Greenplum Community License release was announced at a database technology conference being held this week in Santa Clara, Calif.

See more on this topic by subscribing to Network Computing Pro Reports Research: 2010 State of Database Technology

SUBSCRIBE TO OUR NEWSLETTER
Stay informed! Sign up to get expert advice and insight delivered direct to your inbox

You May Also Like


More Insights