The Greenplum division of storage vendor EMC is offering a Free Community License version of its EMC Greenplum Database software, which allows software developers to build new applications to deal with the explosion of so-called "big data" that businesses and other enterprises have to try to manage. The community license is based on code from Greenplum's massive parallel processing (MPP) database product, and includes the open-source MADlib library of analytic algorithms and Alpine Miner, a data mining modeling tool.
As companies build databases of ever-expanding amounts of data, they need more tools to analyze it and make business decisions based on those findings. Eventually, the databases hit a limit on how much they can scale, says Luke Lonergan, chief technology officer and VP of EMC Data Computing Products Division and co-founder of Greenplum, which EMC acquired in July 2010.
Lonergan gave an example of a company that introduces a new product that quickly becomes popular and all of a sudden they've got 1 million visitors to their site within a month or two. "What does an operation do when they get hit by the scale truck?" Lonergan asks.
Big data applications require "scale-out" technology, he says, which keeps up with demand as enterprises add more servers and storage hardware, and need database analytics software that keeps up with the data. The community license is to be used only for research; a commercial license is required to deploy an application in production or for commercial purposes. Greenplum's commercial- and community-licensed database software is based on the open-source PostgreSQL database software project, to which Greenplum has been a contributor.
The MADlib library offers tools that provide mathematical, statistical and machine learning methods for structured and unstructured data. MAD stands for "magnetic, agile and deep." Alpine Miner is a visual data mining tool from a company that Greenplum incubated within its own company, Lonergan says. Its chief advantage is that it can run right in the database engine as opposed to a situation where a small amount of data is copied from the database and tested in a separate workstation, saving several steps in the modeling process.