Network Computing is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them. Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

EMC Sees Big Opportunity In Big Data: Page 2 of 3

File-based storage, on which a lot of big data applications is based, is growing at a much faster rate than block-based data. IDC predicts that 80 percent of all storage capacity sold will be for file-based data. Network attached storage (NAS) is often used with file-based data, but scale-up NAS had its limitations on a number of dimensions, including scalability and performance. A scale-out NAS storage architecture overcomes these limitations.

For example, Isilon's scale-out NAS architecture that uses its OneFS operating system can scale up to 10 plus petabytes in a single file system and support up to 50GBytes per second of throughput. However, big data applications may emphasize one dimension or the other of the data involved. Consequently, Isilon sells the S product series purpose-built for high-transactional and IOPS-intensive applications such as genome research, while the company's X-Series solutions are targeted at capacity-intensive applications, such as those that need to handle high-concurrent and sequential throughput applications, such as medical images.

Greenplum focuses on the analytical challenges posed by big data. Its suite of products supports big data sets that are analysis-intensive, ultimately helping end users glean salient insights from their data. This typically requires complex analysis, such as ad hoc, interactive analysis, and not simply the production of structured reports. The speed of analysis is important especially if it needs to be performed frequently and when insights facilitate decision-making.

However, traditional relational database management systems are not optimized for big data analytics. Remember that they were designed to meet the small random reads and writes required by OLTP rather than the sequential reads an SQL query may demand. To meet those different needs, Greenplum developed a massively parallel processing (MPP) system, where performance and scalability are key elements. Again, Greenplum illustrates a new architecture that is needed to meet big data application requirements.

Big data applications come in many flavors, but one constant is that they typically consume vast amounts of storage. Scientific and engineering uses of big data, such as in high performance computing (HPC) scenarios, have been around a long time, but now big data is spreading into mainstream information technology, including entertainment media, health care and the Web.