• 01/12/2012
    9:27 AM
  • Rating: 
    0 votes
    Vote up!
    Vote down!

GridIron Systems: Mining Big Data 'Gold' in a Flash

Trends in the IT industry sometimes resemble gold rushes as vendors pan for revenue "nuggets." The use of solid state devices (SSDs)--most notably, flash memory--is the central point of one of these, but just as with the real 19th century gold rushes in California and Alaska, not all prospectors (that is, vendors) will be successful. Where the claims are staked can make all the difference in the world, and GridIron Systems is staking one with a focus on accelerating big data analyses.

GridIron recognizes the different requirements of big data workloads and has designed its algorithms to take advantage of the differences. Traditional tiering works best with large, stable data sets, where taking some time to move data from one group to another is reasonable. Many big data workloads require concurrent bandwidth to process data more quickly (which GridIron delivers) than traditional tiering can handle.

GridIron Systems' solution is the TurboCharger, an SSD-based (primarily flash, but with as little RAM as necessary) appliance that resides in storage area network (SAN) between servers and storage arrays. GridIron's objective is to provide solid state performance to data in the SAN without requiring IT administrators to have to change a thing--no software, database, server, storage or process changes of any sort are required. This is important not only in that the GridIron TurboCharger can be deployed without administrative burden, but it also lets IT feel comfortable in knowing that they could remove the TurboCharger if necessary (although performance would revert to the pre-TurboCharger state).

GridIron currently offers two models of the TurboCharger appliance--the GT1100 with an SSD capacity of 2.5 Tbytes and the GT1100A with an SSD capacity of 6.5 Tbytes. Each has a bandwidth of 1.6 Gbps and 100K IOPS.

Note that since the GridIron TurboCharger is entirely separate from servers and storage, current arrays can still be used; the TurboCharger is designed to complement existing systems and does not have to hold all the big data simultaneously. This means that the current storage array itself could have a tier 0 SSD layer, a Tier 1 FC/SAS layer, and a Tier 2 SATA layer. This external SAN-based approach frees up server and storage system processing from the complexity and mechanics of SSD operation and management.

So how can simply having additional SSDs as a front end in an appliance lead to GridIron's claims of speeding up applications two to 10 times and reducing read latency from 10 times to 100 times? There are some hardware advantages in GridIron's approach, as the appliance can add concurrent bandwidth, enabling multiple applications to concurrently access the same array without interference. But GridIron's secret sauce lies in proprietary software algorithms and heuristics that enable better caching for performance enhancing data.

Obviously, GridIron does not make a lot of details available, but basically the TurboCharger appliance allows big data to be read cached (as big data analyses are on data that has been captured, or already written to disk). The TurboCharger examines I/O patterns quickly and sees what, when and how big data is actually used. This can be done in real time, which is important because tiering in an array often examines patterns that occur over a day or more and this is simply too slow to be effective for many big data applications.

In addition, GridIron provides the concurrent bandwidth among applications to prevent "thrashing," which occurs because of resource contention (that is, a storage bottleneck), meaning that less work gets done as servers spend more time waiting for those resources. The effect of this concurrent bandwidth capability is conceptually equivalent to maintaining a real-time DBA whose sole function is to continuously relay out data sets to match server processing demand, at no performance cost, and that is continually responsive to changes in usage or data growth.

Among the benefits of a GridIron deployment are that both CPU and storage utilization are maximized. Moreover, the same storage array can be used for both production and data warehousing applications. Even though GridIron does not cache writes, it can improve write performance indirectly by offloading the read I/O work that the array would have had to perform otherwise, thus allowing the array to process the writes more efficiently.

Log in or Register to post comments