Data center services for medium to large enterprises typically host several petabytes of data on disk drives. Most of this storage houses data residing in tens to hundreds of databases. This data landscape is both growing as well as dynamic; new data-centric applications are constantly added at data centers, while regulatory requirements such as SOX prevent old and unused data from being deleted. Further, the data access characteristics of these applications change constantly. Ensuring peak application throughput at data centers is incumbent upon addressing this dynamic data management problem in a comprehensive fashion. To accommodate the rapid growth in the number of storage devices across data centers and their associated management overhead, data center administrators are inclining more and more toward isolating storage management at data centers using Storage Area Networks (SAN).
Although SANs allow significant isolation of storage management from server management, the storage management problem itself remains complex. Due to the dynamic nature of modern enterprises, the interaction and use of applications, and even data associated with a single application, changes over time. Dynamic changes in the set of "popular" data results in skewed utilization of network storage devices, both in terms of storage space and I/O bandwidth. In statically allocated storage systems, such skewed storage utilization eventually degrades the performance of applications, creating the necessity to buy more storage, thereby resulting in overall cost increment.
To avoid purchasing additional storage when existing storage is under-utilized, data center administrators spend copious amounts of time moving data between storage devices on a regular basis to avoid hotspots,. However, optimal data movement is a complex problem that entails obtaining accurate knowledge of data popularity at the right granularity and choosing from an exponential number of possible target solutions, while ensuring that the volume of data moved is minimal. As a result, manual decision making in large data centers containing several terabytes of data and hundreds or thousands of storage devices is time-consuming, inefficient, and typically results in sub-optimal decisions. Off-the-shelf relational databases contribute to a large portion of these terabytes of data and the manual data management tasks of system administrators mostly involve remapping of database entities (tables, indexes, logs, etc.) to storage devices.
STORM is a database storage management system that combines low-overhead information gathering of database access and storage usage patterns with efficient analysis to generate accurate and timely hints for the administrator regarding data movement operations. Moving a large amount of data between storage devices requires considerable storage bandwidth and time, and although such movement is typically done in periods of low activity, such as night time, it nevertheless runs the risk of affecting the performance of applications. Moreover, such data movement operations are so critical that they are seldom done in unsupervised mode; a longer time implies greater administrator cost. A longer time requirement for the data movement also prompts data center managers to postpone such activities and live with skewed usage for as long as possible. It is therefore critical to minimize the overall data movement in any reconfiguration operation.
STORM addresses the problem of reconfiguration with the primary objective of minimizing total data movement, with the secondary objective of balancing the I/O bandwidth utilization of the storage devices in a SAN system, given storage device capacity constraints. STORM implements a solution to this exponential complexity problem using a two stage greedy heuristic algorithm that provides an acceptable approximate solution. The heuristic tries to move objects of smaller size before choosing to move larger objects (i.e. greedy on size) from storage nodes with higher bandwidth utilizations to storage nodes with lower bandwidth utilization (i.e. greedy on I/O bandwidth utilization).