WORKSHOPSManaging Mass-Storage Monstersby Eric Hall |
|
If there's one maxim that holds true, it's "data expands to fill the space available." It seems that no matter how much hard-disk space you have, people will find lots of creative ways to fill it. This unfortunate truism leads to an eternal search for more and better ways of maximizing storage options.
This search doesn't have to lead to your continually buying more and larger disks, but instead can be accepted as the need for better storage management strategies and procedures. Instead of trying to fight fire with fire, focus on building a comprehensive, multilevel strategy that will provide for growth. Finding a way to manage the problem will yield much more satisfaction than constantly trying to fight the symptoms. Of course, this often is easier said than done. Although it's easy to pay lip service to the desire for better mass-storage facilities, it's often difficult to take the time and energy required to architect a flexible solution to the problems. Even if you are able to piece together an effective strategy, getting management to pay for something so esoteric can be difficult. Additionally, there's the implementation, followed by the routine maintenance and management, which can be boring as hell, to say the least. We can't buy these products for you nor can we help you personally with the day-to-day management tasks, but we can help you design your strategic solution. In addition, we will offer various tips we've collected over the years. Hard Drives Are the Root of All Evil Let's face facts: The more drives you have, the harder your life is. You have to back them up. You increase your exposure to the negative effects of downtime that eventual failure is sure to bring. You also can spend lots of money, even if drives are cheap on a one-off basis. Instead of buying more drives, perhaps you should be buying fewer of them. We don't mean you should consolidate many small devices into a few large ones (though this is often a good idea), but you should find a way to minimize the amount of front-line magnetic storage available on your network. If data expands to fill the space available, then the necessary correlation is that you can minimize the amount of "necessary" data if you also reduce the amount of available storage. This just isn't true, of course, but it does frame our most fundamental position, which is that data needs to be prioritized before it can be managed effectively. The best way to prioritize is to eliminate. Do you really need a hard drive on each PC, or could you get by with a better-managed server-based storage plan? If you manage applications and user data more efficiently, you will reap more rewards in stability, which means fewer disasters. There are opposing arguments that say it is better to spread your risk by putting local drives in every system, but this doesn't hold up in the long term. Although it's true that if the server crashes all PCs attached to it also are knocked out, you can minimize these risks if the system is architected correctly. Having drives on each system means (eventually) you will have failures on each system that will cost you much more in labor than a single widespread failure would. In addition, it's easier to prevent a single failure than it is to prevent hundreds of them. When you design a centralized storage mechanism, it's important to recognize that there are several types of disk I/O generated. At one extreme, there are small files that are loaded frequently, calling for fast random access. At the other end, there are large files that are loaded infrequently, but demand large throughput when they are loaded. For the small files, what you want more than anything is fast seek times. The disk will need to shoot from one corner to the other quickly, since several simultaneous requests for small pieces of data are likely to occur. Throughput is irrelevant, since a large pipe won't be filled by these short bursts. These types of files generally are word-processing documents, spreadsheets and many of the most common applications. Since these files are accessed often, you can store many of them together on the same disk without having much of an impact on performance--assuming you purchased very fast drives. By using disks that support fast seek times, the many requests will be satisfied quickly, allowing for good overall performance. For large databases and sequential files, however, the opposite is true. These files tend to be large blocks of data that are not opened and closed quickly, but instead are loaded once or twice a day and then searched heavily. If someone needs a report or a query executed, you need maximum throughput, as since a quick return of all the data will make the entire operation faster. You don't care about seek time because multiple random requests aren't as likely as raw reads of huge chunks of data. Which method is suitable for you? Both are, undoubtedly. Your best bet is to set up two separate disk systems, each optimized for its specific purpose. Fault Tolerance Once you've defined your distribution of media, you'll want to ensure it doesn't go down or, if it does crash, you'll want to minimize the impact on end users. Your best bet is to create your disk farms using Redundant Arrays of Inexpensive Disks (RAID) Level 5 arrays, and then mirror them using RAID 1. This becomes an indestructible setup (as long as you have mirrored servers and power supplies as well).
|
|
Return To The Table Of Contents
Updated October 8, 1996













