Let's look at the cost to deliver the capacity requirements first. In this first entry I am going to keep the feature set and the math relatively straight forward. I'll get into features that can affect my initial formulas in a future entry.
To get started, say that you have an application that needs 10 TB of net useable capacity. How much actual storage do you need to buy to deliver that 10 TB of storage? It seems fairly obvious that you can't just buy 10 1-TB hard drives, install them in the server, and declare the mission accomplished.
First and foremost, you can't run a volume at close to 100% capacity without seeing major performance degradation. It is rare to see a volume at even 30% of capacity, but assume you are being very efficient and running at 50% utilization rates. That means your 10 TB volume now needs to be 15TB. Five more drives needs to be added for a total of 15.
It is reasonable to assume that this application is going to need some form of data protection, either RAID 5 or Raid 6. So, for RAID 5 we are going to need at least 16 drives instead of 15. Of course, the problem with RAID 5 is the amount of time you are exposed to total data loss while a rebuild happens. As a result, it is becoming increasingly common for applications to store data on RAID 6 volumes. RAID 6 uses double parity, so two drives are sacrificed for data protection. That means you will need at least 17 drives.
Next, you are going to want to keep a backup copy of this data in case you do have a catastrophic RAID failure or even an application corruption and need to roll back to a prior copy of the data. To be safe this means a backup copy of the data locally and a backup copy of the data in a remote site.
Let's assume at the remote site you are going to use the storage system's replication capability. First, this means that you didn't buy the cheapest storage system possible, you went at least mid-tier and bought something with replication capabilities. That is an additional expense. It also means you need a like system in the remote site with similar capacity. So now you have 34 drives and two at least mid-range storage systems.
For local backup you could use a less expensive array or copy to tape. To keep the math simple, let's assume you went with an inexpensive system instead of using a duplicate of the same array you are using for the primary application data. Using a second array protects you from a total array failure caused by either software or hardware. You could also go with 2-TB drives to keep costs down.
You would still probably want to run RAID 6 in the second system since rebuild times are even slower on 2-TB drives and less powerful storage systems. That is a third system now and another 9 drives. Finally, there is the software needed to make that copy--if your application has it you could use the built-in utilities that came with it, but you may have to purchase some software outright.
Even the most basic 10 TB requirement and using the most basic of hardware components, you end up with three systems and 43 drives on just one application. We haven't even discussed if those 1-TB drives are going to provide the performance that you need. You may need more, smaller, faster drives or even some solid state disk drives to meet the applications performance demand.
Finally, there are software features like snapshots, thin provisioning, and deduplication that we need to factor into the calculation as they can affect total capacity consumed. We'll cover performance and efficiency software as well as solutions to the true cost of storage problems in future entries.
Follow Storage Switzerland on Twitter