I was speaking to a drive manufacturer the other day that again made the statement that when they get supposedly failed hard drives back from storage system vendors, a high percentage of them (80 percent as I recall) work just fine. All they needed was a power cycle and they came back to life. Of course, the remedy for hard drive failure is the same remedy we use for our desktops when something goes wrong. Reboot.
What's the big deal to you? If you have a drive failure you can simply replace the drive with a new one. Its the manufacturer's problem, right? If you have a spare drive sitting on the shelf ready to replace a failed drive then yes, but many data centers don't have extra drives. If you don't have a failed drive you have to let your hot spare take over, let the RAID rebuild happen, order a new drive, pack up the old one and send it back. This all takes time and unless you can get someone else to do it for you, that's time you probably don't have. If the drive could just be rebooted and returned to operation, even if that meant you still had to go through the RAID rebuild, at least you wouldn't have to deal with going through an RMA process to send a drive back that is likely not really bad.
Where this gets interesting is if the drive can be rebooted before forcing a RAID rebuild. For example, let's say one of the drives in your RAID-5 group fails, obviously with XOR calculations you can continue to operate. If the system would reboot the drive prior to going to the global spare and initiated a RAID rebuild, you could save that lengthy rebuild process all together. This would take some intelligence on the part of the storage system to be able to maintain data availability while the reboot of the drive happens, but some suppliers are working on providing this capability.
I'm sure there is some sort of green angle here to make my 'save the planet' friends happy, too. Think of all the carbon we would save by not shipping drives that aren't bad all over the country. There is also the waste involved in manufacturing extra drives that never needed to be made. I'm not sure what each manufacturer does with the failed drives but I'm sure they can't resell them as new.
We all know that IT has to do more with less and there are elaborate presentations from vendors on how their products do that. Taking the simple reboot concept and implementing it into intelligent storage systems could go a long way in increasing productivity and increasing performance.George Crump is president and founder of Storage Switzerland, an IT analyst firm focused on the storage and virtualization segments. With 25 years of experience designing storage solutions for datacenters across the US, he has seen the birth of such technologies as RAID, NAS, ... View Full Bio