From Ceph and ZFS to OpenStack Swift, companies have a lot of open source options for storage.
Open source tools for storage are beginning to proliferate and some are garnering strong support in the enterprise. These tools range from software that builds storage appliances to backup solutions, cloud storage tools, and compression packages.
Like most other open source software, open source storage varies in terms of quality and features, which will determine which tools you'll want to use. These software packages have different levels of maturity, which is reflected in their number of proponents and favorable online reviews.
One thing to remember about open source software is that support is limited, unless paid support from a third party is available. You aren’t on your own, since the community is there to help, but limited support can delay resolution of issues.
The most mature open source storage packages are in the appliance code class. Mature software like Ceph does well in the market, while Gluster and Lustre are positively venerable, as is ZFS. At the other end of the maturity spectrum are cloud storage management tools, which are generally very new and still evolving in a market segment that is a moving target.
Open source has yet to address some storage needs, including virtual SAN creation and monitoring. We also need open source tools for software-defined storage, especially in the orchestration/policy-based management area.
Still, with inexpensive COTS hardware underpinning this open source code, we are about to see a substantial portion of the IT industry migrating to the open source approach. While this isn’t good news for the traditional vendors, I expect we’ll see something like the Linux revolution taking place in storage, with the result a much healthier industry driven by a high rate of innovation.
Today, enterprises have more options than expensive, proprietary storage. Continue on to learn about open source storage tools that businesses should consider.
Storage appliance software
Ceph is the clear leader in open source appliance-building code. It is on the brink of delivering true universal storage, with block-IO and NAS filer gateways to the object storage pool. Recent releases, driven by Red Hat and SanDisk, have improved performance, especially with solid-state drives and flash arrays. Ceph is “software-defined storage -ready,” based on its architecture.
Gluster and Lustre are more traditional scaled-out file systems. They both have large followings, especially in the high-performance computing community and also are well supported. One major aim of both solutions, however, is to parallelize storage access to boost performance. The advent of all-flash arrays with millions of IOPS more than satisfies this need, so expect some shift to alternatives as a result.
ZFS is a file system popular for its data integrity features and the added benefit of support of most Linux distributions. FreeNAS is a full-featured NAS solution with encryption, replication and snapshots. Finally, we shouldn’t forget those stalwarts of our industry, NFS and Samba, which are rock-solid for simple deployments.
Cloud storage software
Related to the appliance software is cloud storage software such as OpenStack’s Swift and Cinder. These are broad-scoped solutions and provide object and bloc-IO storage respectively. They are classed “7 of 8” in OpenStack’s maturity ratings. Swift is experiencing strong competition from Ceph, which has been well integrated with the rest of OpenStack.
Swift and Cinder are intended for “internal use only” within OpenStack. There are also packages for shared file systems (Manila) and databases (Trove), but these are new and lower down the maturity curve.
Open Source Storage (OSS) has taken a different tack on open source. The company offers a cloud storage service based entirely on open source code, running on COTS hardware. It can be done! OSS also sells turn-key appliances using its open source code base.
Storage management packages
We are just starting to see open source policy-driven storage management tools enter the market. Examples include Libvirt, which allows storage pools to be built and accessed from hypervisors such as Xen, KVM and VMware. Online Hierarchical Storage Manager is a policy-driven tool that optimizes disk usage by moving files from hot to cold storage. This is an important tool as we migrate from hard-disk primary storage to using solid-state drives.
Last year, EMC released its ViPR code into the open source community as CoprHD (pronounced copperhead). This code planarizes a variety of storage solutions from multiple vendors to create a unified pool of storage.
Generally, there is a lot of for-profit startup activity in the storage management segment. This should provide some options until the open source community catches up.
There are lots of backup solutions in the market, with prices ranging from Lamborghini-level to a golf cart, and there seems little correlation between price and features/performance.
Out of this turmoil, some open source packages have emerged that provide an economic alternative. Amanda can back up whole networks, and has fee-paid support from Carbonite. Bacula and BackupPC are enterprise-class solutions, with compression and other advanced features to save space and network load.
Cloud-based storage gateways
FTPbox allows users to set up storage-supporting services like FTP in the cloud. Syncany operates like Dropbox either on a local server or in a public cloud.
Nuage Labs recently announced that it will make its Cloud Gateway available as open source. This is a well-featured tool that is easy to deploy and should enable the building of gateway appliance-type solutions with COTS hardware.
There is, however, substantial commercial activity in this segment of open source storage. Even the large clouds like AWS and Azure are offering gateway storage solutions. This should mean low prices and strong support, which need to be balanced against self-support for DIY solutions.
Data services include encryption and compression tools. There are a good number of open source products available, but two stand out for usage and features. On the encryption side, the TrueCrypt tool for Linux and Windows is still extremely popular, even though development was discontinued in 2012 (the last version, 7.1a, is still available for download). VeraCrypt is a more recent and still-active code branch of TrueCrypt that addresses some of the known TrueCrypt issues
For compressing data, 7-Zip also is a popular tool that works on Windows and Linux and, by handling a broad set of popular compression methods, can cope with the traffic generated in large operations.
Monitoring tools also are available from the open source community. Cacti is a versatile monitor with extensive graphing capability. Nagios has been around a while and is less graphical, but delivers more details from the monitor. A new player to watch is STOR2RRD, with real-time NAS and SAN monitoring.