Unstructured data is becoming an IT buzz word these days, and it’s something organizations are grappling with: how to get a handle on email, Office documents and PDFs, among other data-clogging sources. Microsoft’s SharePoint document repository was built to provide content collaboration and version control. But as user adoption increases, so does the amount of data that is stored in SharePoint, which critics say can wreak havoc on its infrastructure and lead to poor performance, along with the corresponding management headaches.
In the InformationWeek 2011 State of Enterprise Storage survey, respondents indicated they have between 11T bytes and 50T bytes of unstructured data—primarily from email, says Kurt Marko, an IT industry observer who authored the September 2011 InformationWeek report ECM: Solving the Problem of Unstructured Data. Also telling was that only 20% of respondents said they delete files when the files reach the end of their retention period, according to the report.
In addition to "a general lack of ECM sophistication within many organizations,"’ Marko says, it doesn’t appear many organizations are using "more advanced taxonomies" to deal with content management, likely due to "deficiencies within the software itself." He says one respondent reported that, "to date, we have found significant gaps in taxonomy management capability with SharePoint and other third-party vendors."
Others in the industry echo similar concerns with SharePoint. "The most frustrating thing with SharePoint is it was originally designed as an intranet, so it assumed authentication inside your Active Directory domain," observes David "HT" Kramer, president of Cooperative Computing, a business technology consulting firm in Dallas. However, when users wanted to put SharePoint information on the Internet, the issue became "taking your back office and trying to share it with your front office," he says. Users want to push documents to a SharePoint site, but they have to go through an authentication process before the documents can be worked on, making it more of a hassle. "It’s not clean with non-domain authenticated access, which is a problem when it comes to SharePoint for Internet."
Quest Software, a provider of SharePoint tools, maintains that there are five "common storage performance killers in SharePoint" that organizations face: unstructured data takeover in SQL Server space, where most SharePoint content is saved; that SharePoint is not optimized to house large media files such as videos, images and PowerPoint presentations--meaning users can experience browser timeout, slow Web server performance, and upload and recall failures; old and unused files hogging SQL storage; not building to scale as SharePoint content grows, causing the supporting hardware to become under powered if growth rates were not accurately forecast; and not leveraging Microsoft’s data externalization features that could enhance storage and related performance benefits.
Kramer disagrees with the storage issues, saying that is not one of the primary concerns with SharePoint. "The top three issues are not around storage, because, technically, it’s easy to solve ... You just have to make sure it’s configured right so it can be indexed right and retrieved. Performance issues with SharePoint are more related to how you’ve loaded the app down on top of the storage itself."
One of the three main issues he sees is that organizations need to "think of [SharePoint] more as a database system and less of a file system." The higher the performance requirement to retrieve and put in files, the higher performing the storage subsystem has to be, he adds.
Another issue, Kramer says, is that SharePoint doesn’t cross-document repositories well. "It can control access to them and the workflow ... but we are less about documents and more about information, and we bundle that in a record-oriented fashion." He says it is "very difficult" to set up content in a records-oriented way in SharePoint.
The third main problem Kramer sees is that SharePoint does not do well with external referencing. "If I’m inside a native blog system like Confluence, and want to link to an external source and keep it healthy and usable, that link is no longer available and you will have broken links in the system. SharePoint doesn’t do that well, and it doesn’t give a clean way to create those reference links in first place."
Learn more about Strategy: Hadoop and Big Data by subscribing to Network Computing Pro Reports (free, registration required).