EII Crosses Data Boundaries
Enterprise information integration provides a single point of access to a melting pot of data sources, in real time.
September 10, 2004
Why not use data warehouses instead? Although similar to EII products in practice, data warehouses play a vastly different role. Whereas EII works with data in real time, data warehouses are designed for historical and analytical applications. Data warehouses require data replication, which in turn requires a host of applications and processes to support the replication, cleansing and categorization of data as it is pulled from corporate sources and pushed into a warehouse. EII products, in contrast, open a window on raw data while leaving it in its place.
But don't scrap your warehouse yet: Although some EII platforms can replicate data, that is not their primary purpose. And though EII products can do many of the tasks required to create and maintain a data warehouse, they cannot replace a large-scale warehouse because of EII's focus on real-time integration and lack of comprehensive ETL (extract/transform/load) functionality.
Another term commonly heard in the same breath as EII is virtual database. The implication is that database tables--such as orders, customers and inventory--from multiple sources will be magically accessible over a virtual database, represented by an EII platform. Rather, virtual databases are containers, like physical databases, that group data constructs, such as tables and views, and provide an interface for application and developer access. Nice, to be sure, but not the be-all and end-all of EII.
Give the People What They Want
A number of benefits are driving EII adoption. Some are business-related, while others are focused solely on IT.It's a classic struggle: Business users want continuous, immediate access to 360-degree views of customers and other enterprise data. Within NWC Inc.--our Web-based widget manufacturing company and 24/7 business-applications lab (see inc.networkcomputing.com)--we want a consolidated view of widget orders, but we need to pull data from three unique sources, both structured and unstructured, to achieve that.
Before and After EIIClick to Enlarge |
From an IT perspective, the application development, deployment of client-connectivity tools and increase in network traffic are major stumbling blocks when it comes to giving business users the powerful productivity boost that is ubiquitous, real-time data access.
Let's assume that business needs justify a build-it-yourself approach. Although custom development of applications to provide comprehensive data access isn't impossible--it's done all over the world--the build-it-yourself tack is inefficient and brings hefty ongoing administration and maintenance costs. EII platforms reduce the amount of client software necessary to provide connectivity to data sources, slashing the cost of deployment while making upkeep much easier.
In addition, network traffic can be greatly reduced on local segments by the introduction of EII because large data sets are distilled to only the desired information on the server rather than on the client. For example, to achieve the goal of bringing together data from an Oracle RDBMS and a Microsoft SQL Server sans EII, you would need to retrieve data from both databases, pull the results over the network and then join them on the client. And results wouldn't be delivered to the desktop in a single stream; rather, they'd dribble in, in blocks of 10 or 20. So it would take a whole lot of network traffic to get the right data to the right place.If you had the right EII platform, this information joining would occur on the server, and only the relevant data would traverse the network and land on the client desktop. Full data streams do still travel from the servers on which the data sources reside to the EII platform, but the typically fatter pipes of a data center should easily handle that traffic flow.
Notice we said the right EII platform. This is important: The market takes two approaches to EII, but only one path provides this cost-based optimization within a core feature set. Products lacking autonomous optimization generally are XML-focused and more concerned with presentation than optimization of retrieval from back-end sources. That makes XML-focused EII platforms a perfect fit for organizations that have made standardization on XML as an interface a top priority, but a less-than-optimal solution for those organizations for which bandwidth and legacy connectivity are problems.
When we tested seven EII platforms at our NWC Inc. lab in Green Bay, Wis. (See "Don't Fear the Data,"), we discovered a land where aggregation and presentation of unlike data sources is a shining reality, though the journey can be fraught with frustration and adversity.
The beauty of the products we tested is that these disparate pools of data aren't restricted to relational data sets. Corporate-class EII suites provide access to hierarchical data, such as XML, and to enterprise-class application data that, while stored in database systems, requires intimate knowledge of schemas to access directly; it's therefore usually accessed over vendor-provided APIs. More advanced EII offerings provide federation of message queues (IBM MQ, TIBCO RV) as well as data sources accessed over non-Web protocols, such as FTP and SMB. Hallelujah, brother.
Once data is federated, EII platforms must provide an access mechanism for developers and third-party applications. Modes of access include ODBC/JDBC, as well as XML over HTTP and SOAP. This particular nugget of EII functionality is a huge differentiator between product sets and should be a primary factor when you're choosing a product: Although EII implementations can help plot a strategic path to fully embracing the XML world, many applications require ODBC or JDBC connectivity. If long-term support is required for these products, tread the path of XML-focused EII offerings carefully.A secondary but growing purpose of EII is to provide a mechanism for transforming data on the server rather than on the client. The most common format into which data is transformed is XML. But EII products can do much more than just a simple transformation of relational data to XML--many also can do the reverse, providing a mechanism for incorporating sources of hierarchical data into reporting tools and custom-developed applications.
Do the Math
Cost-based query optimization is an ugly relative of relational algebra. Its arcane techniques are well-understood only by the most savvy database administrators. For the query-challenged among us, some EII platforms provide rudimentary database-admin-in-a-box functionality that will help free up your database gurus to do whatever database administrators do.
Although we found caching the least-developed feature set in EII products, even the limited caching capabilities found in most EII suites will provide a performance boost. That's a definite plus for business-intelligence and analysis products that perform high-volume queries on large data sets daily--or hourly.
From a business perspective, EII lets application developers create holistic, composite applications that can increase business-user productivity. EII also lets Excel power users access several data sources through a single interface, which should lower support costs.Centralizing Control
A side benefit of EII is its role in regulatory compliance and data security. Sarbanes-Oxley, for example, has fascinating (if you aren't the one tasked with implementing policies) impacts on data access within the organization in certain situations.
A Fortune 500 company we spoke with illustrated the scale of changes some organizations need to make in database access to comply with SOX. Because of the financial data being stored in certain databases, finely controlling access and, more important, properly logging changes to stored data are priorities. Access to this corporation's internal databases is heavily restricted, and hundreds of apps might have to be changed because of hard-coded user name/passwords. Also, auditors must be able to determine who changed what data and when, which means yet more-detailed logs. The man-hours involved in implementing these changes, let along discovering them, are staggering.
EII is a natural fit as both a tactical and strategic solution to such compliance woes. Because EII platforms typically are deployed in a gateway scenario, they stand between the user and the data like a medieval gargoyle, guarding and protecting the treasures hidden in various vaults within the data-center dungeon.
In addition to basic logging and access control, some EII products can add security mechanisms, such as column- or row-level security that may not be supported directly by the RDBMS vendor.Presenting multiple data sources over a single programmatic interface is an appealing notion, as evidenced by the explosion of Web services (SOAP) and the SOA (Service-Oriented Architecture) model. EII platforms offer such mechanisms using a variety of protocols and languages, such as SOAP, XML over HTTP, HTML, JDBC and ODBC. Although the methods of accessing these federated data sources are myriad, the benefits of standardizing on a single interface include not only cost savings--for training, maintenance and deployment--but also faster time to market and decreased time to troubleshoot connectivity issues.
Some quick figures: If it takes a $43,000-a-year tech-support worker 20 hours to install, test and certify four different ODBC drivers on one desktop distribution and you have four corporate desktop images, that's 80 hours at a cost of $1,600 just to set up the environment so developers and end users, or the applications they utilize, have access to those databases.
Implement an EII suite and that scenario changes to five hours for one ODBC connection, 20 hours for all four desktops, for a grand total of $400--a 75 percent savings in deployment costs alone. Granted, that's still small beans, but if you extrapolate the savings across maintenance and troubleshooting, those numbers are going to shoot up fast. There are also client costs for drivers to take into account. This is particularly true of software in the EAI and business-intelligence spaces. These product types are generally priced with a per-adapter charge, meaning a business-intelligence system that requires access to three systems will most assuredly cost more than an implementation requiring access to only one data source.
Bottom line, EII is still not a no-brainer. First, you need to know thyself, or more specifically, know how many places business information is stored throughout your organization. Unless you've cataloged all your enterprise data sources so you can configure your EII system to recognize their existence, the product isn't going to do much for you.
Furthermore, as our tests showed, product performance can be problematic; you'll need a clustered environment for large volumes of users. And top-of-the-line EII isn't cheap. Our winner cost $140,000 for our modest testing scenario, and hidden fees and add-ons can kill you. Thus, until prices drop, EII tools should be considered purely strategic for all but the very well-heeled; pricing is almost uniformly a per CPU/per adapter model, so using EII as a tactical solution will push the cost above and beyond what you can hope for in savings, even over time. Gartner Research concurs, predicting that through 2008, more than 90 percent of virtual heterogeneous data federation will be for read-only composite applications with low transaction volumes and a limited number of data sources.But we believe the general cost savings and reduction in complexity of deployment make EII apps an intriguing addition to the data center, and we'll be watching this product space closely in the coming year.
Lori MacVittie is a Network Computing senior technology editor working in our Green Bay, Wis., labs. She has been a software developer, a network administrator and a member of the technical architecture team for a global transportation and logistics organization. Write to her at [email protected].
Most enterprises keep order information in one system, customer data in another and inventory numbers in a third. To give users one view of all this data, you'd need a system to connect all three sources and knit the data together.
Yeah, we got that. It's called enterprise information integration, and its promise is tantalizing. The downsides are, predictably, complexity and cost. But considering the resources now spent on federating data--Meta Group estimates that IT organizations devote as much as 60 percent of their budgets to application integration--we think it's worth investigating. We asked Cincom Systems, Composite Software, IBM, Ipedo, MetaMatrix, Snapbridge Software and XAware to standardize access to some disparate data sources we have in our NWC Inc. business-applications lab. It took some time to get data flowing smoothly, but once we did, Composite's CIS rode the crest of the wave to the finish line. IBM's DB2 Information Integrator 8.1 was just behind, next to MetaMatrix's submission. See "Don't Fear the Data," for details.
You May Also Like