
Corporate.Net
Web Log Analysis: Finding A Recipe For Success
By Dan Backman with
Jeffrey Rubin
Back when TV commercials didn't feature URLs and surfing meant riding the waves, analyzing Web server logs was an afterthought. Your boss was satisfied by a hit count and a list of top 10 hosts that had surfed through your site. All it took was an afternoon's worth of PERL code and a few background cycles on the SPARC 1+
you relegated to serving HTTP. Today, though, Web log analysis tools are everywhere. But why? Is Web log analysis any different than it was years ago?
To
view the Report card.
Although HTTP server logs have changed only slightly since the original HTTPD was written, the demand for results has exploded. Today's advertising-funded Web si
tes rely on usage statistics like TV networks rely on Nielsen rating points. Hit counts, the traditional measure of a Web site's viewership, are all but worthless to advertisers--the right answer to the wrong question. Instead of counting the number of times a page is viewed, they tabulate every inline graphic, Common Gateway Interface (CGI) script and Java class. Advertisers need to know not only how many pages are viewed, but also how many people visit their Web site.
How do you get the most from your log analysis tools? First of all, define your need. We broke down Web server log reporting into two categories: information for Web administrators (who are primarily concerned with the layout of the site and the content development), and information for advertisers (who want concrete information on readership and dem
ographics). Administrators need information such as what pages are most often accessed, what links users are following and where the problems lie. On the other hand, marketers are more interested in demographic information that can sell ads: who is accessing the Web site and where the users are coming from.
We evaluated eight commercial Web server log analysis tools from Aquas, Bien Logic, Cambridge Quality Management (CQM), e.g. Software, Microsoft Corp., net.Genesis Corp., Sane Solutions LLC and WebManage Technologies to see where the chips fell. All products included were tested using access logs from two major Web servers on site at Network Computing's lab at Syracuse University, including a nationwide educational resources clearinghouse. We judged each package not only on performance (how long it took to process the log files and generate a complete usage report), but also the relative ease of use, presentation of the reports and the intelligence of the processing and the relative value of the informa
tion it provides.
Of the contenders, Microsoft's Inters* Market Focus 3.0 delivered the most functionality for a reasonable price--and it operated at an acceptable speed. It goes above and beyon
d standard log file analysis by tracking visits; performing a path analysis of sessions; clearly reporting entry and exit pages and average time per page; and offering a wealth of demographic reports.
e.g. Software's WebTrends 3.0 offers slightly less functionality (no path analysis or clock analysis), but does generate attractive, informative and easy-to-read reports. Coupled with unparalleled ease of use and very good performance and price, WebTrends offers the best value of the products in this review.
Finally, net.Genesis' net.Analysis Pro 2.2 produced the most in-depth reports, providing extremely useful reports for Web administrators and marketing professionals alike. However, this functionality comes at a cost--both in price and in performance.
The Bad News
All in all, we were rather surpri
sed by the results of our tests. Logic would seem to dictate that if you feed the same file into eight programs--the only purpose is to count the hits and interpret the results--the results should be the same. However, this was not the case. Although we expected that all session/visit reports would be unequal (calculating a visit is a matter of some interpretation), even the counts of raw hits were not the same across all programs.
Each program filtered out certain entries, resulting in a spread of hits with counts ranging from 684,302 (WebTrends) to 701,797 (CQM's Web Tracker). Using Solaris' wc -l (word count) command, the log file appeared to have 701,787 total lines. For some reason, Web Tracker seemed to invent 10 lines in the access log. Although the absolute accuracy of these products is questionable, some of these products provide reasonably valuable information--particularly for Webmasters who are interested in usage trends, most hits, pages and approximate user click-paths.
|