Evaluating Web Server Log Analysis Tools David Strom firstname.lastname@example.org SD’98 2/13/98
Summary • Examine different log files • What you can and can’t learn from your logs • Pros and cons of various tools SD'98 (c) David Strom, Inc.
Different types of log files • Access • Error • Referral • Other SD'98 (c) David Strom, Inc.
Access logs • Domain name • Date, time • Server command processed and result • URL of visitor • Bytes transmitted SD'98 (c) David Strom, Inc.
Sample access log data • rm258.fav.usu.edu [31/May/1995:09:03:23 +0600] "GET /NEI.html HTTP/1.0" 302 396 • rm258.fav.usu.edu [31/May/1995:09:03:28 +0600] "GET /xculture/nei/nei.html HTTP/1.0" 200 2114 • rm258.fav.usu.edu [31/May/1995:09:03:30 +0600] "GET /gifs/sedlbutton.gif HTTP/1.0" 200 1336 • 220.127.116.11 [31/May/1995:09:20:32 +0600] "GET /RELs.html HTTP/1.0" 304 0 • Leslie-Francis.tenet.edu [31/May/1995:09:36:06 +0600] "GET / HTTP/1.0" 200 1867 • ls973.ulib.albany.edu [31/May/1995:09:40:52 +0600] "GET /viii1.html HTTP/1.0" 404 244 SD'98 (c) David Strom, Inc.
Errors reported in your logs • Clients that time out (or leave in frustration!) • Scripts that don’t produce any output • Server bugs • User authentication or configuration problems SD'98 (c) David Strom, Inc.
Sample error log data • [Thu May 30 07:25:32 1996] send timed out for bamberg.sedl.org • [Thu May 30 07:57:41 1996] send timed out for kenya.sedl.org • [Thu May 30 08:23:11 1996] send timed out for ppp092.kyoto-inet.or.jp • [Thu May 30 09:15:52 1996] access to /usr/local/www/htdocs/scimath/compass/vol03 failed for 18.104.22.168, reason: File does not exist • [Thu May 30 09:57:56 1996] send timed out for dd10-048.compuserve.com • [Thu May 30 10:47:25 1996] read timed out for ncia110b.ncia.net SD'98 (c) David Strom, Inc.
Referral logs • Who links to your site? • Who downloads your pages? SD'98 (c) David Strom, Inc.
Sample referral log data • http://www.isisnet.com/ ->/change/welcome.html • http://www.ipl.org/ref/RR/EDU/Research-rr.html ->/welcome.html • http://www.tenet.edu/snp/main.html ->/policy/networks/toc.html • http://www.tenet.edu/new/main.html ->/policy/networks/toc.html • http://guide-p.infoseek.com/NS/Titles?qt=teacher+training ->/resources/SCIMAST/announcement.html • http://www.tenet.edu/new/main.html ->/policy/networks/toc.html • http://www.tenet.edu/new/main.html ->/policy/networks/toc.html • http://www.nwrel.org/national/regional-labs.html ->/welcome.html SD'98 (c) David Strom, Inc.
Common log format • Output by most standard servers • Needed by most third-party log analyzers • hoohoo.ncsa.uiuc.edu/docs/setup/httpd/Overview.html SD'98 (c) David Strom, Inc.
Extended/custom log formats • Log whatever you wish in whatever order you wish • Useful if you will read them regularly! • But can’t work with the analyzers • Now in IIS v4, NSCP v3, others. SD'98 (c) David Strom, Inc.
What you can learn from your log files • Hits per day • Domain origins • The path people take in and around your web • Problem areas SD'98 (c) David Strom, Inc.
HITS • (How Idiots Track Success) • Nobody uses this word anymore • Doesn’t really measure individual users, just access • Catching servers and proxies mess up these statistics SD'98 (c) David Strom, Inc.
Domain origins • Where users are coming from -- sometimes • Just because they are from ibm.net doesn’t mean they work at IBM! • Forgotten accounts, friends and family using the account • Hacked user names • Proxies don’t help here either SD'98 (c) David Strom, Inc.
The path people take in and around your web • Search engines help sometimes • Which search site was the most popular front door • Who links to you and why • Is there a pattern or a random walk? SD'98 (c) David Strom, Inc.
Problem areas to deal with • Broken links (locally) • Broken outbound links • Time outs (sunspots?) SD'98 (c) David Strom, Inc.
What you can’t learn from your logs • Who are these people, anyway? • No specific user names • Is it a bot or a real human? • How long did they view a page? • Most people don’t spend much time on your web • Where did they go visit next? SD'98 (c) David Strom, Inc.
What technologies are available? • Built-in analyzer tools • Sites that capture user info • Secure sites with registration • Build your own from perl • Third-party tools SD'98 (c) David Strom, Inc.
Built-in tools • WebSite, website.ora.com • IIS with Site Server, www.microsoft.com/iis • Netscape servers, www.netscape.com • Easy to use but limited SD'98 (c) David Strom, Inc.
WebSite Professional v2 • Win NT, 95 • Best web server for learning about logs, best docs • QuickStats module for instant analysis: • single report but nice set of information • shows today, last two days requests and unique hosts • IP addresses of visitors, average requests/hour SD'98 (c) David Strom, Inc.
IIS Site Server • NT Server v4 w/SP3 only • Lots of preconfigured reports • Two versions, Express and Full (customized reports) • backoffice.microsoft.com/products/siteserver/express/ SD'98 (c) David Strom, Inc.
Netscape v3 web servers • Various NT, Unix versions • Reports for a few variables but nothing too extensive • Best to use a third-party tool here SD'98 (c) David Strom, Inc.
Sites that capture user info • WebCounter, www.digits.com -- third-party hit counter • Someone else does the programming and debugging • But beyond your control SD'98 (c) David Strom, Inc.
Secure sites with registration • You know your users • But many won’t register, or forget their passwords • Requires scripting, database integration, more maintenance SD'98 (c) David Strom, Inc.
Build your own from perl • Needs some in-house support • Works best with Unix-based webs • Examples: • refstats, members.aol.com/htmlguru/refstats.html • surfreport, bienlogic.com/SurfReport/ SD'98 (c) David Strom, Inc.
Third-party tools • WebTracker, www.CQMInc.com/webtrack • WebTrends, www.webtrends.com • net.Genesis, www.netgen.com • MarketWave, www.marketwave.com • IIS Assistant, www.go-iis.com SD'98 (c) David Strom, Inc.
Third-party tools (con’t) • Can make very pretty reports • Customizable • Make sure they support your particular log format • Not that expensive, mostly run on Windows SD'98 (c) David Strom, Inc.