_______
How Web Analytics Works
Under The Hood
by Jay Neuman

.     Print Friendly Version

This article is an excerpt from: The Complete Internet Marketer: A Practical
Guide To Everything You Need To Know About Marketing Online


Web Analytics is the software used to measure activity on your website.  In this
article, you will learn what Web Analytics is and how to use it.  Web Analytics
comes in two varieties,
log file parsing and page tagging.  You will learn how
both work under the hood.


The ABC’s Of Log File Parsing

Log file parsing software dominated the Web Analytics market from 1995
through 2000.  During this period, all of the fundamental website metrics were
defined in terms of information that can be obtained from web server logs.  The
terminology developed at that time focused on the technical aspects of serving
web pages and tracking user sessions.  This technological (versus business)
focus set the tone for Web Analytics jargon, which continues to this day.

Web servers collect information about visits to your website in log files.  These
are called web server logs.  Every time a visitor enters the website an entry is
recorded in the log file.  Every time a link is clicked an entry is recorded in the
log file.  The web server log provides you with a history of every click that
happens on the site.  

In its raw form, the log file is a big text file.  It does not really mean much to
look at it.  That is where Web Analytics comes in.  The software will splice up the
log file into discrete pieces of meaningful information and store it in a database.  
Once the key information is in the database, the software is able to analyze it to
identify patterns in the data and generate reports.  The process of slicing up a
text file into meaningful chunks of information is called
parsing.

The chief advantage of log file parsing, over page tagging, is in its ability to
accurately report on website diagnostics.  Diagnostic information, such as failed
page loads, is found in the web server logs.  Page tagging does not have access
to it.

Since it is reading information from your web server logs, log file parsing
software must be installed on-site.


Key Data Collected in the Web Server Log

The first step to understanding how log file parsing works is to look at the
information that is collected by your web server.  Figure 13.1 shows the most
important information collected by your web server.  These are the building
blocks of your reports.  


























With these basic pieces of information, the Web Analytics software is able to
calculate a tremendous amount of information about your website visitors and
their activity on the website.  Some of the most important things log file parsing
allows you to accomplish are:

Provide basic site diagnostics, to keep your website functioning properly
Identify the sources and volume of web traffic, to target marketing efforts
Identify usage patterns, to optimize site content
Measure click-thru from marketing efforts, to increase ROI


Problems with Log File Parsing

Log file parsing provides a tremendous amount of information to support your Internet Marketing program.  However, there are some challenges that limit the
effectiveness of this method.  The four biggest issues are as follows.

1.        Dynamically Assigned IP Addresses

The first problem companies had with log file parsing was a difficulty identifying
unique users.  The original method for identifying a user on the website was to
use the IP address of the user’s computer.  However, most people are not
connected directly to the Internet.  They connect to the Internet through an
Internet Service Provider (ISP) such as AOL.  So the IP address is coming from
AOL not from the actual user.  Additionally, the ISP will dynamically assign a
new IP address with each click.  This is good for the ISP, but it makes it very
difficult to identify unique visitors on the website.

There was a solution to this problem.  It was to make user cookies a standard
practice for websites.  When a user enters the website, a cookie is placed on his
computer.  Then, with each subsequent click on the site, the web log records
both the IP address and the cookie.  That allows all of the hits during the visit to
be associated with that specific user.  However, this method still does not work
for people who have disabled cookies on their computers.  For these cases, the
imperfect method of using IP addresses is all that can be done using the log file
information.

2.        Page Caching

The second problem encountered with log file parsing would prove to be even
more difficult than the first.  As soon as a company’s website becomes popular,
they start experiencing performance drains on their web servers.  During
periods of high traffic, this means customers may have to wait a long time for
pages to be displayed, because the server is also processing the requests of
many other customers at the same time.  In order to optimize the performance
of their websites, companies started saving copies of pages being served in a
virtual memory storage, called a cache.  That way, if the same page is
requested again, it will be served from the cache.  This results in tremendous
performance gains which both improves user experience and saves money.  So
page caching quickly became an indispensable practice.  However, when a page
is served from the cache, it does not record an entry in the log file.  Therefore, it
is impossible to accurately record site visits when page caching is being used.

3.        Outsourcing Web Analytics

The third challenge confronting companies using log file parsing was the desire
to have another company perform their analytics for them.  Web analytics is a
somewhat technical endeavor.  Not all companies are able to dedicate in-house
staff to it.  On the other hand, it is a fairly straightforward process that could
easily be done by an outside vendor.  However, with log file parsing, the
software must directly access the web server logs to work.  That means the
software must be installed in-house on the company’s web servers.  This makes
it difficult to outsource.

4.        Measuring Business Objectives

The fourth problem companies had using log file parsing was that it is difficult to
directly measure whether you are meeting your business objectives online.  The
log file records which pages are being served.  It does not necessarily tell you
what the customer was doing while they were on that page.  For example, you
can measure whether a sale took place on the website, by checking to see if a
confirmation page was served.  But it is difficult to tell what they actually bought,
or how much they spent.  That information is not typically recorded in the log file.


The ABC’s Of Page Tagging

The second method of performing Web Analytics, page tagging, became the
method of choice for marketers after 2001.  Companies were still reeling from
the recession that followed the Dot-Com crash.  Many were looking for a pay-as-
you-go outsourcing solution for their Web Analytics.  Businesses were also
learning how to tie website activity more directly to their marketing objectives.  
They wanted a solution that reported marketing results rather than just the
technical activity on the website.

Page tagging allows companies to overcome the challenges experienced with log
file parsing.  With page tagging, you identify all of the actions you want to
measure on the website.  Then you put a small piece of programming code
(usually Java Script) on every page where those actions occur.  This is called
tagging the page.  When an identified action occurs, the
tag will send a message
to the Web Analytics software recording the action in a database.  As with log file
parsing, analytics is then performed on information in the database to report on
key site metrics

Page tagging is only offered as an outsourced solution.


Going Beyond Log File Parsing

Page tagging has some significant advantages over log file parsing.  For these
reasons it has become the method of choice for companies who are using Web
Analytics as a strategic tool to measure and increase the profitability of their
Internet Marketing programs.  Page tagging overcomes three of the four major
challenges faced by log file parsing.  Identifying unique users still relies on
cookies being enabled on the user’s computer.

1.        Overcomes Page Caching Limitations

With page tagging, the action is recorded by programming code on the web page
itself.  When the web page loads on the user’s computer, the script file runs and
records the identified actions.  This allows companies to overcome the problem
of caching web pages.  Whether a page is served from the web server or the
cache, it will still be recorded when it is loaded by the user’s browser.

Nevertheless, this method has its drawbacks also.  The data collected by page
caching depends on the user’s browser running the script file contained in the
page tag.  This will fail with some percentage of users on the website.  Those
users will then be lost in the reported site metrics.  Those users whose
computers do run the page tag scripts, though, will be recorded accurately.  So,
even though there is missing data in the report, the trends reported will be
accurate.

2.        Enables Outsourcing

As important as overcoming the caching limitation is the ability to outsource Web
Analytics.  Page tagging sends information over the Internet to the Web
Analytics software.  One of the great things about the Internet is that the
software can be literally anywhere in the world.  That means Web Analytics can
be installed on your company’s website without needing to install any software at
all.  You just need to put the tags on your website and direct the output to your
Web Analytics vendor.  Their software will process the information and provide
all the reports for you.

3.        Measures Business Objectives

Since page tagging records actions occurring while a user is viewing the web
page, and not just the log file entry recorded when the page loads, this method
is able to capture more information about the user’s visit.  You can capture
information entered into forms contained on the web page as well as data pulled
from a database into the page view.  Examples of some of the information you
can record with page tagging is:

Responses submitted in online forms
Items put into the shopping cart
Actions taking place within a Flash content element
Behavior occurring within a page view, such as scrolling down or
accessing an onsite utility


Problems with Page Tagging

It would be nice if there was a perfect world of clean data.  Unfortunately, there
are always tradeoffs.  As with log file parsing, there are also shortcomings to
page tagging.

The biggest shortcoming of log file parsing is caused by the source of
information used to generate reports.  Analytics is limited by what is captured in
the web server logs.  In the same way, the shortcomings of page tagging are
also caused by its source of data.  Page tagging only records information sent
from the user’s browser once a page loads.  There are two significant drawbacks:

1.        Missing Visits

The first drawback to page tagging was already discussed.  It relies on
information captured by a script file running while the page is active on the user’
s computer.  Therefore, it will be missing data from users with browsers that fail
to run the script file.  

2.        Unable to Run Site Diagnostics

A second, and more significant problem with page tagging is the inability to run
certain site diagnostics.  Page tagging can only report successful page loads for
computers that successfully run the script file contained in the tag.  Therefore, it
is unable to record failed requests, such as broken links.  It also is unable to
provide the complete picture of site traffic provided by the web server logs.  

Because of this drawback, it is not uncommon for companies to set up a basic
log file parsing solution to measure site diagnostics, while using page tagging to
measure their business objectives.


Website Traffic Metrics

You now know how the two methods of Web Analytics work.  These methods
both start with basic data coming from a user’s visit on your website.  That data
is then assembled into meaningful information that can be compiled into reports
measuring the success of your website.  The only thing remaining to understand
how Web Analytics works is to see what the basic building blocks of a web traffic
report are.  We conclude this chapter with a brief overview of the basic metrics
used to create Web Analytics reports.  In the next chapter, we will take a look at
how these building blocks can be assembled to create your website usage
reports.

1.        Hit

A hit is the very first metric used to measure website activity.  It is also the
simplest metric to calculate.  A hit is simply one entry in the web server log.  In
the very first websites, each web page might be no more than a simple HTML
page with text on it.  In this simple page, there are no images or other files
associated with the web page.  So each web page has only one single entry in
the log.  That translates into one hit for each page viewed on the website.

That quickly changed.  Today, there are very few web pages that contain
nothing except HTML code and text.  As we’ve seen above, you may have
pictures, graphic images, movies or other media on a single web page.  Each
one of these will record a separate entry in the log file.  Therefore, each time a
page is viewed, there will be many “hits” recorded in the log.  For this reason, a
hit is not really a useful metric any longer.

2.        Page View

A page view is one complete web page loaded to a user’s browser.  In the web
server log, a page view consists of the HTML file for the web page plus all the
associated graphics and other files associated with that page.  A page view is
made up of one or more hits.

3.        Visit / Session

The words visit and session are used interchangeably.  It refers to all of the
pages viewed by a single user at one sitting.  The session is identified by finding
all of the hits for a given user that occur within a specified period of time from
each other.  Typically, a half hour is used as the cutoff.  In other words, a
session is calculated by stringing together all of the hits for a given user, where
each hit occurs no longer than 30 minutes from the one immediately before it.  
The result is a complete session.

4.        Unique Visitor

A unique visitor is a visitor to the website who can be uniquely identified.  That
way if the same visitor returns multiple times, you can measure his activity over
time.  Unique visitors are typically identified by the user cookie.  As discussed
above, the older method of using the IP address is not a reliable method for
measuring unique visitors.  It is possible that a unique visitor can actually be
multiple persons.  In the case when a family or multiple employees at a
company are using the same computer, they will all have the same user cookie.

5.        Authenticated User

If the user is required to log in to the website at the start of the visit, they
become an
authenticated user.  

6.        Referring URL

The referring URL is the web page where the link that sent a visitor to your
website is located.  If the user types your URL directly into her web browser, she
will have no referring URL.  These are sometimes called walk-ins.

7.        Entry Page

The first page in a unique visit is called the entry page.  

8.        Exit Page

The last page in a unique visit is called the exit page.


Web Analytics solutions come in many varieties.  There are solutions for small
businesses that provide basic reporting at a low cost.  There are also solutions
for large businesses that provide in-depth, customized reporting and analysis for
a much larger cost.  Whatever size business you have, there is a Web Analytics
solution for you.



==========================
This article is an excerpt from
The Complete Internet Marketer: A Practical
Guide To Everything You Need To Know About Marketing Online by Jay Neuman.

Since 1994, Jay Neuman has been helping businesses as varied as Fortune 500
companies, startup Dot-Coms and nonprofit organizations overcome their
Internet Marketing and Database Marketing challenges.  Jay is currently Sole
Proprietor of the KnExT Consulting Group. -
www.knextconsulting.com.  

He can be reached at
jay.neuman@knextconsulting.com
The Complete
Internet Marketer
 Email:  
Subscribe  Today!
The Complete Internet Marketer Newsletter
receive a FREE Dictionary of Internet Jargon
Buy
The Complete
Internet
Marketer
Price:  $44.95
Easy to follow tutorials,
How-To guides and
real-world tips teach you
everything you need to
know about. . .
Search Engines
Email
Online Advertising
Affiliate Programs
Viral Marketing
Blogs
Web Analytics
Making Money from
your Website or Blog
Designing effective
website usability
Building a successful
Online Store
Building a successful
Small Business
website
Building a successful
Content website
Building a successful
B2B website
Building a successful
Nonprofit Org website
Building a successful
Corporate website
Building a successful
Free Online Service
Becoming profitable
And Much More. . .

Learn More