Web Caching Proxy Servers:
What are they?
Why should librarians care?
How are they installed and operated?
by
Michael J. Dargan
21:248 - Information Literacy
September 23, 2002
Michael J. Dargan

What does a caching proxy do?

Why should librarians know about caching proxies?

How can librarians find information about caching proxies?

How are web caching proxies installed and operated?

Hardware Proxy Applications
Operating Systems Example:  Apache 1.3.x on Windows NT Workstation

What does a caching proxy do?
Caching proxies in libraries have several important functions, but most importantly, proxies reduce latency, optimize bandwidth use, and allow for monitoring of traffic for control or statistical purposes.  Latency, the time that passes between the request for a page and its appearance in the browser, is reduced because much of the information is read from the proxy, which resides on the local area network, rather than on the Internet.  Modern LANs operate at 100 mbps while a T1 has a capacity of 1.5 mbps.  Theoretically, at least, files retrieved from a local source have the potential to arrive at a rate that is 65 times greater than from the Internet.  This advantage will vary, of course, according to congestion within and from without the network.  The second advantage, optimized bandwidth, permits more efficient use of not only the local connection but of every other line between the local web browser and the host server.  Why is this so?  Because a properly sized and configured proxy can have a cache/hit ratio of up to 80 %, although a more reasonable goal falls into the 50-60% range.  In a normal caching proxy situation, therefore, not only is latency reduced but only about as half as much traffic occupies the Internet connection.  Yet another interesting quality of caching proxies is their ability to log file requests by machines using the proxy service.  This allows the proxy administrator to control and measure access to WWW sites.
[top]

Why should a librarian care about proxies?
Librarians drive cars without knowing or caring about fuel injectors or gas gauges.   They also watch television without understanding cathode ray tubes; aside from knowing that they are useful, why should not a librarian leave the proxy management to the technicians and concentrate on performing  traditional functions?   Librarians should know about caching proxy servers so that they can avoid giving still more control of the library environment to non-professionals who do not share patron privacy and service concerns.  By definition, a proxy will handle all of the traffic between the LAN and the WWW and whoever has control of the device can monitor and record every file requested by each specific machine.  Librarians would not allow the privacy of the patron circulation records to be placed in the hands of others and allowing non-professional network technicians to become custodians of patron surfing habits is just as problematic.  Furthermore, the proxy manager controls what content is accessible by the LAN; most caching proxies can limit access to only selected sites or can create a no access list which transparently prevents access to websites.  Less important reasons for caching proxies include the ability to get the most from the limited bandwidth resource as well as the opportunity to evaluate traffic logs to determine the extent of use as well as the most popular sites.  And finally, having a proxy sitting between the Internet and less sophisticated workstations of the LAN (especially Windows 9x machines) adds another layer of security.
[top]

How are proxies installed and operated?
Hardware:  When it comes to hardware, size, power, and speed are always desireable;  but after a certain point, overkill sets in.  It is fair to say that any Pentium class processor can probably handle enough traffic to saturate a T1 line and that any contemporary PC probably has enough power to act as a proxy for a T1 connection.  More important than buying the most powerful machine with high-end sound and graphics is the acquisition of a stable machine that will run reliably with little attention for many years.  Usually, the best place for overkill on a proxy is memory--the more the better, as cache access from RAM is far faster than from a drive.  And finally, keep in mind that if the proxy goes down, down goes the LAN's WWW access.  [top]

Operating Systems:  Most proxy servers run on either UNIX or UNIX-like clones (e.g., Linux, FreeBSD, etc).  However, the Windows opeating system can also be used to host a proxy.  Most of the librarians reading this material will probably want to use either Linux or Windows as their operating system.  For those who have never used Linux and are behind a firewall, any version of Windows since 95 will do, but for security reasons the Windows machines should operate behind firewalls.
[top]

Proxy Software:  Large, complex, networks might be wise to go with a turnkey product such as Microsoft Exchange Server.   For those who want a few bells and whistles but have limited budgets copyrighted products may be sufficient.  For those willing to "roll their own," Apache or Squid have been ported to Windows (originally developed for UNIX) and are very scalable as needs and administrator sophistication grow.  In any case, users who decide to choose any solution, whether freeware, shareware or retail product, would be wise to look at software reviews in trade publications for assessments
[top]

Example proxy installation: Apache 1.3.x on Windows NT Workstation with the Analog analyzer:  There are literally hundreds of possible combinations of operating systems and caching proxy applications.  For the sophistcated systems librarian the most common solution is some form of Linux box hosting a Squid.  However, while it is possible for a novice to install and successfully administer such a machine, these instructions recognize that some users would prefer to use an existing operating system and the simplest possible proxy.  Therefore, we will look at how to install Apache with the proxy module onto a Windows NT 4.0 workstation.  These instructions should also work for any Windows 9x machine.
Typical steps for installation:

[top]

Web Caching Proxy Resources:

Analog Log Analyzer
Analog Mail List
Apache Bibliography
Apache Bug Report Page
Apache HTTP users list
Apache Week
comp.infosystems.www.servers.unix
comp.infosystems.www.servers.ms-windows
LITA Institute
Microsoft Exchange Server
NT Systems Librarians
Proxy Web Servers in Libraries
Squid
Systems Librarians
Tucows (software source)
web4lib
[top]


Please submit reactions, questions, suggestions to Michael J. Dargan
September 23, 2002