|
What are they? Why should librarians care? How are they installed and operated? by Michael J. Dargan 21:248 - Information Literacy September 23, 2002 |
![]() |
| Hardware | Proxy Applications |
| Operating Systems | Example: Apache 1.3.x on Windows NT Workstation |
What does a caching proxy
do?
Caching
proxies in libraries have several important functions, but most importantly,
proxies reduce latency, optimize bandwidth use, and allow for monitoring
of traffic for control or statistical purposes. Latency, the time
that passes between the request for a page and its appearance in the browser,
is reduced because much of the information is read from the proxy, which
resides on the local area network, rather than on the Internet. Modern
LANs operate at 100 mbps while a T1 has a capacity of 1.5 mbps. Theoretically,
at least, files retrieved from a local source have the potential to arrive
at a rate that is 65 times greater than from the Internet. This advantage
will vary, of course, according to congestion within and from without the
network. The second advantage, optimized bandwidth, permits more
efficient use of not only the local connection but of every other line
between the local web browser and the host server. Why is this so?
Because a properly sized and configured proxy can have a cache/hit ratio
of up to 80 %, although a more reasonable goal falls into the 50-60% range.
In a normal caching proxy situation, therefore, not only is latency reduced
but only about as half as much traffic occupies the Internet connection.
Yet another interesting quality of caching proxies is their ability to
log file requests by machines using the proxy service. This allows
the proxy administrator to control and measure access to WWW sites.
[top]
Why should a librarian
care about proxies?
Librarians drive cars without knowing or caring about fuel injectors
or gas gauges. They also watch television without understanding
cathode ray tubes; aside from knowing that they are useful, why should
not a librarian leave the proxy management to the technicians and concentrate
on performing traditional functions? Librarians should
know about caching proxy servers so that they can avoid giving still more
control of the library environment to non-professionals who do not share
patron privacy
and service concerns. By definition, a proxy will handle all of the
traffic between the LAN and the WWW and whoever has control of the device
can monitor and record every file requested by each specific machine.
Librarians would not allow the privacy of the patron circulation records
to be placed in the hands of others and allowing non-professional network
technicians to become custodians of patron surfing habits is just as problematic.
Furthermore, the proxy manager controls what content is accessible by the
LAN; most caching proxies can limit access to only selected sites
or can create a no access list which transparently prevents access
to websites. Less important reasons for caching proxies include the
ability to get the most from the limited bandwidth resource as well as
the opportunity to
evaluate
traffic logs to determine the extent of use as well as the most popular
sites. And finally, having a proxy sitting between the Internet and
less sophisticated workstations of the LAN (especially Windows 9x machines)
adds another layer of security.
[top]
How are proxies installed
and operated?
Hardware: When it comes to hardware, size,
power, and speed are always desireable; but after a certain point,
overkill sets in. It is fair to say that any Pentium class processor
can probably handle enough traffic to saturate
a T1 line and that any contemporary PC probably has enough power to
act as a proxy for a T1 connection. More important than buying the
most powerful machine with high-end sound and graphics is the acquisition
of a stable machine that will run reliably with little attention for many
years. Usually, the best place for overkill on a proxy is memory--the
more the better, as cache access from RAM is far faster than from a drive.
And finally, keep in mind that if the proxy goes down, down goes the LAN's
WWW access. [top]
Operating Systems: Most proxy servers
run on either UNIX or UNIX-like clones (e.g., Linux, FreeBSD, etc).
However, the Windows opeating system can also be used to host a proxy.
Most of the librarians reading this material will probably want to use
either Linux or Windows as their operating system. For those who
have never used Linux and are behind a firewall, any version of Windows
since 95 will do, but for security reasons the Windows machines should
operate behind firewalls.
[top]
Proxy Software: Large, complex, networks might
be wise to go with a turnkey product such as Microsoft
Exchange Server. For those who want a few bells and whistles
but have limited budgets copyrighted
products may be sufficient. For those willing to "roll their own,"
Apache
or Squid have been ported to
Windows (originally developed for UNIX) and are very scalable as needs
and administrator sophistication grow. In any case, users who decide
to choose any solution, whether freeware, shareware or retail product,
would be wise to look at software reviews in trade publications for assessments
[top]
Example proxy installation:
Apache
1.3.x on Windows NT Workstation with the Analog
analyzer: There are literally hundreds of possible combinations
of operating systems and caching proxy applications. For the sophistcated
systems librarian the most common solution is some form of Linux box hosting
a Squid. However, while it is possible for a novice to install and
successfully administer such a machine, these instructions recognize that
some users would prefer to use an existing operating system and the simplest
possible proxy. Therefore, we will look at how to install Apache
with the proxy module onto a Windows NT 4.0 workstation. These instructions
should also work for any Windows 9x machine.
Typical steps for installation:
| Analog Log Analyzer |
| Analog Mail List |
| Apache Bibliography |
| Apache Bug Report Page |
| Apache HTTP users list |
| Apache Week |
| comp.infosystems.www.servers.unix |
| comp.infosystems.www.servers.ms-windows |
| LITA Institute |
| Microsoft Exchange Server |
| NT Systems Librarians |
| Proxy Web Servers in Libraries |
| Squid |
| Systems Librarians |
| Tucows (software source) |
| web4lib |