SQUID Frequently Asked Questions © 2004 Team Squid, Frequently Asked Questions (with answers!) about the Squid Internet Object Cache software.

You can download the FAQ as , , , , or as a .

About Squid, this FAQ, and other Squid information resources What is Squid?

Squid is a high-performance proxy caching server for web clients, supporting FTP, gopher, and HTTP data objects. Unlike traditional caching software, Squid handles all requests in a single, non-blocking, I/O-driven process. Squid keeps meta data and especially hot objects cached in RAM, caches DNS lookups, supports non-blocking DNS lookups, and implements negative caching of failed requests. Squid supports SSL, extensive access controls, and full request logging. By using the lightweight Internet Cache Protocol, Squid caches can be arranged in a hierarchy or mesh for additional bandwidth savings.

Squid consists of a main server program Squid is derived from the ARPA-funded . What is Internet object caching?

Internet object caching is a way to store requested Internet objects (i.e., data available via the HTTP, FTP, and gopher protocols) on a system closer to the requesting site than to the source. Web browsers can then use the local Squid cache as a proxy HTTP server, reducing access time as well as bandwidth consumption. Why is it called Squid?

Harris' Lament says, ``All the good ones are taken."

We needed to distinguish this new version from the Harvest cache software. Squid was the code name for initial development, and it stuck. What is the latest version of Squid?

Squid is updated often; please see for the most recent versions. Who is responsible for Squid?

Squid is the result of efforts by numerous individuals from the Internet community. of the National Laboratory for Applied Network Research (funded by the National Science Foundation) leads code development. Please see for a list of our excellent contributors. Where can I get Squid?

You can download Squid via FTP from or one of the many worldwide .

Many sushi bars also have Squid. What Operating Systems does Squid support?

The software is designed to operate on any modern Unix system, and is known to work on at least the following platforms: Linux FreeBSD NetBSD OpenBSD BSDI Mac OS/X OSF/Digital Unix/Tru64 IRIX SunOS/Solaris NeXTStep SCO Unix AIX HP-UX

For more specific information, please see . If you encounter any platform-specific problems, please let us know by registering a entry in our . Does Squid run on Windows?

Starting from 2.5, official Squid sources will development environment.

A more complete native Windows port is provided by . Guido Serassio maintains the native NT port of Squid and is actively working on having the needed changes integrated into the standard Squid distribution. Partially based on earlier NT port by Romeo Anghelache. The Cygwin ports reportedly also appears to run on Window 9x/ME but it is not recommended for as a server operating system.

What Squid mailing lists are available?

squid-users@squid-cache.org: general discussions about the Squid cache software. Subscribe via , and also at . squid-users-digest: digested (daily) version of above. Subscribe via squid-announce@squid-cache.org: A receive-only list for announcements of new versions. Subscribe via

We also have a few other mailing lists which are not strictly Squid-related. I can't figure out how to unsubscribe from your mailing list.

All of our mailing lists have ``-subscribe'' and ``-unsubscribe'' addresses that you must use for subscribe and unsubscribe requests. To unsubscribe from the squid-users list, you send a message to What other Squid-related documentation is available?

for information on the Squid software to be published by January 2004. gives information on our operational mesh of caches. (uh, you're reading it). . . Squid documentation in , , , , and another in . . Yeah, its extremely incomplete. I assure you this is the most recent version. ICPv2 -- Protocol ICPv2 -- Application Does Squid support SSL/HTTPS/TLS?

As of version 2.5, Squid can terminate SSL connections. This is perhaps only useful in a surrogate (http accelerator) configuration. You must run configure with Squid also supports these encrypted protocols by ``tunelling'' traffic between clients and servers. In this case, Squid can relay the encrypted bits between a client and a server.

Normally, when your browser comes across an The browser opens an SSL connection directly to the origin server. The browser tunnels the request through Squid with the

The and (expired). What's the legal status of Squid?

Squid is by the University of California San Diego. Squid uses some .

Squid is .

Squid is licensed under the terms of the . Is Squid year-2000 compliant?

We think so. Squid uses the Unix time format for all internal time representations. Potential problem areas are in printing and parsing other time representations. We have made the following fixes in to address the year 2000: timestamps use 4-digit years instead of just 2 digits.

Year-2000 fixes were applied to the following Squid versions: : Year parsing bug fixed for dates in the "Wed Jun 9 01:29:59 1993 GMT" format (Richard Kettlewell). squid-1.1.22: Fixed likely year-2000 bug in ftpget's timestamp parsing (Henrik Nordstrom). squid-1.1.20: Misc fixes (Arjan de Vet).

Patches: . If you are still running 1.1.X, then you should apply this patch to your source and recompile. . .

Squid-2.2 and earlier versions have a . This is not strictly a Year-2000 bug; it would happen on the first day of any year. Can I pay someone for Squid support?

Yep. Please see the . Squid FAQ contributors

The following people have made contributions to this document:

Please send corrections, updates, and comments to: . About This Document

This document is copyrighted (2000) by Duane Wessels.

This document was written in SGML and converted with the .

Most current version of this document can always be found at in HTML, Plain Text, Postscript and SGML formats. Want to contribute? Please write in SGML...

It is easier for us if you send us text which is close to "correct" SGML. The SQUID FAQ currently uses the LINUXDOC DTD. Its probably easiest to follow examples in the this file. Here are the basics:

Use the <url> tag for links, instead of HTML <A HREF ...> <url url="http://www.squid-cache.org" name="Squid Home Page">

Use <em> for emphasis, config options, and pathnames: <em>usr/local/squid/etc/squid.conf</em> <em/cache_peer/

Here is how you do lists: <itemize> <item>foo <item>bar </itemize>

Use <verb>, just like HTML's <PRE> to show unformatted text. Getting and Compiling Squid

You must download a source archive file of the form squid-x.y.z-src.tar.gz (eg, squid-1.1.6-src.tar.gz) from , or. . Context diffs are available for upgrading to new versions. These can be applied with the ). How do I compile Squid?

For % tar xzf squid-1.1.21-src.tar.gz % cd squid-1.1.21 % make

For % tar xzf squid-2.0.RELEASE-src.tar.gz % cd squid-2.0.RELEASE % ./configure % make What kind of compiler do I need?

To compile Squid, you will need an ANSI C compiler. Almost all modern Unix systems come with pre-installed compilers which work just fine. The old If you are uncertain about your system's C compiler, The GNU C compiler is available at . In addition to gcc, you may also want or need to install the What else do I need to compile Squid?

You will need installed on your system. Do you have pre-compiled binaries available?

The developers do not have the resources to make pre-compiled binaries available. Instead, we invest effort into making the source code very portable. Some people have made binary packages available. Please see our .

The site has pre-compiled packages for SGI IRIX.

Squid binaries for .

Squid binaries for

Gurkan Sengun has some available.

Squid binaries for . How do I apply a patch or a diff?

You need the cd squid-2.5.STABLE3 mkdir ../squid-2.5.STABLE4 find . -depth -print | cpio -pdv ../squid-1.1.11 cd ../squid-1.1.11 patch -p1 < /tmp/squid-2.5.STABLE3-STABLE4.diff or alternatively cp -rl squid-2.5.STABLE3 squid-2.5.STABLE4 cd squid-2.5.STABLE4 zcat /tmp/squid-2.5.STABLE3-STABLE4.diff.gz | patch -p1 After the patch has been applied, you must rebuild Squid from the very beginning, i.e.: make distclean ./configure ... make make install

If your , for example. The configure script can take numerous options. The most useful is /usr/local/squid/. To change the default, you could do: % cd squid-x.y.z % ./configure --prefix=/some/other/directory/squid

Type % ./configure --help to see all available options. You will need to specify some of these options to enable or disable certain features. Some options which are used often include: --prefix=PREFIX install architecture-independent files in PREFIX [/usr/local/squid] --enable-dlmalloc[=LIB] Compile & use the malloc package by Doug Lea --enable-gnuregex Compile GNUregex --enable-splaytree Use SPLAY trees to store ACL lists --enable-xmalloc-debug Do some simple malloc debugging --enable-xmalloc-debug-trace Detailed trace of memory allocations --enable-xmalloc-statistics Show malloc statistics in status page --enable-carp Enable CARP support --enable-async-io Do ASYNC disk I/O using threads --enable-icmp Enable ICMP pinging --enable-delay-pools Enable delay pools to limit bandwith usage --enable-mem-gen-trace Do trace of memory stuff --enable-useragent-log Enable logging of User-Agent header --enable-kill-parent-hack Kill parent on shutdown --enable-snmp Enable SNMP monitoring --enable-cachemgr-hostname[=hostname] Make cachemgr.cgi default to this host --enable-arp-acl Enable use of ARP ACL lists (ether address) --enable-htpc Enable HTCP protocol --enable-forw-via-db Enable Forw/Via database --enable-cache-digests Use Cache Digests see http://www.squid-cache.org/Doc/FAQ/FAQ-16.html --enable-err-language=lang Select language for Error pages (see errors dir) undefined reference to __inet_ntoa

by and .

Probably you've recently installed bind 8.x. There is a mismatch between the header files and DNS library that Squid has found. There are a couple of things you can try.

First, try adding src/Makefile. If If that doesn't seem to work, edit your arpa/inet.h file and comment out the following: #define inet_addr __inet_addr #define inet_aton __inet_aton #define inet_lnaof __inet_lnaof #define inet_makeaddr __inet_makeaddr #define inet_neta __inet_neta #define inet_netof __inet_netof #define inet_network __inet_network #define inet_net_ntop __inet_net_ntop #define inet_net_pton __inet_net_pton #define inet_ntoa __inet_ntoa #define inet_pton __inet_pton #define inet_ntop __inet_ntop #define inet_nsap_addr __inet_nsap_addr #define inet_nsap_ntoa __inet_nsap_ntoa How can I get true DNS TTL info into Squid's IP cache?

If you have source for BIND, you can modify it as indicated in the diff below. It causes the global variable _dns_ttl_ to be set with the TTL of the most recent lookup. Then, when you compile Squid, the configure script will look for the _dns_ttl_ symbol in libresolv.a. If found, dnsserver will return the TTL value for every lookup.

This hack was contributed by . diff -ru bind-4.9.4-orig/res/gethnamaddr.c bind-4.9.4/res/gethnamaddr.c --- bind-4.9.4-orig/res/gethnamaddr.c Mon Aug 5 02:31:35 1996 +++ bind-4.9.4/res/gethnamaddr.c Tue Aug 27 15:33:11 1996 @@ -133,6 +133,7 @@ } align; extern int h_errno; +int _dns_ttl_; #ifdef DEBUG static void @@ -223,6 +224,7 @@ host.h_addr_list = h_addr_ptrs; haveanswer = 0; had_error = 0; + _dns_ttl_ = -1; while (ancount-- > 0 && cp < eom && !had_error) { n = dn_expand(answer->buf, eom, cp, bp, buflen); if ((n < 0) || !(*name_ok)(bp)) { @@ -232,8 +234,11 @@ cp += n; /* name */ type = _getshort(cp); cp += INT16SZ; /* type */ - class = _getshort(cp); - cp += INT16SZ + INT32SZ; /* class, TTL */ + class = _getshort(cp); + cp += INT16SZ; /* class */ + if (qtype == T_A && type == T_A) + _dns_ttl_ = _getlong(cp); + cp += INT32SZ; /* TTL */ n = _getshort(cp); cp += INT16SZ; /* len */ if (class != C_IN) {

And here is a patch for BIND-8: *** src/lib/irs/dns_ho.c.orig Tue May 26 21:55:51 1998 --- src/lib/irs/dns_ho.c Tue May 26 21:59:57 1998 *************** *** 87,92 **** --- 87,93 ---- #endif extern int h_errno; + int _dns_ttl_; /* Definitions. */ *************** *** 395,400 **** --- 396,402 ---- pvt->host.h_addr_list = pvt->h_addr_ptrs; haveanswer = 0; had_error = 0; + _dns_ttl_ = -1; while (ancount-- > 0 && cp < eom && !had_error) { n = dn_expand(ansbuf, eom, cp, bp, buflen); if ((n < 0) || !(*name_ok)(bp)) { *************** *** 404,411 **** cp += n; /* name */ type = ns_get16(cp); cp += INT16SZ; /* type */ ! class = ns_get16(cp); ! cp += INT16SZ + INT32SZ; /* class, TTL */ n = ns_get16(cp); cp += INT16SZ; /* len */ if (class != C_IN) { --- 406,416 ---- cp += n; /* name */ type = ns_get16(cp); cp += INT16SZ; /* type */ ! class = _getshort(cp); ! cp += INT16SZ; /* class */ ! if (qtype == T_A && type == T_A) ! _dns_ttl_ = _getlong(cp); ! cp += INT32SZ; /* TTL */ n = ns_get16(cp); cp += INT16SZ; /* len */ if (class != C_IN) { My platform is BSD/OS or BSDI and I can't compile Squid

cache_cf.c: In function `parseConfigFile': cache_cf.c:1353: yacc stack overflow before `token' ...

You may need to upgrade your gcc installation to a more recent version. Check your gcc version with gcc -v If it is earlier than 2.7.2, you might consider upgrading. Problems compiling The following error occurs on Solaris systems using gcc when the Solaris C compiler is not installed: /usr/bin/rm -f libmiscutil.a /usr/bin/false r libmiscutil.a rfc1123.o rfc1738.o util.o ... make[1]: *** [libmiscutil.a] Error 255 make[1]: Leaving directory `/tmp/squid-1.1.11/lib' make: *** [all] Error 1 Note on the second line the /usr/bin/false. This is supposed to be a path to the To fix this you either need to: Add /usr/ccs/bin to your PATH. This is where the Install the . This package includes programs such as I have problems compiling Squid on Platform Foo.

Please check the on which Squid is known to compile. Your problem might be listed there together with a solution. If it isn't listed there, mail us what you are trying, your Squid version, and the problems you encounter. I see a lot warnings while compiling Squid.

Warnings are usually not a big concern, and can be common with software designed to operate on multiple platforms. If you feel like fixing compile-time warnings, please do so and send us the patches. Building Squid on OS/2

by

In order in compile squid, you need to have a reasonable facsimile of a Unix system installed. This includes I made a few modifications to the pristine EMX 0.9d install. added defines for changed all occurrences of time_t to signed long instead of unsigned long hacked ld.exe to search for both xxxx.a and libxxxx.a to produce the correct filename when using the -Zexe option

You will need to run scripts/convert.configure.to.os2 (in the Squid source distribution) to modify the configure script so that it can search for the various programs.

Next, you need to set a few environment variables (see EMX docs for meaning): export EMXOPT="-h256 -c" export LDFLAGS="-Zexe -Zbin -s"

Now you are ready to configure squid: ./configure

Compile everything: make

and finally, install: make install

This will by default, install into /usr/local/squid. If you wish to install somewhere else, see the Now, don't forget to set EMXOPT before running squid each time. I recommend using the -Y and -N options. Building Squid on Cygwin

In order to compile squid, you need to have Cygwin fully installed.

Unpack the source archive as usual and run configure: ./configure

Compile everything: make

and finally, install: make install

This will by default, install into /usr/local/squid. If you wish to install somewhere else, see the Now, add a new Cygwin user - see the Cygwin user guide - and map it to SYSTEM, or create a new NT user, and a matching Cygwin user and they become the squid runas users.

Read the squid FAQ on permissions if you are using CYGWIN=ntsec.

After run squid -z. If that succeeds, try squid -N -D -d1, squid should start. Check that there are no errors. If everything looks good, try browsing through squid.

Now, configure cygrunsrv to run squid as a service as the chosen usercode. You may need to check permissions here. Installing and Running Squid How big of a system do I need to run Squid?

There are no hard-and-fast rules. The most important resource for Squid is physical memory. Your processor does not need to be ultra-fast. Your disk system will be the major bottleneck, so fast disks are important for high-volume caches. Do not use IDE disks if you can help it.

In late 1998, if you are buying a new machine for a cache, I would recommend the following configuration: 300 MHz Pentium II CPU 512 MB RAM Five 9 GB UW-SCSI disks Your system disk, and logfile disk can probably be IDE without losing any cache performance.

Also, see by Martin Hamilton This is a very nice page summarizing system configurations people are using for large Squid caches. How do I install Squid?

After , you can install it with this simple command: % make install If you have enabled the then you will also want to type % su # make install-pinger

After installing, you will want to edit and customize the /usr/local/squid/etc/squid.conf.

Also, a QUICKSTART guide has been included with the source distribution. Please see the directory where you unpacked the source archive. What does the The Do you have a Yes, after you How do I start Squid?

First you need to make your Squid configuration. The Squid configuration can be found in /usr/local/squid/etc/squid.conf and by default includes documentation on all directives.

In the Suqid distribution there is a small QUICKSTART guide indicating which directives you need to look closer at and why. At a absolute minimum you need to change the http_access configuration to allow access from your clients.

To verify your configuration file you can use the -k parse option % /usr/local/squid/sbin/squid -k parse If this outputs any errors then these are syntax errors or other fatal misconfigurations and needs to be corrected before you continue. If it is silent and immediately gives back the command promt then your squid.conf is syntactically correct and could be understood by Squid.

After you've finished editing the configuration file, you can start Squid for the first time. The procedure depends a little bit on which version you are using.

First, you must create the swap directories. Do this by running Squid with the -z option: % /usr/local/squid/sbin/squid -z

NOTE: If you run Squid as root then you may need to first create /usr/local/squid/var/logs and your cache_dir directories and assign ownership of these to the cache_effective_user configured in your squid.conf.

Once the creation of the cache directories completes, you can start Squid and try it out. Probably the best thing to do is run it from your terminal and watch the debugging output. Use this command: % /usr/local/squid/sbin/squid -NCd1 If everything is working okay, you will see the line: Ready to serve requests. If you want to run squid in the background, as a daemon process, just leave off all options: % /usr/local/squid/sbin/squid

NOTE: depending on which http_port you select you may need to start squid as root (http_port <1024).

NOTE: In Squid-2.4 and earlier Squid was installed in bin by default, not sbin. How do I start Squid automatically when the system boots?

Squid-2 has a restart feature built in. This greatly simplifies starting Squid and means that you don't need to use /usr/local/squid/sbin/squid

Squid will automatically background itself and then spawn a child process. In your Sep 23 23:55:58 kitty squid[14616]: Squid Parent: child process 14617 started That means that process ID 14563 is the parent process which monitors the child process (pid 14617). The child process is the one that does all of the work. The parent process just waits for the child process to exit. If the child process exits unexpectedly, the parent will automatically start another child process. In that case, Sep 23 23:56:02 kitty squid[14616]: Squid Parent: child process 14617 exited with status 1 Sep 23 23:56:05 kitty squid[14616]: Squid Parent: child process 14619 started

If there is some problem, and Squid can not start, the parent process will give up after a while. Your Sep 23 23:56:12 kitty squid[14616]: Exiting due to repeated, frequent failures When this happens you should check your When you look at a process ( 24353 ?? Ss 0:00.00 /usr/local/squid/bin/squid 24354 ?? R 0:03.39 (squid) (squid) The first is the parent process, and the child process is the one called ``(squid)''. Note that if you accidentally kill the parent process, the child process will not notice.

If you want to run Squid from your termainal and prevent it from backgrounding and spawning a child process, use the /usr/local/squid/bin/squid -N From inittab

On systems which have an /etc/inittab file (Digital Unix, Solaris, IRIX, HP-UX, Linux), you can add a line like this: sq:3:respawn:/usr/local/squid/sbin/squid.sh < /dev/null >> /tmp/squid.log 2>&1 We recommend using a #!/bin/sh C=/usr/local/squid PATH=/usr/bin:$C/bin TZ=PST8PDT export PATH TZ # User to notify on restarts notify="root" # Squid command line options opts="" cd $C umask 022 sleep 10 while [ -f /var/run/nosquid ]; do sleep 1 done /usr/bin/tail -20 $C/logs/cache.log \ | Mail -s "Squid restart on `hostname` at `date`" $notify exec bin/squid -N $opts From rc.local

On BSD-ish systems, you will need to start Squid from the ``rc'' files, usually /etc/rc.local. For example: if [ -f /usr/local/squid/sbin/squid ]; then echo -n ' Squid' /usr/local/squid/sbin/squid fi From init.d

Squid ships with a init.d type startup script in contrib/squid.rc which works on most init.d type systems. Or you can write your own using any normal init.d script found in your system as template and add the start/stop fragments shown below.

Start: /usr/local/squid/sbin/squid

Stop: /usr/local/squid/sbin/squid -k shutdown n=120 while /usr/local/squid/sbin/squid -k check && [ $n -gt 120 ]; do sleep 1 echo -n . n=`expr $n - 1` done How do I tell if Squid is running?

You can use the % squidclient http://www.netscape.com/ > test

There are other command-line HTTP client programs available as well. Two that you may find useful are and .

Another way is to use Squid itself to see if it can signal a running Squid process: % squid -k check And then check the shell's exit status variable.

Also, check the log files, most importantly the These are the command line options for How do I see how Squid works?

Check the Install and use the . Can Squid benefit from SMP systems?

Squid is a single process application and can not make use of SMP. If you want to make Squid benefit from a SMP system you will need to run multiple instances of Squid and find a way to distribute your users on the different Squid instances just as if you had multiple Squid boxes.

Having two CPUs is indeed nice for running other CPU intensive tasks on the same server as the proxy, such as if you have a lot of logs and need to run various statistics collections during peak hours.

The authentication and group helpers barely use any CPU and does not benefit from dual-CPU configuration. Is it okay to use separate drives and RAID on Squid?

RAID1 is fine, and so are separate drives.

RAID0 (striping) with Squid only gives you the drawback that if you lose one of the drives the whole stripe set is lost. There is no benefit in performance as Squid already distributes the load on the drives quite nicely.

Squid is the worst case application for RAID5, whether hardware or software, and will absolutely kill the performance of a RAID5. Once the cache has been filled Squid uses a lot of small random writes which the worst case workload for RAID5, effectively reducing write speed to only little more than that of one single drive.

Generally seek time is what you want to optimize for Squid, or more precisely the total amount of seeks/s your system can sustain. Choosing the right RAID solution generally decreases the amount of seeks/s your system can sustain significantly. Configuration issues How do I join a cache hierarchy?

To place your cache in a hierarchy, use the For example, the following # squid.conf - On the host: childcache.example.com # # Format is: hostname type http_port udp_port # cache_peer parentcache.example.com parent 3128 3130 cache_peer childcache2.example.com sibling 3128 3130 cache_peer childcache3.example.com sibling 3128 3130 The # squid.conf - On the host: sv.cache.nlanr.net # # Format is: hostname type http_port udp_port # cache_peer electraglide.geog.unsw.edu.au parent 3128 3130 cache_peer cache1.nzgate.net.nz parent 3128 3130 cache_peer pb.cache.nlanr.net parent 3128 3130 cache_peer it.cache.nlanr.net parent 3128 3130 cache_peer sd.cache.nlanr.net parent 3128 3130 cache_peer uc.cache.nlanr.net sibling 3128 3130 cache_peer bo.cache.nlanr.net sibling 3128 3130 cache_peer_domain electraglide.geog.unsw.edu.au .au cache_peer_domain cache1.nzgate.net.nz .au .aq .fj .nz cache_peer_domain pb.cache.nlanr.net .uk .de .fr .no .se .it cache_peer_domain it.cache.nlanr.net .uk .de .fr .no .se .it cache_peer_domain sd.cache.nlanr.net .mx .za .mu .zm The configuration above indicates that the cache will use How do I join NLANR's cache hierarchy?

We have a simple set of the NLANR cache hierarchy. Why should I want to join NLANR's cache hierarchy?

The NLANR hierarchy can provide you with an initial source for parent or sibling caches. Joining the NLANR global cache system will frequently improve the performance of your caching service. How do I register my cache with NLANR's registration service?

Just enable these options in your cache_announce 24 announce_to sd.cache.nlanr.net:3131 How do I find other caches close to me and arrange parent/child/sibling relationships with them?

Visit the NLANR cache to discover other caches near you. Keep in mind that just because a cache is registered in the database My cache registration is not appearing in the Tracker database.

Your site will not be listed if your cache IP address does not have a DNS PTR record. If we can't map the IP address back to a domain name, it will be listed as ``Unknown.'' The registration messages are sent with UDP. We may not be receiving your announcement message due to firewalls which block UDP, or dropped packets due to congestion. What is the httpd-accelerator mode?

This entry has been moved to . How do I configure Squid to work behind a firewall?

Note: The information here is current for version 2.2.

If you are behind a firewall then you can't make direct connections to the outside world, so you You can use the acl INSIDE dstdomain .mydomain.com always_direct allow INSIDE never_direct allow all

You could also specify internal servers by IP address acl INSIDE_IP dst 1.2.3.0/24 always_direct allow INSIDE_IP never_direct allow all Note, however that when you use IP addresses, Squid must perform a DNS lookup to convert URL hostnames to an address. Your internal DNS servers may not be able to lookup external domains.

If you use cache_peer xyz.mydomain.com parent 3128 0 default How do I configure Squid forward all requests to another proxy?

Note: The information here is current for version 2.2.

First, you need to give Squid a parent cache. Second, you need to tell Squid it can not connect directly to origin servers. This is done with three configuration file lines: cache_peer parentcache.foo.com parent 3128 0 no-query default acl all src 0.0.0.0/0.0.0.0 never_direct allow all Note, with this configuration, if the parent cache fails or becomes unreachable, then every request will result in an error message.

In case you want to be able to use direct connections when all the parents go down you should use a different approach: cache_peer parentcache.foo.com parent 3128 0 no-query prefer_direct off The default behaviour of Squid in the absence of positive ICP, HTCP, etc replies is to connect to the origin server instead of using parents. The prefer_direct off directive tells Squid to try parents first. I have The It's very important that there are enough My First, find out if you have enough Another factor which affects the How can I easily change the default HTTP port?

Before you run the configure script, simply set the setenv CACHE_HTTP_PORT 8080 ./configure make make install Is it possible to control how big each With Squid-1.1 it is NOT possible. Each What Most people have a disk partition dedicated to the Squid cache. You don't want to use the entire partition size. You have to leave some extra room. Currently, Squid is not very tolerant of running out of disk space.

Lets say you have a 9GB disk. Remember that disk manufacturers lie about the space available. A so-called 9GB disk usually results in about 8.5GB of raw, usable space. First, put a filesystem on it, and mount it. Then check the ``available space'' with your Next, I suggest taking off another 10% or so for Squid overheads, and a "safe buffer." Squid normally puts its cache_dir ... 7000 16 256

Its better to start out conservative. After the cache becomes full, look at the disk usage. If you think there is plenty of unused space, then increase the If you're getting ``disk full'' write errors, then you definately need to decrease your cache size. I'm adding a new With Squid-1.1, yes, you will lose your cache. This is because version 1.1 uses a simplistic algorithm to distribute files between cache directories.

With Squid-2, you will not lose your existing cache. You can add and delete Squid and Several people on both the . The most elegant way in my opinion is to run an internal Squid caching proxyserver which handles client requests and let this server forward it's requests to the http-gw running on the firewall. Cache hits won't need to be handled by the firewall.

In this example Squid runs on the same server as the http-gw, Squid uses 8000 and http-gw uses 8080 (web). The local domain is Firewall configuration:

Either run http-gw as a daemon from the /etc/rc.d/rc.local (Linux Slackware): exec /usr/local/fwtk/http-gw -daemon 8080 or run it from inetd like this: web stream tcp nowait.100 root /usr/local/fwtk/http-gw http-gw I increased the watermark to 100 because a lot of people run into problems with the default value.

Make sure you have at least the following line in /usr/local/etc/netperm-table: http-gw: hosts 127.0.0.1 You could add the IP-address of your own workstation to this rule and make sure the http-gw by itself works, like: http-gw: hosts 127.0.0.1 10.0.0.1 Squid configuration:

The following settings are important: http_port 8000 icp_port 0 cache_peer localhost.home.nl parent 8080 0 default acl HOME dstdomain .home.nl alwayws_direct allow HOME never_direct allow all This tells Squid to use the parent for all domains other than 872739961.631 1566 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/ - DEFAULT_PARENT/localhost.home.nl - 872739962.976 1266 10.0.0.21 TCP_CLIENT_REFRESH/304 88 GET http://www.nlanr.net/Images/cache_now.gif - DEFAULT_PARENT/localhost.home.nl - 872739963.007 1299 10.0.0.21 ERR_CLIENT_ABORT/304 83 GET http://www.squid-cache.org/Icons/squidnow.gif - DEFAULT_PARENT/localhost.home.nl - 872739963.061 1354 10.0.0.21 TCP_CLIENT_REFRESH/304 83 GET http://www.squid-cache.org/Icons/Squidlogo2.gif - DEFAULT_PARENT/localhost.home.nl

http-gw entries in syslog: Aug 28 02:46:00 memo http-gw[2052]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:00 memo http-gw[2052]: log host=localhost/127.0.0.1 protocol=HTTP cmd=dir dest=www.squid-cache.org path=/ Aug 28 02:46:01 memo http-gw[2052]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1 Aug 28 02:46:01 memo http-gw[2053]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2053]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/Squidlogo2.gif Aug 28 02:46:01 memo http-gw[2054]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2054]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.squid-cache.org path=/Icons/squidnow.gif Aug 28 02:46:01 memo http-gw[2055]: permit host=localhost/127.0.0.1 use of gateway (V2.0beta) Aug 28 02:46:01 memo http-gw[2055]: log host=localhost/127.0.0.1 protocol=HTTP cmd=get dest=www.nlanr.net path=/Images/cache_now.gif Aug 28 02:46:02 memo http-gw[2055]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=1 Aug 28 02:46:03 memo http-gw[2053]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=2 Aug 28 02:46:04 memo http-gw[2054]: exit host=localhost/127.0.0.1 cmds=1 in=0 out=0 user=unauth duration=3

To summarize:

Advantages: http-gw allows you to selectively block ActiveX and Java, and it's primary design goal is security. The firewall doesn't need to run large applications like Squid. The internal Squid-server still gives you the benefit of caching.

Disadvantages: The internal Squid proxyserver can't (and shouldn't) work with other parent or neighbor caches. Initial requests are slower because these go through http-gw, http-gw also does reverse lookups. Run a nameserver on the firewall or use an internal nameserver. -- What is ``HTTP_X_FORWARDED_FOR''? Why does squid provide it to WWW servers, and how can I stop it?

When a proxy-cache is used, a server does not see the connection coming from the originating client. Many people like to implement access controls based on the client address. To accommodate these people, Squid adds its own request header called "X-Forwarded-For" which looks like this: X-Forwarded-For: 128.138.243.150, unknown, 192.52.106.30 Entries are always IP addresses, or the word We must note that access controls based on this header are extremely weak and simple to fake. Anyone may hand-enter a request with any IP address whatsoever. This is perhaps the reason why client IP addresses have been omitted from the HTTP/1.1 specification.

Because of the weakness of this header, support for access controls based on X-Forwarder-For is not yet available in any officially released version of squid. However, unofficial patches are available from the Squid development project and may be integrated into later versions of Squid once a suitable trust model have been developed. Can Squid anonymize HTTP requests?

Yes it can, however the way of doing it has changed from earlier versions of squid. As of squid-2.2 a more customisable method has been introduced. Please follow the instructions for the version of squid that you are using. As a default, no anonymizing is done.

If you choose to use the anonymizer you might wish to investigate the forwarded_for option to prevent the client address being disclosed. Failure to turn off the forwarded_for option will reduce the effectiveness of the anonymizer. Finally if you filter the User-Agent header using the fake_user_agent option can prevent some user problems as some sites require the User-Agent header. Squid 2.2

With the introduction of squid 2.2 the anonoymizer has become more customisable. It now allows specification of exactly which headers will be allowed to pass. This is further extended in Squid-2.5 to allow headers to be anonymized conditionally.

For details see the documentation of the http_header_access and header_replace directives in squid.conf.default.

References: Can I make Squid go direct for some sites?

Sure, just use the For example, if you want Squid to connect directly to acl hotmail dstdomain .hotmail.com always_direct allow hotmail Can I make Squid proxy only, without caching anything?

Sure, there are few things you can do.

You can use the acl all src 0/0 no_cache deny all

With Squid-2.4 and later you can use the ``null'' storage module to avoid having a cache directory: cache_dir null /tmp

Note: a null cache_dir does not disable caching, but it does save you from creating a cache structure if you have disabled caching with no_cache.

Note: the directory (e.g., /tmp) must exist so that squid can chdir to it, unless you also use the To configure Squid for the ``null'' storage module, specify it on the ./configure --enable-storeio=ufs,null ... Can I prevent users from downloading large files?

You can set the global If the HTTP response coming from the server has a Some responses don't have Note that ``creative'' user-agents will still be able to download really large files through the cache using HTTP/1.1 range requests. Communication between browsers and Squid

Most web browsers available today support proxying and are easily configured to use a Squid server as a proxy. Some browsers support advanced features such as lists of domains or URL patterns that shouldn't be fetched through the proxy, or JavaScript automatic proxy configuration. Netscape manual configuration

Select Here is a of the Netscape Navigator manual proxy configuration screen.

Netscape automatic configuration

Netscape Navigator's proxy configuration can be automated with JavaScript (for Navigator versions 2.0 or higher). Select Here is a of the Netscape Navigator automatic proxy configuration screen. You may also wish to consult Netscape's documentation for the Navigator

Here is a sample auto configuration JavaScript from Oskar Pearson: //We (www.is.co.za) run a central cache for our customers that they //access through a firewall - thus if they want to connect to their intranet //system (or anything in their domain at all) they have to connect //directly - hence all the "fiddling" to see if they are trying to connect //to their local domain. //Replace each occurrence of company.com with your domain name //and if you have some kind of intranet system, make sure //that you put it's name in place of "internal" below. //We also assume that your cache is called "cache.company.com", and //that it runs on port 8080. Change it down at the bottom. //(C) Oskar Pearson and the Internet Solution (http://www.is.co.za) function FindProxyForURL(url, host) { //If they have only specified a hostname, go directly. if (isPlainHostName(host)) return "DIRECT"; //These connect directly if the machine they are trying to //connect to starts with "intranet" - ie http://intranet //Connect directly if it is intranet.* //If you have another machine that you want them to //access directly, replace "internal*" with that //machine's name if (shExpMatch( host, "intranet*")|| shExpMatch(host, "internal*")) return "DIRECT"; //Connect directly to our domains (NB for Important News) if (dnsDomainIs( host,"company.com")|| //If you have another domain that you wish to connect to //directly, put it in here dnsDomainIs(host,"sistercompany.com")) return "DIRECT"; //So the error message "no such host" will appear through the //normal Netscape box - less support queries :) if (!isResolvable(host)) return "DIRECT"; //We only cache http, ftp and gopher if (url.substring(0, 5) == "http:" || url.substring(0, 4) == "ftp:"|| url.substring(0, 7) == "gopher:") //Change the ":8080" to the port that your cache //runs on, and "cache.company.com" to the machine that //you run the cache on return "PROXY cache.company.com:8080; DIRECT"; //We don't cache WAIS if (url.substring(0, 5) == "wais:") return "DIRECT"; else return "DIRECT"; } Lynx and Mosaic configuration

For Mosaic and Lynx, you can set environment variables before starting the application. For example (assuming csh or tcsh):

% setenv http_proxy http://mycache.example.com:3128/ % setenv gopher_proxy http://mycache.example.com:3128/ % setenv ftp_proxy http://mycache.example.com:3128/

For Lynx you can also edit the http_proxy:http://mycache.example.com:3128/ ftp_proxy:http://mycache.example.com:3128/ gopher_proxy:http://mycache.example.com:3128/ Redundant Proxy Auto-Configuration

There's one nasty side-effect to using auto-proxy scripts: if you start the web browser it will try and load the auto-proxy-script.

If your script isn't available either because the web server hosting the script is down or your workstation can't reach the web server (e.g. because you're working off-line with your notebook and just want to read a previously saved HTML-file) you'll get different errors depending on the browser you use.

The Netscape browser will just return an error after a timeout (after that it tries to find the site 'www.proxy.com' if the script you use is called 'proxy.pac').

The Microsoft Internet Explorer on the other hand won't even start, no window displays, only after about 1 minute it'll display a window asking you to go on with/without proxy configuration.

The point is that your workstations always need to locate the proxy-script. I created some extra redundancy by hosting the script on two web servers (actually Apache web servers on the proxy servers themselves) and adding the following records to my primary nameserver: proxy IN A 10.0.0.1 ; IP address of proxy1 IN A 10.0.0.2 ; IP address of proxy2 The clients just refer to 'http://proxy/proxy.pac'. This script looks like this: function FindProxyForURL(url,host) { // Hostname without domainname or host within our own domain? // Try them directly: // http://www.domain.com actually lives before the firewall, so // make an exception: if ((isPlainHostName(host)||dnsDomainIs( host,".domain.com")) && !localHostOrDomainIs(host, "www.domain.com")) return "DIRECT"; // First try proxy1 then proxy2. One server mostly caches '.com' // to make sure both servers are not // caching the same data in the normal situation. The other // server caches the other domains normally. // If one of 'm is down the client will try the other server. else if (shExpMatch(host, "*.com")) return "PROXY proxy1.domain.com:8080; PROXY proxy2.domain.com:8081; DIRECT"; return "PROXY proxy2.domain.com:8081; PROXY proxy1.domain.com:8080; DIRECT"; }

I made sure every client domain has the appropriate 'proxy' entry. The clients are automatically configured with two nameservers using DHCP. -- Proxy Auto-Configuration with URL Hashing

The contains a lot of good information about hash-based proxy auto-configuration scripts. With these you can distribute the load between a number of caching proxies. Microsoft Internet Explorer configuration

Select Here is a of the Internet Explorer proxy configuration screen.

Microsoft is also starting to support Netscape-style JavaScript automated proxy configuration. As of now, only MSIE version 3.0a for Windows 3.1 and Windows NT 3.51 supports this feature (i.e., as of version 3.01 build 1225 for Windows 95 and NT 4.0, the feature was not included).

If you have a version of MSIE that does have this feature, elect Netmanage Internet Chameleon WebSurfer configuration

Netmanage WebSurfer supports manual proxy configuration and exclusion lists for hosts or domains that should not be fetched via proxy (this information is current as of WebSurfer 5.0). Select Take a look at this if the instructions confused you.

On the same configuration window, you'll find a button to bring up the exclusion list dialog box, which will let you enter some hosts or domains that you don't want fetched via proxy. It should be self-explanatory, but you might look at this just for fun anyway. Opera 2.12 proxy configuration

Select Notes: Opera 2.12 doesn't support gopher on its own, but requires a proxy; therefore Squid's gopher proxying can extend the utility of your Opera immensely. Unfortunately, Opera 2.12 chokes on some HTTP requests, for example . At the moment I think it has something to do with cookies. If you have trouble with a site, try disabling the HTTP proxying by unchecking that protocol in the -- How do I tell Squid to use a specific username for FTP urls?

Insert your username in the host part of the URL, for example: ftp://joecool@ftp.foo.org/ Squid should then prompt you for your account password. Alternatively, you can specify both your username and password in the URL itself: ftp://joecool:secret@ftp.foo.org/ However, we certainly do not recommend this, as it could be very easy for someone to see or grab your password. Configuring Browsers for WPAD

by

You may like to start by reading the that describes WPAD.

After reading the 8 steps below, if you don't understand any of the terms or methods mentioned, you probably shouldn't be doing this. Implementing wpad requires you to web server installations and modifications. squid proxy server (or others) installation etc. Domain Name System maintenance etc. Please don't bombard the squid list with web server or dns questions. See your system administrator, or do some more research on those topics.

This is not a recommendation for any product or version. As far as I know IE5 is the only browser out now implementing wpad. I think wpad is an excellent feature that will return several hours of life per month. Hopefully, all browser clients will implement it as well. But it will take years for all the older browsers to fade away though.

I have only focused on the domain name method, to the exclusion of the DHCP method. I think the dns method might be easier for most people. I don't currently, and may never, fully understand wpad and IE5, but this method worked for me. It But if you'd rather just have a go ... Create a standard . The sample provided there is more than adequate to get you going. No doubt all the other load balancing and backup scripts will be fine also. Store the resultant file in the document root directory of a handy web server as notes that you should be able to use an HTTP redirect if you want to store the wpad.dat file somewhere else. You can probably even redirect Redirect /wpad.dat http://racoon.riga.lv/proxy.pac If you do nothing more, a url like http://www.your.domain.name/wpad.dat should bring up the script text in your browser window. Insert the following entry into your web server application/x-ns-proxy-autoconfig dat And then restart your web server, for new mime type to work. Assuming Internet Explorer 5, under http://www.your.domain.name/wpad.dat Test that that all works as per your script and network. There's no point continuing until this works ... Create/install/implement a DNS record so that wpad.your.domain.name resolves to the host above where you have a functioning auto config script running. You should now be able to use http://wpad.your.domain.name/wpad.dat as the Auto Config Script location in step 5 above. And finally, go back to the setup screen detailed in 5 above, and choose nothing but the One final question might be 'Which domain name does the client (IE5) use for the wpad... lookup?' It uses the hostname from the control panel setting. It starts the search by adding the hostname "WPAD" to current fully-qualified domain name. For instance, a client in a.b.Microsoft.com would search for a WPAD server at wpad.a.b.microsoft.com. If it could not locate one, it would remove the bottom-most domain and try again; for instance, it would try wpad.b.microsoft.com next. IE 5 would stop searching when it found a WPAD server or reached the third-level domain, wpad.microsoft.com.

Anybody using these steps to install and test, please feel free to make notes, corrections or additions for improvements, and post back to the squid list...

There are probably many more tricks and tips which hopefully will be detailed here in the future. Things like Configuring Browsers for WPAD with DHCP

You can also use DHCP to configure browsers for WPAD. This technique allows you to set any URL as the PAC URL. For ISC DHCPD, enter a line like this in your option wpad code 252 = text; option wpad "http://www.example.com/proxy.pac";

Replace the hostname with the name or address of your own server.

Ilja Pavkovic notes that the DHCP mode does not work reliably with every version of Internet Explorer. The DNS name method to find wpad.dat is more reliable.

Another user adds that IE 6.01 seems to strip the last character from the URL. By adding a trailing newline, he is able to make it work with both IE 5.0 and 6.0:< option wpad "http://www.example.com/proxy.pac\n"; IE 5.0x crops trailing slashes from FTP URL's

by

There was a bug in the 5.0x releases of Internet Explorer in which IE cropped any trailing slash off an FTP URL. The URL showed up correctly in the browser's ``Address:'' field, however squid logs show that the trailing slash was being taken off.

An example of where this impacted squid if you had a setup where squid would go direct for FTP directory listings but forward a request to a parent for FTP file transfers. This was useful if your upstream proxy was an older version of Squid or another vendors software which displayed directory listings with broken icons and you wanted your own local version of squid to generate proper FTP directory listings instead. The workaround for this is to add a double slash to any directory listing in which the slash was important, or else upgrade to IE 5.5. (Or use Netscape) IE 6.0 SP1 fails when using authentication

When using authentication with Internet Explorer 6 SP1, you may encounter issues when you first launch Internet Explorer. The problem will show itself when you first authenticate, you will receive a "Page Cannot Be Displayed" error. However, if you click refresh, the page will be correctly displayed.

This only happens immediately after you authenticate.

This is not a Squid error or bug. Microsoft broke the Basic Authentication when they put out IE6 SP1.

There is a knowledgebase article () regarding this issue, which contains a link to a downloadable "hot fix." They do warn that this code is not "regression tested" but so far there have not been any reports of this breaking anything else. The problematic file is wininet.dll. Please note that this hotfix is included in the latest security update.

Lloyd Parkes notes that the article references another article, . He says that you must According to Joao Coutinho, this simple solution also corrects the problem: Go to Tools/Internet Go to Options/Advanced UNSELECT "Show friendly HTTP error messages" under Browsing.

Another possible workaround to these problems is to make the ERR_CACHE_ACCESS_DENIED larger than 1460 bytes. This should trigger IE to handle the authentication in a slightly different manner. Squid Log Files

The logs are a valuable source of information about Squid workloads and performance. The logs record not only access information, but also system configuration errors and resource consumption (eg, memory, disk space). There are several log file maintained by Squid. Some have to be explicitely activated during compile time, others can safely be deactivated during run-time.

There are a few basic points common to all log files. The time stamps logged into the log files are usually UTC seconds unless stated otherwise. The initial time stamp usually contains a millisecond extension. If you run your Squid from the The From the area of automatic log file analysis, the The user agent log file is only maintained, if you configured the compile time you pointed the

From the user agent log file you are able to find out about distributation of browsers of your clients. Using this option in conjunction with a loaded production squid might not be the best of all ideas. The The The print format for a store log entry (one line) consists of eleven space-separated columns, compare with the src/store_log.c: "%9d.%03d %-7s %02d %08X %4d %9d %9d %9d %s %d/%d %s %s\n" The timestamp when the line was logged in UTC with a millisecond fraction. The action the object was sumitted to, compare with src/store_log.c: ).

The cache_dir number this object was stored into, starting at 0 for your first cache_dir line.

The file number for the object storage file. Please note that the path to this file is calculated according to your A file number of The HTTP reply status code.

The value of the HTTP "Date: " reply header.

The value of the HTTP "Last-Modified: " reply header.

The value of the HTTP "Expires: " reply header. The HTTP "Content-Type" major value, or "unknown" if it cannot be determined. This column consists of two slash separated fields: The advertised content length from the HTTP "Content-Length: " reply header. The size actually read.

If the advertised (or expected) length is missing, it will be set to zero. If the advertised length is not zero, but not equal to the real length, the object will be realeased from the cache. The request method for the object, e.g.

The key to the object, usually the URL.

The timestamp format for the columns to are all expressed in UTC seconds. The actual values are parsed from the HTTP reply headers. An unparsable header is represented by a value of -1, and a missing header is represented by a value of -2.

The column usually contains just the URL of the object. Some objects though will never become public. Thus the key is said to include a unique integer number and the request method in addition to the URL. This logfile exists for Squid-1.0 only. The format is [date] URL peerstatus peerhost Most log file analysis program are based on the entries in The common log file format contains other information than the native log file, and less. The native format contains more information for the admin interested in cache evaluation. The is used by numerous HTTP servers. This format consists of the following seven fields: remotehost rfc931 authuser [date] "method URL" status bytes

It is parsable by a variety of tools. The common format contains different information than the native log file format. The HTTP version is logged, which is not logged in native log file format. The native format is different for different major versions of Squid. For Squid-1.0 it is: time elapsed remotehost code/status/peerstatus bytes method URL

For Squid-1.1, the information from the time elapsed remotehost code/status bytes method URL rfc931 peerstatus/peerhost type

For Squid-2 the columns stay the same, though the content within may change a little.

The native log file format logs more and different information than the common log file format: the request duration, some timeout information, the next upstream server address, and the content type. There exist tools, which convert one file format into the other. Please mind that even though the log formats share most information, both formats contain information which is not part of the other format, and thus this part of the information is lost when converting. Especially converting back and forth is not possible without loss. It is recommended though to use Squid's native log format due to its greater amount of information made available for later analysis. The print format line for native "%9d.%03d %6d %s %s/%03d %d %s %s %s %s%s/%s %s"

Therefore, an A Unix timestamp as UTC seconds with a millisecond resolution. You can convert Unix timestamps into something more human readable using this short perl script: #! /usr/bin/perl -p s/^\d+\.\d+/localtime $&/e; The elapsed time considers how many milliseconds the transaction busied the cache. It differs in interpretation between TCP and UDP:

For HTTP/1.0, this is basically the time between For persistent connections, this ought to be the time between scheduling the reply and finishing sending it. For ICP, this is the time between scheduling a reply and actually sending it.

Please note that the entries are logged The IP address of the requesting instance, the client IP address. The Also, the

This column is made up of two entries separated by a slash. This column encodes the transaction result: The cache result of the request contains information on the kind of request, how it was satisfied, or in what way it failed. Please refer to section for valid symbolic result codes.

Several codes from older versions are no longer available, were renamed, or split. Especially the for details on the codes no longer available in Squid-2.

The NOVM versions and Squid-2 also rely on the Unix buffer cache, thus you will see less The status part contains the HTTP result codes with some Squid specific extensions. Squid uses a subset of the RFC defined error codes for HTTP. Refer to section for details of the status codes recognized by a Squid-2. The size is the amount of data delivered to the client. Mind that this does not constitute the net object size, as headers are also counted. Also, failed requests may deliver an error page, the size of which is also logged here. The request method to obtain an object. Please refer to section for available methods. If you turned off This column contains the URL requested. Please note that the log file may contain whitespaces for the URI. The default configuration for The eigth column may contain the ident lookups for the requesting client. Since ident lookups have performance impact, the default configuration turns The hierarchy information consists of three items:

Any hierarchy tag may be prefixed with A code that explains how the request was handled, e.g. by forwarding it to a peer, or going straight to the source. Refer to section for details on hierarchy codes and removed hierarchy codes. The IP address or hostname where the request (if a miss) was forwarded. For requests sent to origin servers, this is the origin server's IP address. For requests sent to a neighbor cache, this is the neighbor's hostname. NOTE: older versions of Squid would put the origin server hostname here. The content type of the object as seen in the HTTP reply header. Please note that ICP exchanges usually don't have any content type, and thus are logged ``-''. Also, some weird replies have content types ``:'' or even empty ones.

There may be two more columns in the Squid result codes

The The following result codes were taken from a Squid-2, compare with the src/access_log.c: The client issued a "no-cache" pragma, or some analogous cache control command along with the request. Thus, the cache has to refetch the object. The client issued an IMS request for an object which was in the cache and fresh. The object was believed to be in the cache, but could not be accessed. During "-Y" startup, or during frequent failures, a cache in hit only mode will return either UDP_HIT or this code. Neighbours will thus only fetch hits.

The following codes are no longer available in Squid-2: . . used instead. . HTTP status codes

These are taken from and verified for Squid. Squid-2 uses almost all codes except 307 (Temporary Redirect), 416 (Request Range Not Satisfiable), and 417 (Expectation Failed). Extra codes include 0 for a result code being unavailable, and 600 to signal an invalid header, a proxy error. Also, some definitions were added as for (WebDAV). Yes, there are really two entries for status code 424, compare with src/enums.h: 000 Used mostly with UDP traffic. 100 Continue 101 Switching Protocols *102 Processing 200 OK 201 Created 202 Accepted 203 Non-Authoritative Information 204 No Content 205 Reset Content 206 Partial Content *207 Multi Status 300 Multiple Choices 301 Moved Permanently 302 Moved Temporarily 303 See Other 304 Not Modified 305 Use Proxy [307 Temporary Redirect] 400 Bad Request 401 Unauthorized 402 Payment Required 403 Forbidden 404 Not Found 405 Method Not Allowed 406 Not Acceptable 407 Proxy Authentication Required 408 Request Timeout 409 Conflict 410 Gone 411 Length Required 412 Precondition Failed 413 Request Entity Too Large 414 Request URI Too Large 415 Unsupported Media Type [416 Request Range Not Satisfiable] [417 Expectation Failed] *424 Locked *424 Failed Dependency *433 Unprocessable Entity 500 Internal Server Error 501 Not Implemented 502 Bad Gateway 503 Service Unavailable 504 Gateway Timeout 505 HTTP Version Not Supported *507 Insufficient Storage 600 Squid header parsing error Request methods

Squid recognizes several request methods as defined in . Newer versions of Squid (2.2.STABLE5 and above) also recognize ``HTTP Extensions for Distributed Authoring -- WEBDAV'' extensions. method defined cachabil. meaning --------- ---------- ---------- ------------------------------------------- GET HTTP/0.9 possibly object retrieval and simple searches. HEAD HTTP/1.0 possibly metadata retrieval. POST HTTP/1.0 CC or Exp. submit data (to a program). PUT HTTP/1.1 never upload data (e.g. to a file). DELETE HTTP/1.1 never remove resource (e.g. file). TRACE HTTP/1.1 never appl. layer trace of request route. OPTIONS HTTP/1.1 never request available comm. options. CONNECT HTTP/1.1r3 never tunnel SSL connection. ICP_QUERY Squid never used for ICP based exchanges. PURGE Squid never remove object from cache. PROPFIND rfc2518 ? retrieve properties of an object. PROPATCH rfc2518 ? change properties of an object. MKCOL rfc2518 never create a new collection. COPY rfc2518 never create a duplicate of src in dst. MOVE rfc2518 never atomically move src to dst. LOCK rfc2518 never lock an object against modifications. UNLOCK rfc2518 never unlock an object. Hierarchy Codes

The following hierarchy codes are used with Squid-2: src/peer_select.c:hier_strings[]. src/peer_select.c:hier_strings[].

Almost any of these may be preceded by 'TIMEOUT_' if the two-second (default) timeout occurs waiting for all ICP replies to arrive from neighbors, see also the The following hierarchy codes were removed from Squid-2: code meaning -------------------- ------------------------------------------------- PARENT_UDP_HIT_OBJ hit objects are not longer available. SIBLING_UDP_HIT_OBJ hit objects are not longer available. SSL_PARENT_MISS SSL can now be handled by squid. FIREWALL_IP_DIRECT No special logging for hosts inside the firewall. LOCAL_IP_DIRECT No special logging for local networks. cache/log (Squid-1.x)

This file has a rather unfortunate name. It also is often called the % squid -k shutdown This will disrupt service, but at least you will have your swap log back. Alternatively, you can tell squid to rotate its log files. This also causes a clean swap log to be written. % squid -k rotate

For Squid-1.1, there are six fields: swap.state (Squid-2.x)

In Squid-2, the swap log file is now called for information on the contents and format of that file.

If you remove % squid -k rotate Alternatively, you can tell Squid to shutdown and it will rewrite this file before it exits.

If you remove the By default the Which log files can I delete safely?

You should never delete If you accidentally delete The correct way to maintain your log files is with Squid's ``rotate'' feature. You should rotate your log files at least once per day. The current log files are closed and then renamed with numeric extensions (.0, .1, etc). If you want to, you can write your own scripts to archive or remove the old log files. If not, Squid will only keep up to If you set To rotate Squid's logs, simple use this command: squid -k rotate For example, use this cron entry to rotate the logs at midnight: 0 0 * * * /usr/local/squid/bin/squid -k rotate How can I disable Squid's log files?

To disable cache_access_log /dev/null

To disable cache_store_log none

To disable cache_log /dev/null

To disable cache_access_log none

To disable cache_store_log none

To disable cache_log /dev/null

My log files get very big!

You need to 0 0 * * * /usr/local/squid/bin/squid -k rotate I want to use another tool to maintain the log files.

If you set Managing log files

The preferred log file for analysis is the Depending on the disk space allocated for log file storage, it is recommended to set up a cron job which rotates the log files every 24, 12, or 8 hour. You will need to set your Before transport, the log files can be compressed during off-peak time. On the analysis host, the log file are concatinated into one file, so one file for 24 hours is the yield. Also note that with The EU project developed some to obey when handling and processing log files: Respect the privacy of your clients when publishing results. Keep logs unavailable unless anonymized. Most countries have laws on privacy protection, and some even on how long you are legally allowed to keep certain kinds of information. Rotate and process log files at least once a day. Even if you don't process the log files, they will grow quite large, see section . If you rely on processing the log files, reserve a large enough partition solely for log files. Keep the size in mind when processing. It might take longer to process log files than to generate them! Limit yourself to the numbers you are interested in. There is data beyond your dreams available in your log file, some quite obvious, others by combination of different views. Here are some examples for figures to watch: The hosts using your cache. The elapsed time for HTTP requests - this is the latency the user sees. Usually, you will want to make a distinction for HITs and MISSes and overall times. Also, medians are preferred over averages. The requests handled per interval (e.g. second, minute or hour). Why do I get ERR_NO_CLIENTS_BIG_OBJ messages so often?

This message means that the requested object was in ``Delete Behind'' mode and the user aborted the transfer. An object will go into ``Delete Behind'' mode if It is larger than It is being fetched from a neighbor which has the What does ERR_LIFETIME_EXP mean?

This means that a timeout occurred while the object was being transferred. Most likely the retrieval of this object was very slow (or it stalled before finishing) and the user aborted the request. However, depending on your settings for Retrieving ``lost'' files from the cache

I've been asked to retrieve an object which was accidentally destroyed at the source for recovery. So, how do I figure out where the things are so I can copy them out and strip off the headers?

The following method applies only to the Squid-1.1 versions:

Use grep to find the named object (Url) in the file. The first field in this file is an integer Then, find the file perl fileno-to-pathname.pl [-c squid.conf] file numbers are read on stdin, and pathnames are printed on stdout. Can I use Sort of. You can use cached.

Cached responses are logged with the SWAPOUT tag. Uncached responses are logged with the RELEASE tag.

However, your analysis must also consider that when a cached response is removed from the cache (for example due to cache replacement) it is also logged in Operational issues How do I see system level Squid statistics?

The Squid distribution includes a CGI utility called How can I find the biggest objects in my cache?

sort -r -n +4 -5 access.log | awk '{print $5, $7}' | head -25 I want to restart Squid with a clean cache

Note: The information here is current for version 2.2 and later.

First of all, you must stop Squid of course. You can use the command: % squid -k shutdown

The fastest way to restart with an entirely clean cache is to over write the % echo "" > /cache1/swap.state Repeat that for every Another way, which takes longer, is to have squid recreate all the % cd /cache1 % mkdir JUNK % mv ?? swap.state* JUNK % rm -rf JUNK & Repeat this for your other % squid -z How can I proxy/cache Real Audio?

by , and

Point the RealPlayer at your Squid server's HTTP port (e.g. 3128). Using the Preferences->Transport tab, select The RealPlayer (and RealPlayer Plus) manual states: Use HTTP Only Select this option if you are behind a firewall and cannot receive data through TCP. All data will be streamed through HTTP. Note: You may not be able to receive some content if you select this option.

Again, from the documentation: RealPlayer 4.0 identifies itself to the firewall when making a request for content to a RealServer. The following string is attached to any URL that the Player requests using HTTP GET: /SmpDsBhgRl Thus, to identify an HTTP GET request from the RealPlayer, look for: http://[^/]+/SmpDsBhgRl The Player can also be identified by the mime type in a POST to the RealServer. The RealPlayer POST has the following mime type: "application/x-pncmd" Note that the first request is a POST, and the second has a '?' in the URL, so standard Squid configurations would treat it as non-cachable. It also looks rather ``magic.''

HTTP is an alternative delivery mechanism introduced with version 3 players, and it allows a reasonable approximation to ``streaming'' data - that is playing it as you receive it.

It isn't available in the general case: only if someone has made the realaudio file available via an HTTP server, or they're using a version 4 server, they've switched it on, and you're using a version 4 client. If someone has made the file available via their HTTP server, then it'll be cachable. Otherwise, it won't be (as far as we can tell.)

The more common RealAudio link connects via their own Some confusion arises because there is also a configuration option to use an HTTP proxy (such as Squid) with the Realaudio/RealVideo players. This is because the players can fetch the ``How can I purge an object from my cache?

Squid does not allow you to purge objects unless it is configured with access controls in acl PURGE method PURGE acl localhost src 127.0.0.1 http_access allow PURGE localhost http_access deny PURGE The above only allows purge requests which come from the local host and denies all other purge requests.

To purge an object, you can use the squidclient -m PURGE http://www.miscreant.com/ If the purge was successful, you will see a ``200 OK'' response: HTTP/1.0 200 OK Date: Thu, 17 Jul 1997 16:03:32 GMT Server: Squid/1.1.14 If the object was not found in the cache, you will see a ``404 Not Found'' response: HTTP/1.0 404 Not Found Date: Thu, 17 Jul 1997 16:03:22 GMT Server: Squid/1.1.14 Using ICMP to Measure the Network

As of version 1.1.9, Squid is able to utilize ICMP Round-Trip-Time (RTT) measurements to select the optimal location to forward a cache miss. Previously, cache misses would be forwarded to the parent cache which returned the first ICP reply message. These were logged with FIRST_PARENT_MISS in the access.log file. Now we can select the parent which is closest (RTT-wise) to the origin server. Supporting ICMP in your Squid cache

It is more important that your parent caches enable the ICMP features. If you are acting as a parent, then you may want to enable ICMP on your cache. Also, if your cache makes RTT measurements, it will fetch objects directly if your cache is closer than any of the parents.

If you want your Squid cache to measure RTT's to origin servers, Squid must be compiled with the USE_ICMP option. This is easily accomplished by uncommenting "-DUSE_ICMP=1" in src/Makefile and/or src/Makefile.in.

An external program called % make install % su # make install-pinger There are three configuration file options for tuning the measurement database on your cache. Another option, Utilizing your parents database

Your parent caches can be asked to include the RTT measurements in their ICP replies. To do this, you must enable query_icmp on This causes a flag to be set in your outgoing ICP queries.

If your parent caches return ICMP RTT measurements then the eighth column of your access.log will have lines similar to: CLOSEST_PARENT_MISS/it.cache.nlanr.net In this case, it means that CLOSEST_DIRECT/www.sample.com Inspecting the database

The measurement database can be viewed from the cachemgr by selecting "Network Probe Database." Hostnames are aggregated into /24 networks. All measurements made are averaged over time. Measurements are made to specific hosts, taken from the URLs of HTTP requests. The recv and sent fields are the number of ICMP packets sent and received. At this time they are only informational.

A typical database entry looks something like this: Network recv/sent RTT Hops Hostnames 192.41.10.0 20/ 21 82.3 6.0 www.jisedu.org www.dozo.com bo.cache.nlanr.net 42.0 7.0 uc.cache.nlanr.net 48.0 10.0 pb.cache.nlanr.net 55.0 10.0 it.cache.nlanr.net 185.0 13.0 This means we have sent 21 pings to both www.jisedu.org and www.dozo.com. The average RTT is 82.3 milliseconds. The next four lines show the measured values from our parent caches. Since Why are so few requests logged as TCP_IMS_MISS?

When Squid receives an If the request is not forwarded, Squid replies to the IMS request according to the object in its cache. If the modification times are the same, then Squid returns TCP_IMS_HIT. If the modification times are different, then Squid returns TCP_IMS_MISS. In most cases, the cached object will not have changed, so the result is TCP_IMS_HIT. Squid will only return TCP_IMS_MISS if some other client causes a newer version of the object to be pulled into the cache. How can I make Squid NOT cache some servers or URLs?