A proxy server is simply a middle-man between client PCs (users)
and the Internet (typically, websites).
The purpose of the proxy server is to maintain
a cache of commonly accessed Web sites in order to increase
access speed and to reduce bandwith usage.
For example, let's say your office has four PCs,
all of them sharing the same Internet connection.
Every time a user accesses Google,
that page needs to be downloaded to the PC (assuming it's not
already in the browser cache). If all 4 users try to access Google
at the same time, your Internet connection is busy downloading
the same page 4 times, one for each user.
Clearly, this is not efficient, slows down the connection for
everybody, and will cost the company more money if the Internet
provider charges by volume.
This is where a proxy comes in handy. It
acts as a middle-man beteen
these user PCs and the Internet connection.
When a user wants to access a Web page,
his/her browser actually contacts the proxy server rather than the
website directly. The proxy server, in turn, contacts the
website, downloads that page and feeds it back to the user.
However, the proxy server also keeps that page in a disk cache for
a certain amount of time. When another user tries to
access that same site, the proxy server checks its disk
cache, realizes it already has that page in storage, and feeds it
back to the user immediately without having to transfer
it from the Internet.
If your company has 100 PCs instead of 4, you can see how
a proxy server can make a big difference to your network
performance and cost.
This is the gist of it. Configurations
can get a lot more complex, such as one proxy server
getting pages from other proxy servers, or your proxy keeping
track of who accessed what site and when.
The proxy server can also be configured to block certain
sites or impose various restrictions such as which ports
will be allowed. For instance, you can allow Web browsing
through the proxy but disallow FTP transfers, or you
can allow selected PCs on your network access to certain sites and
disallow all others.
All this control, of course, can
be good or bad, depending on your point of view. However, we're not
going to discuss these ethical issues here; my goal here is simply to
get you up and running with a no-frills proxy server using
the free software package squid.
Assuming you've decided to go ahead with this project,
you have two options:
The first option is certainly cheaper, but since your users will
still have direct access to the Internet connection,
they can get around the proxy if they wish.
The second option obviously adds to the cost of the project
because of the extra hardware involved, but it forces users
to use the proxy to gain access to the Internet since they
don't have another way around.
Generally, if you are dealing with a few computers at home or a
small, collaborative office, the first option might be suitable.
On the other hand, if you are dealing with a large company where
you need to control bandwith costs and might want to gather
usage statistics, then the second option would be preferable.
In this document, we will examine both scenarios. There really
is not much difference between them anyway in terms of implementation.
Once you have decided on your topology, you need to install
squid on the computer that you have selected as the proxy server.
Simply follow the procedure appropriate to your operating system.
For this article, I will assume the server is a Linux system.
For instance, on a Ubuntu server, you would install squid
with this simple command:
sudo apt-get install squid
Once squid has been installed, you need to edit its
central configuration file before it can be of any use to you.
Specifically, squid will not allow any HTTP traffic (Web browsing)
from any of your users by default.
On a Unix-type system, you will need to have superuser powers to
edit the configuration file.
On most systems, you would simply login as root to
do this work. On some Linux distributions, however, you may need to
use the sudo command to elevate your
privileges. I am assuming you are familiar enough with your
own system to know what to do.
The central configuration file is named squid.conf
and is usually located in /etc/squid.
This file is quite large and includes abundant comments about
every option. Fortunately, there are only 3 lines that need to
be modified or added to get minimal functionality out of squid.
First, you need to identify your network using an Access Control
List, or "acl." The line will have the following format:
acl name_of_your_choice src your_network_address
The name_of_your_choice in this case is any word that you
may want to use to refer to your network, such as "mylan" or
"family_net". The sample configuration file that comes with squid
uses "localnet" for this purpose but you can create your own.
For our examples, we will use "mylan".
The keyword "src" stands for source address.
So, if your local network is 192.168.0.xxx and uses a subnet
mask of 255.255.255.0 for instance, you would
add this line to the configuration file:
acl mylan src 192.168.0.0/24
Note that the "acl" instruction can take many other forms
as you will see in the profuse comments in squid.conf. For
instance, using "dst" instead of "src," you can specify
destination addresses instead of source addresses. For our
purposes here, we will limit the scope of our discussion to the
bare necessities to get your users on the Web through the proxy
To this end, we now need to tell squid that HTTP traffic is
to be allowed for members of acl "mylan." This is done
with the http_access directive in
http_access allow your_acl_name
In our case, we would create this instruction:
http_access allow mylan
Since access rules are position-dependent, be sure to insert this
line at the correct place in the file. Search for the phrase
"INSERT YOUR OWN RULE(S) HERE" in the configuration file to find
Next, you need to create an "icp_access" instruction
with the same format as the http_access line we just created, as
icp_access allow mylan
ICP stands for "Internet Cache Protocol" and this is what squid
uses to find the most appropriate location for the requested object
(such as its own cache, a cache on a different proxy server, or
retrieving the page from the website itself).
The above line tells squid to allow members of mylan to use
this protocol to fetch pages.
Search the configuration file for lines that begin with
"icp_access" and insert your own line in that area.
Make sure your new line comes before the line that denies access to all
others (icp_access deny all).
Finally, some installations will require you to specify the name
under which your server will be known. Typically, this is the
machine name you gave to this computer. Simply edit (or create) the
line beginning with "visible_hostname" and specify your machine
name, as in:
That's it! With the above 4 modifications, squid is now configured
for basic functionality, meaning it will allow clients (users) to
access the Web through the proxy server.
Start the squid daemon using the command appropriate for your
Linux distribution. For instance, on Ubuntu, you would use
service squid start; on
OpenSUSE, you would use rcsquid start.
And since squid will
normally be configured to start up automatically at boot-time,
you can probably just shut down and reboot your computer if you
can't find a more elegant way to start the daemon.
The server may well be ready for action, but the users' PCs must also
be told to use the new proxy server.
On a Windows PC, you would do this through the Control Panel by
navigating to the Networking panel down to Internet Options and LAN
Settings. Unfortunately, since there are so many different
versions of Windows, each with slightly different menus under the
Control Panel, it's impractical to provide detailed directions
Fortunately, you can access the same panel through
the Internet Explorer Web browser by clicking on
Tools→Internet Options→Connections→LAN Settings.
In that panel, check the box for "Use a proxy server" and enter the
IP address of your proxy server as well as the port specified in
the squid.conf file. That value will usually be 3128, but look for
the line that starts with "http_port" in squid.conf to be sure.
With this small change done, your Web browser will now access the
Web through the proxy server.
Make this same modification on all
the PCs that need to use the proxy and you're done!
This change affects all other applications on the PC, such as other
Web browsers, FTP clients and telnet communication programs
unless they have been specifically configured not to use a proxy.
For instance, the Firefox browser features a checkbox in the
Advanced→Network configuration screen for "No proxy," so check
this out if you're having difficulty using this program through a
NOTE: It is possible, under some conditions, to implement a proxy
server without having to manually configure each client PC as we
have just described here. See the section "Transparent Proxy"
in the next section for details.
In the introduction, I mentioned that it may be desirable to force
users to use the proxy server in order to realize the
performance and cost benefits made possible by this technology, as
well as to gather usage statistics or exercise certain controls
over Internet access.
There are various ways to achieve this, but probably the simplest
method is to segregate the client PCs from the Internet by
physically placing the proxy server between your LAN and the
Internet connection, so that users don't have direct access to the
Internet modem or router. Specifically, the
computer used as the proxy server would have two network interface
cards (NICs), one belonging to your LAN and the other one
connecting directly to the Internet. In fact, this is the typical
configuration for a firewall machine.
Since users don't have a physical connection to the Internet modem
or router in this scenario, they have no choice but to direct their
network queries to the proxy server if they are to access the
If your organization is already using a Linux machine as its
firewall, then you can simply install squid on that firewall
machine since your network topology is already configured
as shown in the graph on the right.
One common complaint about setting up a proxy server is that every
user PC must be reconfigured to use that proxy. As we described in the
previous section, this is a fairly simple procedure but it can be
an unwelcome chore if your site is using hundreds of PCs,
especially if some of them are mobile laptops that need to operate on
other networks as well.
If you are the IT manager for such as site, you probably wish you
could somehow "trick" all PCs into using the proxy server without
There is a faily simple way to accomplish this by reconfiguring
your router or firewall so that all HTTP connection requests (port
80) are routed to the proxy server on the appropriate port (3128 by
default, unless you changed it).
This way, whenever any of your user PCs is trying to access a
Web page, their request is re-routed to the proxy without having to
change anything on the PC. If this PC happens to be a wireless
laptop, it will continue to operate normally when it leaves your
premises and hooks up to another network since you never configured
it to use a proxy server so it doesn't need to be reconfigured.
Since this strategy involves "intercepting" HTTP request on the
fly, this type of proxy configuration is sometimes called an
"intercepting proxy" or a "forced proxy." Also, since the method
is transparent to users (i.e. they don't know it's happening), the
configuration is often called a "transparent proxy." This is the
term we will be using from now on in this document.
Before we go any further, let me clarify that a transparent proxy
is a limited solution with a number of technical drawbacks.
For one thing, only the ports you specifically redirect will be
affected. For instance, if your redirect port 80 (HTTP) to your
proxy server, all regular Web traffic will go through the proxy,
but all other types of traffic, such as email, ftp and telnet, will
continue to go directly to your Internet connection, bypassing the
In practice, that's probably not an issue since those other protocols
would not benefit much from a caching proxy and are not usually the
protocols you want to control and monitor anyway.
Another issue is that port HTTPS (secure HTTP) may not work
correctly if you intercept it since, as a secure protocol, it is
designed to defeat man-in-the-middle attacks, and an intercepting
proxy is precisely that.
There have also been other technical problems detected with
transparent proxies, especially when dealing with older browsers.
Without going into these details, let me simply say that while a
transparent proxy is very convenient, you should only consider it
if the alternative — configuring your client PCs manually
— is truly undesirable.
If you are still interested in setting up a transparent proxy, here
is how it's done.
Let's examine each step in greater detail.
For this example, we are going to assume you only wish to redirect
Web traffic (port 80) to the proxy server and not bother with the
other protocols. The method you will use to do this depends on
your network topology.
If your proxy server is a single-NIC machine on the same LAN as
your client workstations and under control of the same router (as
shown on the left),
then you will need to reconfigure this router to perform the port
redirection using the method appropriate to that device. In most
cases, you can customize the router settings through its Web-based
configuration interface by simply pointing your browser to it.
In this case, you may want to give your proxy server a static IP
address on your system to make it simpler to tell the router where
to redirect the Web requests from your LAN. Another method would
be to use the unique hardware address of its network card, but
that would be less flexible since your router configuration would have to
be updated if the hardware were ever upgraded or replaced.
Let's assume our LAN uses the 192.168.1.xxx address
block and that we have configured our proxy server with a static
address of 192.168.1.200. We would then access the configuration
menu of the router and tell it to redirect all traffic for port 80
cming from our LAN to 192.168.1.200, port 3128.
Some routers may not have the
ability to redirect one port to a different port on another
machine, so you may have to upgrade your router if that's your
On the other hand, if your topology involves using the Squid server
as your firewall (as shown on the right), then you will need to
modify your firewall setting in Linux to redirect traffic from your
LAN for port 80 to the local machine, port 3128.
On modern Linux
distributions (i.e. kernel 2.6), this can be done with these two
instructions (entered as root):
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 3128
The first line enables packet forwarding on the system; this is
required in order to perform any kind of port redirection.
The second instruction (shown on 2 lines here, although it should
be entered as a single command) causes all incoming traffic for
port 80 to be redirected to port 3128 on the same system. Note
that if you have changed the default port in your squid.conf file,
you should specify your custom port number here instead.
Note that if your
system is already configured as a router or firewall, you probably
have a shell script somewhere with lines similar to those above and
you should incorporate these instructions in that script so that
they are executed automatically whenever the system is booted.
By the way, if you are not familiar with configuring firewalls on
Linux, these instructions are probably totally cryptic to you.
Unfortunately, that topic is outside the scope of this document,
so you may want to read up on the subject in separate documentation.
This second step is actually trivial to implement on current
versions of squid (starting at version 2.6): all you have to do is
locate the http_port line in the squid configuration file
(squid.conf) and add the keyword "transparent" to it, like this:
http_port 3128 transparent
Then, stop and restart the squid daemon and you're done.
On most current Linux distributions, you would do this using the
service squid restart
On older versions of squid, there are a number of configuration
lines to add to the squid.conf file to make it operate as a
transparent proxy, but we will not delve into this here since it is
generally simpler to just upgrade your squid software to a
One of the most popular reasons for implementing a squid proxy
server is to block access to certain websites on your LAN. This may be
desirable in order to cut down on unproductive use of corporate resources,
such as employees watching YouTube videos or chatting on Facebook
instead of working. It may also be a safeguard against liability
issues in cases of illegal activities by employees, such as
watching child pornography on the corporate network.
Whatever your reasons, if you need to block access to
certain sites or domains on your LAN,
the procedure is quite simple.
There are different ways to blacklist specific sites or domains
by entering instructions in squid.conf, but I find that
the simplest and most flexible approach is to create a list of
blocked domain names in a separate file and point to it from
squid.conf using these two instructions:
acl your_label url_regex your_filename
http_access deny your_label
The parts in italics can be any name you wish to use. For
instance, we may want to use the label "banned" to refer to
the access control list (acl) that we are creating, and we may want to
use the name "blocked_sites" for the file that will contain the
banned domain names.
This would result in these two instructions:
acl banned url_regex "/etc/squid/blocked_sites"
http_access deny banned
The keyword "url_regex" stands for "URL regular expressions," which
means the file we specified may include wild-card expressions
to describe the names we wish to blacklist.
So, let's create the file we specified
(/etc/squid/blocked_sites in our example) using a standard text editor.
Let's assume we want to preclude our users from accessing
YouTube and Facebook. To do this, we would add the following lines
to that file:
This would match any domain name containing these strings, such as
"www.youtube.com" or "facebook.com/login.php."
IMPORTANT: Be sure to start domain names with a dot if you want
to block all sub-domains belonging to that domain. For instance,
".youtube.com" will block "www.youtube.com" while "youtube.com"
will only block "youtube.com" and no sub-domains of it.
The following lines would block any domain name with the strings
"porn", "sex" or "gambl" in them:
This would successfully block sites like "thegambler.com" and
"gambling.com." However, these broad restrictions might also
have unintended effects,
such as blocking access to legitimate sites such as "sussex.com"
which happen to contain one of our forbidden text strings.
Unfortunately, there is no easy way to block all sites dealing with
a particular subject matter, such as pornography or gambling, since
they may be listed under domain names that give no indication as to
their nature. However, there are people on the Internet
who maintain lists of known
"undesirable" sites and you should be able to locate a suitable
list with a small amount of research.
IMPORTANT: The order in which the instructions
are specified in squid.conf matters. For instance, setting a
rule to deny a particular access after another rule that
allows this access to "all" will have no effect, so be careful
to place your instructions logically.
If things don't work as expected on your first try, just keep
fiddling with your settings; it can all be made to work!
Another useful feature of squid is the ability to block access to
certain ports. For instance, you may want to allow email transfers
on your network but not Web surfing.
Again, the method to implement this restriction consists of
creating an acl (access control list) identifying the port
or ports we want to affect, and then creating an
http_access deny rule to preclude this access.
For instance, to disallow Web traffic (port 80) on our LAN,
we would create the following instructions:
acl some_name port 80
http_access deny some_name
For instance, if we choose to label our acl "blocked_port," our
entries in squid.conf would look like this:
acl blocked_port port 80
http_access deny blocked_port
What is we wanted to give Web access to one particular computer on
our network while disallowing all others?
We would create a second acl corresponding to the IP address of the
privileged computer, and then specify that the "deny" rule does
not apply to it by preceding that acl label with an
exclamation mark, which indicates the negative.
For instance, in the following segment, we are creating an acl
named "allowed_pc" corresponding to a given source address (src),
and then we negate it in the http_access deny clause,
thus excluding it from the rule:
acl blocked_port port 80
acl allowed_pc src 192.168.1.123
http_access deny blocked_port !allowed_pc
These instructions tell squid: "Deny access to the access list
specified by 'blocked_port' but not to the access list
specified by 'allowd_pc'."
When Squid denies access to a particular website or port, it
displays a fairly terse and technical error message instead of the
page the user was probably expecting. At first glance, most users
will probably think there is something wrong on the network or with
their browser, so you may wish to display a more helpful page to
indicate that access was actually denied based on corporate policy.
Fortunately, this is fairly easy to do, especially if you are
reasonably familiar with creating simple HTML pages.
The standard error messages displayed by squid are stored in small
HTML pages located in a directory named "errors" and a subdirectory
named after the language used, as in errors/English,
somewhere in the squid installation directories. This exact
location may vary from one platform and software release to
another, so you may have to search for it. Common locations are
/etc/squid/errors and /usr/share/squid/errors. Fortunately, the
names of all these error files begin with "ERR_" (as in
ERR_ACCESS_DENIED), so you can use a
search tool to find these files on your system.
Once you have located the right directory on your system,
you can simply create your own custom-made error page, possibly
called "ERR_CUSTOM" for instance, which might
feature a simpler message such as "Sorry, access to this site is
not allowed on the company network. If you have any questions
about this policy, talk to the hand."
Then, to tell squid to use your new-and-improved error page instead
of the stock version, edit
squid.conf to introduce the instruction "deny_info" using
deny_info your_error_page the_affected_acl
This instruction tells squid to display the page you specify as
your_error_page when access was denied as the result of
the ACL specified as the second parameter.
For instance, if we want to display a custom HTML page
called "ERR_BAD_SITE" when users are denied access
through the "banned" ACL that we created in our earlier example,
we would insert the following line (shown in red) after that ACL definition:
acl banned url_regex "/etc/squid/blocked_sites"
http_access deny banned
deny_info ERR_BAD_SITE banned
While it's nice to be able to control access to network resources
with such great precision, it's fairly pointless to have this control if
you don't know exactly what should be restricted.
For instance, which sites or domains are being accessed by your
users? Who accesses them most frequently? At what time of the day
are these being accessed? And so on.
Fortunately, squid maintains a thorough access log, typically in
/var/log/squid, aptly named access.log.
The good news is, this log has all the information you might need
to keep an eye on what's happening on your network.
The bad news is, this log gets really big really fast and it's
pretty cryptic to look at.
Lucky for us, there is a utility named sarg, the Squid
Analysis Report Generator, that can be used to parse this log,
gather useful statistics from it, and generate a friendly and
convenient Web-browsable report that you can examine with any Web
In its simplest form, you invoke sarg with the name of the
log file you wish to examine, like this:
In mere seconds, sarg will have created an HTML report that you can
then examine with your Web browser. By default, the report is
created in /var/lib/sarg under a subdirectory named after
the date range covered by the log file you specified as an
argument. For instance, if the log file covered the period March
17 to April 5, 2011, the HTML report will be found in a directory
named /var/lib/sarg/2011Mar17-2011Apr05. To read it, point your
browser to the file index.html in that directory.
On that page, you will get a summary of usage as well as clickable
links to list the top sites, sites and users, downloads and denied
The sarg utility features a number of command-line
options to modify its default
behavior, and can also be configured through its configuration
file, /etc/sarg/sarg.conf. This file specifies the default
location of the squid log, the output directory for the HTML report,
and numerous display options for the final report, including custom
titles and font styles.
We are not going to examine all these options here, but you are
encouraged to check out the documentation and examine the contents
of sarg.conf to help you customize your reports as desired.
*Note: crontab is a standard service on Unix-type operating
systems to schedule periodic tasks, such as this daily report. A
description of how to use this service is unfortunately outside the
scope of this tutorial.
Finally, if entering commands from the shell prompt isn't your thing and you
just want to be able to check out the usage log every morning to
see who's doing what, you can simply create a crontab* entry to run
the report once a day automatically.
This way, all you need to do is point your
browser to the latest HTML report when you get to work in the
We have only scratched the surface of squid's impressive set of
features in this document.
The rich array of configuration options offered by squid should allow
you to implement just about any set of controls and
restrictions you wish to have on your network.
Fortunately, there is a lot of information available on squid,
starting with the generous comments in squid.conf and the
documentation that comes with the package (look in
/usr/share/doc/squid). Of course, the Internet is also full of helpful
blogs and documentation.
Did you find an error on this page or
do you have a comment?