I.T. Consulting
Tutorials
Sysadmin
How RAID Works What's a RAID? Hardware vs Software RAID Striping and Mirroring RAID 2 and 3 RAID 4 and 5 Conclusion
Software RAID on Linux RAID: Quick Recap Software Tools Creating & Using an Array Monitoring an Array Removing & Re-Assembling an Array The mdadm.conf File Deleting an Array Summary & Cheat-Sheet
Network Security
Squid Proxy Server Basic Configuration Controlling Traffic Blocking Access Monitoring Traffic
SSH: Secure Shell Overview Using SSH Encryption Authentication Keys Configuring SSH Advanced Tricks
Implementing HTTPS What Is HTTPS? Setting Up The Server
Linux Skills
The ed Line Editor First Things First Navigating Entering Text Changing Text Line Maneuvers Text Searches Using ed in Real Life Summary
Regular Expressions Text Patterns Extended Expressions
The vi Editor Introduction Operating Modes Navigation Editing Summary
Intermediate vi Power Editing Cut-and-Paste Modifying Text Searches Tips & Tricks The vi Prompt Indenting
Miscellaneous
Creating an eBook Introduction Create an ePub Create a MOBI Create a PDF

Configuring a Squid Proxy Server

What's a Proxy Server?

A proxy server is simply a middle-man between client PCs (users) and the Internet (typically, websites). The purpose of the proxy server is to maintain a cache of commonly accessed Web sites in order to increase access speed and to reduce bandwith usage.

A standard LAN

For example, let's say your office has four PCs, all of them sharing the same Internet connection. Every time a user accesses Google, that page needs to be downloaded to the PC (assuming it's not already in the browser cache). If all 4 users try to access Google at the same time, your Internet connection is busy downloading the same page 4 times, one for each user. Clearly, this is not efficient, slows down the connection for everybody, and will cost the company more money if the Internet provider charges by volume.

Network topology for proxy server

This is where a proxy comes in handy. It acts as a middle-man beteen these user PCs and the Internet connection. When a user wants to access a Web page, his/her browser actually contacts the proxy server rather than the website directly. The proxy server, in turn, contacts the website, downloads that page and feeds it back to the user.

However, the proxy server also keeps that page in a disk cache for a certain amount of time. When another user tries to access that same site, the proxy server checks its disk cache, realizes it already has that page in storage, and feeds it back to the user immediately without having to transfer it from the Internet.

If your company has 100 PCs instead of 4, you can see how a proxy server can make a big difference to your network performance and cost.

This is the gist of it. Configurations can get a lot more complex, such as one proxy server getting pages from other proxy servers, or your proxy keeping track of who accessed what site and when. The proxy server can also be configured to block certain sites or impose various restrictions such as which ports will be allowed. For instance, you can allow Web browsing through the proxy but disallow FTP transfers, or you can allow selected PCs on your network access to certain sites and disallow all others.

All this control, of course, can be good or bad, depending on your point of view. However, we're not going to discuss these ethical issues here; my goal here is simply to get you up and running with a no-frills proxy server using the free software package squid.

Step 1: Decide on Your Network Topology

Assuming you've decided to go ahead with this project, you have two options:

  1. configure one of your existing user PCs to serve as the proxy; or
  2. install a new computer to be used as a dedicated proxy server and placed between your LAN and the Internet connection.

The first option is certainly cheaper, but since your users will still have direct access to the Internet connection, they can get around the proxy if they wish.

The second option obviously adds to the cost of the project because of the extra hardware involved, but it forces users to use the proxy to gain access to the Internet since they don't have another way around.

Generally, if you are dealing with a few computers at home or a small, collaborative office, the first option might be suitable. On the other hand, if you are dealing with a large company where you need to control bandwith costs and might want to gather usage statistics, then the second option would be preferable.

In this document, we will examine both scenarios. There really is not much difference between them anyway in terms of implementation.

Step 2: Installing Squid

Once you have decided on your topology, you need to install squid on the computer that you have selected as the proxy server. Simply follow the procedure appropriate to your operating system. For this article, I will assume the server is a Linux system.

For instance, on a Ubuntu server, you would install squid with this simple command:

sudo apt-get install squid

Step 3: Configuring Squid

Once squid has been installed, you need to edit its central configuration file before it can be of any use to you. Specifically, squid will not allow any HTTP traffic (Web browsing) from any of your users by default.

The central configuration file is named squid.conf and is usually located in /etc/squid. This file is quite large and includes abundant comments about every option. Fortunately, there are only 3 lines that need to be modified or added to get minimal functionality out of squid.

First, you need to identify your network using an Access Control List, or "acl." The line will have the following format:

acl name_of_your_choice src your_network_address

The name_of_your_choice in this case is any word that you may want to use to refer to your network, such as "mylan" or "family_net". The sample configuration file that comes with squid uses "localnet" for this purpose but you can create your own. For our examples, we will use "mylan". The keyword "src" stands for source address.

So, if your local network is 192.168.0.xxx and uses a subnet mask of 255.255.255.0 for instance, you would add this line to the configuration file:

acl mylan src 192.168.0.0/24

Note that the "acl" instruction can take many other forms as you will see in the profuse comments in squid.conf. For instance, using "dst" instead of "src," you can specify destination addresses instead of source addresses. For our purposes here, we will limit the scope of our discussion to the bare necessities to get your users on the Web through the proxy server.

To this end, we now need to tell squid that HTTP traffic is to be allowed for members of acl "mylan." This is done with the http_access directive in the form:

http_access allow your_acl_name

In our case, we would create this instruction:

http_access allow mylan

Since access rules are position-dependent, be sure to insert this line at the correct place in the file. Search for the phrase "INSERT YOUR OWN RULE(S) HERE" in the configuration file to find that place.

Next, you need to create an "icp_access" instruction with the same format as the http_access line we just created, as in:

icp_access allow mylan

ICP stands for "Internet Cache Protocol" and this is what squid uses to find the most appropriate location for the requested object (such as its own cache, a cache on a different proxy server, or retrieving the page from the website itself).

The above line tells squid to allow members of mylan to use this protocol to fetch pages. Search the configuration file for lines that begin with "icp_access" and insert your own line in that area. Make sure your new line comes before the line that denies access to all others (icp_access deny all).

Finally, some installations will require you to specify the name under which your server will be known. Typically, this is the machine name you gave to this computer. Simply edit (or create) the line beginning with "visible_hostname" and specify your machine name, as in:

visible_hostname hal9000

That's it! With the above 4 modifications, squid is now configured for basic functionality, meaning it will allow clients (users) to access the Web through the proxy server.

Start the squid daemon using the command appropriate for your Linux distribution. For instance, on Ubuntu, you would use service squid start; on OpenSUSE, you would use rcsquid start. And since squid will normally be configured to start up automatically at boot-time, you can probably just shut down and reboot your computer if you can't find a more elegant way to start the daemon.

Step 4: Configuring the Client PCs

The server may well be ready for action, but the users' PCs must also be told to use the new proxy server.

On a Windows PC, you would do this through the Control Panel by navigating to the Networking panel down to Internet Options and LAN Settings. Unfortunately, since there are so many different versions of Windows, each with slightly different menus under the Control Panel, it's impractical to provide detailed directions here.

Fortunately, you can access the same panel through the Internet Explorer Web browser by clicking on Tools→Internet Options→Connections→LAN Settings.

In that panel, check the box for "Use a proxy server" and enter the IP address of your proxy server as well as the port specified in the squid.conf file. That value will usually be 3128, but look for the line that starts with "http_port" in squid.conf to be sure.

With this small change done, your Web browser will now access the Web through the proxy server. Make this same modification on all the PCs that need to use the proxy and you're done!

This change affects all other applications on the PC, such as other Web browsers, FTP clients and telnet communication programs unless they have been specifically configured not to use a proxy. For instance, the Firefox browser features a checkbox in the Advanced→Network configuration screen for "No proxy," so check this out if you're having difficulty using this program through a proxy server.

NOTE: It is possible, under some conditions, to implement a proxy server without having to manually configure each client PC as we have just described here. See the section "Transparent Proxy" in the next section for details.

 

 

Controlling Access and Traffic with Squid

Forcing Users Through the Proxy

In the introduction, I mentioned that it may be desirable to force users to use the proxy server in order to realize the performance and cost benefits made possible by this technology, as well as to gather usage statistics or exercise certain controls over Internet access.

Controlling traffic flow with squid

There are various ways to achieve this, but probably the simplest method is to segregate the client PCs from the Internet by physically placing the proxy server between your LAN and the Internet connection, so that users don't have direct access to the Internet modem or router. Specifically, the computer used as the proxy server would have two network interface cards (NICs), one belonging to your LAN and the other one connecting directly to the Internet. In fact, this is the typical configuration for a firewall machine.

Since users don't have a physical connection to the Internet modem or router in this scenario, they have no choice but to direct their network queries to the proxy server if they are to access the Internet.

If your organization is already using a Linux machine as its firewall, then you can simply install squid on that firewall machine since your network topology is already configured as shown in the graph on the right.

Transparent or Intercepting Proxy

One common complaint about setting up a proxy server is that every user PC must be reconfigured to use that proxy. As we described in the previous section, this is a fairly simple procedure but it can be an unwelcome chore if your site is using hundreds of PCs, especially if some of them are mobile laptops that need to operate on other networks as well.

If you are the IT manager for such as site, you probably wish you could somehow "trick" all PCs into using the proxy server without their knowledge.

There is a faily simple way to accomplish this by reconfiguring your router or firewall so that all HTTP connection requests (port 80) are routed to the proxy server on the appropriate port (3128 by default, unless you changed it). This way, whenever any of your user PCs is trying to access a Web page, their request is re-routed to the proxy without having to change anything on the PC. If this PC happens to be a wireless laptop, it will continue to operate normally when it leaves your premises and hooks up to another network since you never configured it to use a proxy server so it doesn't need to be reconfigured.

Since this strategy involves "intercepting" HTTP request on the fly, this type of proxy configuration is sometimes called an "intercepting proxy" or a "forced proxy." Also, since the method is transparent to users (i.e. they don't know it's happening), the configuration is often called a "transparent proxy." This is the term we will be using from now on in this document.

Before we go any further, let me clarify that a transparent proxy is a limited solution with a number of technical drawbacks. For one thing, only the ports you specifically redirect will be affected. For instance, if your redirect port 80 (HTTP) to your proxy server, all regular Web traffic will go through the proxy, but all other types of traffic, such as email, ftp and telnet, will continue to go directly to your Internet connection, bypassing the proxy server.

In practice, that's probably not an issue since those other protocols would not benefit much from a caching proxy and are not usually the protocols you want to control and monitor anyway.

Another issue is that port HTTPS (secure HTTP) may not work correctly if you intercept it since, as a secure protocol, it is designed to defeat man-in-the-middle attacks, and an intercepting proxy is precisely that.

There have also been other technical problems detected with transparent proxies, especially when dealing with older browsers. Without going into these details, let me simply say that while a transparent proxy is very convenient, you should only consider it if the alternative — configuring your client PCs manually — is truly undesirable.

If you are still interested in setting up a transparent proxy, here is how it's done.

  1. Configure your router or firewall to redirect port 80 to the IP address and port of your proxy server.
  2. Edit the squid configuration file to inform it that it should run as a transparent proxy.

Let's examine each step in greater detail.

Step 1: Redirecting Traffic to the Proxy Server

For this example, we are going to assume you only wish to redirect Web traffic (port 80) to the proxy server and not bother with the other protocols. The method you will use to do this depends on your network topology.

Squid server on the LAN

If your proxy server is a single-NIC machine on the same LAN as your client workstations and under control of the same router (as shown on the left), then you will need to reconfigure this router to perform the port redirection using the method appropriate to that device. In most cases, you can customize the router settings through its Web-based configuration interface by simply pointing your browser to it.

In this case, you may want to give your proxy server a static IP address on your system to make it simpler to tell the router where to redirect the Web requests from your LAN. Another method would be to use the unique hardware address of its network card, but that would be less flexible since your router configuration would have to be updated if the hardware were ever upgraded or replaced.

Squid as the firewall

Let's assume our LAN uses the 192.168.1.xxx address block and that we have configured our proxy server with a static address of 192.168.1.200. We would then access the configuration menu of the router and tell it to redirect all traffic for port 80 cming from our LAN to 192.168.1.200, port 3128. Some routers may not have the ability to redirect one port to a different port on another machine, so you may have to upgrade your router if that's your situation.

On the other hand, if your topology involves using the Squid server as your firewall (as shown on the right), then you will need to modify your firewall setting in Linux to redirect traffic from your LAN for port 80 to the local machine, port 3128.

On modern Linux distributions (i.e. kernel 2.6), this can be done with these two instructions (entered as root):

echo 1 > /proc/sys/net/ipv4/ip_forward

iptables -t nat -A PREROUTING -p tcp --dport 80 -j REDIRECT --to-port 3128

The first line enables packet forwarding on the system; this is required in order to perform any kind of port redirection. The second instruction (shown on 2 lines here, although it should be entered as a single command) causes all incoming traffic for port 80 to be redirected to port 3128 on the same system. Note that if you have changed the default port in your squid.conf file, you should specify your custom port number here instead.

Note that if your system is already configured as a router or firewall, you probably have a shell script somewhere with lines similar to those above and you should incorporate these instructions in that script so that they are executed automatically whenever the system is booted.

By the way, if you are not familiar with configuring firewalls on Linux, these instructions are probably totally cryptic to you. Unfortunately, that topic is outside the scope of this document, so you may want to read up on the subject in separate documentation.

Step 2: Telling the Proxy to go Transparent

This second step is actually trivial to implement on current versions of squid (starting at version 2.6): all you have to do is locate the http_port line in the squid configuration file (squid.conf) and add the keyword "transparent" to it, like this:

http_port 3128 transparent

Then, stop and restart the squid daemon and you're done. On most current Linux distributions, you would do this using the command:

service squid restart

On older versions of squid, there are a number of configuration lines to add to the squid.conf file to make it operate as a transparent proxy, but we will not delve into this here since it is generally simpler to just upgrade your squid software to a current version.

 

 

Blocking Access to Web Sites

One of the most popular reasons for implementing a squid proxy server is to block access to certain websites on your LAN. This may be desirable in order to cut down on unproductive use of corporate resources, such as employees watching YouTube videos or chatting on Facebook instead of working. It may also be a safeguard against liability issues in cases of illegal activities by employees, such as watching child pornography on the corporate network.

Whatever your reasons, if you need to block access to certain sites or domains on your LAN, the procedure is quite simple. There are different ways to blacklist specific sites or domains by entering instructions in squid.conf, but I find that the simplest and most flexible approach is to create a list of blocked domain names in a separate file and point to it from squid.conf using these two instructions:

acl your_label url_regex your_filename
http_access deny your_label

The parts in italics can be any name you wish to use. For instance, we may want to use the label "banned" to refer to the access control list (acl) that we are creating, and we may want to use the name "blocked_sites" for the file that will contain the banned domain names. This would result in these two instructions:

acl banned url_regex "/etc/squid/blocked_sites"
http_access deny banned

The keyword "url_regex" stands for "URL regular expressions," which means the file we specified may include wild-card expressions to describe the names we wish to blacklist.

So, let's create the file we specified (/etc/squid/blocked_sites in our example) using a standard text editor. Let's assume we want to preclude our users from accessing YouTube and Facebook. To do this, we would add the following lines to that file:

.youtube.com
.facebook.com

This would match any domain name containing these strings, such as "www.youtube.com" or "facebook.com/login.php."

IMPORTANT: Be sure to start domain names with a dot if you want to block all sub-domains belonging to that domain. For instance, ".youtube.com" will block "www.youtube.com" while "youtube.com" will only block "youtube.com" and no sub-domains of it.

The following lines would block any domain name with the strings "porn", "sex" or "gambl" in them:

porn
sex
gambl

This would successfully block sites like "thegambler.com" and "gambling.com." However, these broad restrictions might also have unintended effects, such as blocking access to legitimate sites such as "sussex.com" which happen to contain one of our forbidden text strings.

Unfortunately, there is no easy way to block all sites dealing with a particular subject matter, such as pornography or gambling, since they may be listed under domain names that give no indication as to their nature. However, there are people on the Internet who maintain lists of known "undesirable" sites and you should be able to locate a suitable list with a small amount of research.

IMPORTANT: The order in which the instructions are specified in squid.conf matters. For instance, setting a rule to deny a particular access after another rule that allows this access to "all" will have no effect, so be careful to place your instructions logically. If things don't work as expected on your first try, just keep fiddling with your settings; it can all be made to work!

Blocking Access to Ports

Another useful feature of squid is the ability to block access to certain ports. For instance, you may want to allow email transfers on your network but not Web surfing.

Again, the method to implement this restriction consists of creating an acl (access control list) identifying the port or ports we want to affect, and then creating an http_access deny rule to preclude this access.

For instance, to disallow Web traffic (port 80) on our LAN, we would create the following instructions:

acl some_name port 80
http_access deny some_name

For instance, if we choose to label our acl "blocked_port," our entries in squid.conf would look like this:

acl blocked_port port 80
http_access deny blocked_port

What is we wanted to give Web access to one particular computer on our network while disallowing all others? We would create a second acl corresponding to the IP address of the privileged computer, and then specify that the "deny" rule does not apply to it by preceding that acl label with an exclamation mark, which indicates the negative.

For instance, in the following segment, we are creating an acl named "allowed_pc" corresponding to a given source address (src), and then we negate it in the http_access deny clause, thus excluding it from the rule:

acl blocked_port port 80
acl allowed_pc src 192.168.1.123
http_access deny blocked_port !allowed_pc

These instructions tell squid: "Deny access to the access list specified by 'blocked_port' but not to the access list specified by 'allowd_pc'."

Redirecting Users When Access is Denied

When Squid denies access to a particular website or port, it displays a fairly terse and technical error message instead of the page the user was probably expecting. At first glance, most users will probably think there is something wrong on the network or with their browser, so you may wish to display a more helpful page to indicate that access was actually denied based on corporate policy.

Fortunately, this is fairly easy to do, especially if you are reasonably familiar with creating simple HTML pages.

The standard error messages displayed by squid are stored in small HTML pages located in a directory named "errors" and a subdirectory named after the language used, as in errors/English, somewhere in the squid installation directories. This exact location may vary from one platform and software release to another, so you may have to search for it. Common locations are /etc/squid/errors and /usr/share/squid/errors. Fortunately, the names of all these error files begin with "ERR_" (as in ERR_ACCESS_DENIED), so you can use a search tool to find these files on your system.

Once you have located the right directory on your system, you can simply create your own custom-made error page, possibly called "ERR_CUSTOM" for instance, which might feature a simpler message such as "Sorry, access to this site is not allowed on the company network. If you have any questions about this policy, talk to the hand."

Then, to tell squid to use your new-and-improved error page instead of the stock version, edit squid.conf to introduce the instruction "deny_info" using this syntax:

deny_info   your_error_page   the_affected_acl

This instruction tells squid to display the page you specify as your_error_page when access was denied as the result of the ACL specified as the second parameter.

For instance, if we want to display a custom HTML page called "ERR_BAD_SITE" when users are denied access through the "banned" ACL that we created in our earlier example, we would insert the following line (shown in red) after that ACL definition:

acl banned url_regex "/etc/squid/blocked_sites"
http_access deny banned
deny_info ERR_BAD_SITE banned

 

 

Monitoring Access with sarg

While it's nice to be able to control access to network resources with such great precision, it's fairly pointless to have this control if you don't know exactly what should be restricted.

For instance, which sites or domains are being accessed by your users? Who accesses them most frequently? At what time of the day are these being accessed? And so on.

Fortunately, squid maintains a thorough access log, typically in /var/log/squid, aptly named access.log. The good news is, this log has all the information you might need to keep an eye on what's happening on your network. The bad news is, this log gets really big really fast and it's pretty cryptic to look at.

Lucky for us, there is a utility named sarg, the Squid Analysis Report Generator, that can be used to parse this log, gather useful statistics from it, and generate a friendly and convenient Web-browsable report that you can examine with any Web browser.

In its simplest form, you invoke sarg with the name of the log file you wish to examine, like this:

sarg /var/log/squid/access.log

In mere seconds, sarg will have created an HTML report that you can then examine with your Web browser. By default, the report is created in /var/lib/sarg under a subdirectory named after the date range covered by the log file you specified as an argument. For instance, if the log file covered the period March 17 to April 5, 2011, the HTML report will be found in a directory named /var/lib/sarg/2011Mar17-2011Apr05. To read it, point your browser to the file index.html in that directory.

On that page, you will get a summary of usage as well as clickable links to list the top sites, sites and users, downloads and denied accesses.

The sarg utility features a number of command-line options to modify its default behavior, and can also be configured through its configuration file, /etc/sarg/sarg.conf. This file specifies the default location of the squid log, the output directory for the HTML report, and numerous display options for the final report, including custom titles and font styles.

We are not going to examine all these options here, but you are encouraged to check out the documentation and examine the contents of sarg.conf to help you customize your reports as desired.

Finally, if entering commands from the shell prompt isn't your thing and you just want to be able to check out the usage log every morning to see who's doing what, you can simply create a crontab* entry to run the report once a day automatically. This way, all you need to do is point your browser to the latest HTML report when you get to work in the morning.

Conclusion

We have only scratched the surface of squid's impressive set of features in this document. The rich array of configuration options offered by squid should allow you to implement just about any set of controls and restrictions you wish to have on your network.

Fortunately, there is a lot of information available on squid, starting with the generous comments in squid.conf and the documentation that comes with the package (look in /usr/share/doc/squid). Of course, the Internet is also full of helpful blogs and documentation.

 

 


Did you find an error on this page or do you have a comment?

Services
Sponsors