Trace: Proxy Server

Proxy Server

Proxy Server

A proxy server is a server that performs actions for another server, much like the concept of a proxy in business where it means one who manages another's affairs or a document that empowers another to transact one's affairs.

There are two main reasons to use a proxy server on a network: security and caching.

Filtering Proxy Server for security

Proxy servers are often used to limit either the internal or external machines that are permitted to access certain (internal or external) resources. For example, in large enterprises a proxy server is often the only server permitted to access the outside Internet, so only those machines who have access to the proxy have any access to the outside world. Similarly, a proxy may be designated to limit access to only certain portions of the Internet using either a black list (that rejects certain sites) or a white list (that accepts certain sites).

For my small business as well as family use, I have found an outward-facing caching proxy server to be of great value. For years I ran a black-listing server that blocked advertising and objectionable content, but I found this required so much maintenance that I switched to a white-listing configuration instead. Although large enterprises generally would use the black-list approach, within a small or controlled group (such as an elementary school) a white-listing proxy server is a very workable solution.

You will be amazed at how much advertising disappears using a proxy server in the way I have outlined here. You can eliminate perhaps 30 percent of your bandwidth, thereby speeding up web pages while also making them easier to read because the ads will be gone. In a business with dozens or hundreds of users imagine how much bandwidth and time this would save! Imagine the distractions removed from employees' screens (do you really want them looking at all those ads?). I am so accustomed to the absence of ads that my proxy server affords me that I often am completely floored when I see how a web page looks without it, often they seem like a completely different site and waste so much more time navigating around the ads.

Caching Proxy Server for efficiency

Outward-facing caching can be used in conjunction with black-listing or white-listing, or by itself, to reduce bandwidth and increase speed of delivery for web sites. Increased speed means less time spent waiting for web sites to load – something that benefits individuals and businesses.

Inward-facing caching can be used to allow a single web server to server a larger number of users. So if you have a complex web site that is difficult to spread across many servers and the number of users exceeds the capacity of the web server or you wanted that server protected behind a tight firewall, you could put any number of proxy servers on the Internet and have your web site point to those, which would, in turn, fetch content from the actual web server (which could site in an internal network zone that is not accessible directly from the Internet).

The Best of Both Worlds

Combining a caching proxy server with filtering rules yields a device that will decrease bandwidth usage, increase browsing speed, and reject unwanted content. Sounds like Virtual Utopia, and in many ways it is, although nothing is perfect. The more limiting the filtering rules are, the more often you will need to enter exclusions in the white list, but the benefit of course is all the material you will not see that is blocked.

Squid is a caching proxy server that has very flexible access control lists. By creating a set of rules that are evaluated in a specific sequence, you can control what is and is not permitted through your network. Once you put this proxy server solution in-place you will be astonished at the amount of traffic that is blocked while still allowing web pages to function normally. Occasionally you will come across a web site that actually relies upon contacting an advertising server in order to function, but most often the sites will work as usual sans the advertising.

Installing Squid

The proxy server software I have been using for over ten years is Squid. I first used it purely as a cache to increase the speed of delivery of web content to subscribers of my ISP and later developed a black-list for it so that I could offer a content-filtered service.

Squid can also be used to cache internal web servers, but I have not used it this way so will not be covering that here.

Server hardware is so inexpensive these days that I would recommend using a machine with at least 4GB of RAM and at least three modest hard disk drives (three 80GB drives would be fine, or use five 500GB drives if you really want to maximise cache efficiency). I don't find squid to be very CPU intensive at all, so the least expensive CPU you can get should be plenty fast. I have been very happy with the AMD low-power dual-core and quad-core processors. A server with a quad-core AMD 600e processor, 8GB of RAM, six WD Green hard disk drives, and a power supply with active power-factor correction consumes under 100 watts of electricity yet has enough power to provide service to hundreds of local (i.e., Ethernet-connected) machines.

Gentoo

emerge -va squid

CentOS / RHEL / Oracle Linux

sudo yum install squid

Arch / Antergos

sudo pacman -Syu squid

From source

  1. Obtain the source code from http://www.squid-cache.org/Download/
  2. As your unprivileged user unpack the source code, e.g.:
    tar -xzf squid-3.4.6.tar.gz
  3. Create an unprivileged user to run squid, e.g.:
    sudo groupadd -g 333 squid\\ useradd -u 333 -g 333 squid
  4. Configure and build it, e.g.:
    cd squid-3.4.6
    ./configure --exec-prefix=/usr --sysconfdir=/etc/squid --with-logdir=/var/log/squid --with-pidfile=/run --with-swapdir=/var/cache/squid --with-default-user=squid
    make all
    sudo make install
  5. Modify the configuration file. You may refer to the guidelines below.
    vim /etc/squid/squid.conf
  6. Create the cache directories:
    squid -z

Configuring Squid for Outward-facing Caching

Once you install squid (using emerge squid in Gentoo or yum install squid in RHEL/CentOS) you will find its configuration files in /etc/squid. The file is nicely commented, you hardly need any other documentation.

If you want to get going quickly, open /etc/squid/squid.conf in your favourite editor and use these tips to guide you in setting the options therein.

acl SSL_ports

You may want to change this parameter to include ports that some services use for SSL. If you use Plesk you will want to add port 8443. If you use CPanel you will want to add ports 2082-2087.

Example:

acl SSL_ports port 443 4433 563 2082-2087 8443

http_port

It is a good idea to use a non-standard port if your server is reachable from the outside world. You may also wish to use authentication, source address limits, and so forth, so that the proxy does not end up being abused by people all over the world.

http_port 98765

cache_mem

A dedicated proxy server with gigabytes of RAM can have this value set high, perhaps a quarter of total RAM, whereas a machine that is also performing other tasks probably should have the number set lower. You can experiment to see what works best for you.

My proxy server does a lot of other things so I have this number set to a modest 64 MB, although the current recommended default is “256 MB”. In the past the default was “8 MB”, but then most servers did not even have a gigabyte of RAM ten years ago.

cache_mem 64 MB

maximum object size in memory

The default is 512kB but I have mine set at 16MB since I have a small number of users.

maximum_object_size_in_memory 16 MB

cache_dir

Stick to ufs except on large and busy servers.

Squid has a good algorithm for spreading access across multiple drives, so if you have several drives and are using software RAID, as I do, it is best to configure one partition on each drive for squid and do not configure it with mdadm. Instead just let squid manage all the drives.

For example, I have six 1.9GB partitions and these configuration lines:

cache_dir ufs /var/cache/squid-1 1550 8 32
cache_dir ufs /var/cache/squid-2 1550 8 32
cache_dir ufs /var/cache/squid-3 1550 8 32
cache_dir ufs /var/cache/squid-4 1550 8 32
cache_dir ufs /var/cache/squid-5 1550 8 32
cache_dir ufs /var/cache/squid-6 1550 8 32

I found that when I set the cache to the full size of the disk I might start downloading a huge file and this would cause squid to crash since I have the maximum file size set high (500MB) and squid it doesn't erase old content until the new content is retrieved.

cache_swap

There is a high and low mark for the cache. The defaults are 90% and 95%, however, if your cache is large these number's don't make sense. In my case, with an 11GB cache one percent is about 110MB. If you have a 100GB cache then one percent is a gigabyte. So unless your cache is quite small these numbers should be both increased.

Example:

cache_swap_low 98
cache_swap_high 99

cache_store_log

I've never found this log to be of any use, so as the comments recommend, I have it disabled.

Example:

cache_store_log none

debug_options

Someday you will end up using this. When you can't figure out why a certain site is blocked (or not blocked) you will need to use this options.

Here is the line I always leave in my configuration file so I can remove the hash-mark (aka, pound sign) to activate it whenever I need to:

#debug_options ALL,1 28,3

“28” is for ACLs, so this setting puts everything at level one except for ACLs, which are set at level 3. Then you can view the /var/log/squid/cache.log file to see exactly why a site was permitted or blocked. Once you have it figured out, just put the hash-mark back at the front of this line and issue the squid -k reconf command.

Configuring Squid for Content Filtering

In addition to the preceding tips, you can add some additional configuration lines to your squid.conf file to allow it to act as a black-list or white-list filtering proxy. Whichever method you choose, you will have a combination of the two because there will always be sites that are exceptions to any given set of rules.

I called this project Webilant on the ispltd.com web site. If you are looking to details on how to install Webilant yourself follow the instructions above plus the ones that follow.

Define permitted hosts

Who can use your proxy server? Everyone on your LAN? Everyone on the Internet? No, you don't want everyone on the Internet to use it. Many people have scanners running that will find an open proxy in a hurry, publish it on a list, and before you know it your bandwidth will be soaked up by a thirsty sponge of users intent on questionable or illegal uses.

So the first thing you do is define the permitted users by setting ACLs (access control lists). Let's say your LAN is using 10.1.1.0/24 and 192.168.2.0/24, plus you have family, friends, or associates who will also use your proxy from 12.1.2.3. We will call the permitted users AllowedHosts in the following example:

acl AllowedHosts src 10.1.1.1/24 192.168.2.0/24
acl AllowedHosts src 12.1.2.3

[ black-list and white-list rules will go in here in the next step ]

http_access allow AllowedHosts
http_access deny all

Black-listing

For an example of black-list configuration and filter files, click here.

White-listing

For an example of white-list configuration and filter files, click here.

Black Lists

Here are the download links for my black lists. You can use as many or as few as you like, just enable or disable the ACL definitions in your squid.conf file appropriately.

Download ACL files from our FTP server