Configuring Your Web
Server
The greatest asset of Apache is its flexibility in configuration -- you
will never be limited to the settings the original developers thought you would
want! Everything can be configured to a per-directory level, or even a per-file
level if necessary. All Apache configuration is performed using one file: httpd.conf. There are many
configuration commands, but all follow a similar format.
Apache
will work fine with its default settings, as long as the ServerName directive is set, so you
can begin with the original settings and gradually move toward your
requirements changing a few settings at a time, restarting Apache each time to
see the results. Apache only reads the httpd.conf file when starting. In
fact, the chances are most of the default settings will never need changing,
and that you will use a few useful commands repeatedly. We will cover all these
essential commands.
An
important general rule is to avoid using hostnames (e.g. www.trampolining.net) unless the command
requires them. Hostnames in the configuration file will work most of the time,
so long as you have told Linux where to find a DNS server. However, if you have
filtered any content using hostnames, Apache will need to perform a DNS check
against every client IP address, increasing server load. On the other hand, if
you filter against IP addresses, Apache already has the information in order to
function without forcing an unwanted DNS check. Also, if your DNS server is
down when Apache is started, any configuration directives containing hostnames
won't be parsed, and parts of the server will not be started. This can cause
intermittent problems when the DNS server is down or contains bad data.
The
effect of a command depends on where it is placed. The first section of httpd.conf contains global environment
directives, which affect the entire server and all virtual hosts
running on it. The next section configures the main, or primary server.
Settings here also provide the default settings for all virtual hosts. The
third and final section configures the virtual hosts themselves.
Within section two or three you might want to apply
settings to a single directory only. This is achieved using a directory
container; the settings are placed within a pair of HTML-like tags, which
define which directory to apply the settings to. (We will meet other containers
when configuring virtual hosts.) Note there is no trailing slash on the
directory.
<Directory /any/directory>
Settings here
</Directory>
Section 1: Global Environment
The first directive that you will come across is this one:
ServerType may be either standalone or inetd. For all but the lowest-use
servers, use standalone
as Apache will be permanently ready and waiting for any requests itself. The inetd option caters for users who wish to start Apache
when requests are received on a specified port; this introduces start-up delays
which will only be acceptable if the server rarely functions as a web server.
The
following directives tell Apache to maintain a pool of between 5 and 10 spare
server processes, ready for new requests:
MinSpareServers 5
MaxSpareServers 10
Adaptive
spawning implemented in Apache versions 1.3 and upwards means there should be
no reason to change these except on very high load servers. However, be wary of
over-trusting benchmarking utilities, as these generate a step change in
request volume over a few seconds, which do not occur in reality. The directive
below prevents more than 150 clients connecting simultaneously, to prevent the
server locking during periods of high usage:
If
you find that clients are being refused a connection during periods of high
usage, try increasing this number. If you find the server is locking up or
becoming very slow during periods of high usage, you may consider lowering this
number as a temporary measure to keep the server running until you can provide
higher capacity. More information on optimizing for high loads can be found in Professional Apache by Peter Wainwright,
published by Wrox Press (ISBN 1861003021).
Section 2: Primary Server Configuration
To reduce the chance of malicious damage to your system, we give Apache
processes the minimum possible security privileges on your machine:
Any
CGI scripts run by Apache will inherit these settings. CGI scripts will be
discussed later in the chapter.
The
following e-mail address will be suffixed to any error messages Apache sends to
the client. Setting it to your address ensures visitors have a way to inform
you of problems with your site. You may of course feel this is not a good
thing!
ServerAdmin richard@trampolining.net
Obviously
you would put your address in here, not mine.
The ServerName
directive tells Apache the hostname of the primary host. It is essential, and
proves a common cause of headaches if it is set wrongly:
ServerName www.trampolining.net
Suppose
a client requests http://www.trampolining.net/news, where news
is a directory. This is not a valid HTTP request as the trailing slash is
missing. Apache will ask the browser to visit the correct URL, http://www.trampolining.net/news/ using ServerName to reconstruct the URL. If ServerName is not correctly set, the
redirect URL returned will be invalid (e.g. http://not-set/news/) and a 404 'File not found'
error, or a DNS resolution failure will be returned. If you do not yet have a
valid hostname, you can use the machine IP address here instead.
This
directive specifies where to look for the contents of the web site; this
directory will appear as the root directory of your web site, and is where all
the HTML files will be placed.
DocumentRoot
"/home/www/trampolining.net/"
The
following container contains directives that apply to the entire system:
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
These permissions prevent anyone browsing around private
system files. Because of this default denial of access, we will need to
explicitly allow access to any directories we intend to use using another
directory container. Subdirectories inherit the same Apache permissions as
their parent unless refined by later directory containers. The following tag
specifies the beginning of the main directory container.
<Directory
/home/www/trampolining.net>
This
is used to allow access to the main server root directory. Later directory
containers may refine the permissions we have set here for the whole of /home/www/trampolining.net.
The
Options
directive tells Apache what it is allowed to serve to the user.
Options Indexes FollowSymLinks
Includes ExecCGI
An
index allows Apache to produce a listing of most of the files in a directory if
there is no index.html
or other file specified in DirectoryIndex. If Indexes is not included
here, a 'forbidden' HTTP response will be returned. FollowSymLinks will allow users to follow
any symbolic links you create to other directories.
Includes and ExecCGI are the remaining valid operators, and will
be covered in more detail in the section called Technologies for Effective Sites later in this chapter. Briefly, Includes
tells Apache to allow Server Side Includes (SSI) in this directory to be parsed
(IncludesNoExec is the same except it will not honor SSI exec
commands). In the same way, ExecCGI allows CGI files in this
directory to be executed, which is not the most secure way to provide CGI
support, but the most convenient.
Now
take a look at this directive:
If AllowOverride is set to All,
then wherever a file called .htaccess exists in a directory, the httpd.conf settings for that directory
will be overridden by the settings in that file. This allows you to keep httpd.conf to a reasonable length by defining per-directory
settings in individual .htaccess files. However, this is a poor method of
configuration as making changes can become a chore -- any number of .htaccess files may require editing, even for a simple
update. Setting AllowOverride to All will also mean that every time a request is received for a directory,
whether or not an .htaccess file is present, Apache will search for one,
thereby increasing server load.
A
better alternative, if you are concerned about the length of httpd.conf, is to group extra settings
in another file, (e.g. /usr/local/apache/conf/football-websites.conf), and then force Apache to
read this by placing 'Include /usr/local/apache/conf/football-websites.conf' at the end of httpd.conf. We will this principle later in this chapter when
we configure ApacheJServ.
This
section defines who is allowed to view your web site. Setting 'Allow from all'
will allow anyone to visit your web site -- there is more about restricting
access in the Logs and Analysis
section.
Order allow,deny
Allow from all
This is the end of the directory container -- we have
finished setting the default permissions!
When
a client requests a directory, Apache first checks to see whether there is a DirectoryIndex directive defined there which it can serve
instead. So when requesting http://www.trampolining.net,
the actual file returned is http://www.trampolining.net/index.html.
Apache will check for each of the named files in order; in this example we have
chosen to serve index.html in preference to index.htm. If neither index.html nor index.htm is present, either a 500 'forbidden' response, or a
directory index will be returned, depending whether Indexes is set in the Options command as above. If you use
SSI or JSPs, you might want to add index.shtml or index.jsp to this list.
DirectoryIndex index.html index.htm
This
is an example of a Files container; it will apply to
any matching file, anywhere on this host or any of the virtual hosts. It
prevents clients viewing any .htaccess configuration file which provides useful
information to a hacker.
AccessFileName .htaccess
<Files .htaccess>
Order allow,deny
Deny from all
</Files>
The
next directive of note is:
When
set to On,
Apache will attempt to resolve the IP address of every client before writing to
the logs, so that the logs contains 'machine.isp.net' instead of
'123.123.123.123'. It heavily increases server load, and load upon your
Internet connection. In the Logs and Analysis
section we will configure Analog to do this much more efficiently.
This
directive tells Apache where to log errors:
ErrorLog
/usr/local/apache/logs/error_log
This
log collects most error messages, including CGI errors and errors occurring on
virtual hosts (unless your virtual host has its own ErrorLog command). A useful trick when troubleshooting is to type tail -f /usr/local/apache/logs/error_log, which will show the end of
the log in real time. Then browse around the site, seeing exactly what is
causing errors and when the errors occurred.
The
following directive tells Apache where to log accesses.
CustomLog
/usr/local/apache/logs/access_log common
It
takes an added parameter that tells it which of the predefined log formats to
use. This may be common,
the NCSA standard log, combined,
the most useful log format (which provides nearly all information in one log), referer (which records only referrer details) or agent (which records only user agent details). It
is even possible to create several different logs concurrently using separate CustomLog commands, and much more -- see Professional Apache for more details.
ErrorDocument commands allow you to
return a pre-selected page in place of standard Apache error pages. This
preserves the corporate image of a site and minimizes any unprofessional
impression which would inevitably be created in this situation. ErrorDocuments can be created for any or all HTTP error
responses, but the important ones to cover are 404 ('File not found') and 500
('Server error', usually caused by script or servlet failure).
ErrorDocument
404 /missing.html
Adding
Virtual Hosts
Virtual hosts allow you to
provide another web site from the same server. To the user, the virtual host
looks and feels identical to how it would if it was the primary host. For
example, in addition
to providing www.trampolining.net
on my server, I might
want to run a completely different web site called www.sport-science.net.
Since HTTP 1.1, virtual hosts can be accessed by each one
listening to a different IP address (the traditional approach), or to one IP
address, with an HTTP header telling the server which virtual host to serve.
While the second approach is well supported by Apache, the small number of
browsers still in use which are not compliant with HTTP 1.1 forces us to adopt
the traditional approach.
Linux allows up to 256 IP-based virtual hosts per network
card, although the number of file descriptors available will probably limit us
to a mere 200 or so. There are two parts to adding a virtual host. First, you
must set up the network configuration to force Linux to 'listen' to the other
IP addresses. Second, you will also need to configure Apache to listen.
Network
Configuration
The primary host listens to the machine's IP address as specified
during Linux setup (or subsequently through Linuxconf, for example). We will
need another IP address for each virtual host, and also need to register the
URL and IP address with a Domain Name Server.
By
default, network connections only listen to information sent to them. After
all, why waste time listening to information meant for other machines? To
provide virtual hosts, we need to listen to requests sent to our virtual hosts'
IP addresses. Luckily, Linux provides direct support for this IP masquerading.
(IP masquerading is one reason why network transmissions, like e-mail and
telnet, are so easy to intercept.)
Typing
ifconfig will produce a listing of network services, along
with technical information about each of them. We are interested in the device eth0, which is the first to be listed (eth0 represents your first network card). If there are
masquerades already defined, these will be listed underneath as eth0:0, eth0:1, and so on. If you have multiple network
cards, you may reference them as eth1, eth1:1 and so on.
eth0 Link encap:Ethernet HWaddr 00:50:04:86:89:61
inet
addr:123.123.123.122
Bcast:123.255.255.95 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500
Metric:1
RX packets:184263 errors:4 dropped:0
overruns:0 frame:5
TX packets:104402 errors:0 dropped:0
overruns:0 carrier:0
collisions:762 txqueuelen:100
Interrupt:11 Base address:0x1000
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3924
Metric:1
RX packets:7858 errors:0 dropped:0
overruns:0 frame:0
TX packets:7858 errors:0 dropped:0
overruns:0 carrier:0
collisions:0 txqueuelen:0
We
will set up a virtual host for www.trampolining.net with an IP address of
123.123.123.123. Type: ifconfig eth0:0 123.123.123.123, where 0 is the first available
masquerade (set to 0 here since none have been
defined yet) and 123.123.123.123 is the IP address of your
virtual host. Next we need to set up routing: type route add -host 123.123.123.123 dev eth0:0. If all goes well, typing ifconfig should now produce this:
eth0 Link encap:Ethernet HWaddr 00:50:04:86:89:61
inet
addr:123.123.123.122
Bcast:123.255.255.95
Mask:255.255.255.0
UP BROADCAST RUNNING
MULTICAST MTU:1500 Metric:1
RX packets:184263
errors:4 dropped:0 overruns:0 frame:5
TX packets:104402
errors:0 dropped:0 overruns:0 carrier:0
collisions:762
txqueuelen:100
Interrupt:11 Base
address:0x1000
eth0:0 Link encap:Ethernet HWaddr 00:50:04:86:89:61
inet addr:123.123.123.123 Bcast:123.123.123.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500
Metric:1
Interrupt:11 Base address:0x1000
lo Link encap:Local
Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:3924
Metric:1
RX packets:7858 errors:0
dropped:0 overruns:0 frame:0
TX packets:7858 errors:0
dropped:0 overruns:0 carrier:0
collisions:0
txqueuelen:0
That's it -- our network configuration is ready for
virtual hosting.
Apache
Configuration
Apache configuration consists of adding a few lines to httpd.conf and restarting the server:
<VirtualHost 123.123.123.123>
Just
as <Directory> containers contained
directives applying to that directory, <VirtualHost> containers contain all the
directives related to that virtual host. Any directives not included take as
default the settings assigned in the primary server section. A notable
exception is the directive Options +Includes (explained in the later section entitled Server-Side Includes), which must be
placed in the Directory container of each virtual
host where it is used -- it is not inherited.
These
directives are required, and have the same effect as when used before for the
primary host.
DocumentRoot
/home/www/sport-science.net
ServerName www.sport-science.net
These
two directives create logs specifically for this virtual host. If no logging
command is given, log messages will be redirected to the primary host logs.
CustomLog
/usr/local/apache/logs/sports_science_access_log combined
ErrorLog /usr/local/apache/logs/sports_science_error_log
This
Directory container gives the client permission to access the
DocumentRoot. Without it, a forbidden HTTP response is
returned.
<Directory
"/home/www/sport-science.net">
allow from All
</Directory>
This
tag ends the VirtualHost container. Restart the
server, and assuming your DNS entry has had 12-24 hours or so to propagate
across the web, your virtual host should be up and working.
end part one...
©1998 Wrox Press Limited, US and UK..