May 22, 2000
Brought to you by:
Table of contents:
Got a tough Linux deployment question? Ask the experts!For a limited time, you can put the authors of "Deploying Web and FTP Servers" to the test. Post your question, and if they answer it, you'll receive a free Network Computing collectable. Click here for more info.
|

The success of the Internet lies in its ability to provide information,
quickly and cheaply to anyone at any time, and to facilitate fast communication
on a global scale. As a result, computers have now been installed in homes and
in workplaces that are able to connect to the Internet and draw from an almost
unlimited supply of information. Indeed, every business that wants to succeed
well into the twenty-first century needs to make use of and contribute to this
technology. At the center of this communication network are the web servers that supply information
back to the client. However, web servers should supply not just static
information, that is unchangeable text and graphics on a standard HTML page,
but are able to respond to the needs of the user. This requires a web server to
resolve the request of the user and respond accordingly. Technologies now exist
that enable the web server itself to fetch and process data before sending it
to the client.
Telnet was originally used to run programs on remote computers, e-mail was used to communicate and FTP or Gopher was used to transfer large files. Telnet and e-mail are still
used, and FTP has become the preferred way to make files available across the
Internet, but the real revolution in Internet use over the past few years has
been the introduction of a new method of searching and viewing information, the
World Wide Web (WWW).
The WWW was originally developed in the European Center for Particle
Research (CERN). CERN developed a program to serve information in HTML format
across HTTP -- a web server, and another program to receive and view it -- a web
browser. These programs were subsequently developed by the National Center for
Supercomputing Applications at the University of Illinois and renamed NCSA and
Mosaic respectively. Mosaic went on to become the commercial Netscape Navigator product, which from version 5 onwards has been
decommercialized and is now open source again. On the other hand, the NCSA
server has been open source all along. The server has been improved by
successive patches, and has been renamed Apache. In over ten years of dedicated development, Apache has been
extended to implement numerous technologies and is considered to be the most
stable web server available, when run on its native Unix or Linux operating
systems.
Apache's scalability, zero cost and its customizability, make it the
most popular web servers available -- running over 4.3 million web sites, over
half of the WWW, as of October 1999 (figures from Netcraft).
The development of FTP servers has not followed the same pattern as
that of web servers. However, the ability to resume broken downloads, by
starting the download process part way through a file, is a major recent
advance. There are numerous FTP servers available for Linux, but the most fully
featured and well-tested server is the one developed at Washington University. WU-FTP provides a complete
FTP server solution, as well as having a large user base which ensures
continued development, and updates, where necessary.
As
I have said at the very beginning of this section, web servers now require the
ability to respond to the user by server-side processing. The CGI (Common Gateway Interface) allows you to execute scripts such as
Perl on your server, providing particularly powerful text handling
capabilities. More recently, the Java servlet has providing the ability to
execute more complex routines on the server. Either technology allows you to
provide different content and perform different actions depending on the user's
actions. JavaServer Pages (JSP) are the latest addition to the Java family,
which do the same task as Microsoft's Active Server Pages (ASP) -- processing user requests using server-side
code and returning the information to the user as plain HTML.
Apache has excellent
implementations of each of these technologies with the release of the mod_cgi/mod_perl modules (for fast CGI
script execution) and ApacheJServ (a servlet), with JSP support through Jakarta
becoming available at the time of writing. There is another advantage to these
technologies; if you decide to change operating system or web server, you will
be able to transfer these scripts with very little editing -- the same cannot be
said of ASP!
This chapter will not only demonstrate how to install
both the Apache web server and the WU-FTP server, but will show you how to
configure the installations to meet your requirements, including providing
basic security to your servers. It will also show you how to perform essential
administration tasks, such as analysing log data and will suggest potential
options to replace proprietary technologies, such as ASP, and will cover
technologies such as ApacheJServ, CGI and SSI. Finally you will be given tips
on how to get the server working as quickly as possible following a crash.
Deploying the Apache
Web Server
In this section you will learn how to set up Apache on a Linux machine.
You will be shown the system requirements for setting up the server and the
modifications you will need to make to the Linux operation system configuration
files to prepare it for installation. In addition to stepping through the
installation process itself, you will find out how to configure Apache to meet
your own requirements. Then, you will learn how to add a virtual host to the
server, review some technologies to provide interactivity to your web site and
how to provide useful reports on web site usage by examining the contents of
access log files.
System Requirements
The following list gives the requirements that need to be met when
setting upg an effective production web server. For a test web server, it is
possible to install to any machine with as little as 30 MB of free disk space,
any processor speed, and a static IP address if you want to test it on line.
Permanent Internet connection -- as an
experienced system administrator, you will probably already know exactly the
requirements for your site. If not, a good rule of thumb is to allow a minimum
of 10 kilobytes per second per simultaneous user. So if you expect a maximum of
twelve users on your site at any one time, a 128K leased line would be the
minimum requirement. Obviously the content of your pages will make a big
difference, and experience is the best teacher as far as choosing a connection
is concerned. If your site is purely for intranet use, this will not concern
you.
URL and IP address -- You will need to purchase
at least one domain name and IP address. Entering your primary IP address and
host name when originally installing Linux is the quickest way to gaining a
bulletproof basic network configuration -- if you didn't set them during the
installation, they can be edited later. You will need a further IP address and
hostname for each additional virtual host you wish to run -- although these hostnames can be
subdomains of a single registered domain name (e.g. apache.wrox.co.uk is a subdomain of wrox.co.uk). An FTP
server and a web server may share a host name but they will use a different
port.
DNS server -- You will need access to a
DNS server to allow Internet users to resolve your domain names. For many
companies, this access will be provided through a corporate account with a
major network provider -- although Linux is itself capable of running fully
featured name servers if you will make sufficient use of it to justify the
maintenance effort. Adding entries will be as simple as contacting that
provider.
Linux machine -- The ideal requirements here are plenty of memory and a fast,
ultra-wide SCSI hard disk, since these are what will see the most work -- IDE
hard disks may prove to be a bottleneck if server demand increases. The entire
software (including the operating system) will unlikely take more than 1GB, so
a typical 10GB hard disk available today will be more than ideal and allow
plenty of space for growth. 64MB of RAM should prove enough for up to 10 000
hits per day; if you find memory swapping is taking place, it is a simple
matter to add more RAM (and you should do this because swapping causes serious
performance degradation). If using server-side Java technologies, add at least
another 32MB for the Java Virtual Machine (JVM). Where significant server-side processing
is used, the speed of the processor is also important -- the faster the better.
If connected to the Internet, the Linux machine should not be used for any
non-Internet purposes. If you are using the server as part of an intranet, then
the installation of a firewall will be necessary; this way the effect of any
security breach will be minimized.
Preparing Linux for
Installation
The installation should always be performed in a clean environment --
this involves formatting the hard disk and installing Linux from scratch. Make
sure your version of Linux is up-to-date -- stack and TCP/IP bugs are
occasionally found in the Linux kernel, and an up-to-date distribution will
keep you one step ahead of the hackers. The latest security bulletins are available
from CERT and it is useful to
subscribe to their security mailing lists. This chapter assumes you are using
Red Hat Linux 6.1, but the steps are very similar for all Linux distributions.
It
is crucial that only essential services are running; Linux is capable of
running everything from ping to fully featured name
servers, and by default, it will. When not fully configured, these represent a
security loophole -- so it is best to unselect all the obvious communication
services during the Linux installation (anything with ftp, mail or web in the title). Make sure that make (an install utility) and gcc (the C compiler) are selected -- we will use them later.
Once
the install is completed, edit the /etc/inetd.conf file, or equivalent, which
contains a listing of all the network services started on the machine. Insert a
# symbol at the start of the line containing the name
of the services which are not needed; particular services to disable include finger,
cfinger
and portmap which provide useful information for hackers, and if you don't require it, telnet.
Telnet allows easy remote management, but if you don't intend to use it,
removing it closes one more possible security loophole. Finally, after a reboot,
we can type netstat --a | more for a listing of remaining
services:
| tcp | 0 | 0 *:printer | *:* | LISTEN |
| tcp | 0 | 0 *:linuxconf | *:* | LISTEN |
| tcp | 0 | 0 *:auth | *:* | LISTEN |
| tcp | 0 | 0 *:login | *:* | LISTEN |
The
first 20 or so lines contain the useful listing of services -- the lines to look
for have the word LISTEN at the end, signifying that
they are ready to accept connections. As long as you don't see portmap, finger, sendmail or ftpd (unless deliberately installed and configured), we
have a relatively safe environment to continue with.
Installing Apache
Web Server
Traditionally, Apache
required recompiling every time you wanted to add a new feature, or ýmodule',
as all modules were compiled into the Apache executable. More recently, the
Apache group have incorporated support for DSO linking, which, like DLLs in
Windows, allow new modules to be added later without recompiling the whole
program. We will install Apache using DSO linking, which will make installation
of ApacheJServ and other modules easier. Static linking is very slightly faster
in operation than DSO linking, and some older distributions of Linux will not
allow DSO. If you have to do a static installation, just leave out the --enable-module=most and --enable-shared=max parameters in step 4.
1. If you are using Red Hat 6.0, you first need to correct an error in
the distribution, by making a C header file available from its correct
location. Type:
# ln -s /usr/include/db1/ndbm.h
/usr/include/ndbm.h
2.
Download the latest version of
Apache (www.apache.org) to /usr/local/src,
unzip and extract it:
# tar -xvfz apache_x_x_x.tar.gz
3.
Enter the created directory:
# cd /usr/local/src/apache_x_x_x
The next three steps configure, build and install Apache.
The first parameters supplied to the configure script specifies the path for
the apache installation.
4.
# ./configure --prefix=/usr/local/apache
--enable-module=most --enable-shared=max
5.
# make
6.
# make install
That's it ‑ Apache is installed!
We will cover the configuration fully later in the chapter, so for now we will
just configure enough to test it:
1.
Open /usr/local/apache/conf/httpd.conf for editing.
This contains every configuration command
for the entire server, so it may look quite intimidating. However, don't be put
off as it is simpler than it looks!
2.
Search for the ServerName directive
(about a third of the way through). Replace it with ServerName http://localhost -- we will use a
real network identity later, but using localhost for
now provides us a simple check of our Apache installation. Save and
close httpd.conf.
3.
Start the apache server with the command /usr/local/apache/bin/apachectl start.
4.
If Netscape, or some other web browser, is
installed, type http://localhost
into the location bar. If not, type lynx localhost
at
the command line. Either way, you should be presented with a congratulations
page:

Getting to Know the
Web Server
Your
working web server installation should be self-contained within /usr/local/apache. If you installed Apache
differently (e.g. using a Red Hat RPM), and find your configuration changes
aren't having any effect, try searching for stray files: type find / -name httpd.conf -- print. Some proprietary
distributions place the active httpd.conf in other folders.
Inside
/usr/local/apache will be a set of further
directories, the ones we will use are:
bin ‑ contains all the program executables
cgi-bin ‑ which is the default location for CGI files
conf ‑ contains all the Apache configuration files
htdocs ‑
the default root directory for your web site. A sample index.html file is
already in this directory to produce the 'It Worked' page you saw in the last
section
logs ‑ contains all the server logs by default. We will deal with
these in more depth later.
We
will change the locations of the default web site root directory and CGI
location later when we cover Apache configuration, and develop a consistent
placement of all web site content within the /home directory where it is well
separated from the application. Separating content from application is a useful
technique if your fellow system administrators make the occasional mistake when
updating web site content.
To start Apache, type:
/usr/local/apache/bin/apachectl start.
To stop Apache, type:
/usr/local/apache/bin/apachectl stop.
The
above two commands can be performed together. To restart Apache, type: /usr/local/apache/bin/apachectl
restart. The alternative command /usr/local/apache/bin/apachectl graceful
does
the same but finishes serving any current
requests
first.
Configuring Your Web
Server
The greatest asset of Apache is its flexibility in configuration -- you
will never be limited to the settings the original developers thought you would
want! Everything can be configured to a per-directory level, or even a per-file
level if necessary. All Apache configuration is performed using one file: httpd.conf. There are many
configuration commands, but all follow a similar format.
Apache
will work fine with its default settings, as long as the ServerName directive is set, so you
can begin with the original settings and gradually move toward your
requirements changing a few settings at a time, restarting Apache each time to
see the results. Apache only reads the httpd.conf file when starting. In
fact, the chances are most of the default settings will never need changing,
and that you will use a few useful commands repeatedly. We will cover all these
essential commands.
An
important general rule is to avoid using hostnames (e.g. www.trampolining.net) unless the command
requires them. Hostnames in the configuration file will work most of the time,
so long as you have told Linux where to find a DNS server. However, if you have
filtered any content using hostnames, Apache will need to perform a DNS check
against every client IP address, increasing server load. On the other hand, if
you filter against IP addresses, Apache already has the information in order to
function without forcing an unwanted DNS check. Also, if your DNS server is
down when Apache is started, any configuration directives containing hostnames
won't be parsed, and parts of the server will not be started. This can cause
intermittent problems when the DNS server is down or contains bad data.
The
effect of a command depends on where it is placed. The first section of httpd.conf contains global environment
directives, which affect the entire server and all virtual hosts
running on it. The next section configures the main, or primary server.
Settings here also provide the default settings for all virtual hosts. The
third and final section configures the virtual hosts themselves.
Within section two or three you might want to apply
settings to a single directory only. This is achieved using a directory
container; the settings are placed within a pair of HTML-like tags, which
define which directory to apply the settings to. (We will meet other containers
when configuring virtual hosts.) Note there is no trailing slash on the
directory.
<Directory /any/directory>
Settings here
</Directory>
Section 1: Global Environment
The first directive that you will come across is this one:
ServerType may be either standalone or inetd. For all but the lowest-use
servers, use standalone
as Apache will be permanently ready and waiting for any requests itself. The inetd option caters for users who wish to start Apache
when requests are received on a specified port; this introduces start-up delays
which will only be acceptable if the server rarely functions as a web server.
The
following directives tell Apache to maintain a pool of between 5 and 10 spare
server processes, ready for new requests:
MinSpareServers 5
MaxSpareServers 10
Adaptive
spawning implemented in Apache versions 1.3 and upwards means there should be
no reason to change these except on very high load servers. However, be wary of
over-trusting benchmarking utilities, as these generate a step change in
request volume over a few seconds, which do not occur in reality. The directive
below prevents more than 150 clients connecting simultaneously, to prevent the
server locking during periods of high usage:
If
you find that clients are being refused a connection during periods of high
usage, try increasing this number. If you find the server is locking up or
becoming very slow during periods of high usage, you may consider lowering this
number as a temporary measure to keep the server running until you can provide
higher capacity. More information on optimizing for high loads can be found in Professional Apache by Peter Wainwright,
published by Wrox Press (ISBN 1861003021).
Section 2: Primary Server Configuration
To reduce the chance of malicious damage to your system, we give Apache
processes the minimum possible security privileges on your machine:
Any
CGI scripts run by Apache will inherit these settings. CGI scripts will be
discussed later in the chapter.
The
following e-mail address will be suffixed to any error messages Apache sends to
the client. Setting it to your address ensures visitors have a way to inform
you of problems with your site. You may of course feel this is not a good
thing!
ServerAdmin richard@trampolining.net
Obviously
you would put your address in here, not mine.
The ServerName
directive tells Apache the hostname of the primary host. It is essential, and
proves a common cause of headaches if it is set wrongly:
ServerName www.trampolining.net
Suppose
a client requests http://www.trampolining.net/news, where news
is a directory. This is not a valid HTTP request as the trailing slash is
missing. Apache will ask the browser to visit the correct URL, http://www.trampolining.net/news/ using ServerName to reconstruct the URL. If ServerName is not correctly set, the
redirect URL returned will be invalid (e.g. http://not-set/news/) and a 404 'File not found'
error, or a DNS resolution failure will be returned. If you do not yet have a
valid hostname, you can use the machine IP address here instead.
This
directive specifies where to look for the contents of the web site; this
directory will appear as the root directory of your web site, and is where all
the HTML files will be placed.
DocumentRoot
"/home/www/trampolining.net/"
The
following container contains directives that apply to the entire system:
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
These permissions prevent anyone browsing around private
system files. Because of this default denial of access, we will need to
explicitly allow access to any directories we intend to use using another
directory container. Subdirectories inherit the same Apache permissions as
their parent unless refined by later directory containers. The following tag
specifies the beginning of the main directory container.
<Directory
/home/www/trampolining.net>
This
is used to allow access to the main server root directory. Later directory
containers may refine the permissions we have set here for the whole of /home/www/trampolining.net.
The
Options
directive tells Apache what it is allowed to serve to the user.
Options Indexes FollowSymLinks
Includes ExecCGI
An
index allows Apache to produce a listing of most of the files in a directory if
there is no index.html
or other file specified in DirectoryIndex. If Indexes is not included
here, a 'forbidden' HTTP response will be returned. FollowSymLinks will allow users to follow
any symbolic links you create to other directories.
Includes and ExecCGI are the remaining valid operators, and will
be covered in more detail in the section called Technologies for Effective Sites later in this chapter. Briefly, Includes
tells Apache to allow Server Side Includes (SSI) in this directory to be parsed
(IncludesNoExec is the same except it will not honor SSI exec
commands). In the same way, ExecCGI allows CGI files in this
directory to be executed, which is not the most secure way to provide CGI
support, but the most convenient.
Now
take a look at this directive:
If AllowOverride is set to All,
then wherever a file called .htaccess exists in a directory, the httpd.conf settings for that directory
will be overridden by the settings in that file. This allows you to keep httpd.conf to a reasonable length by defining per-directory
settings in individual .htaccess files. However, this is a poor method of
configuration as making changes can become a chore -- any number of .htaccess files may require editing, even for a simple
update. Setting AllowOverride to All will also mean that every time a request is received for a directory,
whether or not an .htaccess file is present, Apache will search for one,
thereby increasing server load.
A
better alternative, if you are concerned about the length of httpd.conf, is to group extra settings
in another file, (e.g. /usr/local/apache/conf/football-websites.conf), and then force Apache to
read this by placing 'Include /usr/local/apache/conf/football-websites.conf' at the end of httpd.conf. We will this principle later in this chapter when
we configure ApacheJServ.
This
section defines who is allowed to view your web site. Setting 'Allow from all'
will allow anyone to visit your web site -- there is more about restricting
access in the Logs and Analysis
section.
Order allow,deny
Allow from all
This is the end of the directory container -- we have
finished setting the default permissions!
When
a client requests a directory, Apache first checks to see whether there is a DirectoryIndex directive defined there which it can serve
instead. So when requesting http://www.trampolining.net,
the actual file returned is http://www.trampolining.net/index.html.
Apache will check for each of the named files in order; in this example we have
chosen to serve index.html in preference to index.htm. If neither index.html nor index.htm is present, either a 500 'forbidden' response, or a
directory index will be returned, depending whether Indexes is set in the Options command as above. If you use
SSI or JSPs, you might want to add index.shtml or index.jsp to this list.
DirectoryIndex index.html index.htm
This
is an example of a Files container; it will apply to
any matching file, anywhere on this host or any of the virtual hosts. It
prevents clients viewing any .htaccess configuration file which provides useful
information to a hacker.
AccessFileName .htaccess
<Files .htaccess>
Order allow,deny
Deny from all
</Files>
The
next directive of note is:
When
set to On,
Apache will attempt to resolve the IP address of every client before writing to
the logs, so that the logs contains 'machine.isp.net' instead of
'123.123.123.123'. It heavily increases server load, and load upon your
Internet connection. In the Logs and Analysis
section we will configure Analog to do this much more efficiently.
This
directive tells Apache where to log errors:
ErrorLog
/usr/local/apache/logs/error_log
This
log collects most error messages, including CGI errors and errors occurring on
virtual hosts (unless your virtual host has its own ErrorLog command). A useful trick when