home news blogs forums events research newsletter whitepapers careers


Network Computing Network Computing Network Computing
HOT PICKS

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers





Chapter 5: Deploying Web and FTP Servers

May 22, 2000

Brought to you by:




Table of contents:

Got a tough Linux deployment question?
Ask the experts!

For a limited time, you can put the authors of "Deploying Web and FTP Servers" to the test. Post your question, and if they answer it, you'll receive a free Network Computing collectable. Click here for more info.

The success of the Internet lies in its ability to provide information, quickly and cheaply to anyone at any time, and to facilitate fast communication on a global scale. As a result, computers have now been installed in homes and in workplaces that are able to connect to the Internet and draw from an almost unlimited supply of information. Indeed, every business that wants to succeed well into the twenty-first century needs to make use of and contribute to this technology. At the center of this communication network are the web servers that supply information back to the client. However, web servers should supply not just static information, that is unchangeable text and graphics on a standard HTML page, but are able to respond to the needs of the user. This requires a web server to resolve the request of the user and respond accordingly. Technologies now exist that enable the web server itself to fetch and process data before sending it to the client.

 

Telnet was originally used to run programs on remote computers, e-mail was used to communicate and FTP or Gopher was used to transfer large files. Telnet and e-mail are still used, and FTP has become the preferred way to make files available across the Internet, but the real revolution in Internet use over the past few years has been the introduction of a new method of searching and viewing information, the World Wide Web (WWW).

 

The WWW was originally developed in the European Center for Particle Research (CERN). CERN developed a program to serve information in HTML format across HTTP -- a web server, and another program to receive and view it -- a web browser. These programs were subsequently developed by the National Center for Supercomputing Applications at the University of Illinois and renamed NCSA and Mosaic respectively. Mosaic went on to become the commercial Netscape Navigator product, which from version 5 onwards has been decommercialized and is now open source again. On the other hand, the NCSA server has been open source all along. The server has been improved by successive patches, and has been renamed Apache. In over ten years of dedicated development, Apache has been extended to implement numerous technologies and is considered to be the most stable web server available, when run on its native Unix or Linux operating systems.

 

Apache's scalability, zero cost and its customizability, make it the most popular web servers available -- running over 4.3 million web sites, over half of the WWW, as of October 1999 (figures from Netcraft).

 

The development of FTP servers has not followed the same pattern as that of web servers. However, the ability to resume broken downloads, by starting the download process part way through a file, is a major recent advance. There are numerous FTP servers available for Linux, but the most fully featured and well-tested server is the one developed at Washington University. WU-FTP provides a complete FTP server solution, as well as having a large user base which ensures continued development, and updates, where necessary.

 

As I have said at the very beginning of this section, web servers now require the ability to respond to the user by server-side processing. The CGI (Common Gateway Interface) allows you to execute scripts such as Perl on your server, providing particularly powerful text handling capabilities. More recently, the Java servlet has providing the ability to execute more complex routines on the server. Either technology allows you to provide different content and perform different actions depending on the user's actions. JavaServer Pages (JSP) are the latest addition to the Java family, which do the same task as Microsoft's Active Server Pages (ASP) -- processing user requests using server-side code and returning the information to the user as plain HTML.

 

Apache has excellent implementations of each of these technologies with the release of the mod_cgi/mod_perl modules (for fast CGI script execution) and ApacheJServ (a servlet), with JSP support through Jakarta becoming available at the time of writing. There is another advantage to these technologies; if you decide to change operating system or web server, you will be able to transfer these scripts with very little editing -- the same cannot be said of ASP!

 

This chapter will not only demonstrate how to install both the Apache web server and the WU-FTP server, but will show you how to configure the installations to meet your requirements, including providing basic security to your servers. It will also show you how to perform essential administration tasks, such as analysing log data and will suggest potential options to replace proprietary technologies, such as ASP, and will cover technologies such as ApacheJServ, CGI and SSI. Finally you will be given tips on how to get the server working as quickly as possible following a crash.

Deploying the Apache Web Server

In this section you will learn how to set up Apache on a Linux machine. You will be shown the system requirements for setting up the server and the modifications you will need to make to the Linux operation system configuration files to prepare it for installation. In addition to stepping through the installation process itself, you will find out how to configure Apache to meet your own requirements. Then, you will learn how to add a virtual host to the server, review some technologies to provide interactivity to your web site and how to provide useful reports on web site usage by examining the contents of access log files.

System Requirements

The following list gives the requirements that need to be met when setting upg an effective production web server. For a test web server, it is possible to install to any machine with as little as 30 MB of free disk space, any processor speed, and a static IP address if you want to test it on line.

Permanent Internet connection -- as an experienced system administrator, you will probably already know exactly the requirements for your site. If not, a good rule of thumb is to allow a minimum of 10 kilobytes per second per simultaneous user. So if you expect a maximum of twelve users on your site at any one time, a 128K leased line would be the minimum requirement. Obviously the content of your pages will make a big difference, and experience is the best teacher as far as choosing a connection is concerned. If your site is purely for intranet use, this will not concern you.

URL and IP address -- You will need to purchase at least one domain name and IP address. Entering your primary IP address and host name when originally installing Linux is the quickest way to gaining a bulletproof basic network configuration -- if you didn't set them during the installation, they can be edited later. You will need a further IP address and hostname for each additional virtual host you wish to run -- although these hostnames can be subdomains of a single registered domain name (e.g. apache.wrox.co.uk is a subdomain of wrox.co.uk). An FTP server and a web server may share a host name but they will use a different port.

DNS server -- You will need access to a DNS server to allow Internet users to resolve your domain names. For many companies, this access will be provided through a corporate account with a major network provider -- although Linux is itself capable of running fully featured name servers if you will make sufficient use of it to justify the maintenance effort. Adding entries will be as simple as contacting that provider.

Linux machine -- The ideal requirements here are plenty of memory and a fast, ultra-wide SCSI hard disk, since these are what will see the most work -- IDE hard disks may prove to be a bottleneck if server demand increases. The entire software (including the operating system) will unlikely take more than 1GB, so a typical 10GB hard disk available today will be more than ideal and allow plenty of space for growth. 64MB of RAM should prove enough for up to 10 000 hits per day; if you find memory swapping is taking place, it is a simple matter to add more RAM (and you should do this because swapping causes serious performance degradation). If using server-side Java technologies, add at least another 32MB for the Java Virtual Machine (JVM). Where significant server-side processing is used, the speed of the processor is also important -- the faster the better. If connected to the Internet, the Linux machine should not be used for any non-Internet purposes. If you are using the server as part of an intranet, then the installation of a firewall will be necessary; this way the effect of any security breach will be minimized.

Preparing Linux for Installation

The installation should always be performed in a clean environment -- this involves formatting the hard disk and installing Linux from scratch. Make sure your version of Linux is up-to-date -- stack and TCP/IP bugs are occasionally found in the Linux kernel, and an up-to-date distribution will keep you one step ahead of the hackers. The latest security bulletins are available from CERT and it is useful to subscribe to their security mailing lists. This chapter assumes you are using Red Hat Linux 6.1, but the steps are very similar for all Linux distributions.

 

It is crucial that only essential services are running; Linux is capable of running everything from ping to fully featured name servers, and by default, it will. When not fully configured, these represent a security loophole -- so it is best to unselect all the obvious communication services during the Linux installation (anything with ftp, mail or web in the title). Make sure that make (an install utility) and gcc (the C compiler) are selected -- we will use them later.

 

Once the install is completed, edit the /etc/inetd.conf file, or equivalent, which contains a listing of all the network services started on the machine. Insert a # symbol at the start of the line containing the name of the services which are not needed; particular services to disable include finger, cfinger and portmap which provide useful information for hackers, and if you don't require it, telnet. Telnet allows easy remote management, but if you don't intend to use it, removing it closes one more possible security loophole. Finally, after a reboot, we can type netstat --a | more for a listing of remaining services:

 

tcp00 *:printer*:*LISTEN
tcp00 *:linuxconf*:*LISTEN
tcp00 *:auth*:*LISTEN
tcp00 *:login*:*LISTEN

 

The first 20 or so lines contain the useful listing of services -- the lines to look for have the word LISTEN at the end, signifying that they are ready to accept connections. As long as you don't see portmap, finger, sendmail or ftpd (unless deliberately installed and configured), we have a relatively safe environment to continue with.

Installing Apache Web Server

Traditionally, Apache required recompiling every time you wanted to add a new feature, or ýmodule', as all modules were compiled into the Apache executable. More recently, the Apache group have incorporated support for DSO linking, which, like DLLs in Windows, allow new modules to be added later without recompiling the whole program. We will install Apache using DSO linking, which will make installation of ApacheJServ and other modules easier. Static linking is very slightly faster in operation than DSO linking, and some older distributions of Linux will not allow DSO. If you have to do a static installation, just leave out the --enable-module=most and --enable-shared=max parameters in step 4.

 

1.       If you are using Red Hat 6.0, you first need to correct an error in the distribution, by making a C header file available from its correct location. Type:

# ln -s /usr/include/db1/ndbm.h /usr/include/ndbm.h

2.    Download the latest version of Apache (www.apache.org) to /usr/local/src, unzip and extract it:

# tar -xvfz apache_x_x_x.tar.gz

3.    Enter the created directory:

# cd /usr/local/src/apache_x_x_x

 

The next three steps configure, build and install Apache. The first parameters supplied to the configure script specifies the path for the apache installation.

 

4.    # ./configure --prefix=/usr/local/apache --enable-module=most --enable-shared=max

5.    # make

6.    # make install

 

That's it ‑ Apache is installed! We will cover the configuration fully later in the chapter, so for now we will just configure enough to test it:

 

1.    Open /usr/local/apache/conf/httpd.conf for editing. This contains every configuration command for the entire server, so it may look quite intimidating. However, don't be put off as it is simpler than it looks!

2.    Search for the ServerName directive (about a third of the way through). Replace it with ServerName http://localhost -- we will use a real network identity later, but using localhost for now provides us a simple check of our Apache installation. Save and close httpd.conf.

3.    Start the apache server with the command /usr/local/apache/bin/apachectl start.

4.    If Netscape, or some other web browser, is installed, type http://localhost into the location bar. If not, type lynx localhost at the command line. Either way, you should be presented with a congratulations page:

 

Getting to Know the Web Server

Your working web server installation should be self-contained within /usr/local/apache. If you installed Apache differently (e.g. using a Red Hat RPM), and find your configuration changes aren't having any effect, try searching for stray files: type find / -name httpd.conf -- print. Some proprietary distributions place the active httpd.conf in other folders.

 

Inside /usr/local/apache will be a set of further directories, the ones we will use are:

 

      bincontains all the program executables

      cgi-binwhich is the default location for CGI files

      confcontains all the Apache configuration files

      htdocs ‑ the default root directory for your web site. A sample index.html file is already in this directory to produce the 'It Worked' page you saw in the last section

      logscontains all the server logs by default. We will deal with these in more depth later.

 

We will change the locations of the default web site root directory and CGI location later when we cover Apache configuration, and develop a consistent placement of all web site content within the /home directory where it is well separated from the application. Separating content from application is a useful technique if your fellow system administrators make the occasional mistake when updating web site content.

 

To start Apache, type: /usr/local/apache/bin/apachectl start.

 

To stop Apache, type: /usr/local/apache/bin/apachectl stop.

 

The above two commands can be performed together. To restart Apache, type: /usr/local/apache/bin/apachectl restart. The alternative command /usr/local/apache/bin/apachectl graceful does the same but finishes serving any current requests first.

Configuring Your Web Server

The greatest asset of Apache is its flexibility in configuration -- you will never be limited to the settings the original developers thought you would want! Everything can be configured to a per-directory level, or even a per-file level if necessary. All Apache configuration is performed using one file: httpd.conf. There are many configuration commands, but all follow a similar format.

 

Apache will work fine with its default settings, as long as the ServerName directive is set, so you can begin with the original settings and gradually move toward your requirements changing a few settings at a time, restarting Apache each time to see the results. Apache only reads the httpd.conf file when starting. In fact, the chances are most of the default settings will never need changing, and that you will use a few useful commands repeatedly. We will cover all these essential commands.

 

An important general rule is to avoid using hostnames (e.g. www.trampolining.net) unless the command requires them. Hostnames in the configuration file will work most of the time, so long as you have told Linux where to find a DNS server. However, if you have filtered any content using hostnames, Apache will need to perform a DNS check against every client IP address, increasing server load. On the other hand, if you filter against IP addresses, Apache already has the information in order to function without forcing an unwanted DNS check. Also, if your DNS server is down when Apache is started, any configuration directives containing hostnames won't be parsed, and parts of the server will not be started. This can cause intermittent problems when the DNS server is down or contains bad data.

 

The effect of a command depends on where it is placed. The first section of httpd.conf contains global environment directives, which affect the entire server  and all virtual hosts running on it. The next section configures the main, or primary server. Settings here also provide the default settings for all virtual hosts. The third and final section configures the virtual hosts themselves.

 

Within section two or three you might want to apply settings to a single directory only. This is achieved using a directory container; the settings are placed within a pair of HTML-like tags, which define which directory to apply the settings to. (We will meet other containers when configuring virtual hosts.) Note there is no trailing slash on the directory.

 

<Directory /any/directory>

  Settings here

</Directory>

Section 1: Global Environment

The first directive that you will come across is this one:

 

ServerType standalone

 

ServerType may be either standalone or inetd. For all but the lowest-use servers, use standalone as Apache will be permanently ready and waiting for any requests itself. The inetd option caters for users who wish to start Apache when requests are received on a specified port; this introduces start-up delays which will only be acceptable if the server rarely functions as a web server.

 

The following directives tell Apache to maintain a pool of between 5 and 10 spare server processes, ready for new requests:

 

MinSpareServers 5

MaxSpareServers 10

 

Adaptive spawning implemented in Apache versions 1.3 and upwards means there should be no reason to change these except on very high load servers. However, be wary of over-trusting benchmarking utilities, as these generate a step change in request volume over a few seconds, which do not occur in reality. The directive below prevents more than 150 clients connecting simultaneously, to prevent the server locking during periods of high usage:

 

MaxClients 150

 

If you find that clients are being refused a connection during periods of high usage, try increasing this number. If you find the server is locking up or becoming very slow during periods of high usage, you may consider lowering this number as a temporary measure to keep the server running until you can provide higher capacity. More information on optimizing for high loads can be found in Professional Apache by Peter Wainwright, published by Wrox Press (ISBN 1861003021).

Section 2: Primary Server Configuration

To reduce the chance of malicious damage to your system, we give Apache processes the minimum possible security privileges on your machine:

 

User nobody

Group nobody

 

Any CGI scripts run by Apache will inherit these settings. CGI scripts will be discussed later in the chapter.

 

The following e-mail address will be suffixed to any error messages Apache sends to the client. Setting it to your address ensures visitors have a way to inform you of problems with your site. You may of course feel this is not a good thing!

 

ServerAdmin richard@trampolining.net

 

Obviously you would put your address in here, not mine.

 

The ServerName directive tells Apache the hostname of the primary host. It is essential, and proves a common cause of headaches if it is set wrongly:

 

ServerName www.trampolining.net

 

Suppose a client requests http://www.trampolining.net/news, where news is a directory. This is not a valid HTTP request as the trailing slash is missing. Apache will ask the browser to visit the correct URL, http://www.trampolining.net/news/ using ServerName to reconstruct the URL. If ServerName is not correctly set, the redirect URL returned will be invalid (e.g. http://not-set/news/) and a 404 'File not found' error, or a DNS resolution failure will be returned. If you do not yet have a valid hostname, you can use the machine IP address here instead.

 

This directive specifies where to look for the contents of the web site; this directory will appear as the root directory of your web site, and is where all the HTML files will be placed.

 

DocumentRoot "/home/www/trampolining.net/"

 

The following container contains directives that apply to the entire system:

 

<Directory />

  Options FollowSymLinks

  AllowOverride None

</Directory>

 

These permissions prevent anyone browsing around private system files. Because of this default denial of access, we will need to explicitly allow access to any directories we intend to use using another directory container. Subdirectories inherit the same Apache permissions as their parent unless refined by later directory containers. The following tag specifies the beginning of the main directory container.

 

<Directory /home/www/trampolining.net>

 

This is used to allow access to the main server root directory. Later directory containers may refine the permissions we have set here for the whole of /home/www/trampolining.net.

 

The Options directive tells Apache what it is allowed to serve to the user.

 

Options Indexes FollowSymLinks Includes ExecCGI

 

An index allows Apache to produce a listing of most of the files in a directory if there is no index.html or other file specified in DirectoryIndex. If Indexes is not included here, a 'forbidden' HTTP response will be returned. FollowSymLinks will allow users to follow any symbolic links you create to other directories.

 

Includes and ExecCGI are the remaining valid operators, and will be covered in more detail in the section called Technologies for Effective Sites later in this chapter. Briefly, Includes tells Apache to allow Server Side Includes (SSI) in this directory to be parsed (IncludesNoExec is the same except it will not honor SSI exec commands). In the same way, ExecCGI allows CGI files in this directory to be executed, which is not the most secure way to provide CGI support, but the most convenient.

 

Now take a look at this directive:

 

AllowOverride None

 

If AllowOverride is set to All, then wherever a file called .htaccess exists in a directory, the httpd.conf settings for that directory will be overridden by the settings in that file. This allows you to keep httpd.conf to a reasonable length by defining per-directory settings in individual .htaccess files. However, this is a poor method of configuration as making changes can become a chore -- any number of .htaccess files may require editing, even for a simple update. Setting AllowOverride to All will also mean that every time a request is received for a directory, whether or not an .htaccess file is present, Apache will search for one, thereby increasing server load.

 

A better alternative, if you are concerned about the length of httpd.conf, is to group extra settings in another file, (e.g. /usr/local/apache/conf/football-websites.conf), and then force Apache to read this by placing 'Include /usr/local/apache/conf/football-websites.conf' at the end of httpd.conf. We will this principle later in this chapter when we configure ApacheJServ.

 

This section defines who is allowed to view your web site. Setting 'Allow from all' will allow anyone to visit your web site -- there is more about restricting access in the Logs and Analysis section.

 

Order allow,deny

Allow from all

 

This is the end of the directory container -- we have finished setting the default permissions!

 

</Directory>

 

When a client requests a directory, Apache first checks to see whether there is a DirectoryIndex directive defined there which it can serve instead. So when requesting http://www.trampolining.net, the actual file returned is http://www.trampolining.net/index.html. Apache will check for each of the named files in order; in this example we have chosen to serve index.html in preference to index.htm. If neither index.html nor index.htm is present, either a 500 'forbidden' response, or a directory index will be returned, depending whether Indexes is set in the Options command as above. If you use SSI or JSPs, you might want to add index.shtml or index.jsp to this list.

 

DirectoryIndex index.html index.htm

 

This is an example of a Files container; it will apply to any matching file, anywhere on this host or any of the virtual hosts. It prevents clients viewing any .htaccess configuration file which provides useful information to a hacker.

 

AccessFileName .htaccess

<Files .htaccess>

   Order allow,deny

   Deny from all

</Files>

 

The next directive of note is:

 

HostnameLookups Off

 

When set to On, Apache will attempt to resolve the IP address of every client before writing to the logs, so that the logs contains 'machine.isp.net' instead of '123.123.123.123'. It heavily increases server load, and load upon your Internet connection. In the Logs and Analysis section we will configure Analog to do this much more efficiently.

 

This directive tells Apache where to log errors:

 

ErrorLog /usr/local/apache/logs/error_log

 

This log collects most error messages, including CGI errors and errors occurring on virtual hosts (unless your virtual host has its own ErrorLog command). A useful trick when