Upcoming Events

HDI Service Management 2010 Conference & Expo
October 6-8, Miami

IT service and technical support professionals gather at the annual HDI Service Management Conference & Expo to explore some of the hottest topics affecting IT service management. The half-day conference workshops provide the processes, frameworks, templates, and tools to help you meet the service demands of your business..

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up



Chapter 5: Deploying Web and FTP Servers

May 22, 2000

Brought to you by:




Table of contents:

Got a tough Linux deployment question?
Ask the experts!

For a limited time, you can put the authors of "Deploying Web and FTP Servers" to the test. Post your question, and if they answer it, you'll receive a free Network Computing collectable. Click here for more info.

Technologies for Effective Sites

Undoubtedly the biggest cost in deploying any web site is the design and maintenance. If you have used ASP in the past, you will be aware of the focus on reducing maintenance costs. If every time you want to make a content or design change, you need to edit every page on the server by hand, maintenance becomes prohibitively expensive and error-prone. Server Side Includes (SSI) allow text which is common to every page, to be specified in one file. The other pages can then include it before the document is sent to the client, using an SSI command. Furthermore, if the site is to be able to present more than purely static content, it needs some way of adapting the content it serves to the actions of the client. It needs the intelligence provided by programming languages, such as the ability to perform calculations, handle information and provide feedback to the user.

 

CGI (Common Gateway Interface) allows scripts to run on the server. The main language used is Perl which is a scripting language providing excellent text processing power as well as standard programming tools. These scripts can handle user input and process it, store it and return customized pages. Apache implements standard script support, and can also provide super-fast script support using mod_perl.

 

Alternatives to CGI for providing fast server-side processing come with Java servlets and JavaServer Pages. Java servlets are complete programs which run on your server, providing a complete portable development environment for your applications. Apache implements full servlet support with the help of ApacheJServ, which we will install later on in this section.

 

Java Server Pages are from the same family as servlets, but instead of being separate programs, they are HTML files with Java code inserted in-line, and executed before the document leaves the server, combining incredible programming power with the simplicity of in-line code. Support for JSP is provided by project Jakarta, which we will briefly discuss later on.

Server-Side Includes

The real trampolining.net web site contains over 100 separate HTML pages, and that may be small compared to the web sites you plan to deploy. Nearly every page follows exactly the same format in terms of design and layout, including a copyright statement at the end of every page. At the end of the year, all 100 pages will need the copyright statement updated to read, for example, © 1999, 2000.

 

To attempt to update all these manually would be tedious and error-prone. Instead, Server-Side Includes (SSI) are used to include one HTML file within the others. Any commonly repeated text could be inserted using SSI:

 

<!--#include virtual="stylesheet.shtml" -->   Includes a standard stylesheet

<!--#include virtual="navbar.shtml" -->       Starts the page table and include

                                              a standard navigation bar

 

Main content here   

 

<!--#include virtual="copyright.shtml" -->    Closes table, adds copyright statement

 

The include commands insert the contents of the named files at that point. The named file is relative to the directory of the main file; subdirectories can be accessed (e.g.
<!--#include virtual="subdir1/included.html"  -->) as can files in parent directories (e.g. <!--#include virtual="../fromparent.html" -->) and the included files can themselves contain SSI commands if they end in .shtml. These inclusions are all performed before the document leaves the server ó the client will only ever see a normal HTML page. At the end of the year, I will only need to change copyright.shtml, for all the pages on my site to be updated ó a huge saving in maintenance time.

 

SSI includes other useful commands: CGI scripts can be called using the
<!--#exec cgi="/cgi-bin/script.pl" --> command, with the output written directly into the page sent to the client. This prevents the client knowing the script even exists, so is a useful security aid.

 

You can insert text which automatically updates using the
<!--#echo var="LAST_MODIFIED" --> command, which allows an extended set of standard variables to be inserted automatically each time the page is called, to show the date for example. Listings of the available commands are available online in the Apache mod_include documentation.

 

Apache provides excellent support for SSI with just a few commands. Because SSI increases server load, it is traditional to suffix any file containing SSI with .shtml. Setting up Apache to parse *.shtml means it won't waste time attempting to parse normal HTML (*.html) files. This part of the configuration takes place in the primary server section of httpd.conf, outside of any <Directory> containers. In fact, the directives are already there ó about three quarters of the way through the file and just need uncommenting. These two lines tell Apache what content type .shtml should be allocated, and tells the internal SSI handler to parse .shtml files before serving them to the client.

 

AddType text/html .shtml

AddHandler server-parsed .shtml

 

Adding index.shtml to this directive allows .shtml files to be served as directory indexes by preference; or in other words if index.shtml exists in the root directory of my server, it will be served to someone requesting http://www.trampolining.net).

 

DirectoryIndex index.shtml index.html index.htm

 

It is then necessary to turn on SSI support in every directory container in which you wish to use it. If SSI is not working on a virtual host, check this command is present in that virtual host's directory container:

 

<Directory /home/www/trampolining.net>

Options Indexes FollowSymLinks Includes ExecCGI MultiViews

Options +Includes

AllowOverride None

Order allow,deny

Allow from all

</Directory>

 

For more information on Apache SSI look up http://www.apache.org/docs/mod/mod_include.html

Common Gateway Interface

Better known as CGI, this technology is the simplest way to deploy interactive content on your web site. Scripts are freely available to perform everything from form handling to maintaining complete discussion forums. Scripts are usually written in Perl and interpreted as they are used. However as with any program running on your server, they represent a potential security risk. It is possible to configure Apache to interpret scripts from anywhere on the system, but this means anyone with access to directories containing web pages can create potentially harmful scripts.

 

To minimize this, CGI scripts are run from a special directory, usually called cgi-bin, and have file permissions set that allow remote users to execute them, but only allowing write access to root. The first line of the Perl script must also be changed to read the location of the Perl interpreter on your system ó type which perl to find it.

 

The httpd.conf file already contains the necessary directives in the primary server section, so we just need to uncomment them and change any locations if necessary ó note the trailing slashes:

 

ScriptAlias /cgi-bin/ "/home/www/cgi/"

 

The above directive tells Apache to treat any request to /cgi-bin/ as a request for a script, and to look for that script in the server directory /home/www/cgi/. This is inherited by any virtual hosts, unless we define a different ScriptAlias in the corresponding VirtualHost container, so in this example, http://www.trampolining.net/cgi-bin/script.pl and http://www.sport-science.net/cgi-bin/script.pl will each point to /home/www/cgi/script.pl.

 

Now look at this Directory container:

 

<Directory /home/www/cgi/>

    AllowOverride None

    Options None

    Order allow,deny

    Allow from all

</Directory>

 

This sets the permissions for your CGI directory to the absolute minimum necessary to run scripts. No-one will actually be able to read the scripts as any request will instead run them. These minimum permissions will also make life more difficult for hackers trying to access your scripts.

mod_perl

The mod_perl program allows Perl scripts to be run very fast by a dedicated Perl interpreter within Apache, which will not need starting separately for each request. Perl scripts are reported to run between two and twenty times faster than mod_cgi, depending on the script itself. However the increased speed of script processing comes at a price.

 

The mod_perl module is a complex module that is complicated to install and configure, and the actual steps needed depend on the versions of mod_perl and Apache being used; it also has three user modes, and thirty configuration options during build. Therefore, detailed installation instructions are beyond the scope of this book, though you can get more help from the INSTALL text file that comes with the mod_perl download or from the Apache web site (www.apache.org). Furthermore, the installation of mod_perl will break your existing Apache configuration. It has to be installed first and Apache reinstalled on top, which mean that you will have to customize Apache again from scratch. You have to decide, right from the onset, whether to include mod_perl in your server system, as it is currently very difficult to incorporate it later on.

 

The discussion on mod_perl has been left until now, because its benefits would only become apparent under conditions of very heavy server usage. For moderate or low usage, then CGI is only marginally slower and there is little advantage in having the increased script processing power that mod_perl offers. The mod_perl program is an advanced application that should be considered for use, only if very high server usage is anticipated.

An Example Installation

 Below is a very standard installation procedure for mod_perl. Download the module from www.apache.org and uncompress it to /usr/local. Then carry out the following steps.

The installation steps reproduced below are highly simplified and can only be said to work on most systems. You should look up the Apache documentation for more detailed instructions.

# cd /usr/local/mod_perl

# perl Makefile.PL APACHE_SRC=../apache_version/src\

> DO_HTTPD=1 USE_DSO=1 USE_APACI=1 EVERYTHING=1

# make && make test && make install

# cd ../apache_x.x.x

# make install

 

After the installation is complete, and mod_perl and Apache are working as they should, then Apache will need to be configured for mod_perl. This consists of adding a few directives to httpd.conf. The first tells Apache to look for /home/www/fast-perl/anyscript.cgi given a request for www.trampolining.net/fast-perl/anyscript.cgi.

 

Alias /fast-perl/ /home/www/fast-perl

 

The next lines tell Apache to allow scripts to be executed in this directory, and to execute them by passing them to mod_perl:

 

<Location /cgi-perl>

   AllowOverride None

   SetHandler perl-script PerlHandler

   Apache::PerlRun

   Options ExecCGI

   allow from all

   PerlSendHeader On

</Location>

 

mod_perl is a powerful and configurable module. Much more information on configuration is available from the Apache on-line documentation.

Java Servlets

Java is a programming language developed by Sun Microsystems. It is unique in that once compiled, Java programs will run on any machine, architecture or operating system with the help of a Java Virtual Machine (JVM). The compiled program, called a servlet, is not designed to run on any specific machine but instead on a JVM, a piece of software which provides a standard set of commands like that of a chipset. JVMs can and have been developed for nearly all the important operating systems, guaranteeing that well-written code should work on any platform without recompilation. This cross-platform portability is an important feature of the Java development environment which ensures that your development resources will never be made obsolete by new hardware ó investment will survive a change of platform.

 

Java servlets are called by the browser, but are run on the server with the results being sent to the browser. This eliminates any need to worry about the browser type as no code is sent. It is possible to implement infinitely complex algorithms using Java servlets, but if the servlet is designed to return output as pure HTML, the results will be viewable by even the simplest text based browsers.

 

Servlet support in Apache is performed using ApacheJServ, a fully featured Java servlet runtime container supporting all commands up to JSDK 2.0. While the ApacheJServ modules are not particularly big, the Java Development Kit which is required to compile servlets and provide the JVM, is a huge 45 MB in size (the zipped archive is just over 19MB in size), and using servlets will also cause a step increase in memory requirement of around 32MB due to the JVM. However, the benefits of servlet technology far outweigh the cost of set up, so read on!

 

Servlets are relatively more difficult to configure than CGI, and Java may take some getting used to ó it is a very powerful language with many similarities to C++. However, Java offers the increased security of its in-built security model which makes it much more difficult for hackers to cause damage by passing harmful system commands to the servlet. Complex tasks like chat-rooms or server-side parts of games are also ideally suited to Java, because you can create servlets which will stay alive right from their initial instantiation. While Perl is ideally suited to text processing applications, Java can be used to develop code of infinite complexity with extensions available to make multi-tier distributed applications possible. With the help of MySQL, details of which you will find at www.mysql.org, it is possible to use SQL databases. You will find a more complete discussion of these topics in the Wrox publication Professional Java Server Programming.

 

And so we come to installing ApacheJServ.

 

To run ApacheJServ requires the Java Development Kit 1.2 for glibc 2.1 from http://www.blackdown.org. 1 (Note that older Linux distributions may require the glibc 2.0 version.) The JDK is currently only available as a bzip2 archive, so you will need to install the bzip2 utility as well (http://sourceware.cygnus.com/bzip2/). You will also need the Java Servlet Development Kit (JSDK) version 2.0 from http://java.sun.com/products/Servlet. Download JServ from http://java.apache.org and extract into the /usr/local/ApacheJServ-1.0 directory and type the following commands:

      

       # mkdir /usr/local/apache/src/modules/jserv

# cd /usr/local/ApacheJServ-1.0

# ./configure --prefix=/usr/local/ApacheJServ-1.0 --with-apache-\
> install=/usr/local/apache --with-jsdk=/usr/local/JSDK2.0/lib/jsdk.jar

# make

# make install

 

ApacheJServ should now be installed and configured. Open httpd.conf for editing and add this directive to the very end of the file:

 

Include /usr/local/ApacheJServ-1.0/example/jserv.conf

 

Appending this command forces Apache to read jserv.conf from its installed location. Future versions of ApacheJServ may instead install this file in the same directory as httpd.conf. The jserv.conf file contains all the commands to configure the Apache side of ApacheJServ.

 

Restart Apache, give it a moment to two for JServ to begin accepting requests, and if everything works, visiting http://localhost/example/Hello should produce a success page!

 

 

If this does not work, then your version of ApacheJServ configures /servlet as the test zone, which means that you would have to type http://localhost/servlet/Hello.

Java Server Pages

While Java servlets offer boundless possibilities for powerful server-side processing, for simple applications they can be quite unwieldy. Perhaps you want to insert the time and date at one point on your page, and perform a calculation at another; using JavaScript or a Java Applet prevents older browsers viewing your page correctly. You could use a single servlet to create the whole page. However, the page content itself is now mixed up within Java code, making maintenance difficult ó particularly if the programmers and web designers are different groups of people. Alternatively, you could keep the page content in an HTML file which uses Server-Side Includes to call successive CGI scripts to insert the correct text at each point. This way the web designers can maintain the HTML without worrying about the code. However, this simple page now has one HTML file and several CGI scripts associated with it, which again makes maintenance complicated.

 

For simple applications, the ideal solution would be to have the Java code and HTML contained in a single file. It will have the look and 'feel' of HTML, so the web designers can understand it, but would contain additional code which would be run on the server before delivering the page back to the client. Sun's new member of the Java family, JavaServer Pages (JSP), provides this solution. Code can be inserted in line within the HTML, which is executed on the server and the results merged with the HTML in the output. This parallels how Microsoft's ASP works, and JSP is emerging as the open source challenger to ASP in this field.

 

The file which leaves the server is pure HTML, so unlike JavaScript and Java Applets, which have to be run on the client, you can have the interactivity and programming flexibility of Java while ensuring that all existing HTML browsers can display the output. Furthermore, you maintain all the advantages of Java's portability should you later decide to change operating system or web server. There are already many web sites that use JSP instead of ASP.

 

Up until recently, the main open source JSP implementations were GNU Server Pages (GSP) and GNU Java Server Pages (GNUJSP), which are independent development efforts despite their similar names. Both are written as regular Java servlets, and although they are difficult to install and configure, they can be used to create JSPs and develop web sites. Information on GSP and GNUJSP can be found at www.bitmechanic.com and www.klomp.org/gnujsp respectively.

 

However, JSP support in Apache now is in the form of a module called Jakarta, named after the project team which implemented it (or the largest city on the Indonesian island of Java which might or might not be a coincidence). At time of going to press, Jakarta is in final pre-release form, so by the time you read this Jakarta will almost certainly be in production release. The latest version of Jakarta and its installation instructions are available online at http://jakarta.apache.org.

 

Logs and Analysis

To develop a web site effectively, you will need to regularly analyze the web site's log files, which contain data on everyone who accesses the site. From it you can determine, the number of requests made, the identities (IP addresses) of the clients and the pattern of hyperlinks that are followed across the web site. While small scale information can be gained by manually viewing the log files, this technique is not appropriate for finding large-scale trends. Each request for a page creates 60 bytes or so of data that is added to the log file ó more if images are requested along with the pages, which is usually the case. Multiplying this number by, say, 200 daily page requests means that roughly 50-60 kilobytes of data added to the log each day. Therefore, manual viewing is in reality restricted to small samples of the logs.

 

To automatically analyze the complete logs, we will be using Analog, a small yet powerful program which is configurable, scalable and free. It is currently the most popular log file analysis program on the web (a 25% market share according to a GVU report at http://www.gvu.gatech.edu). It will be configured to produce separate reports for each virtual host, and update them each morning, and the reports will only be read by authorized people.

Manual Logfile Analysis

While manual analysis will not be suitable for viewing overall trends, it allows you to interpret the logs with human intelligence. For example, if you notice lots of visitors are requesting one page then leaving, you may want to investigate ways of encouraging them to stay on your site. Do you provide links to other relevant pages? Are they arriving directly into a frame and being trapped with no links out? Are your pages so large, or your connection so slow, they are giving up waiting and leaving the site?

 

You will have chosen where to place your logs when editing httpd.conf. Simply open one in an editor and concentrate on a small section. Below is an extract from access_log on my machine (with the IP addresses replaced by dummy ones):

 

231.231.231.231 - - [02/Oct/1999:19:47:35 +0000] "GET / HTTP/1.1" 200 9621 "-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:41 +0000] "GET /trampnetmini.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:58 +0000] "GET /trampnetmini.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:58 +0000] "GET /coach.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:59 +0000] "GET /news.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:59 +0000] "GET /improve.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:59 +0000] "GET /merger.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

231.231.231.231 - - [02/Oct/1999:19:47:59 +0000] "GET /chat.gif HTTP/1.1" 304 - "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

132.132.132.132 - - [03/Oct/1999:16:30:45 +0000] "POST /cgi-bin/poll.pl?voted HTTP/1.1" 302 291 "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

132.132.132.132 - - [03/Oct/1999:16:30:46 +0000] "GET / HTTP/1.1" 200 10137 "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

132.132.132.132 - - [03/Oct/1999:16:30:47 +0000] "GET /trampnetmini.gif HTTP/1.1" 200 6971 "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

132.132.132.132 - - [03/Oct/1999:16:30:47 +0000] "GET /improve.gif HTTP/1.1" 200 4727 "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

132.132.132.132 - - [03/Oct/1999:16:30:49 +0000] "GET /merger.gif HTTP/1.1" 200 4526 "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)"

 

The first number in each line is the IP address of the client. By following an IP address through the log, you can find the path an individual visitor took through your site. (Office networks and ISPs such as AOL employing proxies represent around 25% of web traffic, and can cause a single user to appear to come from multiple IP addresses, or allow users to receive some pages without them appearing in your logs. This technique remains accurate the remainder of the time, and is normally accurate even during access via a proxy server, assuming there are not multiple caches. However there is as yet no way round this growing problem).

 

There follows the date and time, followed by the requested filename and the version of HTTP in double quotes. A single slash (/) here represents a directory request, which usually returns index.html. The number immediately following the request is the HTTP success code which is either 200 or 304 as shown above. Any unsuccessful requests, i.e. producing) 403 (Access forbidden) or 404 (File not found) codes go into the error_log file.

 

The next field is the referrer, which in all of the above log entries is http://www.trampolining.net/. The identity of the referrer depends on what file is being logged at the time. In the case of images, the referrer is simply the page that contains the image, but in the case of pages, it is the page the browser was previously viewing ó this gives a good idea where your visitors are coming from. The final pieces of information are the browser and the version of operating system.

 

As you can see, each page can generate many lines of log so to make this kind of following easier, we can cut out some of the unwanted information. To follow the path of just one client, type:

 

$ grep 231.231.231.231 /usr/local/apache/logs/trampolining_access_log | more

 

This will display only log entries created by the client with IP address 231.231.231.231.

 

There are many log file entries corresponding to images, which are often of little interest. To view only page entries, type:

 

$ grep 'html HTTP' /usr/local/apache/logs/trampolining_access_log | more

 

You can even view page requests from a single client:

 

$ grep 'html HTTP' /usr/local/apache/logs/trampolining_access_log | grep 231.231.231.231

 

A final technique allows you to watch current requests in real time. This command is:

 

$ tail -f /usr/local/apache/logs/trampolining_access_log

 

You can make this easier to read by removing the image requests and displaying only page requests:

 

$ tail -f /usr/local/apache/logs/trampolining_access_log | grep 'html HTTP'

Automatic Analysis

This is the vehicle by which we will obtain an overview of our system's usage. Installation of analog is quite simple:

 

q     Download Analog from http://www.statslab.cam.ac.uk/~sret1/analog/ to /usr/local/analog/.

q     Change to the /usr/local/analog directory.

q     Open the analhead.h file for editing and change ANALOGDIR to /usr/local/analog/.

q     Type make.

 

We also need to prepare a directory for the reports and populate it with the necessary images:

 

q     Type mkdir /home/www/trampolining.net/analog.

q     Copy /usr/local/analog/images/* to /home/www/trampolining.net/analog.

 

That's it ‑ Analog is ready for use.

 

Analog is set up using configuration files; the default is analog.cfg which we will edit now, and later on we will create an additional configuration file for each virtual host.

 

LOGFORMAT specifies the format of log used. Analog natively supports the Apache formats COMBINED and COMMON. LOGFILE tells Analog where to look for the access log.

 

LOGFORMAT COMBINED

LOGFILE /usr/local/apache/logs/access_log

 

HOSTNAME specifies the name to put at the top of the report.

 

HOSTNAME "www.trampolining.net"

 

Remember we told Apache not to resolve IP addresses? This little section tells Analog to resolve them, but is much more efficient because addresses are only resolved once, and then written to the cache file specified in DNSFILE. DNSGOODHOURS is the number of hours to trust an entry in the cache file, DNSBADHOURS is the number of hours to wait before attempting to resolve a bad IP address again. DNS WRITE tells Analog to try to resolve unknown IP addresses, then write them to the dnsfile.txt file. The alternative command DNS READ would tell Analog to skip IP addresses which didn't exist in the dnsfile.txt file, thus saving time. On the first run, Analog will complain about dnsfile.txt not existing ó ignore it, Analog will create it.

 

DNSFILE /usr/local/analog/dnsfile.txt

DNSGOODHOURS 1250

DNSBADHOURS 350

DNS WRITE

 

This directive tells Analog where to create the report.

 

OUTFILE /home/www/trampolining.net/analog/trampolining_net_report.html

 

HOSTEXCLUDE directives tell Analog to ignore accesses from a certain IP address or hostname. This allows you to report what your visitors do, without being influenced by your own visits! In this example I exclude all page accesses from Cambridge University using Cambridge's IP allocation, and exclude all accesses from York University using the resolved hostnames.

 

HOSTEXCLUDE 131.111.*.*

HOSTEXCLUDE *.york.ac.uk

 

If your web site contains pages with other extensions than .htm or .html, for example JSPs or .shtml, you will need to add them here to include them in the page counts, otherwise Analog will assume them to be images.

 

PAGEINCLUDE *.htm,*.html,*.shtml

 

Save your completed file as analog.cfg then type ./analog (or ./analog +g/other-config-file.cfg if you have an additional config file). If all goes well, you should get a report like this:

 

 

You will need to create a configuration file for each virtual host, and save it with a different filename, e.g. trampolining_net.cfg. Finally we are ready to schedule Analog to run each morning. We do this using a cronjob, a Linux feature that allows tasks to be run at regular times. We will need to create a separate task for each report, and run them at different times to prevent multiple simultaneous Analog processes clashing.

 

The set up of cronjobs requires you to use the vi editor, which is explained in Appendix B. This is what you type at the vi command prompt;

 

55 0 * * * /usr/local/analog/analog

55 1 * * * /usr/local/analog/analog +g/trampolining_net.cfg

 

The cronjob is now set up to run Analog at 0:55 a.m. each morning, which will write to the default configuration file, analog.cfg), and again at 1.55 a.m. to run Analog with the configuration file trampolining_net.cfg.

 

Our final task is to protect the reports from unwelcome visitors. To do this, we will create a directory container for the /home/www/trampolining.net/analog/ directory in the primary server section of httpd.conf:

 

<Directory "/home/www/trampolining.net">

  Options Indexes FollowSymLinks

  Options +Includes

  AllowOverride None

  Order allow,deny

  Allow from all

</Directory>

 

<Directory "/home/www/trampolining.net/analog">

  Order allow,deny

  Allow from 123.123.123.123

</Directory>

 

This will deny the contents of www.trampolining.net/analog to anyone except the owner of IP address 123.123.123.123.

 

 

Deploying an FTP Server

While FTP servers are less prevalent in the current web browser driven Internet, they are still the primary method of distributing very large files and maintaining large stores of files. This section will demonstrate how to deploy an effective anonymous FTP server which modern web browsers will be able to access directly. As we said right at the beginning of the chapter, we will be installing the server developed by Washington University, WU-FTP, which can be downloaded from http://www.wu-ftpd.org. For more information on WU-FTP, look up http://www.landfield.com/wu-ftpd/.

Installing WU-FTP

To install WU_FTP, you will need to carry out the following procedure:

 

1.    Download WU-FTP and extract to /usr/local

2.    Type ./build CC=gcc lnx ó Note that to build the ftpd daemon, you might have to install the byacc utility first, which contains the yacc parser.

3.    Type ./build install

4.       We need tell Linux to use WU-FTP for FTP requests by editing /etc/inetd.conf. Look for a line beginning ftp, and make sure it is uncommented. Then edit it to look like this:

ftp stream tcp nowait root /usr/local/wu-ftpd ftpd -laio

5.    Type ps -uax | grep inetd, which will produce a listing of system processes with the word inetd in the title. You should get output like this:

root       354  0.0  0.4  1252  528 ?        S    Oct21   0:00 inetd
root     19048  0.0  0.3  1152  440 pts/2    S    19:31   0:00 grep inetd

The first of the two is of importance to us (the second merely being the search just carried out). What the listing does is provide us with the process ID (PID) which is 528 in the above case.

6.    Restart inetd by typing kill -HUP PID, where PID is the process ID listed from step 5.

 

The latest download of WU-FTP comes with a configure script. It can be installed, from the wu-ftp-version directory, using the ./configure, make, make install sequence of commands as in the other installations in this chapter.

 

There we have it! The Washington University ó File Transfer Protocol daemon is installed and ready for action! We can check the installation by typing ftp www.trampolining.net, or whatever your hostname/IP address is. You should be presented with a login screen, and you will be able to log in using a standard Linux user account and password set up on your system.

 

connected to www.trampolining.net.

220 www.trampolining.net FTP server (Version wu-2.6.0(1) Fri Nov 12 11:43:54 GMT 1999) ready.

Name (www.trampolining.net:none):

Configuring WU-FTP

To provide access to the general public we need to allow anonymous access. Before doing this, we need to create a safe directory for anonymous users, which will appear to them as the root of the FTP server. This prevents anonymous users browsing around your machine to obtain private information! We also need to create an user account for anonymous FTP users to use.

Creating an FTP directory

We will create our FTP directory in /home/ and adopt a traditional directory structure:

 

mkdir /home/ftp

mkdir /home/ftp/bin

mkdir /home/ftp/etc

mkdir /home/ftp/pub

 

The first, /home/ftp, will be the root directory of our anonymous FTP server. /home/ftp/bin will contain links to commands we want to allow FTP users to use, in particular ls (to list the contents of a directory) and cd (to change directory). /home/ftp/etc is present to hold a password file if necessary and /home/ftp/pub/ is the public directory which contains the files we are making available.

 

All directories and files within this structure should be owned by root, and none of them should have Group or All write permissions. This will prevent the user editing any of the files ó by editing the contents of /home/ftp/bin/, a user could execute any code on your machine. All the directories should have All read and execute permissions, to allow users to enter the directory (execute permission) and read the contents (read permission). Finally, all the files contained should have All and Group read permissions only ó this will allow users to download files, but not change or execute them on your server.

 

You may require the creating of yet another directory, as follows:

 

mkdir /home/ftp/incoming

 

This directory is special in that it is available for users to upload files to. For this reason, it must have Group and All write permissions and but not Group and All read permissions which will prevent users viewing the contents of this directory. While this is the standard way to implement two-way FTP access, it does pose a security risk ó users could potentially upload illegal files and use your server to store them. It is a serious policy decision whether or not to provide this service ó if you do, be sure to set a umask to prevent uploaded scripts being executed. A slightly more secure system involves removing All write permissions from this directory too, then creating subdirectories with full read, write and execute permissions ó these can then be accessed by 'trusted users'. Anyone you have not told the location of these folders to should be unable to find them, since /home/ftp/incoming cannot be listed ó there are no read permissions for All.

 

To summarize, this is how I suggest that you set the access permissions for your FTP site:

 

drwxr-xr-x root   root  bin/

drwxr-xr-x root   root  etc/

drwx--x--x root   root  incoming/

drwxrwxrwx root   root  incoming/secret

drwxr-xr-x root   root  pub/

-rwxr--r-- root   root  pub/any.file

drwxr-xr-x root   root  etc/

Configuring Linux for WU-FTP

The most important change is to modify the main Linux /etc/passwd file to ensure the anonymous FTP user is limited to /home/ftp/pub. Open the file for editing, you should see a listing like this:

 

ftp:x:14:50:FTP User:/home/ftp:

nobody:x:99:99:Nobody:/:

gdm:x:42:42::/home/gdm:/bin/bash

xfs:x:100:233:X Font Server:/etc/X11/fs:/bin/false

username:x:500:500::/home/username:/bin/bash

 

If no FTP user exists, use the root command adduser to add ftp. The important line begins with ftp, which contains the user settings for FTP User. Note there is no entry after the final colon. This ensures no command shell is made available to the FTP User. To force /home/ftp/ to be treated as root directory, we edit this line slightly, adding a decimal point where we want the user to be rooted. The final /pub ensures they are initially placed in that directory:

 

ftp:x:14:50:FTP User:/home/ftp/./pub:

nobody:x:99:99:Nobody:/:

gdm:x:42:42::/home/gdm:/bin/bash

xfs:x:100:233:X Font Server:/etc/X11/fs:/bin/false

username:x:500:500::/home/username:/bin/bash

 

Finally, we need to create a set of configuration files for WU-FTP in /etc. Luckily there is no need to create them by hand, as WU-FTP distribute a default set with the program, which will prove fine for our anonymous server. We will copy these default files to /etc:

 

# cd /usr/local/wu-ftpd

# cp ftpaccess ftpusers ftpconversions ftpgroups ftphosts ftpusers /etc

 

We can implement an extra security touch. In /home/ftp/ type:

 

# touch .rhosts .forward

# chown root .rhosts .forward

# chmod 400 .rhosts .forward

 

There are some final modifications which are not strictly necessary but make anonymous access that little bit easier. Hard linking /home/ftp/bin/ls to point to /bin/ls will allow clients to list the directory through FTP. Make sure that the owner is root and it has group, owner and all execute permissions only. Copying /etc/passwd and /etc/netconfig into /home/ftp/etc/ will provide the replace the user and group IDs for each file and folder with their corresponding names. However these files contain far too much sensitive information and need editing. Only groups and users owning files within the FTP directory should be left in, and password information should be left out ó there should just be an x after the user name, not a random string of characters. Anonymous access should now be available.

Making your Servers Persistent

In the event that your Linux machine crashes or loses power, the priority is to get the machine serving requests as quickly as possible. This can be eased greatly if the system has been designed to recover from a crash ‑ if the services start themselves on boot up, it can save a great deal of time trying to remember what needs to be started!

 

There are two main services that need special attention in order to enable autostart. First, we need to make sure the network is ready for requests. The ifconfig utility must be configured for each virtual host. We can make this automatic by editing /etc/rc.d/rc.local, or the equivalent file for your Linux distribution. At the end of this file we append all the commands we used originally when we set up the virtual hosts:

 

# setting up IP masquerading for virtual hosts

echo "setting up IP masquerading for virtual hosts"

ifconfig eth0:0 123.123.123.123

route add -host 123.123.123.123

 

The other service we need to start is the Apache web server. Again we will start this by appending the setup command to a boot file. Editing the boot script of the machine is a simple way to do this. You could create a startup script in init.d (called apache) and link to it from S20apache in rc2.d. A sample file follows:

 

#!/bin/bash

#(@) A startup and shutdown script for Apache

 

case "$1" in

       start)

            # Starts Apache Server

            echo -n "Starting Apache Web Server"

            /usr/local/apache/bin/apachectl start

            ;;

       stop)

            # Stops Apache Server

            echo -n "Stopping Apache Web Server"

            /usr/local/apache/bin/apachectl stop

            ;;

       restart)

            # Restarts Apache gracefully

            echo -n "Restarting Apache after serving current web requests"

            /usr/local/apache/bin/apachectl graceful

            ;;

       *)

            # Incorrect parameter

            echo "Usage: $0 start | stop | restart"

            exit 1

esac

exit 0

 

You can create the symbolic link by changing directory to rc<n>.d (where <n> is your runlevel - usually 3, but you might also want to create one in rc5.d if you use a graphical login.) Create the link by entering ln -s /etc/rc.d/init.d/apache /etc/rc.d/rc3.d/S20apache.

Summary

In this chapter you have learned how to install the highly popular Apache web server and configure it to meet your requirements and set up virtual hosts. You were shown how to install and configure the ApacheJServ servlet as well as how to modify your Apache configuration to make use of SSI and CGI. Other newer powerful technologies such as mod_perl and JSP were also briefly discussed. You were also instructed in the setting up and configuration of one of the main open source FTP applications WU-FTP.

 

In addition to setting up the servers, this chapter also covered an important administrative task, namely the analysis of the server logs files, with some discussion on manual analysis using command line tools and automatic analysis using the free Analog tool. Finally you learnt some tips on server persistence ó by making minor alterations to system files you can restart Apache on reboot and have it ready to receive requests.

 

For a discussion on the advanced configuration of Apache, and for other information on Apache itself, ApacheJServ and JSP, see Professional Apache.

References

Web

The Apache home page:

http://www.apache.org/

 

Security bulletins for Internet services:

http://www.cert.org

 

Java Servlets Page:

http://java.apache.org

 

WU-FTP's web site:

http://www.wu-ftpd.org

 

More information on WU-FTP:

http://www.landfield.com/wu-ftpd/

 

Analog logfile analyzer site:

http://www.statslab.cam.ac.uk/~sret1/analog/

 

Jakarta Development site:

http://jakarta.apache.org

HOWTOs

Details on how to set up web servers and clients:

WWW-HOWTO

 

How to set up a multi-purpose web server:

Apache SSL PHP/FI frontpage mini-HOWTO

Books

Peter Wainwright, Professional Apache, Wrox Press, ISBN 1861003021

Danny Ayers et al, Professional Java Server Programming, ISBN 1861002777

©1998 Wrox Press Limited, US and UK..

 

Best of the Web

Data deduplication: Declawing the clones

Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.

Quick Read

Compression, Encryption, Deduplication, and Replication: Strange Bedfellows

One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.

Quick Read

WAN Optimization Whitelists and Blacklists

Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.

Quick Read

WAN Optimization as a Managed Service: It's Not About the Cost

This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.

Quick Read

Premium Content

Don't Stop At VoIP
June 2010

Network Computing June 2010


Salary

Video