home news blogs forums events research newsletter whitepapers careers


UBM Network Computing
TechWeb
HOT PICKS

IMMERSE YOURSELF:

SOA

  |

Data Center

  |

802.11n

  |

Data Privacy

  |
APO  |

Virtualization

  |

NAC

  |

Security

  |

Network Mgmt

  |

Enterprise Apps

  |

Storage & Servers





Chapter 5: Deploying Web and FTP Servers (Part Two)

June 5, 2000

Brought to you by:




Table of contents:

Got a tough Linux deployment question?
Ask the experts!

For a limited time, you can put the authors of "Deploying Web and FTP Servers" to the test. Post your question, and if they answer it, you'll receive a free Network Computing collectable. Click here for more info.

Technologies for Effective Sites

Undoubtedly the biggest cost in deploying any web site is the design and maintenance. If you have used ASP in the past, you will be aware of the focus on reducing maintenance costs. If every time you want to make a content or design change, you need to edit every page on the server by hand, maintenance becomes prohibitively expensive and error-prone. Server Side Includes (SSI) allow text which is common to every page, to be specified in one file. The other pages can then include it before the document is sent to the client, using an SSI command. Furthermore, if the site is to be able to present more than purely static content, it needs some way of adapting the content it serves to the actions of the client. It needs the intelligence provided by programming languages, such as the ability to perform calculations, handle information and provide feedback to the user.

 

CGI (Common Gateway Interface) allows scripts to run on the server. The main language used is Perl which is a scripting language providing excellent text processing power as well as standard programming tools. These scripts can handle user input and process it, store it and return customized pages. Apache implements standard script support, and can also provide super-fast script support using mod_perl.

 

Alternatives to CGI for providing fast server-side processing come with Java servlets and JavaServer Pages. Java servlets are complete programs which run on your server, providing a complete portable development environment for your applications. Apache implements full servlet support with the help of ApacheJServ, which we will install later on in this section.

 

Java Server Pages are from the same family as servlets, but instead of being separate programs, they are HTML files with Java code inserted in-line, and executed before the document leaves the server, combining incredible programming power with the simplicity of in-line code. Support for JSP is provided by project Jakarta, which we will briefly discuss later on.

Server-Side Includes

The real trampolining.net web site contains over 100 separate HTML pages, and that may be small compared to the web sites you plan to deploy. Nearly every page follows exactly the same format in terms of design and layout, including a copyright statement at the end of every page. At the end of the year, all 100 pages will need the copyright statement updated to read, for example, ý 1999, 2000.

 

To attempt to update all these manually would be tedious and error-prone. Instead, Server-Side Includes (SSI) are used to include one HTML file within the others. Any commonly repeated text could be inserted using SSI:

 

<!--#include virtual="stylesheet.shtml" -->   Includes a standard stylesheet

<!--#include virtual="navbar.shtml" -->       Starts the page table and include

                                              a standard navigation bar

 

Main content here   

 

<!--#include virtual="copyright.shtml" -->    Closes table, adds copyright statement

 

The include commands insert the contents of the named files at that point. The named file is relative to the directory of the main file; subdirectories can be accessed (e.g.
<!--#include virtual="subdir1/included.html"  -->) as can files in parent directories (e.g. <!--#include virtual="../fromparent.html" -->) and the included files can themselves contain SSI commands if they end in .shtml. These inclusions are all performed before the document leaves the server ý the client will only ever see a normal HTML page. At the end of the year, I will only need to change copyright.shtml, for all the pages on my site to be updated ý a huge saving in maintenance time.

 

SSI includes other useful commands: CGI scripts can be called using the
<!--#exec cgi="/cgi-bin/script.pl" --> command, with the output written directly into the page sent to the client. This prevents the client knowing the script even exists, so is a useful security aid.

 

You can insert text which automatically updates using the
<!--#echo var="LAST_MODIFIED" --> command, which allows an extended set of standard variables to be inserted automatically each time the page is called, to show the date for example. Listings of the available commands are available online in the Apache mod_include documentation.

 

Apache provides excellent support for SSI with just a few commands. Because SSI increases server load, it is traditional to suffix any file containing SSI with .shtml. Setting up Apache to parse *.shtml means it won't waste time attempting to parse normal HTML (*.html) files. This part of the configuration takes place in the primary server section of httpd.conf, outside of any <Directory> containers. In fact, the directives are already there ý about three quarters of the way through the file and just need uncommenting. These two lines tell Apache what content type .shtml should be allocated, and tells the internal SSI handler to parse .shtml files before serving them to the client.

 

AddType text/html .shtml

AddHandler server-parsed .shtml

 

Adding index.shtml to this directive allows .shtml files to be served as directory indexes by preference; or in other words if index.shtml exists in the root directory of my server, it will be served to someone requesting http://www.trampolining.net).

 

DirectoryIndex index.shtml index.html index.htm

 

It is then necessary to turn on SSI support in every directory container in which you wish to use it. If SSI is not working on a virtual host, check this command is present in that virtual host's directory container:

 

<Directory /home/www/trampolining.net>

Options Indexes FollowSymLinks Includes ExecCGI MultiViews

Options +Includes

AllowOverride None

Order allow,deny

Allow from all

</Directory>

 

For more information on Apache SSI look up http://www.apache.org/docs/mod/mod_include.html

Common Gateway Interface

Better known as CGI, this technology is the simplest way to deploy interactive content on your web site. Scripts are freely available to perform everything from form handling to maintaining complete discussion forums. Scripts are usually written in Perl and interpreted as they are used. However as with any program running on your server, they represent a potential security risk. It is possible to configure Apache to interpret scripts from anywhere on the system, but this means anyone with access to directories containing web pages can create potentially harmful scripts.

 

To minimize this, CGI scripts are run from a special directory, usually called cgi-bin, and have file permissions set that allow remote users to execute them, but only allowing write access to root. The first line of the Perl script must also be changed to read the location of the Perl interpreter on your system ý type which perl to find it.

 

The httpd.conf file already contains the necessary directives in the primary server section, so we just need to uncomment them and change any locations if necessary ý note the trailing slashes:

 

ScriptAlias /cgi-bin/ "/home/www/cgi/"

 

The above directive tells Apache to treat any request to /cgi-bin/ as a request for a script, and to look for that script in the server directory /home/www/cgi/. This is inherited by any virtual hosts, unless we define a different ScriptAlias in the corresponding VirtualHost container, so in this example, http://www.trampolining.net/cgi-bin/script.pl and http://www.sport-science.net/cgi-bin/script.pl will each point to /home/www/cgi/script.pl.

 

Now look at this Directory container:

 

<Directory /home/www/cgi/>

    AllowOverride None

    Options None

    Order allow,deny

    Allow from all

</Directory>

 

This sets the permissions for your CGI directory to the absolute minimum necessary to run scripts. No-one will actually be able to read the scripts as any request will instead run them. These minimum permissions will also make life more difficult for hackers trying to access your scripts.

mod_perl

The mod_perl program allows Perl scripts to be run very fast by a dedicated Perl interpreter within Apache, which will not need starting separately for each request. Perl scripts are reported to run between two and twenty times faster than mod_cgi, depending on the script itself. However the increased speed of script processing comes at a price.

 

The mod_perl module is a complex module that is complicated to install and configure, and the actual steps needed depend on the versions of mod_perl and Apache being used; it also has three user modes, and thirty configuration options during build. Therefore, detailed installation instructions are beyond the scope of this book, though you can get more help from the INSTALL text file that comes with the mod_perl download or from the Apache web site (www.apache.org). Furthermore, the installation of mod_perl will break your existing Apache configuration. It has to be installed first and Apache reinstalled on top, which mean that you will have to customize Apache again from scratch. You have to decide, right from the onset, whether to include mod_perl in your server system, as it is currently very difficult to incorporate it later on.

 

The discussion on mod_perl has been left until now, because its benefits would only become apparent under conditions of very heavy server usage. For moderate or low usage, then CGI is only marginally slower and there is little advantage in having the increased script processing power that mod_perl offers. The mod_perl program is an advanced application that should be considered for use, only if very high server usage is anticipated.

An Example Installation

 Below is a very standard installation procedure for mod_perl. Download the module from www.apache.org and uncompress it to /usr/local. Then carry out the following steps.

The installation steps reproduced below are highly simplified and can only be said to work on most systems. You should look up the Apache documentation for more detailed instructions.

# cd /usr/local/mod_perl

# perl Makefile.PL APACHE_SRC=../apache_version/src\

> DO_HTTPD=1 USE_DSO=1 USE_APACI=1 EVERYTHING=1

# make && make test && make install

# cd ../apache_x.x.x

# make install

 

After the installation is complete, and mod_perl and Apache are working as they should, then Apache will need to be configured for mod_perl. This consists of adding a few directives to httpd.conf. The first tells Apache to look for /home/www/fast-perl/anyscript.cgi given a request for www.trampolining.net/fast-perl/anyscript.cgi.

 

Alias /fast-perl/ /home/www/fast-perl

 

The next lines tell Apache to allow scripts to be executed in this directory, and to execute them by passing them to mod_perl:

 

<Location /cgi-perl>

   AllowOverride None

   SetHandler perl-script PerlHandler

   Apache::PerlRun

   Options ExecCGI

   allow from all

   PerlSendHeader On

</Location>

 

mod_perl is a powerful and configurable module. Much more information on configuration is available from the Apache on-line documentation.

Java Servlets

Java is a programming language developed by Sun Microsystems. It is unique in that once compiled, Java programs will run on any machine, architecture or operating system with the help of a Java Virtual Machine (JVM). The compiled program, called a servlet, is not designed to run on any specific machine but instead on a JVM, a piece of software which provides a standard set of commands like that of a chipset. JVMs can and have been developed for nearly all the important operating systems, guaranteeing that well-written code should work on any platform without recompilation. This cross-platform portability is an important feature of the Java development environment which ensures that your development resources will never be made obsolete by new hardware ý investment will survive a change of platform.

 

Java servlets are called by the browser, but are run on the server with the results being sent to the browser. This eliminates any need to worry about the browser type as no code is sent. It is possible to implement infinitely complex algorithms using Java servlets, but if the servlet is designed to return output as pure HTML, the results will be viewable by even the simplest text based browsers.

 

Servlet support in Apache is performed using ApacheJServ, a fully featured Java servlet runtime container supporting all commands up to JSDK 2.0. While the ApacheJServ modules are not particularly big, the Java Development Kit which is required to compile servlets and provide the JVM, is a huge 45 MB in size (the zipped archive is just over 19MB in size), and using servlets will also cause a step increase in memory requirement of around 32MB due to the JVM. However, the benefits of servlet technology far outweigh the cost of set up, so read on!

 

Servlets are relatively more difficult to configure than CGI, and Java may take some getting used to ý it is a very powerful language with many similarities to C++. However, Java offers the increased security of its in-built security model which makes it much more difficult for hackers to cause damage by passing harmful system commands to the servlet. Complex tasks like chat-rooms or server-side parts of games are also ideally suited to Java, because you can create servlets which will stay alive right from their initial instantiation. While Perl is ideally suited to text processing applications, Java can be used to develop code of infinite complexity with extensions available to make multi-tier distributed applications possible. With the help of MySQL, details of which you will find at www.mysql.org, it is possible to use SQL databases. You will find a more complete discussion of these topics in the Wrox publication Professional Java Server Programming.

 

And so we come to installing ApacheJServ.

 

To run ApacheJServ requires the Java Development Kit 1.2 for glibc 2.1 from http://www.blackdown.org. 1 (Note that older Linux distributions may require the glibc 2.0 version.) The JDK is currently only available as a bzip2 archive, so you will need to install the bzip2 utility as well (http://sourceware.cygnus.com/bzip2/). You will also need the Java Servlet Development Kit (JSDK) version 2.0 from http://java.sun.com/products/Servlet. Download JServ from http://java.apache.org and extract into the /usr/local/ApacheJServ-1.0 directory and type the following commands:

      

       # mkdir /usr/local/apache/src/modules/jserv

# cd /usr/local/ApacheJServ-1.0

# ./configure --prefix=/usr/local/ApacheJServ-1.0 --with-apache-\
> install=/usr/local/apache --with-jsdk=/usr/local/JSDK2.0/lib/jsdk.jar

# make

# make install

 

ApacheJServ should now be installed and configured. Open httpd.conf for editing and add this directive to the very end of the file:

 

Include /usr/local/ApacheJServ-1.0/example/jserv.conf

 

Appending this command forces Apache to read jserv.conf from its installed location. Future versions of ApacheJServ may instead install this file in the same directory as httpd.conf. The jserv.conf file contains all the commands to configure the Apache side of ApacheJServ.

 

Restart Apache, give it a moment to two for JServ to begin accepting requests, and if everything works, visiting http://localhost/example/Hello should produce a success page!

 

 

If this does not work, then your version of ApacheJServ configures /servlet as the test zone, which means that you would have to type http://localhost/servlet/Hello.

Java Server Pages

While Java servlets offer boundless possibilities for powerful server-side processing, for simple applications they can be quite unwieldy. Perhaps you want to insert the time and date at one point on your page, and perform a calculation at another; using JavaScript or a Java Applet prevents older browsers viewing your page correctly. You could use a single servlet to create the whole page. However, the page content itself is now mixed up within Java code, making maintenance difficult ý particularly if the programmers and web designers are different groups of people. Alternatively, you could keep the page content in an HTML file which uses Server-Side Includes to call successive CGI scripts to insert the correct text at each point. This way the web designers can maintain the HTML without worrying about the code. However, this simple page now has one HTML file and several CGI scripts associated with it, which again makes maintenance complicated.

 

For simple applications, the ideal solution would be to have the Java code and HTML contained in a single file. It will have the look and 'feel' of HTML, so the web designers can understand it, but would contain additional code which would be run on the server before delivering the page back to the client. Sun's new member of the Java family, JavaServer Pages (JSP), provides this solution. Code can be inserted in line within the HTML, which is executed on the server and the results merged with the HTML in the output. This parallels how Microsoft's ASP works, and JSP is emerging as the open source challenger to ASP in this field.

 

The file which leaves the server is pure HTML, so unlike JavaScript and Java Applets, which have to be run on the client, you can have the interactivity and programming flexibility of Java while ensuring that all existing HTML browsers can display the output. Furthermore, you maintain all the advantages of Java's portability should you later decide to change operating system or web server. There are already many web sites that use JSP instead of ASP.

 

Up until recently, the main open source JSP implementations were GNU Server Pages (GSP) and GNU Java Server Pages (GNUJSP), which are independent development efforts despite their similar names. Both are written as regular Java servlets, and although they are difficult to install and configure, they can be used to create JSPs and develop web sites. Information on GSP and GNUJSP can be found at www.bitmechanic.com and www.klomp.org/gnujsp respectively.

 

However, JSP support in Apache now is in the form of a module called Jakarta, named after the project team which implemented it (or the largest city on the Indonesian island of Java which might or might not be a coincidence). At time of going to press, Jakarta is in final pre-release form, so by the time you read this Jakarta will almost certainly be in production release. The latest version of Jakarta and its installation instructions are available online at http://jakarta.apache.org.

 

PAGE: 1 I 2 I 3 I NEXT PAGE
 





Ready to take that job and shove it?

Function:

Keyword(s):

State:
SPONSOR
RECENT JOB POSTINGS
CAREER NEWS
Go beyond Google and get vertical. These specialized search sites will help you find the business information you need -- fast.

Ari Balogh was named to the post of chief technology officer as the companys for a "realignment" of employees.