June 5, 2000
Brought to you by:
Table of contents:
Got a tough Linux deployment question? Ask the experts!For a limited time, you can put the authors of "Deploying Web and FTP Servers" to the test. Post your question, and if they answer it, you'll receive a free Network Computing collectable. Click here for more info.
|
Technologies for
Effective Sites
Undoubtedly
the biggest cost in deploying any web site is the design and maintenance. If
you have used ASP in the past, you will be aware of the focus on reducing
maintenance costs. If every time you want to make a content or design change,
you need to edit every page on the server by hand, maintenance becomes
prohibitively expensive and error-prone. Server Side Includes (SSI) allow text which is common to every page, to
be specified in one file. The other pages can then include it before the
document is sent to the client, using an SSI command. Furthermore, if the site
is to be able to present more than purely static content, it needs some way of
adapting the content it serves to the actions of the client. It needs the
intelligence provided by programming languages, such as the ability to perform
calculations, handle information and provide feedback to the user.
CGI (Common Gateway
Interface) allows scripts to run on the server. The main language used is Perl
which is a scripting language providing excellent text processing power as well
as standard programming tools. These scripts can handle user input and process
it, store it and return customized pages. Apache implements standard script
support, and can also provide super-fast script support using mod_perl.
Alternatives
to CGI for providing fast server-side processing come with Java servlets and
JavaServer Pages. Java servlets are complete programs which run on your server, providing
a complete portable development environment for your applications. Apache
implements full servlet support with the help of ApacheJServ, which we will install
later on in this section.
Java Server Pages are from the same family as servlets, but instead of
being separate programs, they are HTML files with Java code inserted in-line,
and executed before the document leaves the server, combining incredible
programming power with the simplicity of in-line code. Support for JSP is
provided by project Jakarta, which we will briefly discuss later on.
Server-Side
Includes
The real trampolining.net web site contains over 100 separate HTML pages, and
that may be small compared to the web sites you plan to deploy. Nearly every
page follows exactly the same format in terms of design and layout, including a
copyright statement at the end of every page. At the end of the year, all 100
pages will need the copyright statement updated to read, for example, ý 1999,
2000.
To
attempt to update all these manually would be tedious and error-prone. Instead,
Server-Side Includes (SSI) are used to include one HTML file within the others.
Any commonly repeated text could be inserted using SSI:
<!--#include
virtual="stylesheet.shtml" -->
Includes a standard stylesheet
<!--#include
virtual="navbar.shtml" -->
Starts the page table and include
a standard navigation
bar
Main content here
<!--#include
virtual="copyright.shtml" -->
Closes table, adds copyright statement
The
include commands insert the contents of the named files at that point. The
named file is relative to the directory of the main file; subdirectories can be
accessed (e.g.
<!--#include virtual="subdir1/included.html" -->) as can files in parent
directories (e.g. <!--#include virtual="../fromparent.html" -->) and the included files can themselves contain SSI
commands if they end in .shtml. These inclusions are all
performed before the document leaves the server ý the client will only ever see
a normal HTML page. At the end of the year, I will only need to change copyright.shtml, for all the pages on my site to be updated ý a
huge saving in maintenance time.
SSI
includes other useful commands: CGI scripts can be called using the
<!--#exec cgi="/cgi-bin/script.pl" --> command, with the output written directly into the
page sent to the client. This prevents the client knowing the script even
exists, so is a useful security aid.
You
can insert text which automatically updates using the
<!--#echo var="LAST_MODIFIED" --> command, which allows an extended set of standard
variables to be inserted automatically each time the page is called, to show
the date for example. Listings of the available commands are available online
in the Apache mod_include documentation.
Apache
provides excellent support for SSI with just a few commands. Because SSI
increases server load,
it is traditional to suffix any file containing SSI with .shtml. Setting up Apache to parse *.shtml means it won't waste time attempting to parse
normal HTML (*.html) files. This part of the
configuration takes place in the primary server section of httpd.conf, outside of any <Directory> containers. In fact, the directives are
already there ý about three quarters of the way through the file and just need
uncommenting. These two lines tell Apache what content type .shtml should be allocated, and tells the internal SSI
handler to parse .shtml files before serving them
to the client.
AddType text/html .shtml
AddHandler server-parsed .shtml
Adding
index.shtml to this directive allows .shtml files to be served as directory indexes by
preference; or in other words if index.shtml exists in the root
directory of my server, it will be served to someone requesting http://www.trampolining.net).
DirectoryIndex index.shtml index.html index.htm
It is then necessary to turn on SSI support in every
directory container in which you wish to use it. If SSI is not working on a
virtual host, check this command is present in that virtual host's directory
container:
<Directory
/home/www/trampolining.net>
Options Indexes FollowSymLinks
Includes ExecCGI MultiViews
Options +Includes
AllowOverride None
Order allow,deny
Allow from all
</Directory>
For
more information on Apache SSI look up http://www.apache.org/docs/mod/mod_include.html
Common
Gateway Interface
Better known as CGI, this
technology is the simplest way to deploy interactive
content on your web site. Scripts are freely available to perform everything from form handling
to maintaining complete discussion forums. Scripts are usually written in Perl
and interpreted as they are used. However as with any program running on your
server, they represent a potential security risk. It is possible to configure
Apache to interpret scripts from anywhere on the system, but this means anyone
with access to directories containing web pages can create potentially harmful
scripts.
To
minimize this, CGI scripts are run from a special directory, usually called cgi-bin, and have file permissions
set that allow remote users to execute them, but only allowing write access to
root. The first line of the Perl script must also be changed to read the
location of the Perl interpreter on your system ý type which perl to find it.
The httpd.conf file already contains the
necessary directives in the primary server section, so we just need to
uncomment them and change any locations if necessary ý note the trailing
slashes:
ScriptAlias /cgi-bin/
"/home/www/cgi/"
The
above directive tells Apache to treat any request to /cgi-bin/ as a request for a script, and to look for that
script in the server directory /home/www/cgi/. This is inherited by any
virtual hosts, unless we define a different ScriptAlias in the corresponding VirtualHost container, so in this example, http://www.trampolining.net/cgi-bin/script.pl and http://www.sport-science.net/cgi-bin/script.pl will each point to /home/www/cgi/script.pl.
Now
look at this Directory container:
<Directory
/home/www/cgi/>
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>
This sets the permissions for your CGI directory to the
absolute minimum necessary to run scripts. No-one will actually be able to read
the scripts as any request will instead run them. These minimum permissions
will also make life more difficult for hackers trying to access your scripts.
mod_perl
The mod_perl program allows Perl scripts
to be run very fast by a dedicated Perl interpreter within Apache, which will
not need starting separately for each request. Perl scripts are reported to run
between two and twenty times faster than mod_cgi, depending on the script
itself. However the increased speed of script processing comes at a price.
The
mod_perl module is a complex module that is complicated to
install and configure, and the actual steps needed depend on the versions of mod_perl and Apache being used; it also has three user modes, and thirty configuration options during
build. Therefore, detailed installation instructions are beyond the
scope of this book, though you can get more help from the INSTALL text file that comes with the mod_perl download or from the Apache web site (www.apache.org). Furthermore, the installation of mod_perl will
break your existing Apache configuration. It has to be installed first and
Apache reinstalled on top, which mean that you will have to customize Apache
again from scratch. You have to decide, right from the onset, whether to
include mod_perl in your server system, as it is currently very
difficult to incorporate it later on.
The
discussion on mod_perl has been left until now,
because its benefits would only become apparent under conditions of very heavy
server usage. For moderate or low usage, then CGI is only marginally slower and
there is little advantage in having the increased script processing power that mod_perl offers. The mod_perl program is an advanced
application that should be considered for use, only if very high server usage
is anticipated.
An
Example Installation
Below is a very standard
installation procedure for mod_perl. Download the module from www.apache.org and uncompress it to /usr/local. Then carry out the following steps.
The installation steps reproduced below
are highly simplified and can only be said to work on most systems. You should look up the Apache documentation for more
detailed instructions.
# cd
/usr/local/mod_perl
# perl
Makefile.PL APACHE_SRC=../apache_version/src\
> DO_HTTPD=1 USE_DSO=1 USE_APACI=1 EVERYTHING=1
# make
&& make test && make install
# cd
../apache_x.x.x
# make
install
After
the installation is complete, and mod_perl and Apache are working as they
should, then Apache will need to be configured for mod_perl. This consists of adding a
few directives to httpd.conf. The first tells Apache to
look for /home/www/fast-perl/anyscript.cgi given a request for www.trampolining.net/fast-perl/anyscript.cgi.
Alias /fast-perl/ /home/www/fast-perl
The
next lines tell Apache to allow scripts to be executed in this directory, and to
execute them by passing them to mod_perl:
<Location /cgi-perl>
AllowOverride None
SetHandler perl-script PerlHandler
Apache::PerlRun
Options ExecCGI
allow from all
PerlSendHeader On
</Location>
mod_perl is a powerful and configurable
module. Much more information on configuration is available from the Apache
on-line documentation.
Java
Servlets
Java is a programming language developed by Sun Microsystems. It is
unique in that once compiled, Java programs will run on any machine, architecture or operating system with the help of a Java Virtual Machine (JVM). The compiled program, called a servlet, is not
designed to run on any specific machine but instead on a JVM, a piece of
software which provides a standard set of commands like that of a chipset. JVMs
can and have been developed for nearly all the important operating systems,
guaranteeing that well-written code should work on any platform without
recompilation. This cross-platform portability is an important feature of the
Java development environment which ensures that your development resources will
never be made obsolete by new hardware ý investment will survive a change of
platform.
Java servlets are called by the browser, but are run on
the server with the results being sent to the browser. This eliminates any need
to worry about the browser type as no code is sent. It is possible to implement
infinitely complex algorithms using Java servlets, but if the servlet is
designed to return output as pure HTML, the results will be viewable by even
the simplest text based browsers.
Servlet support in Apache is
performed using ApacheJServ, a fully featured Java servlet runtime container supporting
all commands up to JSDK 2.0. While the ApacheJServ modules are not particularly
big, the Java Development Kit which is required to compile servlets and provide
the JVM, is a huge 45 MB in size (the zipped archive is just over 19MB in
size), and using servlets will also cause a step increase in memory requirement
of around 32MB due to the JVM. However, the benefits of servlet technology far
outweigh the cost of set up, so read on!
Servlets are
relatively more difficult to configure than CGI, and Java may take some getting
used to ý it is a very powerful language with many similarities to C++.
However, Java offers the increased security of its in-built security model
which makes it much more difficult for hackers to cause damage by passing
harmful system commands to the servlet. Complex tasks like chat-rooms or
server-side parts of games are also ideally suited to Java, because you can
create servlets which will stay alive right from their initial instantiation.
While Perl is ideally suited to text processing applications, Java can be used
to develop code of infinite complexity with extensions available to make
multi-tier distributed applications possible. With the help of MySQL, details
of which you will find at www.mysql.org, it is possible to
use SQL databases. You will find a more complete discussion of these topics in
the Wrox publication Professional Java
Server Programming.
And so we come to installing ApacheJServ.
To
run ApacheJServ requires the Java Development Kit 1.2 for glibc 2.1 from http://www.blackdown.org. 1 (Note that older Linux distributions may
require the glibc 2.0 version.) The JDK is currently
only available as a bzip2 archive, so you will need to install the
bzip2
utility as well (http://sourceware.cygnus.com/bzip2/). You will also need the Java Servlet Development Kit (JSDK) version 2.0 from http://java.sun.com/products/Servlet. Download JServ from http://java.apache.org and extract into the /usr/local/ApacheJServ-1.0 directory and type the
following commands:
# mkdir
/usr/local/apache/src/modules/jserv
# cd
/usr/local/ApacheJServ-1.0
# ./configure
--prefix=/usr/local/ApacheJServ-1.0 --with-apache-\
> install=/usr/local/apache
--with-jsdk=/usr/local/JSDK2.0/lib/jsdk.jar
# make
# make
install
ApacheJServ should now be installed and configured. Open httpd.conf for editing and add this
directive to the very end of the file:
Include
/usr/local/ApacheJServ-1.0/example/jserv.conf
Appending
this command forces Apache to read jserv.conf from its installed location. Future versions
of ApacheJServ may instead install this file in the same directory as httpd.conf.
The jserv.conf file contains all the commands to
configure the Apache side of ApacheJServ.
Restart
Apache, give it a moment to two for JServ to begin accepting requests, and if
everything works, visiting http://localhost/example/Hello should produce a success
page!

If
this does not work, then your version of ApacheJServ configures /servlet as the test zone, which means that you would have
to type http://localhost/servlet/Hello.
Java
Server Pages
While Java servlets offer boundless possibilities for powerful
server-side processing, for simple applications they can be quite unwieldy.
Perhaps you want to insert the time and date at one point on your page, and
perform a calculation at another; using JavaScript or a Java Applet prevents
older browsers viewing your page correctly. You could use a single servlet to
create the whole page. However, the page content itself is now mixed up within
Java code, making maintenance difficult ý particularly if the programmers and
web designers are different groups of people. Alternatively, you could keep the
page content in an HTML file which uses Server-Side Includes to call successive
CGI scripts to insert the correct text at each point. This way the web
designers can maintain the HTML without worrying about the code. However, this
simple page now has one HTML file and several CGI scripts associated with it,
which again makes maintenance complicated.
For
simple applications, the ideal solution would be to have the Java code and HTML
contained in a single file. It will have the look and 'feel' of HTML, so the
web designers can understand it, but would contain additional code which would
be run on the server before delivering the page back to the client. Sun's new
member of the Java family, JavaServer Pages (JSP), provides this solution. Code
can be inserted in
line within the HTML, which is executed on the server and the results merged
with the HTML in the output. This parallels how Microsoft's ASP works, and JSP
is emerging as the open source challenger to ASP in this field.
The
file which leaves the server is pure HTML, so unlike JavaScript and Java
Applets, which have to be run on the client, you can have the interactivity and
programming flexibility of Java while ensuring that all existing HTML browsers
can display the output. Furthermore, you maintain all the advantages of Java's
portability should you later decide to change operating system or web server.
There are already many web sites that use JSP instead of ASP.
Up
until recently, the main open source JSP implementations were GNU Server Pages (GSP) and GNU Java Server Pages (GNUJSP), which are independent development
efforts despite their similar names. Both are written as regular Java servlets,
and although they are difficult to install and configure, they can be used to
create JSPs and develop web sites. Information on GSP and GNUJSP can be found
at www.bitmechanic.com and www.klomp.org/gnujsp respectively.
However, JSP support in Apache now is in the form of a module called Jakarta, named after the
project team which implemented it (or the largest city on the Indonesian island
of Java which might or might not be a coincidence). At time of going to press,
Jakarta is in final pre-release form, so by the time you read this Jakarta will
almost certainly be in production release. The latest version of Jakarta and its installation
instructions are available online at http://jakarta.apache.org.