|
|
||
|
| ||
Chapter 5: Deploying Web and FTP Servers May 22, 2000
Technologies for
Effective Sites
Undoubtedly
the biggest cost in deploying any web site is the design and maintenance. If
you have used ASP in the past, you will be aware of the focus on reducing
maintenance costs. If every time you want to make a content or design change,
you need to edit every page on the server by hand, maintenance becomes
prohibitively expensive and error-prone. Server Side Includes (SSI) allow text which is common to every page, to
be specified in one file. The other pages can then include it before the
document is sent to the client, using an SSI command. Furthermore, if the site
is to be able to present more than purely static content, it needs some way of
adapting the content it serves to the actions of the client. It needs the
intelligence provided by programming languages, such as the ability to perform
calculations, handle information and provide feedback to the user. CGI (Common Gateway
Interface) allows scripts to run on the server. The main language used is Perl
which is a scripting language providing excellent text processing power as well
as standard programming tools. These scripts can handle user input and process
it, store it and return customized pages. Apache implements standard script
support, and can also provide super-fast script support using mod_perl. Alternatives
to CGI for providing fast server-side processing come with Java servlets and
JavaServer Pages. Java servlets are complete programs which run on your server, providing
a complete portable development environment for your applications. Apache
implements full servlet support with the help of ApacheJServ, which we will install
later on in this section. Java Server Pages are from the same family as servlets, but instead of
being separate programs, they are HTML files with Java code inserted in-line,
and executed before the document leaves the server, combining incredible
programming power with the simplicity of in-line code. Support for JSP is
provided by project Jakarta, which we will briefly discuss later on. Server-Side
Includes
The real trampolining.net web site contains over 100 separate HTML pages, and
that may be small compared to the web sites you plan to deploy. Nearly every
page follows exactly the same format in terms of design and layout, including a
copyright statement at the end of every page. At the end of the year, all 100
pages will need the copyright statement updated to read, for example, © 1999,
2000. To
attempt to update all these manually would be tedious and error-prone. Instead,
Server-Side Includes (SSI) are used to include one HTML file within the others.
Any commonly repeated text could be inserted using SSI: <!--#include virtual="stylesheet.shtml" --> Includes a standard stylesheet <!--#include virtual="navbar.shtml" --> Starts the page table and include a standard navigation bar Main content here <!--#include virtual="copyright.shtml" --> Closes table, adds copyright statement The
include commands insert the contents of the named files at that point. The
named file is relative to the directory of the main file; subdirectories can be
accessed (e.g. SSI
includes other useful commands: CGI scripts can be called using the You
can insert text which automatically updates using the Apache
provides excellent support for SSI with just a few commands. Because SSI
increases server load,
it is traditional to suffix any file containing SSI with .shtml. Setting up Apache to parse *.shtml means it won't waste time attempting to parse
normal HTML (*.html) files. This part of the
configuration takes place in the primary server section of httpd.conf, outside of any <Directory> containers. In fact, the directives are
already there ó about three quarters of the way through the file and just need
uncommenting. These two lines tell Apache what content type .shtml should be allocated, and tells the internal SSI
handler to parse .shtml files before serving them
to the client. AddType text/html .shtml AddHandler server-parsed .shtml Adding
index.shtml to this directive allows .shtml files to be served as directory indexes by
preference; or in other words if index.shtml exists in the root
directory of my server, it will be served to someone requesting http://www.trampolining.net). DirectoryIndex index.shtml index.html index.htm It is then necessary to turn on SSI support in every directory container in which you wish to use it. If SSI is not working on a virtual host, check this command is present in that virtual host's directory container: <Directory /home/www/trampolining.net> Options Indexes FollowSymLinks Includes ExecCGI MultiViews Options +Includes AllowOverride None Order allow,deny Allow from all </Directory> For more information on Apache SSI look up http://www.apache.org/docs/mod/mod_include.html Common
Gateway Interface
Better known as CGI, this
technology is the simplest way to deploy interactive
content on your web site. Scripts are freely available to perform everything from form handling
to maintaining complete discussion forums. Scripts are usually written in Perl
and interpreted as they are used. However as with any program running on your
server, they represent a potential security risk. It is possible to configure
Apache to interpret scripts from anywhere on the system, but this means anyone
with access to directories containing web pages can create potentially harmful
scripts. To
minimize this, CGI scripts are run from a special directory, usually called cgi-bin, and have file permissions
set that allow remote users to execute them, but only allowing write access to
root. The first line of the Perl script must also be changed to read the
location of the Perl interpreter on your system ó type which perl to find it. The httpd.conf file already contains the
necessary directives in the primary server section, so we just need to
uncomment them and change any locations if necessary ó note the trailing
slashes: ScriptAlias /cgi-bin/ "/home/www/cgi/" The
above directive tells Apache to treat any request to /cgi-bin/ as a request for a script, and to look for that
script in the server directory /home/www/cgi/. This is inherited by any
virtual hosts, unless we define a different ScriptAlias in the corresponding VirtualHost container, so in this example, http://www.trampolining.net/cgi-bin/script.pl and http://www.sport-science.net/cgi-bin/script.pl will each point to /home/www/cgi/script.pl. Now
look at this Directory container: <Directory
/home/www/cgi/> AllowOverride None Options None Order allow,deny Allow from all </Directory> This sets the permissions for your CGI directory to the absolute minimum necessary to run scripts. No-one will actually be able to read the scripts as any request will instead run them. These minimum permissions will also make life more difficult for hackers trying to access your scripts. mod_perl
The mod_perl program allows Perl scripts
to be run very fast by a dedicated Perl interpreter within Apache, which will
not need starting separately for each request. Perl scripts are reported to run
between two and twenty times faster than mod_cgi, depending on the script
itself. However the increased speed of script processing comes at a price. The
mod_perl module is a complex module that is complicated to
install and configure, and the actual steps needed depend on the versions of mod_perl and Apache being used; it also has three user modes, and thirty configuration options during
build. Therefore, detailed installation instructions are beyond the
scope of this book, though you can get more help from the INSTALL text file that comes with the mod_perl download or from the Apache web site (www.apache.org). Furthermore, the installation of mod_perl will
break your existing Apache configuration. It has to be installed first and
Apache reinstalled on top, which mean that you will have to customize Apache
again from scratch. You have to decide, right from the onset, whether to
include mod_perl in your server system, as it is currently very
difficult to incorporate it later on. The
discussion on mod_perl has been left until now,
because its benefits would only become apparent under conditions of very heavy
server usage. For moderate or low usage, then CGI is only marginally slower and
there is little advantage in having the increased script processing power that mod_perl offers. The mod_perl program is an advanced
application that should be considered for use, only if very high server usage
is anticipated. An
Example Installation
Below is a very standard
installation procedure for mod_perl. Download the module from www.apache.org and uncompress it to /usr/local. Then carry out the following steps. The installation steps reproduced below are highly simplified and can only be said to work on most systems. You should look up the Apache documentation for more detailed instructions. # cd /usr/local/mod_perl # perl Makefile.PL APACHE_SRC=../apache_version/src\ > DO_HTTPD=1 USE_DSO=1 USE_APACI=1 EVERYTHING=1 # make && make test && make install # cd ../apache_x.x.x # make install After
the installation is complete, and mod_perl and Apache are working as they
should, then Apache will need to be configured for mod_perl. This consists of adding a
few directives to httpd.conf. The first tells Apache to
look for /home/www/fast-perl/anyscript.cgi given a request for www.trampolining.net/fast-perl/anyscript.cgi. Alias /fast-perl/ /home/www/fast-perl The
next lines tell Apache to allow scripts to be executed in this directory, and to
execute them by passing them to mod_perl: <Location /cgi-perl> AllowOverride None SetHandler perl-script PerlHandler Apache::PerlRun Options ExecCGI allow from all PerlSendHeader On </Location> mod_perl is a powerful and configurable
module. Much more information on configuration is available from the Apache
on-line documentation. Java
Servlets
Java is a programming language developed by Sun Microsystems. It is
unique in that once compiled, Java programs will run on any machine, architecture or operating system with the help of a Java Virtual Machine (JVM). The compiled program, called a servlet, is not
designed to run on any specific machine but instead on a JVM, a piece of
software which provides a standard set of commands like that of a chipset. JVMs
can and have been developed for nearly all the important operating systems,
guaranteeing that well-written code should work on any platform without
recompilation. This cross-platform portability is an important feature of the
Java development environment which ensures that your development resources will
never be made obsolete by new hardware ó investment will survive a change of
platform. Java servlets are called by the browser, but are run on the server with the results being sent to the browser. This eliminates any need to worry about the browser type as no code is sent. It is possible to implement infinitely complex algorithms using Java servlets, but if the servlet is designed to return output as pure HTML, the results will be viewable by even the simplest text based browsers. Servlet support in Apache is
performed using ApacheJServ, a fully featured Java servlet runtime container supporting
all commands up to JSDK 2.0. While the ApacheJServ modules are not particularly
big, the Java Development Kit which is required to compile servlets and provide
the JVM, is a huge 45 MB in size (the zipped archive is just over 19MB in
size), and using servlets will also cause a step increase in memory requirement
of around 32MB due to the JVM. However, the benefits of servlet technology far
outweigh the cost of set up, so read on! Servlets are
relatively more difficult to configure than CGI, and Java may take some getting
used to ó it is a very powerful language with many similarities to C++.
However, Java offers the increased security of its in-built security model
which makes it much more difficult for hackers to cause damage by passing
harmful system commands to the servlet. Complex tasks like chat-rooms or
server-side parts of games are also ideally suited to Java, because you can
create servlets which will stay alive right from their initial instantiation.
While Perl is ideally suited to text processing applications, Java can be used
to develop code of infinite complexity with extensions available to make
multi-tier distributed applications possible. With the help of MySQL, details
of which you will find at www.mysql.org, it is possible to
use SQL databases. You will find a more complete discussion of these topics in
the Wrox publication Professional Java
Server Programming. And so we come to installing ApacheJServ. To
run ApacheJServ requires the Java Development Kit 1.2 for glibc 2.1 from http://www.blackdown.org. 1 (Note that older Linux distributions may
require the glibc 2.0 version.) The JDK is currently
only available as a bzip2 archive, so you will need to install the
bzip2
utility as well (http://sourceware.cygnus.com/bzip2/). You will also need the Java Servlet Development Kit (JSDK) version 2.0 from http://java.sun.com/products/Servlet. Download JServ from http://java.apache.org and extract into the /usr/local/ApacheJServ-1.0 directory and type the
following commands:
# mkdir
/usr/local/apache/src/modules/jserv # cd
/usr/local/ApacheJServ-1.0 # ./configure
--prefix=/usr/local/ApacheJServ-1.0 --with-apache-\ # make # make
install ApacheJServ should now be installed and configured. Open httpd.conf for editing and add this
directive to the very end of the file: Include /usr/local/ApacheJServ-1.0/example/jserv.conf Appending
this command forces Apache to read jserv.conf from its installed location. Future versions
of ApacheJServ may instead install this file in the same directory as httpd.conf.
The jserv.conf file contains all the commands to
configure the Apache side of ApacheJServ. Restart
Apache, give it a moment to two for JServ to begin accepting requests, and if
everything works, visiting http://localhost/example/Hello should produce a success
page!
If
this does not work, then your version of ApacheJServ configures /servlet as the test zone, which means that you would have
to type http://localhost/servlet/Hello. Java
Server Pages
While Java servlets offer boundless possibilities for powerful
server-side processing, for simple applications they can be quite unwieldy.
Perhaps you want to insert the time and date at one point on your page, and
perform a calculation at another; using JavaScript or a Java Applet prevents
older browsers viewing your page correctly. You could use a single servlet to
create the whole page. However, the page content itself is now mixed up within
Java code, making maintenance difficult ó particularly if the programmers and
web designers are different groups of people. Alternatively, you could keep the
page content in an HTML file which uses Server-Side Includes to call successive
CGI scripts to insert the correct text at each point. This way the web
designers can maintain the HTML without worrying about the code. However, this
simple page now has one HTML file and several CGI scripts associated with it,
which again makes maintenance complicated. For
simple applications, the ideal solution would be to have the Java code and HTML
contained in a single file. It will have the look and 'feel' of HTML, so the
web designers can understand it, but would contain additional code which would
be run on the server before delivering the page back to the client. Sun's new
member of the Java family, JavaServer Pages (JSP), provides this solution. Code
can be inserted in
line within the HTML, which is executed on the server and the results merged
with the HTML in the output. This parallels how Microsoft's ASP works, and JSP
is emerging as the open source challenger to ASP in this field. The
file which leaves the server is pure HTML, so unlike JavaScript and Java
Applets, which have to be run on the client, you can have the interactivity and
programming flexibility of Java while ensuring that all existing HTML browsers
can display the output. Furthermore, you maintain all the advantages of Java's
portability should you later decide to change operating system or web server.
There are already many web sites that use JSP instead of ASP. Up
until recently, the main open source JSP implementations were GNU Server Pages (GSP) and GNU Java Server Pages (GNUJSP), which are independent development
efforts despite their similar names. Both are written as regular Java servlets,
and although they are difficult to install and configure, they can be used to
create JSPs and develop web sites. Information on GSP and GNUJSP can be found
at www.bitmechanic.com and www.klomp.org/gnujsp respectively. However, JSP support in Apache now is in the form of a module called Jakarta, named after the
project team which implemented it (or the largest city on the Indonesian island
of Java which might or might not be a coincidence). At time of going to press,
Jakarta is in final pre-release form, so by the time you read this Jakarta will
almost certainly be in production release. The latest version of Jakarta and its installation
instructions are available online at http://jakarta.apache.org. Logs and Analysis
To
develop a web site effectively, you will need to regularly analyze the web site's log files, which contain data on everyone who accesses the site.
From it you can determine, the number of requests made, the identities (IP
addresses) of the clients and the pattern of hyperlinks that are followed
across the web site. While small scale information can be gained by manually
viewing the log files, this technique is not appropriate for finding
large-scale trends. Each request for a page creates 60 bytes or so of data that
is added to the log file ó more if images are requested along with the pages,
which is usually the case. Multiplying this number by, say, 200 daily page
requests means that roughly 50-60 kilobytes of data added to the log each day.
Therefore, manual viewing is in reality restricted to small samples of the
logs. To
automatically analyze the complete logs, we will be using Analog, a small yet
powerful program which is configurable, scalable and free. It is currently the
most popular log file analysis program on the web (a 25% market share according
to a GVU report at http://www.gvu.gatech.edu). It will be configured to
produce separate reports for each virtual host, and update them each morning,
and the reports will only be read by authorized people. Manual
Logfile Analysis
While manual analysis will not be suitable for viewing overall trends,
it allows you to interpret the logs with human intelligence. For example, if
you notice lots of visitors are requesting one page then leaving, you may want
to investigate ways of encouraging them to stay on your site. Do you provide
links to other relevant pages? Are they arriving directly into a frame and
being trapped with no links out? Are your pages so large, or your connection so
slow, they are giving up waiting and leaving the site? You
will have chosen where to place your logs when editing httpd.conf. Simply open one in an editor and concentrate on a
small section. Below is an extract from access_log on my machine (with the IP
addresses replaced by dummy ones): 231.231.231.231
- - [02/Oct/1999:19:47:35 +0000] "GET / HTTP/1.1" 200 9621
"-" "Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:41 +0000] "GET /trampnetmini.gif HTTP/1.1" 304
- "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:58 +0000] "GET /trampnetmini.gif HTTP/1.1" 304
- "http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:58 +0000] "GET /coach.gif HTTP/1.1" 304 -
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:59 +0000] "GET /news.gif HTTP/1.1" 304 -
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:59 +0000] "GET /improve.gif HTTP/1.1" 304 -
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:59 +0000] "GET /merger.gif HTTP/1.1" 304 -
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 231.231.231.231
- - [02/Oct/1999:19:47:59 +0000] "GET /chat.gif HTTP/1.1" 304 -
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 132.132.132.132
- - [03/Oct/1999:16:30:45 +0000] "POST /cgi-bin/poll.pl?voted
HTTP/1.1" 302 291 "http://www.trampolining.net/"
"Mozilla/4.0 (compatible; MSIE 4.01; Windows 95)" 132.132.132.132
- - [03/Oct/1999:16:30:46 +0000] "GET / HTTP/1.1" 200 10137
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 132.132.132.132
- - [03/Oct/1999:16:30:47 +0000] "GET /trampnetmini.gif HTTP/1.1" 200
6971 "http://www.trampolining.net/" "Mozilla/4.0 (compatible;
MSIE 4.01; Windows 95)" 132.132.132.132
- - [03/Oct/1999:16:30:47 +0000] "GET /improve.gif HTTP/1.1" 200 4727
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" 132.132.132.132
- - [03/Oct/1999:16:30:49 +0000] "GET /merger.gif HTTP/1.1" 200 4526
"http://www.trampolining.net/" "Mozilla/4.0 (compatible; MSIE
4.01; Windows 95)" The first number in each line is the IP address of the client. By following an IP address through the log, you can find the path an individual visitor took through your site. (Office networks and ISPs such as AOL employing proxies represent around 25% of web traffic, and can cause a single user to appear to come from multiple IP addresses, or allow users to receive some pages without them appearing in your logs. This technique remains accurate the remainder of the time, and is normally accurate even during access via a proxy server, assuming there are not multiple caches. However there is as yet no way round this growing problem). There
follows the date and time, followed by the requested filename and the version
of HTTP in double quotes. A single slash (/) here represents a directory
request, which usually returns index.html. The number immediately
following the request is the HTTP success code which is either 200 or 304 as
shown above. Any unsuccessful requests, i.e. producing) 403 (Access forbidden)
or 404 (File not found) codes go into the error_log file. The
next field is the referrer, which in all of the above log entries is http://www.trampolining.net/. The identity of the referrer
depends on what file is being logged at the time. In the case of images, the
referrer is simply the page that contains the image, but in the case of pages,
it is the page the browser was previously viewing ó this gives a good idea
where your visitors are coming from. The final pieces of information are the
browser and the version of operating system. As
you can see, each page can generate many lines of log so to make this kind of
following easier, we can cut out some of the unwanted information. To follow
the path of just one client, type: $ grep 231.231.231.231 /usr/local/apache/logs/trampolining_access_log | more This
will display only log entries created by the client with IP address 231.231.231.231. There
are many log file entries corresponding to images, which are often of little
interest. To view only page entries, type: $ grep 'html HTTP' /usr/local/apache/logs/trampolining_access_log | more You
can even view page requests from a single client: $ grep 'html HTTP' /usr/local/apache/logs/trampolining_access_log | grep 231.231.231.231 A final technique allows you to watch current requests in real time. This command is: $ tail -f /usr/local/apache/logs/trampolining_access_log You
can make this easier to read by removing the image requests and displaying only
page requests: $ tail -f /usr/local/apache/logs/trampolining_access_log | grep 'html HTTP' Automatic
Analysis
This is the vehicle by which we will obtain an overview of our system's
usage. Installation of analog is quite simple: q Download Analog from http://www.statslab.cam.ac.uk/~sret1/analog/ to /usr/local/analog/. q Change to the /usr/local/analog
directory. q Open the analhead.h file for editing and change ANALOGDIR to /usr/local/analog/. q Type make. We
also need to prepare a directory for the reports and populate it with the
necessary images: q Type mkdir /home/www/trampolining.net/analog. q Copy /usr/local/analog/images/* to /home/www/trampolining.net/analog. That's
it ‑ Analog is ready for use. Analog
is set up using configuration files; the default is analog.cfg which we will edit now, and later on we will create
an additional configuration file for each virtual host. LOGFORMAT specifies the format of log
used. Analog natively supports the Apache formats COMBINED and COMMON. LOGFILE tells Analog where to look for the access log. LOGFORMAT COMBINED LOGFILE
/usr/local/apache/logs/access_log HOSTNAME specifies the name to put
at the top of the report. HOSTNAME
"www.trampolining.net" Remember
we told Apache not to resolve IP addresses? This little section tells Analog to
resolve them, but is much more efficient because addresses are only resolved
once, and then written to the cache file specified in DNSFILE. DNSGOODHOURS is the number of hours to
trust an entry in the cache file, DNSBADHOURS is the number of hours to
wait before attempting to resolve a bad IP address again. DNS WRITE tells Analog to try to
resolve unknown IP addresses, then write them to the dnsfile.txt file. The alternative command DNS READ would tell Analog to skip
IP addresses which didn't exist in the dnsfile.txt file, thus saving time. On
the first run, Analog will complain about dnsfile.txt not existing ó ignore it,
Analog will create it. DNSFILE
/usr/local/analog/dnsfile.txt DNSGOODHOURS
1250 DNSBADHOURS
350 DNS
WRITE This directive tells Analog where to create the report. OUTFILE
/home/www/trampolining.net/analog/trampolining_net_report.html HOSTEXCLUDE directives tell Analog to
ignore accesses from a certain IP address or hostname. This allows you to
report what your visitors do, without being influenced by your own visits! In
this example I exclude all page accesses from Cambridge University using
Cambridge's IP allocation, and exclude all accesses from York University using
the resolved hostnames. HOSTEXCLUDE
131.111.*.* HOSTEXCLUDE
*.york.ac.uk If
your web site contains pages with other extensions than .htm or .html, for example JSPs or .shtml, you will need to add them here to include them in
the page counts, otherwise Analog will assume them to be images. PAGEINCLUDE
*.htm,*.html,*.shtml Save your completed file as analog.cfg then type ./analog (or ./analog +g/other-config-file.cfg if you have an additional config file). If all goes well, you should get a report like this:
You
will need to create a configuration file for each virtual host, and save it
with a different filename, e.g. trampolining_net.cfg. Finally we are ready to
schedule Analog to run each morning. We do this using a cronjob, a Linux feature that allows tasks to be run at regular
times. We will need to create a separate task for each report, and run them at
different times to prevent multiple simultaneous Analog processes clashing. The
set up of cronjobs requires you to use the vi editor, which is explained
in Appendix B. This is what you type at the vi command prompt; 55 0 * * * /usr/local/analog/analog 55 1 * * * /usr/local/analog/analog +g/trampolining_net.cfg The
cronjob is now set up to run Analog at 0:55 a.m. each morning, which will write
to the default configuration file, analog.cfg), and again at 1.55 a.m. to
run Analog with the configuration file trampolining_net.cfg. Our
final task is to protect the reports from unwelcome visitors. To do this, we
will create a directory container for the /home/www/trampolining.net/analog/ directory in the primary
server section of httpd.conf: <Directory "/home/www/trampolining.net"> Options Indexes FollowSymLinks Options +Includes AllowOverride None Order allow,deny Allow from all </Directory> <Directory
"/home/www/trampolining.net/analog"> Order allow,deny Allow from 123.123.123.123 </Directory> This
will deny the contents of www.trampolining.net/analog to anyone except the owner
of IP address 123.123.123.123. Deploying an FTP
Server
While FTP servers are less prevalent in the current web browser driven Internet, they are still the primary method of distributing very large files and maintaining large stores of files. This section will demonstrate how to deploy an effective anonymous FTP server which modern web browsers will be able to access directly. As we said right at the beginning of the chapter, we will be installing the server developed by Washington University, WU-FTP, which can be downloaded from http://www.wu-ftpd.org. For more information on WU-FTP, look up http://www.landfield.com/wu-ftpd/. Installing WU-FTP
To install WU_FTP, you will need to carry out the following procedure: 1. Download WU-FTP and extract to /usr/local 2. Type ./build CC=gcc lnx ó Note that to build the ftpd daemon, you might have to install the byacc utility first, which contains the yacc parser. 3. Type ./build install 4.
We need tell Linux to use WU-FTP for FTP requests by
editing /etc/inetd.conf. Look for a line beginning ftp, and make sure it is
uncommented. Then edit it to look like this: 5.
Type ps -uax | grep inetd, which will produce a
listing of system processes with
the word inetd in the title. You should get output like this: 6.
Restart inetd by typing kill -HUP PID, where PID is the
process ID listed from step 5. The
latest download of WU-FTP comes with a configure script. It can be
installed, from the wu-ftp-version directory, using the ./configure, make, make install sequence of commands as in the other installations
in this chapter. There
we have it! The Washington University ó File Transfer Protocol daemon is
installed and ready for action! We can check the installation by typing ftp www.trampolining.net, or whatever your
hostname/IP address is. You should be presented with a login screen, and you
will be able to log in using a standard Linux user account and password set up
on your system. connected to www.trampolining.net. 220 www.trampolining.net FTP server (Version wu-2.6.0(1) Fri Nov 12 11:43:54 GMT 1999) ready. Name (www.trampolining.net:none): Configuring WU-FTP
To provide access to the general public we need to allow anonymous access. Before doing this, we need to create a safe directory for anonymous users, which will appear to them as the root of the FTP server. This prevents anonymous users browsing around your machine to obtain private information! We also need to create an user account for anonymous FTP users to use. Creating
an FTP directory
We will create our FTP
directory in /home/ and adopt a traditional
directory structure: mkdir /home/ftp mkdir /home/ftp/bin mkdir /home/ftp/etc mkdir /home/ftp/pub The
first, /home/ftp, will be the root directory of our anonymous FTP
server. /home/ftp/bin will contain links to commands we want to
allow FTP users to use, in particular ls (to list the contents of a
directory) and cd (to change directory). /home/ftp/etc is present to hold a password file if necessary and
/home/ftp/pub/ is the public directory which contains the
files we are making available. All
directories and files within this structure should be owned by root, and none
of them should have Group or All write permissions. This will prevent the user editing any of the files
ó by editing the contents of /home/ftp/bin/, a user could execute any
code on your machine. All the directories should have All read and execute permissions, to allow users to enter the directory
(execute permission) and read the contents (read permission). Finally, all the
files contained should have All and Group read permissions only ó this will allow users to
download files, but not change or execute them on your server. You
may require the creating of yet another directory, as follows: mkdir /home/ftp/incoming This
directory is special in that it is available for users to upload files to. For
this reason, it must have Group and All write permissions and but not
Group and All read permissions which will
prevent users viewing the contents of this directory. While this is the
standard way to implement two-way FTP access, it does pose a security risk ó
users could potentially upload illegal files and use your server to store them.
It is a serious policy decision whether or not to provide this service ó if you
do, be sure to set a umask to prevent uploaded scripts
being executed. A slightly more secure system involves removing All write permissions from this directory too, then creating
subdirectories with full read, write and execute permissions ó these can then be
accessed by 'trusted users'. Anyone you have not told the location of these
folders to should be unable to find them, since /home/ftp/incoming cannot be listed ó there
are no read permissions for All. To
summarize, this is how I suggest that you set the access permissions for your
FTP site: drwxr-xr-x root root bin/ drwxr-xr-x root root etc/ drwx--x--x root root incoming/ drwxrwxrwx root root incoming/secret drwxr-xr-x root root pub/ -rwxr--r-- root root pub/any.file drwxr-xr-x root root etc/ Configuring
Linux for WU-FTP
The most important change is to modify the main Linux /etc/passwd file to ensure the anonymous FTP user is limited to
/home/ftp/pub. Open the file for editing, you should see a
listing like this: ftp:x:14:50:FTP
User:/home/ftp: nobody:x:99:99:Nobody:/: gdm:x:42:42::/home/gdm:/bin/bash xfs:x:100:233:X
Font Server:/etc/X11/fs:/bin/false username:x:500:500::/home/username:/bin/bash If
no FTP user exists, use the root command adduser to add ftp. The important line begins with ftp, which contains the user
settings for FTP User. Note there is no entry after the final colon. This
ensures no command shell is made available to the FTP User. To force /home/ftp/ to be treated as root directory, we edit this line
slightly, adding a decimal point where we want the user to be rooted. The final
/pub ensures they are initially placed in that
directory: ftp:x:14:50:FTP
User:/home/ftp/./pub: nobody:x:99:99:Nobody:/: gdm:x:42:42::/home/gdm:/bin/bash xfs:x:100:233:X Font
Server:/etc/X11/fs:/bin/false username:x:500:500::/home/username:/bin/bash Finally,
we need to create a set of configuration files for WU-FTP in /etc. Luckily there is no need to create them by hand,
as WU-FTP distribute a default set with the program, which will prove fine for
our anonymous server. We will copy these default files to /etc: # cd /usr/local/wu-ftpd # cp ftpaccess ftpusers ftpconversions ftpgroups ftphosts ftpusers /etc We can implement an extra
security touch. In /home/ftp/ type: # touch .rhosts .forward # chown root .rhosts .forward # chmod 400 .rhosts .forward There
are some final modifications which are not strictly necessary but make
anonymous access that little bit easier. Hard linking /home/ftp/bin/ls to point to /bin/ls will allow clients to list
the directory through FTP. Make sure that the owner is root and it has group,
owner and all execute permissions only. Copying /etc/passwd and /etc/netconfig into /home/ftp/etc/ will provide the replace
the user and group IDs for each file and folder with their corresponding names.
However these files contain far too much sensitive information and need
editing. Only groups and users owning files within the FTP directory should be
left in, and password information should be left out ó there should just be an
x after the user name, not a random string of characters. Anonymous access
should now be available. Making your Servers Persistent
In the event that your Linux machine crashes or loses power, the
priority is to get the machine serving requests as quickly as possible. This
can be eased greatly if the system has been designed to recover from a crash ‑
if the services start themselves on boot up, it can save a great deal of time
trying to remember what needs to be started! There
are two main services that need special attention in order to enable autostart. First, we need to make sure the network is ready
for requests. The ifconfig utility must be configured
for each virtual host. We can make this automatic by editing /etc/rc.d/rc.local, or the equivalent file for your Linux
distribution. At the end of this file we append all the commands we used
originally when we set up the virtual hosts: # setting up IP masquerading for virtual hosts echo "setting up IP masquerading for virtual hosts" ifconfig eth0:0 123.123.123.123 route add -host 123.123.123.123 The
other service we need to start is the Apache web server. Again we will start
this by appending the setup command to a boot file. Editing the boot script of
the machine is a simple way to do this. You could create a startup script in init.d (called apache) and link to it from S20apache in rc2.d. A sample file follows: #!/bin/bash #(@) A startup and shutdown script for Apache case "$1" in start) # Starts Apache Server echo -n "Starting Apache Web Server" /usr/local/apache/bin/apachectl start ;; stop) # Stops Apache Server echo -n "Stopping Apache Web Server" /usr/local/apache/bin/apachectl stop ;; restart) # Restarts Apache gracefully echo -n "Restarting Apache after serving current web requests" /usr/local/apache/bin/apachectl graceful ;; *) # Incorrect parameter echo "Usage: $0 start | stop | restart" exit 1 esac exit 0 You
can create the symbolic link by changing directory to rc<n>.d (where
<n> is your runlevel - usually 3, but you might also want to create one
in rc5.d if you use a graphical login.) Create the link by entering ln -s /etc/rc.d/init.d/apache /etc/rc.d/rc3.d/S20apache. Summary
In
this chapter you have learned how to install the highly popular Apache web
server and configure it to meet your requirements and set up virtual hosts. You
were shown how to install and configure the ApacheJServ servlet as well as how
to modify your Apache configuration to make use of SSI and CGI. Other newer
powerful technologies such as mod_perl and JSP were also briefly
discussed. You were also instructed in the setting up and configuration of one
of the main open source FTP applications WU-FTP. In
addition to setting up the servers, this chapter also covered an important
administrative task, namely the analysis of the server logs files, with some
discussion on manual analysis using command line tools and automatic analysis
using the free Analog tool. Finally you learnt some tips on server persistence
ó by making minor alterations to system files you can restart Apache on reboot
and have it ready to receive requests. For a discussion on the advanced configuration of Apache, and for other
information on Apache itself, ApacheJServ and JSP, see Professional Apache. References
Web
The
Apache home page: Security
bulletins for Internet services: Java
Servlets Page: WU-FTP's
web site: More
information on WU-FTP: http://www.landfield.com/wu-ftpd/ Analog
logfile analyzer site: http://www.statslab.cam.ac.uk/~sret1/analog/ Jakarta
Development site: HOWTOs
Details
on how to set up web servers and clients: WWW-HOWTO How
to set up a multi-purpose web server: Apache SSL PHP/FI frontpage
mini-HOWTO Books
Peter
Wainwright, Professional Apache, Wrox
Press, ISBN 1861003021 Danny
Ayers et al, Professional Java Server Programming,
ISBN 1861002777 ©1998 Wrox Press Limited, US and UK.. | ||














