|
Making Web Browsers Talk Back |
The World Wide Web, accessed through forms-capable browsers like Netscape and Mosaic, can be used for two-way communication. Here's how to create forms and scripts for collecting information.By Patrick M. RyanThe World Wide Web (WWW) is an excellent tool not only for retrieving information from remote sites but also for allowing you to interact with sites in a way similar to transaction processing. Several Web browsers, most notably Netscape and Mosaic, let you enter information into the browser and have that information sent back to a server. The resources most commonly accessed through WWW are documents written using the Hypertext Markup Language (HTML). With HTML, portions of a document can be treated as hyperlinks (or references) to other Web resources. These elements, which are often textual and sometimes graphical, appear as highlighted objects when viewed in a browser such as Netscape. If you click a mouse or press a key related to one of these highlighted objects, the browser goes out to the net and retrieves that hyperlink. HTML documents may be retrieved from machines that are running the Hypertext Transfer Protocol (HTTP) daemon. This daemon (HTTPD) listens to a certain port (default 80) for requests for documents within a certain domain on the host system. Mosaic and HTTPD are both products of the National Center for Supercomputing Applications (NCSA). (See the June 1994 Open Computing ``Hands-On'' section tutorial article, ``Riding the Internet Wave'' for setting up Mosaic.) HTML also allows the reader to enter information into the HTML document and have that information passed back to the server machine's HTTP daemon. These types of HTML documents are called forms. The method for passing the information back and processing that information is called the Common Gateway Interface (CGI). Associated with an HTML form is a CGI program or script. The CGI specification describes what CGI programs can expect from standard input, what they should send to standard output, what environment variables they can use, and what may appear on the command line. Nearly all current browsers support forms. Netscape Navigator and versions of Mosaic later than 2.0 support forms. The research for this tutorial used Mosaic for X version 2.4 and httpd 1.1. There are many things to learn about implementing HTML documents, but some important items you should learn about include configuring Web server (HTTP daemon program), how to write HTML forms, how the server and CGI program interact, and what security measures to keep in mind. Server ConfigurationImplementing a form system using HTML requires you to write
two files: an HTML document and a CGI program to process the
input from the form. Both Listing 1
and Listing 2 demonstrate a simple
product-ordering system used by the prolific and fictitious
Yoyodyne Corp. We assume that their HTTP server resides on
Configuration of the Web server daemon is a straightforward but long process and is a subject worthy of an article all to itself. The URL (Uniform Resource Locator) for NCSA's excellent documentation on HTTPD configuration can be found at the end of this article. Once you have an operational HTTP daemon on your system,
familiarize yourself with the directory structure of the server.
The server has a directive named ServerRoot that
points to the top of the HTTP daemon's directory tree (often
Look at the file Form SyntaxForms are set up in an HTML document using a As mentioned before, CGI scripts must reside in the
directory pointed to by the ScriptAlias parameter. A
typical value for the alias directory name is
As with CGI scripts, the locations of all resources--accessed
through the Web server daemon, the documents served, or otherwise
--are restricted. The DocumentRoot directive points
to the top-level directory where these resources may be accessed.
(The default value for DocumentRoot is
However, if the first part of the URL file path has the form
HTML ButtonsAn HTML form can make use of three different types of
interface elements or tags: The general form of an The
The The The The Interpreting Input From the ServerOnce a user presses the ``Submit'' button, the browser will
send the data entered by the user back to the Web server in a
compact format. The server will attempt to resolve and validate
the URL parameter of the CGI programs may be written in any language. Simple ones are often written using just the Bourne or C shell. More complicated ones use C, C++, or Perl. I chose Perl for our sample CGI program because of its string-handling and associative-array capabilities. The server communicates with the CGI program through standard
input, standard output, environment variables, and the command
line. The input to the form is passed to the program via
standard input. The server sends a single newline-terminated
line of ASCII text to the CGI program. That line consists of
Interpreting the CGI program input is a straightforward
tokenizing task. Typically, the
Once the CGI program has the input from the server in a usable format, the program can do anything it wants with it. Typical uses for HTML forms are bug reports, registration forms, product-ordering forms, and forms used to query a database. Use your imagination. Responding to the ServerThe CGI program sends information back to the server by way of
standard output. The standard output of the program is
interpreted according to the Writers of HTML forms make use of the CGI program's output in several different ways. Among them are:
Sophisticated CGI programs may create new HTML documents and forms on the fly. The HTTP daemon defines a set of environment variables for the
CGI program that provide information about the server and about
the client where the browser is running. Consult the CGI
specification for a complete list of these environment variables.
Perl stores environment values in an associative array called
These are especially useful for logging information about who is using your form. When setting up a new HTML form, it is wise to test the output
from the form before you associate it with a CGI script that does
real work. To that end, you can set your SecurityCGI works by allowing an anonymous remote user to start up a program on an HTTP server's host machine. That simple fact should send chills down the spine of any good system administrator. HTTP has a sophisticated authentication system and is unlikely to be foiled by the novice cracker. However, a more experienced and less noble cracker might exploit vulnerabilities present in an insecure CGI program. CGI programs often do their job by running additional programs
on the server system. These programs may be fed input that
originated in a submitted HTML form. Although using form input
this way is a normal thing to do, a secure CGI program should
carefully look at any text being passed to another program. If
the CGI program sees any characters that have special meaning to
the command shell, the program should either escape them with
backslashes or generate an error message and terminate. Consult
the manual pages for the Bourne
and C shell for a complete list
of these characters. Some especially dangerous characters are
semicolons, backquotes, and ampersands. Note the
Say you have an HTML form that returns a user's email address.
In your Perl CGI script you copy this value to a variable named
# Send confirmation back to the user. open(MAIL, "|/bin/mail $address"); print MAIL "Your order has been processed. Your PO number is $number\n"; close MAIL; A legitimate user would enter a value like
``joe@company.com''. The second argument to Perl's
The unscrupulous cracker could use a bogus address and then
follow it with a semicolon and another command. For instance,
let's say this cracker entered ``/dev/null; rm -rf /home''.
Unless the CGI program notices the semicolon embedded in the
Several languages, including Perl and most command shells,
have an Concluding ThoughtsHTML forms are a slick way to allow remote users to interact with your system in a clean, controlled way. Considering how sophisticated this interaction can be, HTML forms are easy to set up. All of the really hard work is done by the HTTP daemons and Web browsers. AcknowledgmentsThanks to Rob McCool (formerly of NCSA) for writing the HTML documentation on HTTP and CGI and to Steven E. Brenner for writing the Perl routines for tokenizing CGI input. The HTML form I used as a baseline was created by Diann Smith (Langley Research Center). For More InformationHere are some pointers to relevant documentation available through the Web:
Converted to HTML and additional copy/style edit by Walter Zintz |
Print This Page Send as e-mail |












