WORKSHOPS

Adding Interactive Services To Your Web Server

Eric Hall

Look around the World Wide Web, and most of what you'll find today are onlineequivalents of printed propaganda. There may also be some downloadable filesor imagesæas well as the seemingly requisite overabundance of hypertextlinks to every imaginable corner of the Internet. But, for the most part,Web servers are about as interactive as their paper-based counterparts.

While getting to this level of highly-functional-yet-somewhat-interoperableonline publishing can seem miraculous to many, it is far from where thesesystems need to be if they are ever to be something more than an outletfor marketing's creativity. In order for the Web to become the next-generationplatform for distributed computing, it must go beyond read- only into theworld of interactive. Users should be able to add, edit and delete datain real time, as security and application locking allow.

We'll outline some of the concepts and technologies to help you get theretoday. We'll also look at some basic examples, and provide pointers to othermaterials where appropriate and available. By no means can we cover everything.We will attempt to give you enough information so that you can get startedin the right direction.

"What kinds of applications can be called from the server, and wherecan they be found?" These are the two most common questions asked atthis point. Before we can answer these questions, we must first look athow the server and applications communicate.

The Common Gateway Interface (CGI) Most, but not all, Web servershave two basic methods of handling back-end data. They either read a fileor communicate with other programs through the Common Gateway Interface(CGI). It's up to the application to figure out what to do with that data,and to return HTML to the server. This is an extremely simplistic overviewof the process, and there are many possible exceptions and extensions.

Examples of CGI can be seen on many servers today. Clickable image mapssuch as those found on Network Computing's home page http://techweb.cmp.com/nc/currentuse CGI. As you can see in the figure "Clickable Image Maps,"the client sends an HTTP request containing the URL and XY coordinates tothe server, which passes the coordinates to a local application. The applicationexecutes an event that has been associated with those coordinates and returnsHTML to the server for delivery to the client. In the case of image maps,most of the time this HTML data consists of an HTTP redirect command thatinstructs the client to request another document from the servers.

Another common use of CGI is for searching through files or databases (Yahooand WebCrawler are good examples). A user constructs a query that is sentto the server, which t hen spawns an application that conducts the actualsearch and returns the "hits" in HTML form.

So, to answer the question about the kinds of applications that can be calledby the server: They include any application that supports CGI on the frontend and can generate HTML responses. In fact, you can call almost any programyou want, as long as it supports command-line operation.

However, the unfortunate truth is that no two Web servers are alike. Theyhave all been developed with specific audiences in mind and "enhanced"to suit those markets. This means that just because something works withserver brand X does not mean that it will work with server brand Y. Forexample, there are considerable differences between the Netscape server'sCGI interface and the O'Reilly and Associ ates WebSite server's CGI interface,even though they both use Windows NT's CMD.EXE command interpreter. Thedifferences between the various Unix shells, Digital's DCL, DOS' COMMAND.COM,and all other platforms, can also impact what you want to do.

CGI Detailed: GET Vs. POST One of the most common methods of implementingCGI is through the use of GETs and POSTs. In the "Clickable Image Map"figure, the client submitted a URL with a question mark and XY coordinatesappended to the end. This method of CGI call is known as a GET. Not onlyare GETs the oldest, but they are also the default and easiest to implement.Another form of call, known as POST has become more popular, and is generallyconsidered to be much more powerful. The differences between them are subtle,but significant.

The GET method is named for what the client does. It asks the server toGET the HTML document at HTTP://URL?Param. The server sees that the URLpoints to an executable, calls it and passes it the parameters.

In contrast, the POST method is used when the client submits a block ofdata to the server. The client is not necessarily assuming that anythingwill be returned to it, although the connection is still op en. In most cases,the server's Webmaster has configured things so that the client is givena thank you message, or response data is sent to the client.

CGI Examples Let's walk through an example using O'Reillyand Associates WebSite server, since its use of Windows NT's CMD.EXE makesit easy to write simplistic demonstration scripts. WebSite can also runon Windows 95, so it's easy to use on a personal system for learning purposes.Besides, you can evaluate it free for 60 days. It's available at http://website.ora.com.

Once the WebSite Manager is downloaded, use the administrative tool to locatethe directory for DOS executables (C:\WEBSITE\CGI-DOS is the default). UsingNotePad or some other text editor, create a file called DIRSEARCH.CMD inthat directory and carefully copy the conten ts of the "Sample File:DIRSEARCH.CMD" figure and save the file. Using a Web client, connectto your server and point to http://<servername>/DOS-CGI/DIRSEARCH.CMD.Assuming that your serv er is configured to allow your client to access DOS-basedCGI scripts, you should get an HTML page stating No Files Specified. Don'tworry. This is what we want! Connect again and this time add ?command.comto the end. You should see a standard directory listing of all the filesnamed command.com. DIRSEARCH.CMD is a simple NT batch file that scans yourserver's C: drive for any occurrence of a specified file and returns thedirectory list to the user. If a file is not specified, then an error messageis displayed. We'll extend this program later, but first we need to examineit in detail.

Always Test the Data First The first line in the box belowis a parameter test to make sure that a user has supplied a filename. If"%1" is blank (that is, does not exist), then the batch file jumpsto the :errlabel label, which returns an error message to the user. Anyseasoned programmer will tell you that it's important to test your programinputs before acting on them, and it is even more crucial h ere for severalreasons.

Most important is an open issue regarding system security. There are knownweaknesses with many operating systems that allow shell metacharacters tobreak your script or a called application, thereby exposing your system.It sounds far-fetched, but it happens all the time. By adding keystroketraps to your script, however, you can go a long way toward preventing thisfrom happening. Refer to the CERT archives for more information on thissubject. This weakness is not prevalent on all systems, but it is on manyof them.

Another good reason to test the parameters is for program control. As we'llshow below, you can use this function to minimize your development. Finally,there's a programming reason to test the input: to make sure that the datais valid. If instead of conducting a director y search, you queried a database,you would want to make sure that all of the necessary information had beenprovided. Testing the data in a script is more efficient than letting thecalled applicat ion fail. You would still have to test for errors, so youmight as well do it up front.

Define Your MIME Type The use of MIME types to coordinate data exchangeis now part of the HTTP specification, as of version 1.0. As you can see,both the :goodlabel and :errlabel sections begin with MIME type declarationsechoed to STDOUT.

For our purposes, we are simply declaring that the following data is HTML.The Web client interprets this and knows to read and apply whatever HTMLformatting tags it sees in the data. If the MIME type had not been declared,then your browser may or may not interpret the HTML tags.

As you may have guessed, you can also send binaries or other nontext databack to the client. For example, if you wanted to return a GIF image, youwould define the MIME content-type as Image/GIF. The client would then receiveand display the image appropriately. Likewise, you could send applicationbinaries by using the Application/Octet-stream MIME type, and so on. Notethe echo statemen t that follows the MIME declaration. You must send a blankline after you send the content-type: statement.

The Rest of It The remaining code is fairly self explanatory.There are, however, some things worth noting. Notice that the < and >characters are proceeded by ^. This is to prevent CMD.EXE from interpretingthem as redirection commands. Also notice that the DIR C:\"%1"/sis the actual command that we are calling. The remaining lines of code areused to prepare the environment and command line, as well as generate HTMLon its behalf.

Image map servers and dynamic directory builders are two of the more commonCGI-aware applications that you'll find. Additionally, there are a varietyof third-party tools, such as database requesters and indexed searchingagents, available on the In ternet. If you wish to write your own, you canapply the same concepts we've shown you to a compiled language.

ADDING Support for Forms In its current form, DIRS EARCH.CMDmust be given a parameter on the URL's command line. This is far from intuitive,and if you made your users work like this, you'd have very little trafficon your site. Most users prefer to enter information on a form, and so doadministrators, since the level of control increases tremendously.

The easiest way to build a form is through the use of the <ISINDEX>HTML tag. This simple tag has the delightful affect of putting an edit boxwidget on your page. When a user enters text into the edit box and pressesthe enter key, the text is appended to the URL (with the question mark),and the page is reloaded. At this point, a valid search string exists, andthe script will process accordingly.

For our example, change the line in the figure "DIRSEACH.CMD"that reads ^<H1^> No Files Specified^</H1^> to Enter a filenameto search for: ^<ISINDEX^>. Now point your Web browser to /BIN/CGI-DOS/DIRSEARCH, and you will be greeted with an edit box. Type in COMMAND.COM andpre ss enter, and the same script will now branch through the :goodlabelsection, and return a directory listing.

Don't be confused by the name ISINDEX, it's an unfortunate moniker for sucha flexible gizmo. This tag does not automatically make your documents searchable;some other tool (such as DIR, in our case) must do the searching if that'swhat you want. The <ISINDEX> tag simply provides an edit box whosecontents are appended to the current URL.

Just as the <ISINDEX> tag is poorly named, so is it poorly treatedby most browsers. Many of them prepend the edit box with an inane commentabout the document being searchable, which is just plain wrong. In orderfor us to have an attractive form that doesn't confuse people, we'll haveto use "real" forms.

Creating Real HTML Forms Form creation with HTML is a fairly simpletask, and as we've shown, passing data with CGI is also straightforward.When you combine these two components' simplicity and strength, tremendousres ults can develop.

In order to illustrate this, we need to modify our sample code. Edit DIRSEARCH.CMDonce again, and replace the <ISINDEX> tag with the code shown in thefigure "ISINDEX Replacement".

Save these changes, and reload our now familiar URL. Now, instead of seeingan <ISINDEX> input field, you should see a real HTML form. When youenter a file name into the edit box and click on the "submit"button, the script will process the directory search and return the listof matching files.

Let's examine this syntax in detail. The line that contains the HTML tagdoes what you might guess: It notifies the client that the following textcontains form elements, until a closing tag is encountered.

There are additional keywords that can be used with the tag, which providegreater amounts of control. One of these is the Action= statement, whichtells the browser where to send the form contents. By default, the destinationis a URL but it could be any CGI appli cation available to the user. Forcompanies that have several distributed Web servers, but only one or twoCGI servers, this is a great way to split the load. Also of importance isthe Method= keyword, which allows you to specify that the query data isto be submitted using the GET or POST methods as described earlier.

Widget Lingo Another keyword, the <input> tag, has many options,but it basically applies to almost every type of form widget available.This includes edit boxes, radio buttons, check boxes and command buttons.The only types of widgets not controlled by <input> are list boxes(either drop-down or scrollable) and large, multi-line text boxes, bothof which have their own specific tags.

In our example, the first occurrence reads <Input Name="FILENAME">.This syntax will create an edit b ox with the resource name of Filename.The Name= command associates a name with the control data. After successfullyrunning a query, look at the client's URL command line. Notice that whereit used to have ?command.com, it now has ?filename=command.com. One finalthing to remember about the Name= option is that it does not apply to commandbuttons (you'll see why in a moment).

Notice that we did not explicitly declare the edit box as such. That's becausethe default of <Input> widget is an edit box. If we chose to do so,we would declare it using the Type=Text attribute. Other valid Type= declarationsinclude Password fields (like edit boxes, only keystrokes are echoed asasterisks), CheckBox, Radio, Submit and Reset.

Additionally, we could preload a string into the edit box by using the Value=statement. This can also be used with password fields, as well as checkboxes,radio buttons and command buttons, although its usage varies widely amongthem.

The default width (in theory, at least) of an edit box is 20 characters,and the height is one. You can define different sizes using the Size= directive.When setting the width, use Size=W. When setting the height , use Size=W,H.Note that setting the width does not restrict input to that amount of characters,but instead defines the width of the box itself.

One problem with the GET method is that text is appended to the URL as entered.If you let the user enter as much as they want, you run the risk of scriptor called application failure due to overflown buffers. The Maxlength= attributeis useful for limiting the maximum number of characters that an edit boxwill accept, thereby limiting your system's exposure.

Altogether, our simplistic <Input Name="Filename"> examplecould (and probably should) read as <Input Type="Text" Name="Filename"Width= 20 maxlength=20>. Note the use of quote marks around the textualvalues; this is an HTML requirement that needs to be followed for maximumcompatibility.

Command Buttons Command buttons are not nearly as complexas the edit boxes; there are only two possible types. Our example uses theType="Submit " variety, which builds the parameter list and appendsit to the URL specified by the <Form Action=....> tag. The other typeof command button is Type="Reset", which clears the form and resetsall of the widgets to their original state.

The only optional attribute that can be set for command buttons is Value=.In this case, the Value= attribute defines the text that appears on thebutton's face. It does not change the button's behavior. For our example,we could change <Input Type="Submit">

to <Input Type="Submit" Value= "Search">, andadd another button that cleared the editbox by using <Input Type="Reset"Value="Clear">.

And Then Some You should experiment with the differenttypes of widgets and their optional parameters. Together they provide afairly comprehensive set of tools for getting user input into your CGI scriptsand applications. A good starting place is http://www.ncsa.uiuc.edu / SDG/Software/Mosaic/Docs/fill-out-forms/overview.html, and by searching Yahoo's (http://www.yahoo.com) databasefor "HTML and Forms."

Anyone familiar with other types of graphical forms-based tools will tellyou that these gizmos, while handy, are hardly sufficient. New extensionshave been added to the HTML drafts that address some concerns, but not all.Vendors are stepping up with products that address these concerns, however,so you should be able to find something that suits your need.

On the back end, Lotus Development and Netscape have both promised to shipHTML filters for their respective groupware products, allowing you to leveragein-form value testing, mathematical functions and the like. How these extensionswill be handled by generic clients is anybody's guess. But it is a stepin the right direction.

As for the cli ents, Netscape Navigator 2.0 and Oracle's PowerBrowser clientsboth promise to offer event-driven scripting language extensions that willallow Webmasters to embed extended controls into the forms directly. Althoughthese clients will perform the functions as directed, it means more workfor the administrators, since you may have to support multiple proprietaryextensions. Again, it's a step that will drive further standards-based development.

GET vs. POST Revisited We have been using the GET methodto pass data down to our script. The POST method, which offers more capabilities,deserves some attention as well. To see a quick demonstration, change the<Form...> line in DIRSEARCH.CMD to read <Form Method="POST"Action= "/BIN/CGI-DOS/DIRSEARCH">, and reload the form again.This won't change the behavior of your script, but you may receive a securitynotice from your client.

Remember, we've been using O'Reilly's WebSite. It's CGI interface to CMD.EXEis relatively consistent between POST and GET. However, there is one significantdifference. WebSite manager generates a unique temporary file that containsthe dat a provided during the post operation. You can sniff through thisoutput file on a programmatic basis if needed, allowing your script to branchaccording to content. This is extremely handy if you have to allow usersto submit large strings of text or binary objects.

Beyond Sim ple CGI GET and POST both offervalue, depending on your objectives. However, in this implementation theyare also both limited in that they rely on NT's CMD.EXE, a character-basedcommand interpreter, as well as require that scripts and applications supportthe use of STDIN and STDOUT. This precludes the use of Windows-based applicationsas CGI-callable services.

A handful of vendors are working together to develop WIN-CGI to circumventthis problem. Implemented as a CGI-aware library, you can extend any applicationfor which you have the source code, r ecompiling it to read and write CGIinstead of the more traditional I/O interfaces. Such efforts aren't new.Many sources are available for CGI libr aries for various servers and operatingsystems. Contact your Web server vendor for leads.

Now that you understand some of the concepts behind CGI programming, youshould be able to develop applications that communicate between the Weband any data system. Remember to follow the specifications, and do a lotof testing, and you'll come out ahead.

Eric Hall is an independent networking consultant, currently workingin Europe. He can be reached at ehall@nwc.com.


Sample File: DIRsearch.CMD

IF "%1=="" goto errlabel
:goodlabel
echo Content-type: text/html
echo.
echo ^<html^>^<head^>
echo ^<title^>All occurances of %1^</title^>^</head^>
echo ^<body^>
echo ^<h1^>All occurances of %1^</h1^>
echo ^<pre^>
dir c:\"%1" /s
echo ^</pre^>
echo ^</body^>
echo ^</html^>
goto exitlabel
:errlabel
echo Content-type: text/html
echo.
echo ^<html^>^<head^>
echo ^<title^>No files specified!^</tile^>^</head^>
echo ^<body^>
echo ^<h1^>No files specified^</h1^>
echo ^</body^>
echo ^</html^>
goto exitlabel
:exitlabel


ISINDEX Replacement

echo ^<Form^>
echo Enter A File Name to search for:
echo ^<Input Name="FILENAME"^>
echo ^<Input Type="SUBMIT"^>
echo ^</Form^>



January 15, 1996

Valley View, Live!

Research and Reports

Storage Virtualization Guide
May 2012

Network Computing: May 2012

TechWeb Careers