The Interactive Network Design Manual
The Systems Management Dimension
by Bruce Boardman
Grab Your Assets
Asset management software provides possibly the biggest bang for the buck when it comes to distributed systems management. Just about every vendor and analyst points to the money pit that is user support as the leverage for the thinning of the desktop. But even if all the desktops went to Jenny Craig, the servers head for KFC, just shifting the assets around. So through thick or thin asset management is a safe investment.
Asset inventories of hardware and software are usually bundled or integrated with software-distribution packages. Generally the routine for most systems management disciplines-not just asset management-is as follows: The agent software collects information (in this case hardware and software configuration) on a distributed resource and populates a server-housed database. The
collected asset information is updated on a scheduled basis, and provides an information store for accounting, problem resolution and software distribution.
Software-distribution applications can make powerful use of the collected asset information. By querying the database for machines with characteristics that match a particular profile, software can be delivered to a group of machines based on need and hardware profile, for example, all machines with 8 of MB RAM, 486 or better in the accounts payable department. A further refinement to this query-based grouping is the actual performance of the query at the point in time that the transfer is created. This increases transfer success by reducing the opportunity for dated or invalid asset information including a target machine that no longer meets the query parameters. It also makes grouping easier, because the minimum requirements can be specified once and sent to all machines meeting those requirements.
The agents have a finite ability to recognize hardware and software. Differences exist in the agent and management console operating systems supported, and the detail of gathered configuration information varies not only from product to product, but also between agent and operating systems within the same product. Adding unsupported configurations maybe as easy as adding a line in a file, or as clunky as waiting for another release of the software.
Changing files on networked machines is very much a change to the production environment and therefore requires production control procedures. Some products support workflow sign-off of files to support production control. The production control steps include install script testing, packaging of files/scripts, moving the package to an install server, scheduling the install, running the transfer, audit of the transfer results and setting up reruns for transfers that failed. In addition to delineated audit steps, multilayered security provides package creation and testing access to the technician s
etting up the transfer, administrative access for approval and monitoring, and execution/monitoring to the operators controlling the transfer. Some products carry their own security as part of the asset system while others rely on internal authentication and access. Generally products that work over multiple operating systems will have internal security.
When it comes to distributing files and software to remote locations the coordination and utilization over the network is one of the most difficult results to control. Full-function software-distribution systems conserve network utilization by fanning distribution jobs from a single instance of the job to intermediate servers that then propagate the job the number of times necessary for the downstream targets. This not only means sending the job (and the associated files) once from the origination point, conserving bandwidth, it also means scheduling and tracking only one job (even though multiple transfers have to complete before the job is successful).
Further bandwidth utilization saving are accomplished by distribution software setting a percentage of the bandwidth it will use. This may mean setting differing percentages for each link traversed depending on traffic and available bandwidth.
Transfer logs list which installs were successful and which ones failed, and why. A companion feature found in more advanced management software will not only show the failing node and the failing reason, but also will automate a retry if applicable, as in the case of network failure.
Coordinating the installation of a new version of client software with changes in host systems can be a helpdesk nightmare. To this end a necessary feature is the implementation scheduling of new software. Files can be dribbled out over days or weeks, but not be available for install until the appropriate day. This not only limits bandwidth impact, it ensures that users are on the right version.
Although setting up the job and files and actually running th
em may seem like the lion's share of the coordination, it really is not. The biggest challenge comes on the back end. And that is the installation process.
Installation shields are important to limit user interaction (better known as screwups). Three basic install scripts or shield types are available: programmatic shells, scripts and delta scripts. An example of a programmatic shell is any install wizard that comes with commercially available software. Some allow for the limiting or defaulting of user input, and are best applied to the install of commercial software and are not usually part of the software-distribution software. Scripting tools are, however, found in software-distribution software, and offer an easy way to conditionally limit user interaction by prefilling and pushing the buttons on the programmatic install wizards. Install scripts generally come with example scripts for installing most popular software packages, so that with some cutting and pasting one can create a custom install suite.
The third type of install approach, the delta script, takes a snapshot of the machine's hardware and software inventory before and after installation, then creates an install executable of the difference, that when run, changes the target machine to the change delta. This easy approach does have a "gotcha," in that the before snapshot on target install machines needs to be relatively similar, or installation may fail. Although delta scripts do allow for some differences, it is best applied when all of the target machines are very close in configuration.
Once the new software is installed the process comes full circle requiring software-audit and -metering software. This is a feature generally left to point products, but enterprise management suites are beginning to include it as well.
Job Job, or the Work of Job Scheduling
Job scheduling is generally a discipline required by a centralized computing approach, but may also apply to larger distributed groups. Job scheduling
controls the running of batch jobs including backups, updates and reorganizations. It is made up of calendaring, dependencies, accounting and output controls.
Job scheduling is not a job control language but rather a scheduler of schedulers, though it can run as executables. The main function is to coordinate jobs across multiple platforms that may span multiple operating systems.
In a distributed environment this discipline provides a focused console for viewing and controlling all jobs running over the distributed processors. It coordinates processing based on dependencies such as file creation/arrival, time, job predecessors and operator input as well as automatic restart and return code processing, such as event forwarding.
The calendar feature of a job control system must be flexible and easy to use. It should have the ability to handle any number of calendars that hold differing work schedules templates. For example a given calendar may contain a daily, weekly, monthly and year-end schedule that can then be overlaid in order to project and set up the actual run for a given day. Holiday schedules should also be a template that can be applied to and/or shared between calendars.
Conditional job and job-stream processing is also an important part of controlling jobs across distributed processors. Dependencies that rely on literal, variable and operator-supplied conditions need to be available from within jobs and job streams running on multiple systems concurrently. Also part of the conditional processing is a listing, and at run time a checking for available resources, such as tape drives, and distributed machines on which to run.
Two more advanced features are job-stream simulation and load-balancing.
Job-stream simulation runs the job streams off-line to check dependencies and scheduling. Although no products use performance data to simulate load, the scheduling clock can be sped up to verify an entire days' run in a few minutes. At the very least, the job setup
should allow for a user-supplied estimate of the elapsed job time and provide a graphical representation of job-stream-processing progress with drag-and-drop editing to realign and tweak job-stream performance and coordination. A better approach (not available with any product yet) is the evaluation of actual run-time data for a particular calendared schedule as a projection of run-time estimates as day-to-day elapsed execution time can vary greatly depending on calendar workload.
Pooling distributed computing resources and distributing jobs across those resources is the basic idea of load-balancing. This is, however, the most basic form of load-balancing as the job will run on whatever resource is available. A more sophisticated balancing is achieved when each computing resource is classified as to its power, using variables like memory, processor and speed. Usually in addition to resource classification the job will contain some comparable classification indicating the amount of power required. The ultimate in sophistication would be a collection of the run-time statistics for jobs on specific platforms, and then a dynamic assignment of the best platform for a particular job, given the jobs running and those yet to come. This is obviously a much more complex formula and is not offered to my knowledge by any product on the market (therefore, hang onto your operators for the foreseeable future).
Updated December 17, 1996

Print This Page
E-mail this URL
|