

Making Your Server System Scale
By Jay Milne
Scalability is an overused term, but we can define it as the ability to add more users to your system or to process more transactions while retaining the same performance. However, getting your server system to scale is more of a black art than an exact science. It involves the hardware platform, the operating system and the applications--but not as
separate entities. Rather, it requires a holistic approach, where each part of the system is tightly integrated with the others.
Our testing--intended to determine which factors can enhance scalability--focused on Windows NT, mostly because ou
r corporate lab partner is migrating to NT and we would, therefore, be able to share in the experiences. We used Microsoft Exchange and SQL Server as our sample applications, partly for the same reasons, and because many of the benchmarking tools we have in our San Mateo, Calif., lab are NT oriented.
Where to Add the Hardware
The hardware component is the most tangible and easily changed, but beware: Simply adding hardware doesn't guarantee better performance or more scalability. Your server is only as fast as its weakest link. For instance, the NetWare 4.1 file system is not symmetric multiprocessing (SMP) system-aware, so adding CPUs won't improve performance. But adding more memory could, since the additional memory will be used to cache data, and the operating system will not have to read files from disk.
We ran BlueCurve's Dynameasure 1.2 performance tool, which executes various scripts using multiple clients to a Microsoft Corp. SQL Server 6.5 database on NT Server 4.0. We used a read
-only script and performed the test four times, incrementing the number of CPUs by one each time (our test server was a Compaq Computer Corp. 166-MHz Pentium Pro ProLiant 5000 with 512-KB cache processors and 512-MB memory). As we incremented the CPUs, our transa
ctions per second didn't increase significantly but our responses did improve. After making one simple modification to our SQL database (increasing the Max Async Cache setting from 8 to 20), our transactions per second increased by 15 percent.
Determining how far your server can scale and which servers scale best isn't easy, but if you look at the official TPC-C benchmark numbers as of January 21 (see www. tpc.org), the system with the highest throughput was the Digital Equipment Corp. AlphaServer 8400 5/350 4Node cluster with a score of 30,390 tpmC. However, it costs more than $9 million. Intel Corp.-based systems fare better in the price/performance category. The Digital system was a clustered system, and if you want Intel-based clustering, y
ou'll have to wait. Most of today's clustering solutions are for failover only, not scalability. Even Microsoft's WolfPack application programming interface (API) is only for failover. Phase II of WolfPack, scheduled for 1998, should have clustering for scalability.
Intel-based servers scale well to about four CPUs, after which performance declines for the dollar. Even Microsoft has admitted that many of the six-way (and more) CPU systems are not cutting the mustard. Still, for servers, the more Layer 2 cache the better. The Pentium Pros come in two models: the 256 KB and the 512 KB. We've found that a Pentium Pro 166 with 512 KB of L2 cache will almost always outperform a 200 MHz with 256 KB of L2 cache, especially in an application server.
To see how well our heavy-duty Compaq ProLiant 5000 would scale, we ran a synthetic benchmark tool from Neal Nelson and Associates. The Business Benchmark runs on the test server and stresses various system components. The biggest performance gain was from going f
rom one CPU to two. Adding another CPU didn't necessarily increase read and write performance to disk. In addition, once the load factor (number of users) was around 17 or 18, the single CPU system degraded faster than the two, three or four CPU systems, which tended
to perform the same. This is to be expected, since this type of operation isn't CPU bound. The additional CPUs increased scalability on CPU-intensive applications.
One of the tests in the Business Benchmark Suite is the Simulated Transaction Processing Workload. With one CPU, the system took almost 300 seconds to complete the test, but only 151 seconds with two CPUs. After that, we saw little gain. With three CPUs, the gain was 133 seconds; with four CPUs, 128 seconds.
|