Upcoming Events

Executive conference

Cloud Connect March 16-18

Comprehensive thought leadership for executives, IT professionals and developers. Topics include: the ROI, cost and economics of on-demand computing; Migration strategies to move from on-premise to cloud-based IT; Vertical cloud specialization, tailoring features and architectures to specific applications, industries, and customer ecosystems

More Events »

Subscribe to Newsletter

  • Keep up with all of the latest news and analysis on the fast-moving IT industry with Network Computing newsletters.
Sign Up
Network Design Manual
 
Beginning SQL Programming:
Q and A


June 25, 2001

Got Questions? We've Got Answers!

Thank you for submitting your questions to Brian Matsik and Kevin Spencer, co-authors of "Beginning SQL Programming."


Brought to you by:




Check It Out!



Questions:
  1. Writing queries
  2. Working in MS SQL Server 2000
  3. Problem with column headings
  4. Question on auto mail feature in SQL 7.0 Server
  5. Records, unique values and derived tables
  6. Problem concerning quotes within quotes
  7. Deleting duplicate rows from a table
  8. Error with SQL on an NT server
  9. Maintaining a database periodically
  10. Viewing portions of a result set


Question 1:

Q: How can I write a query to select, say, the top n companies sorted according to their revenues from, say, a total of m companies?


Kevin Spencer responds:

A: Well, it depends on the database you're working with, of course. I will be using SQL Server for this example. ISO (ANSI) SQL has an operator called "TOP" which is used to select a given number of records. If an ORDER BY clause is used, it selects the "Top X" records in the order of the ORDER BY clause.

Let's say that your table is called "companies" and each table has a foreign key to another table in which the revenues from any number of other companies are stored. We'll call that other table "revenues." The foreign key to the other table would be a unique identifier for each company (in the "companies" table) which I'll call "companyid." In the "revenues" table, there are many company listings, with a "revenue" column, a "companyid" column (foreign key), and a "from_company" column (another foreign key to the first table), which contains the "companyid" of the company from which the first company ("companyid") has collected revenue.

Companies
Revenues
companyid companyid
companyname revenue
etc. from_company

Now, we will need a JOIN query to get the total revenues from the second table, and create an aggregate column in our result set. But, since the aggregate operator is SUM for the second table, which returns a single value from a number of rows, we need to aggregate the value in the first table to make it work. By using a GROUP BY clause, we separate the single SUMmed value in the revenues table by company:

SELECT TOP 10 SUM(revenues.revenue) AS revenue, 
    companies.companyname 
FROM revenues INNER JOIN 
    companies ON 
    revenues.companyid = companies.companyid 
GROUP BY companies.companyname 
ORDER BY SUM(revenues.revenue) DESC 
Now, that's not exactly what you asked for, since you want only the "TOP m" records from the second table, but that's about all you're going to get in a single query. There is a way to do this, however, and that's by creating a view to use for the second table. The view is created using the following query:

SELECT TOP 10 SUM(revenue) AS revenue, 
  Max(companyid) AS companyid 
FROM revenues 
GROUP BY companyid 
ORDER BY SUM(revenue) DESC 
Then we simply replace the table name "revenues" in the first query, and substitute the view name (which I will call "TopTenRevenues" in the query):


SELECT TOP 10 SUM(TopTenRevenues.revenue) AS revenue, 
    companies.companyname 
FROM TopTenRevenues INNER JOIN 
    companies ON 
    TopTenRevenues.companyid = companies.companyid 
GROUP BY companies.companyname 
ORDER BY SUM(TopTenRevenues.revenue) DESC

Now, this could also be done in a single Stored Procedure, by creating a cursor instead of a View, and selecting from the companies table JOINed with the cursor.

Very tricky question!

Question 2:

Q: I have a table that looks like the following. All three cols are ints.

Cust ID QuestionNo ResponseId
1225 1 1
1225 2 4
1225 3 2
1225 4 5

What would be the most efficient way in MS SQL Server 2000 to create the following output?

Cust ID Answer to Q1 Answer to Q2
1225 1 5


Kevin Spencer responds:

A: This is another situation where you can't accomplish the objective with a single query. You have a table with three columns, custId, QuestionNo and ResponseId, and you want to return a table with as many as (in your example) four or more columns, depending on the number of questions. In other words, the only thing the "result set" you've given has in common with the original table (which, for the purpose of this discussion, I will refer to as "responses"), is the "CustId" field. At first glance it looks like it has more in common, but each of the columns you've given is actually a result set. Lets add some more data to the second "table" to illustrate this better:

Cust ID Answer to Q1 Answer to Q2 Answer to Q3 Answer to Q4
1225 1 4 2 5
1226 3 4 2 2
1227 3 2 2 5

As you can see, each "column" in this "table" is a result set, consisting of the result set of the query.


SELECT ResponseId FROM responses WHERE QuestionNo = y 

So, about the only way I can think of to tackle this effectively is to create a view on each QuestionNo, using a variant of the following:

SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 1 (view1) 
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 2 (view2) 
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 3 (view3) 
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 4 (view4)

I will assume for the purpose of this discussion that each customer has only one response to each question. Now you construct a query to join all of these result sets into a single result set. To keep duplicates from occurring, we use aggregate functions to group everything together:


SELECT responses.CustId, 
  Max(view1.ResponseId) AS Q1Answer, 
  Max(view2.ResponseId) AS Q2Answer, 
  Max(view3.ResponseId) AS Q3Answer, 
  Max(view4.ResponseId) AS Q4Answer 
FROM responses INNER JOIN 
  view1 ON 
  responses.CustId = view1.CustId 
INNER JOIN 
  view2 ON 
  responses.CustId = view2.CustId 
INNER JOIN 
   view3 ON 
  responses.CustId = view3.CustId 
INNER JOIN 
  view4 ON 
  responses.CustId = view4.CustId 
GROUP BY responses.custid 

Now, I have one last comment. It seems that, unless your original table has to have the structure you described, it would be more efficient to restructure that first table like the following:

Cust ID Q1 Response Q2 Response Q3 Response Q4 Response
1225 1 4 2 5
1226 2 2 3 1

In your version of the table, the QuestionId is repeated per CustId, and the CustId is repeated per Question. By using this format, the Questions are implied by the column name, resulting in less use of space (only one record per customer).

Question 3:

Q: What causes column headings to be excluded from the result set? This appears to happen when a combination of events occur:

A select statement containing only literals followed by a select statement against a database table that includes compute sum totaling.

If the first select (literals only) is not made, the column headings from the 2nd select appear. But when both are run (within either SQL Query Analyzer or a stored procedure), the column headings are excluded.


Brian Matsik responds:

A: If you write a query where you a computing a value or using an aggregate function then there is no field name. For instance, MAX(OrderID) does not have a field name since this is technically a dreived column. If you are using literals such as SELECT 'First Name' then this is also a derived column.

So, in order to avoid this problem, always name a column that is derived. Use the AS operator to do a proper rename of the item. Some databases such as Access will create a derived column name (MaxofOrderID) for you. SQL Server on the other hand will return an empty column name.

As for the actual content of the message, I would need to see an example of the SQL to be sure that I am answering the correct question.



Question 4:

Q: I am running a 7.0 SQL server. I have been trying for two months to get the auto mail feature to work. I have it working perfectly on my 6.5 server. Does anyone have a suggestion or heard of this issue before?

In Enterprise Manager / Support Services / SQL Mail, click on start SQL mail and this is the first error:

Microsoft SQL-DMO (ODBC SQLState: 42000)
Error 17952: Failed to start SQL Mail session.
Check the error log file in the SQL Server directory for details.

Click OK. I look at the error log and I see:

ods Starting SQL Mail session
ods Failed to start SQL Mail session
ods Error: 17951, Severity: 18, State: 1
ods General MAPI failure
ods Error: 17902, Severity: 18, State: 1

I created an NT account and SQL account called SQLMail. It has administrator/SA privileges in both NT and SQL. I have also tried it with my account (network admin) and I get the same errors. Outlook/Exchange is open on the SQL server box.

The only difference between the two SQL boxes that I am running are the SQL versions. The mail notifications work great in 6.5.

When I go to SQL Server Properties and look at the SQL mail configuration settings, I click on change and get this error:

This MAPI profile was not found on the server. Are you sure this is what you want? Click on yes or no (it does not matter either way), and then I click on test and get this error:

Error 22030: A MAPI error (error number: 87) occurred: MapiLogonEx failed due to MAPI error 87: Invalid parameter.

I get these errors no matter what I use as the login.

There is a SQLmail and administrator mail account on the SQL server using OUTLOOK 2000. (My SQL 6.5 also is running OUTLOOK 2000).

Any info will be greatly appreciated.

Brian Matsik responds:

A: Some things to try:

  1. Make sure that MSSQLServer service and SQLServerAgent are using the same account (a domain account).
  2. Check the following KB article: Q263556
  3. Reboot - really. Sometimes you need to reboot after the configuration.
  4. Log in as user of the domain account when you configure the SQL Mail. The Outlook profile may have been set up as a different user.
One of the biggest problems with SQL Mail is that the user sets a domain account for the mail, but did not log in with that domain account. You need to be sure that there is an Outlook profile for that domain user on the local machine. In other words, setting a domain user account while logged in under a local user profile or another domain account will fail all of the time.

Hope this helps.



Question 5:

Q: Say I have a table with the following fields:

F1 F2 F3
A 10 a
A 12 b
A 16 c
B 3 d
B 4 1
B 5 2
C 1 3
C 7 4
E 8 h
H 2 i
H 7 j

and I want one record returned for each unique value in F1, where the F2 value is largest. Thus the result set would be:

F1 F2 F3
A 16 c
B 5 2
C 7 4
E 8 h
H 7 j

Is there an easy way to do this?


Brian Matsik responds:

A:

            
SELECT
  B.F1,
  B.F2,
  A.F3
FROM
  rmrtech AS A,
  (
    SELECT
      F1,
      MAX(F2) as F2
    FROM
      rmrtech
    GROUP BY F1
  ) AS B
WHERE
  A.F1 = B.F1
  AND A.F2 = B.F2
ORDER BY
  B.F1
  
  
The inner query (a derived table) finds the maximum value for F2 for each value of F1. Once you find this information it is a simple join to find the record where F1 and F2 match the record in the derived table.



Question 6:

Q: I am having a problem concerning quotes within quotes when running OSQL from a stored procedure via the xp_cmdshell SP.

My script I am using in the stored procedure is:

=========================================
.
.
.
Declare @command1 varchar(255),@command varchar(255),@database varchar(60)
select @database="plathk"
.
.
.
-- Valid code which works
        select @command1 = 'osql -Q"BACKUP DATABASE ' + @database + ' TO
' + 
                @database + '_backup with STATS=10,   INIT ' + '" -d' +
@database 
        select @command = 'master..xp_cmdshell ' + "'" + @command1 + "'" 
        print @command
        exec(@command)


-- Invalid code which fails
-- How can I specify a description in quotes in the statements below ?????
        select @command1 = 'osql -Q"BACKUP DATABASE ' + @database + ' TO
' + 
                @database + '_backup with STATS=10,   INIT, DESCRIPTION
= ''Full backup of ' + 
                @database + "'" + ', NAME = ''' + @database + 'Full''' +
'" -d' + @database 
        select @command = 'master..xp_cmdshell ' + "'" + @command1 + "'" 
        print @command
        exec(@command)

=========================================

The output from this script is:

=========================================

Valid backup output:

master..xp_cmdshell 'osql -Q"BACKUP DATABASE plathk TO plathk_backup
with STATS=10,   INIT " -dplathk'
output
                       

-----------------------------------------
Password: 
10 percent backed up.
20 percent backed up.
30 percent backed up.
40 percent backed up.
51 percent backed up.
61 percent backed up.
72 percent backed up.
82 percent backed up.
90 percent backed up.
Processed 4640 pages for database 'plathk', file 'plathkdata' on file 1.
100 percent backed up.
Processed 1 pages for database 'plathk', file 'plathklog' on file 1.
Backup or restore operation successfully processed 4641 pages in 7.541 seconds
(5.040 MB/sec).

(15 row(s) affected)

Invalid backup output:

master..xp_cmdshell 'osql -Q"BACKUP DATABASE plathk TO plathk_backup
with STATS=10,   INIT, DESCRIPTION = 'Full backup of plathk', NAME =
'plathkFull'" -dplathk'
Server: Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'Full'.

Please advise how to insert quotes within quotes.




Kevin Spencer responds:

A: It took me a while to figure out exactly what the question was due to the profusion of code, but the answer, it turned out, was simple. The confusion, I think, arose from a misunderstanding of what constitutes "quotes" in Oracle SQL. In order to clear up the issue let me first begin by defining some terms. There are two kinds of "quotes" in a SQL statement. One kind is a "delimiter," which in the case of Oracle SQL (and every other version that I know of), is the single quote ('). In order to keep things as clear as possible, I will refer to these characters by their "long name" ("single quote" and "double quote"). Every string literal in SQL is delimited by a single quote at the beginning and a single quote at the end. Example:

SELECT @variablename = 'some string'

The double quote is a character literal in SQL. It represents itself, NOT any kind of quotation mark. It is best to think of the double quote as a "letter" or "character." Now, here's an example of a double quote inside a string:

SELECT @variablename = 'some string containing a "quoted phrase" inside it'

In the example, the DBMS makes no distinction between the double quotes and the letters in the string.

So the real issue is, how do you insert a string containing single quotes into a string literal? The answer is, you double the single quote, which is also known as "escaping" the single quote. Example:

SELECT @variablename = 'John O''Hara'

While this may LOOK like a double quote, depending on the character set used to print this, it is actually TWO single quotes between the "O" and the "H."

This is a hard and fast rule. Perhaps your confusion arose when you decided to concatenate a bunch of string literals and variable strings together to form a string. But to represent a single quote by itself, you use the same rule. Example:

SELECT @variablename = ''''

Again, this may be confusing to look at so I will describe what you're seeing in case the character set confuses the issue. The string above is FOUR single quotes. The first and last are the string delimiters. The middle two are an escaped single quote character, represented by two single quotes. While it may look confusing to you, it is not at all confusing to Oracle. Oracle knows that the first and last characters in the string are delimiters, so it automatically eliminates them, leaving the doubled single quotes as the string. Therefore, it knows that it is to be translated as a literal single quote.



Question 7:

Q: How do I delete duplicate rows from a table? (It is just a question on SQL for beginners.)


Kevin Spencer responds:

A: This isn't all that simple of a question. I'm sure that many people have struggled with this issue. At first glance it seems simple, but in fact, it requires more than one operation to complete. Here's why:

The SQL DELETE statement deletes records from a table using a WHERE clause. All records matching the WHERE clause are deleted. Now, if you want to delete "duplicate rows," I am assuming that you only want to delete the duplicates, and leave one behind. But, they are all identical, so the WHERE clause will select ALL of them, deleting all of them and leaving NONE behind. Let's take a look at a simple example:


mytable 
----------- 

row1    row2 
------------------- 
This    That    (duplicate of row 2) 
This    That    (duplicate of row 1) 
This    This    (duplicate of row 4) 
This    This    (duplicate of row 3) 
That    This 
That    That

In the example above, we have six rows in the table, and FOUR of them are duplicates (not two). What we want to do is to eliminate the "duplicates," but what we REALLY want to do is to eliminate ALL BUT ONE of the duplicates, ending up with:


mytable 
----------- 

row1    row2 
------------------- 
This    That 
This    This 
That    This 
That    That

Now, we can get this result set by using a "SELECT DISTINCT" query:

 

SELECT DISTINCT row1, row2 FROM mytable 

But, how do we DELETE the duplicate rows? The answer is, you first create a temporary table, run your SELECT DISTINCT query to copy the correct result set into the new temporary table, delete ALL of the rows from the first table, and then copy the contents of the second back into the first.



Question 8:

Q: I have installed SQL on an NT Server and get the following error message:

"The MSSQLServerADHelper service terminated with service specif error 3221225672"

How do I correct this problem?


Kevin Spencer responds:

A: That's a tough one. The MSSQLServerADHelper service is used by SQL Server 2000 to register SQL Server 2000 with Windows 2000 Active Directory. I've never tried to install SQL Server 2000 on an NT server, so I've never encountered this problem.

I did some searching on the MSDN Library and the MS Knowledge Base, but couldn't turn up any answers concerning this specific problem. One issue I DID encounter was that NT Server requires Service Pack 5 to successfully install SQL Server 2000. If you haven't installed Service Pack 5, I would try that first, then re-install SQL Server 2000. If you have Service Pack 5, you might just ignore the error message. NT 4 doesn't have Active Directory (it is a new component of Windows 2000), and the error may signify nothing, since Active Directory is not a part of NT 4, and the error may simply indicate that the service couldn't find Active Directory.


Question 9:

Q: We have an application running which utilizes SQL (currently 6.5). Is there any maintenance that I should be doing on the database to keep it running smoothly? Should I be checking any logs or reports regularly? Is there anything similar to defrag that should be done periodically to the database?


Brian Matsik responds:

A: First, take a look at the books online since there are several sections on maintenance.

Primarily you need to have a disaster recovery plan available and TESTED. I have seen many disaster plans that looked great on paper but did not work.

Make sure that you keep your indexes healthy. A poor index or series of indexes will ruin performance in a database. Use DBCC SHOWCONTIG and DBCC REINDEX to check the current level of fragmentation (the indicator of indexes that need to be rebuilt) and DBCC REINDEX to drop and recreate the table indexes. This may take some time to occur, so make this part of your weekly (or bi-weekly) maintenance schedule. Do this during off-peak hours or you may have users coming after you with torches and pitchforks :)


Question 10:

Q: Using a select statement, how do I choose to view rows X through Y from the result set? I think of this question whenever I use a search engine. I see that I am viewing items 51 to 60 of 2048. Without prenumbering my rows with sequence numbers, how do you manage to view just a 'portion' of the result set with each query?


Brian Matsik responds:

A: One way to do this (and there are several approaches) is to use a combination of a record creation timestamp or an autonumber (SQL Server identity) with a TOP N query. Let's look at an identity since it is a little easier to see.

To get all of the identity columns from the employee table we could do this:


  SELECT
    EmployeeID,
    EmployeeName
  FROM
    Employees
  WHERE
    EmployeeID > 0
    
(assumption: we are sorting by EmployeeID)

Now, let's say that we wanted the first three employees:

  

  SELECT TOP 3
    EmployeeID,
    EmployeeName
  FROM
    Employees
  WHERE
    EmployeeID > 0
    
Now, we get the first three employees. The great thing about this approach is that we do not have to worry about gaps in the autonumber field. If the first three employee IDs were 1, 2, and 7 then that is what we would see.

To make the second request, we just need to know where we left off.

  

  SELECT TOP 3
    EmployeeID,
    EmployeeName
  FROM
    Employees
  WHERE
    EmployeeID > 7
    
Remember that the third employee was ID = 7, so we use a great than (not a greater than or equal >= since that would include this employee).

So, you just need to find a unique key to sort on. Once you have that field, just keep track of where you want to pick up in the clause.

If you are trying to implement forward and reverse paging then you need to keep track of the start and ending ID values. For paging back, you would still use a TOP N query, but this time you do a reverse sort:


  SELECT TOP 3
    EmployeeID,
    EmployeeName
  FROM
    Employees
  WHERE
    EmployeeID < 8
  ORDER BY EmployeeID DESC
  
This is not a perfect solution though. As records get inserted and deleted you may not see the same records forward and back. For a better implementation of paging you would probably want to look at the ADO library. There is built-in support for the paging of recordsets.

There are more advanced techniques that you can use to determine the page area, but that is a chapter in itself. I recommend any of Joe Celko's books (SQL for Smarties, SQL Puzzles and answers) for a more detailed explanation of some of these techniques and neat tips for using advanced SQL.





Best of the Web

Data deduplication: Declawing the clones

Data deduplication is emerging as a critically important new arrow in the storage administrator's quiver to answer hard questions about the increasing problem in storage growth costs.

Quick Read

Compression, Encryption, Deduplication, and Replication: Strange Bedfellows

One of the great ironies of storage technology is the inverse relationship between efficiency and security: Adding performance or reducing storage requirements almost always results in reducing the confidentiality, integrity, or availability of a system.

Quick Read

WAN Optimization Whitelists and Blacklists

Optimization is a fantastic way of saving money and creating really happy customers at the same time, but it doesn't work flawlessly for all applications.

Quick Read

WAN Optimization as a Managed Service: It's Not About the Cost

This insight examines how organizations outsourcing their WAN optimization initiatives to a third-party go about achieving their goals for application performance, reducing operational costs, and streamlining enterprise infrastructure.

Quick Read

  Sponsored Links

Premium Content

Data Centers Gone Wild
February 22, 2010

NWC


Salary

Video