|
|
|
|
Beginning SQL Programming: Q and A
June 25, 2001
Got Questions? We've Got Answers!
Thank you for submitting your questions to Brian Matsik and Kevin Spencer, co-authors of "Beginning SQL Programming."
Brought to you by:

Check It Out!
|
Questions:
-
Writing queries
-
Working in MS SQL Server 2000
-
Problem with column headings
- Question on auto mail feature in SQL 7.0 Server
- Records, unique values and derived tables
- Problem concerning quotes within quotes
- Deleting duplicate rows from a table
- Error with SQL on an NT server
- Maintaining a database periodically
- Viewing portions of a result set
Question 1:
Q: How can I write a query to select, say, the top n companies sorted according to their revenues from, say, a total of m companies?
Kevin Spencer responds:
A: Well, it depends on the database you're working with, of course. I will be using SQL Server for this example. ISO (ANSI) SQL has an
operator called "TOP" which is used to select a given number of
records. If an ORDER BY clause is used, it selects the "Top X" records
in the order of the ORDER BY clause.
Let's say that your table is called "companies" and each table has a
foreign key to another table in which the revenues from any number of
other companies are stored. We'll call that other table "revenues." The
foreign key to the other table would be a unique identifier for each
company (in the "companies" table) which I'll call "companyid." In
the "revenues" table, there are many company listings, with a
"revenue" column, a "companyid" column (foreign key), and a
"from_company" column (another foreign key to the first table), which
contains the "companyid" of the company from which the first company
("companyid") has collected revenue.
|
Companies
|
Revenues
|
| companyid |
companyid |
| companyname |
revenue |
| etc. |
from_company |
Now, we will need a JOIN query to get the total revenues from the
second table, and create an aggregate column in our result set. But,
since the aggregate operator is SUM for the second table, which returns
a single value from a number of rows, we need to aggregate the value in
the first table to make it work. By using a GROUP BY clause, we
separate the single SUMmed value in the revenues table by company:
SELECT TOP 10 SUM(revenues.revenue) AS revenue,
companies.companyname
FROM revenues INNER JOIN
companies ON
revenues.companyid = companies.companyid
GROUP BY companies.companyname
ORDER BY SUM(revenues.revenue) DESC
Now, that's not exactly what you asked for, since you want only the
"TOP m" records from the second table, but that's about all you're
going to get in a single query. There is a way to do this, however, and
that's by creating a view to use for the second table. The view is created
using the following query:
SELECT TOP 10 SUM(revenue) AS revenue,
Max(companyid) AS companyid
FROM revenues
GROUP BY companyid
ORDER BY SUM(revenue) DESC
Then we simply replace the table name "revenues" in the first query,
and substitute the view name (which I will call "TopTenRevenues" in
the query):
SELECT TOP 10 SUM(TopTenRevenues.revenue) AS revenue,
companies.companyname
FROM TopTenRevenues INNER JOIN
companies ON
TopTenRevenues.companyid = companies.companyid
GROUP BY companies.companyname
ORDER BY SUM(TopTenRevenues.revenue) DESC
Now, this could also be done in a single Stored Procedure, by creating
a cursor instead of a View, and selecting from the companies table
JOINed with the cursor.
Very tricky question!
Question 2:
Q: I have a table that looks like the following. All three cols are ints.
| Cust ID |
QuestionNo |
ResponseId
|
| 1225 |
1 |
1 |
| 1225 |
2 |
4 |
| 1225 |
3 |
2 |
| 1225 |
4 |
5 |
What would be the most efficient way in MS SQL Server 2000 to create the
following output?
| Cust ID |
Answer to Q1 |
Answer to Q2
|
| 1225 |
1 |
5 |
Kevin Spencer responds:
A: This is another situation where you can't accomplish the objective with a single query. You have a table with three columns, custId, QuestionNo and ResponseId, and you want to return a table with as many as (in your example) four or more columns, depending on the number of questions. In other words, the only thing the "result set" you've given has in common with the original table (which, for the purpose of this discussion, I will refer to
as "responses"), is the "CustId" field. At first glance it looks like it has more in common, but each of the columns you've given is actually a result set. Lets add some more data to the second "table" to illustrate this better:
| Cust ID |
Answer to Q1 |
Answer to Q2
|
Answer to Q3 |
Answer to Q4 |
| 1225 |
1 |
4 |
2 |
5 |
| 1226 |
3 |
4 |
2 |
2 |
| 1227 |
3 |
2 |
2 |
5 |
As you can see, each "column" in this "table" is a result set, consisting of the result set of the query.
SELECT ResponseId FROM responses WHERE QuestionNo = y
So, about the only way I can think of to tackle this effectively is to create a view on each QuestionNo, using a variant of the following:
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 1 (view1)
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 2 (view2)
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 3 (view3)
SELECT CustId, ResponseId FROM responses WHERE QuestionNo = 4 (view4)
I will assume for the purpose of this discussion that each customer has only one response to each question. Now you construct a query to join all of these result sets into a single result set. To keep duplicates from occurring, we use aggregate functions to group everything together:
SELECT responses.CustId,
Max(view1.ResponseId) AS Q1Answer,
Max(view2.ResponseId) AS Q2Answer,
Max(view3.ResponseId) AS Q3Answer,
Max(view4.ResponseId) AS Q4Answer
FROM responses INNER JOIN
view1 ON
responses.CustId = view1.CustId
INNER JOIN
view2 ON
responses.CustId = view2.CustId
INNER JOIN
view3 ON
responses.CustId = view3.CustId
INNER JOIN
view4 ON
responses.CustId = view4.CustId
GROUP BY responses.custid
Now, I have one last comment. It seems that, unless your original table has to have the structure you described, it would be more efficient to restructure that first table like the following:
| Cust ID |
Q1 Response |
Q2 Response |
Q3 Response |
Q4 Response |
| 1225 |
1 |
4 |
2 |
5 |
| 1226 |
2 |
2 |
3 |
1 |
In your version of the table, the QuestionId is repeated per CustId, and the CustId is repeated per Question. By using this format, the Questions are implied by the column name, resulting in less use of space (only one record per customer).
Question 3:
Q:
What causes column headings to be excluded from the result
set? This appears to happen when a combination of events occur:
A select statement containing only literals followed by a select
statement against a database table that includes compute sum totaling.
If the first select (literals only) is not made, the column headings
from the 2nd select appear. But when both are run (within either SQL
Query Analyzer or a stored procedure), the column headings are excluded.
Brian Matsik responds:
A: If you write a query where you a computing a value or using an aggregate function then there is no field name. For instance, MAX(OrderID) does not have a field name since this is technically a dreived column. If you are using literals such as SELECT 'First Name' then this is also a derived column.
So, in order to avoid this problem, always name a column that is derived.
Use the AS operator to do a proper rename of the item. Some databases such
as Access will create a derived column name (MaxofOrderID) for you. SQL
Server on the other hand will return an empty column name.
As for the actual content of the message, I would need to see an example of
the SQL to be sure that I am answering the correct question.
Question 4:
Q:
I am running a 7.0 SQL server. I have been trying for two
months to get the auto mail feature to work. I have it working perfectly
on my 6.5 server. Does anyone have a suggestion or heard of this issue
before?
In Enterprise Manager / Support Services / SQL Mail, click on start SQL mail and this is the first error:
Microsoft SQL-DMO (ODBC SQLState: 42000)
Error 17952: Failed to start SQL Mail session.
Check the error log file in the SQL Server directory for details.
Click OK. I look at the error log and I see:
| ods |
Starting SQL
Mail session |
| ods |
Failed to start
SQL Mail session |
| ods |
Error: 17951,
Severity: 18, State: 1 |
| ods |
General MAPI
failure |
| ods |
Error: 17902,
Severity: 18, State: 1 |
I created an NT account and SQL account called SQLMail. It has administrator/SA privileges in both NT and SQL. I have also tried it with my account (network admin) and I get the same errors. Outlook/Exchange is open on the SQL server box.
The only difference between the two SQL boxes that I am running are the SQL versions. The mail notifications work great in 6.5.
When I go to SQL Server Properties and look at the SQL mail configuration settings, I click on change and get this error:
This MAPI profile was not found on the server. Are you sure this is what you want? Click on yes or no (it does not matter either way), and then I click on test and get this error:
Error 22030: A MAPI error (error number: 87) occurred: MapiLogonEx failed due to MAPI error 87: Invalid parameter.
I get these errors no matter what I use as the login.
There is a SQLmail and administrator mail account on the SQL server using
OUTLOOK 2000. (My SQL 6.5 also is running OUTLOOK 2000).
Any info will be greatly appreciated.
Brian Matsik responds:
A: Some things to try:
- Make sure that MSSQLServer service and SQLServerAgent are using the same
account (a domain account).
- Check the following KB article: Q263556
- Reboot - really. Sometimes you need to reboot after the configuration.
- Log in as user of the domain account when you configure the SQL Mail.
The Outlook profile may have been set up as a different user.
One of the biggest problems with SQL Mail is that the user sets a domain
account for the mail, but did not log in with that domain account. You need
to be sure that there is an Outlook profile for that domain user on the
local machine. In other words, setting a domain user account while logged
in under a local user profile or another domain account will fail all of the
time.
Hope this helps.
Question 5:
Q: Say I have a table with the following fields:
| F1 |
F2 |
F3 |
| A |
10 |
a |
| A |
12 |
b |
| A |
16 |
c |
| B |
3 |
d |
| B |
4 |
1 |
| B |
5 |
2 |
| C |
1 |
3 |
| C |
7 |
4 |
| E |
8 |
h |
| H |
2 |
i |
| H |
7 |
j |
and I want one record returned for each unique value in F1, where the F2
value is largest. Thus the result set would be:
| F1 |
F2 |
F3 |
| A |
16 |
c |
| B |
5 |
2 |
| C |
7 |
4 |
| E |
8 |
h |
| H |
7 |
j |
Is there an easy way to do this?
Brian Matsik responds:
A:
SELECT
B.F1,
B.F2,
A.F3
FROM
rmrtech AS A,
(
SELECT
F1,
MAX(F2) as F2
FROM
rmrtech
GROUP BY F1
) AS B
WHERE
A.F1 = B.F1
AND A.F2 = B.F2
ORDER BY
B.F1
The inner query (a derived table) finds the maximum value for F2 for each
value of F1. Once you find this information it is a simple join to find the
record where F1 and F2 match the record in the derived table.
Question 6:
Q:
I am having a problem concerning quotes within quotes when
running OSQL from a stored procedure via the xp_cmdshell SP.
My script I am using in the stored procedure is:
=========================================
.
.
.
Declare @command1 varchar(255),@command varchar(255),@database varchar(60)
select @database="plathk"
.
.
.
-- Valid code which works
select @command1 = 'osql -Q"BACKUP DATABASE ' + @database + ' TO
' +
@database + '_backup with STATS=10, INIT ' + '" -d' +
@database
select @command = 'master..xp_cmdshell ' + "'" + @command1 + "'"
print @command
exec(@command)
-- Invalid code which fails
-- How can I specify a description in quotes in the statements below ?????
select @command1 = 'osql -Q"BACKUP DATABASE ' + @database + ' TO
' +
@database + '_backup with STATS=10, INIT, DESCRIPTION
= ''Full backup of ' +
@database + "'" + ', NAME = ''' + @database + 'Full''' +
'" -d' + @database
select @command = 'master..xp_cmdshell ' + "'" + @command1 + "'"
print @command
exec(@command)
=========================================
The output from this script is:
=========================================
Valid backup output:
master..xp_cmdshell 'osql -Q"BACKUP DATABASE plathk TO plathk_backup
with STATS=10, INIT " -dplathk'
output
-----------------------------------------
Password:
10 percent backed up.
20 percent backed up.
30 percent backed up.
40 percent backed up.
51 percent backed up.
61 percent backed up.
72 percent backed up.
82 percent backed up.
90 percent backed up.
Processed 4640 pages for database 'plathk', file 'plathkdata' on file 1.
100 percent backed up.
Processed 1 pages for database 'plathk', file 'plathklog' on file 1.
Backup or restore operation successfully processed 4641 pages in 7.541 seconds
(5.040 MB/sec).
(15 row(s) affected)
Invalid backup output:
master..xp_cmdshell 'osql -Q"BACKUP DATABASE plathk TO plathk_backup
with STATS=10, INIT, DESCRIPTION = 'Full backup of plathk', NAME =
'plathkFull'" -dplathk'
Server: Msg 156, Level 15, State 1, Line 1
Incorrect syntax near the keyword 'Full'.
Please advise how to insert quotes within quotes.
Kevin Spencer responds:
A: It took me a while to figure out exactly what the question was due to the profusion of code, but the answer, it turned out, was simple. The confusion, I think, arose from a misunderstanding of what constitutes "quotes" in Oracle SQL. In order to clear up the issue let me first begin by defining some terms. There are two kinds of "quotes" in a SQL
statement. One kind is a "delimiter," which in the case of Oracle SQL
(and every other version that I know of), is the single quote ('). In order
to keep things as clear as possible, I will refer to these characters by their
"long name" ("single quote" and "double quote"). Every string literal
in SQL is delimited by a single quote at the beginning and a single quote
at the end. Example:
SELECT @variablename = 'some string'
The double quote is a character literal in SQL. It represents itself, NOT
any kind of quotation mark. It is best to think of the double quote as a
"letter" or "character." Now, here's an example of a double quote inside
a string:
SELECT @variablename = 'some string containing a "quoted phrase"
inside it'
In the example, the DBMS makes no distinction between the double
quotes and the letters in the string.
So the real issue is, how do you insert a string containing single quotes
into a string literal? The answer is, you double the single quote, which is
also known as "escaping" the single quote. Example:
SELECT @variablename = 'John O''Hara'
While this may LOOK like a double quote, depending on the
character set used to print this, it is actually TWO single quotes between
the "O" and the "H."
This is a hard and fast rule. Perhaps your confusion arose when you
decided to concatenate a bunch of string literals and variable strings
together to form a string. But to represent a single quote by itself, you use
the same rule. Example:
SELECT @variablename = ''''
Again, this may be confusing to look at so I will describe what you're
seeing in case the character set confuses the issue. The string above is
FOUR single quotes. The first and last are the string delimiters. The
middle two are an escaped single quote character, represented by two single
quotes. While it may look confusing to you, it is not at all confusing to
Oracle. Oracle knows that the first and last characters in the string are
delimiters, so it automatically eliminates them, leaving the doubled
single quotes as the string. Therefore, it knows that it is to be translated as
a literal single quote.
Question 7:
Q:
How do I delete duplicate rows from a table? (It is just a question on SQL for beginners.)
Kevin Spencer responds:
A: This isn't all that simple of a question. I'm sure that many people have struggled with this issue. At first glance it seems simple, but in fact, it requires more than one operation to complete. Here's why:
The SQL DELETE statement deletes records from a table using a
WHERE clause. All records matching the WHERE clause are deleted.
Now, if you want to delete "duplicate rows," I am assuming that you only
want to delete the duplicates, and leave one behind. But, they are all
identical, so the WHERE clause will select ALL of them, deleting all of
them and leaving NONE behind. Let's take a look at a simple example:
mytable
-----------
row1 row2
-------------------
This That (duplicate of row 2)
This That (duplicate of row 1)
This This (duplicate of row 4)
This This (duplicate of row 3)
That This
That That
In the example above, we have six rows in the table, and FOUR of them
are duplicates (not two). What we want to do is to eliminate the
"duplicates," but what we REALLY want to do is to eliminate ALL BUT
ONE of the duplicates, ending up with:
mytable
-----------
row1 row2
-------------------
This That
This This
That This
That That
Now, we can get this result set by using a "SELECT DISTINCT" query:
SELECT DISTINCT row1, row2 FROM mytable
But, how do we DELETE the duplicate rows? The answer is, you first
create a temporary table, run your SELECT DISTINCT query to copy
the correct result set into the new temporary table, delete ALL of the rows
from the first table, and then copy the contents of the second back into the
first.
Question 8:
Q:
I have installed SQL on an NT Server and get the following
error message:
"The MSSQLServerADHelper service terminated with service
specif error 3221225672"
How do I correct this problem?
Kevin Spencer responds:
A: That's a tough one. The MSSQLServerADHelper service is used by SQL Server 2000 to register SQL Server 2000 with Windows 2000 Active
Directory. I've never tried to install SQL Server 2000 on an NT server,
so I've never encountered this problem.
I did some searching on the MSDN Library and the MS Knowledge Base, but couldn't turn up any answers concerning this specific problem. One issue I DID encounter was that NT Server requires Service Pack 5 to successfully install SQL Server 2000. If you haven't installed Service Pack 5, I would try that
first, then re-install SQL Server 2000. If you have Service Pack 5, you
might just ignore the error message. NT 4 doesn't have Active Directory
(it is a new component of Windows 2000), and the error may signify
nothing, since Active Directory is not a part of NT 4, and the error may
simply indicate that the service couldn't find Active Directory.
Question 9:
Q:
We have an application running which utilizes SQL (currently 6.5). Is
there any maintenance that I should be doing on the database to keep it
running smoothly? Should I be checking any logs or reports regularly?
Is there anything similar to defrag that should be done periodically to
the database?
Brian Matsik responds:
A: First, take a look at the books online since there are several sections on maintenance.
Primarily you need to have a disaster recovery plan available and TESTED.
I have seen many disaster plans that looked great on paper but did not work.
Make sure that you keep your indexes healthy. A poor index or series of
indexes will ruin performance in a database. Use DBCC SHOWCONTIG and DBCC
REINDEX to check the current level of fragmentation (the indicator of
indexes that need to be rebuilt) and DBCC REINDEX to drop and recreate the
table indexes. This may take some time to occur, so make this part of your
weekly (or bi-weekly) maintenance schedule. Do this during off-peak hours
or you may have users coming after you with torches and pitchforks :)
Question 10:
Q:
Using a select statement, how do I choose to view rows X through Y from
the result set? I think of this question whenever I use a search engine.
I see that I am viewing items 51 to 60 of 2048. Without prenumbering my
rows with sequence numbers, how do you manage to view just a 'portion'
of the result set with each query?
Brian Matsik responds:
A: One way to do this (and there are several approaches) is to use a combination of a record creation timestamp or an autonumber (SQL Server identity) with a TOP N query. Let's look at an identity since it is a
little easier to see.
To get all of the identity columns from the employee table we could do
this:
SELECT
EmployeeID,
EmployeeName
FROM
Employees
WHERE
EmployeeID > 0
(assumption: we are sorting by EmployeeID)
Now, let's say that we wanted the first three employees:
SELECT TOP 3
EmployeeID,
EmployeeName
FROM
Employees
WHERE
EmployeeID > 0
Now, we get the first three employees. The great thing about this
approach is that we do not have to worry about gaps in the autonumber field.
If the first three employee IDs were 1, 2, and 7 then that is what we would
see.
To make the second request, we just need to know where we left off.
SELECT TOP 3
EmployeeID,
EmployeeName
FROM
Employees
WHERE
EmployeeID > 7
Remember that the third employee was ID = 7, so we use a great than (not a
greater than or equal >= since that would include this employee).
So, you just need to find a unique key to sort on. Once you have that
field, just keep track of where you want to pick up in the clause.
If you are trying to implement forward and reverse paging then you need to
keep track of the start and ending ID values. For paging back, you would
still use a TOP N query, but this time you do a reverse sort:
SELECT TOP 3
EmployeeID,
EmployeeName
FROM
Employees
WHERE
EmployeeID < 8
ORDER BY EmployeeID DESC
This is not a perfect solution though. As records get inserted and
deleted you may not see the same records forward and back. For a better
implementation of paging you would probably want to look at the ADO library.
There is built-in support for the paging of recordsets.
There are more advanced techniques that you can use to determine the page
area, but that is a chapter in itself. I recommend any of Joe Celko's books
(SQL for Smarties, SQL Puzzles and answers) for a more detailed explanation
of some of these techniques and neat tips for using advanced SQL.
|
 |
 |
|
|
|
 |
|