Problem using DISTINCT in case insensitive SQL Server databases
May 30th 2008 06:07
Problem using DISTINCT in case insensitive SQL Server databases
Problem
SQL Server gives you the ability to store mixed case data in your databases, but depending on how you create your databases SQL Server will ignore the case when you issue T-SQL commands. One of the problems you may be faced with is that you want to get a distinct list of values from a table to show the differences in your table, but if your database is setup as case insensitive the DISTINCT clause does not show the differences it all gets grouped together. So based on this what options are there?
Solution
To illustrate this behavior we are going to look at a couple ways this works using a case sensitive database and a case insensitive database.
The first set of queries uses the AdventureWorks database which is configured as case sensitive. To determine the collation for your databases you can run this query:
SELECT name, collation_name
FROM master.sys.databases
We are querying the data from Preson.Contact in the AdventureWorks database. All data is setup as mixed case, so we have no duplicates when we run this query.
SELECT DISTINCT TOP 5 FirstName
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
Firstname
A
A. Scott
Aaron
Abby
Adam
If we update one of the record and change the FirstName from "Adam" to "ADAM" we should get two different values when we run the query.
UPDATE Person.Contact
SET FirstName = 'ADAM'
WHERE ContactID = 62
GO
SELECT DISTINCT TOP 5 FirstName
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
As you can see we now show both "Adam" and "ADAM" as two different values.
Firstname
A
A. Scott
Aaron
Adam
ADAM
The next thing we are going to do is to create a new table in a case insensitive database and then load all of the data from Person.Contact into this new table.
CREATE TABLE Test.dbo.contact (FirstName nvarchar(50))
GO
INSERT INTO Test.dbo.contact
SELECT FirstName FROM Person.Contact
GO
SELECT DISTINCT TOP 5 FirstName
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
GO
When we run the SELECT query you can see that the output combines both "Adam" and "ADAM" since case is ingored.
Firstname
A
A. Scott
Aaron
Abby
Adam
To get around this we can change the query as follows to force the collation to case sensitive on the FirstName column.
SELECT DISTINCT TOP 5 FirstName COLLATE sql_latin1_general_cp1_cs_as
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
When this is run we now have the values of "Adam" and "ADAM".
Firstname
A
A. Scott
Aaron
Adam
ADAM
So depending on how your database is setup you may or may not see the differences.
________________________________________
To show you another example here is just a quick way of selecting the case sensitive or case insensitive option.
The first query we run is using case sensitive, so all four rows should show up.
select distinct (item) COLLATE sql_latin1_general_cp1_cs_as
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
(No column name)
ABCD
abcd
deFg
defg
All that is different in the next query is the name of the collation. When this query is run using case insensitive, we only get two rows.
select distinct (item) COLLATE sql_latin1_general_cp1_ci_ai
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
(No Column name)
ABCD
deFG
Problem
SQL Server gives you the ability to store mixed case data in your databases, but depending on how you create your databases SQL Server will ignore the case when you issue T-SQL commands. One of the problems you may be faced with is that you want to get a distinct list of values from a table to show the differences in your table, but if your database is setup as case insensitive the DISTINCT clause does not show the differences it all gets grouped together. So based on this what options are there?
Solution
To illustrate this behavior we are going to look at a couple ways this works using a case sensitive database and a case insensitive database.
The first set of queries uses the AdventureWorks database which is configured as case sensitive. To determine the collation for your databases you can run this query:
SELECT name, collation_name
FROM master.sys.databases
We are querying the data from Preson.Contact in the AdventureWorks database. All data is setup as mixed case, so we have no duplicates when we run this query.
SELECT DISTINCT TOP 5 FirstName
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
Firstname
A
A. Scott
Aaron
Abby
Adam
If we update one of the record and change the FirstName from "Adam" to "ADAM" we should get two different values when we run the query.
UPDATE Person.Contact
SET FirstName = 'ADAM'
WHERE ContactID = 62
GO
SELECT DISTINCT TOP 5 FirstName
FROM Person.Contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
As you can see we now show both "Adam" and "ADAM" as two different values.
Firstname
A. Scott
Aaron
Adam
ADAM
The next thing we are going to do is to create a new table in a case insensitive database and then load all of the data from Person.Contact into this new table.
CREATE TABLE Test.dbo.contact (FirstName nvarchar(50))
GO
INSERT INTO Test.dbo.contact
SELECT FirstName FROM Person.Contact
GO
SELECT DISTINCT TOP 5 FirstName
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
GO
When we run the SELECT query you can see that the output combines both "Adam" and "ADAM" since case is ingored.
Firstname
A
A. Scott
Aaron
Abby
Adam
To get around this we can change the query as follows to force the collation to case sensitive on the FirstName column.
SELECT DISTINCT TOP 5 FirstName COLLATE sql_latin1_general_cp1_cs_as
FROM Test.dbo.contact
WHERE FirstName LIKE 'A%'
ORDER BY 1
When this is run we now have the values of "Adam" and "ADAM".
Firstname
A
A. Scott
Aaron
Adam
ADAM
So depending on how your database is setup you may or may not see the differences.
________________________________________
To show you another example here is just a quick way of selecting the case sensitive or case insensitive option.
The first query we run is using case sensitive, so all four rows should show up.
select distinct (item) COLLATE sql_latin1_general_cp1_cs_as
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
(No column name)
ABCD
abcd
deFg
defg
All that is different in the next query is the name of the collation. When this query is run using case insensitive, we only get two rows.
select distinct (item) COLLATE sql_latin1_general_cp1_ci_ai
FROM (
select 'abcd' item
union all select 'ABCD'
union all select 'defg'
union all select 'deFg') items
(No Column name)
ABCD
deFG
| 28 |
| Vote |

Add Comments




