SQL Server Expert: March 2009

Friday, March 27, 2009

Hashing out hash tables (temp tables)

Temp tables use (or the lack of use) has a great impact on a lot of different aspects of the database.
Temp tables can cause a very negative impact on performance if used incorrectly and even if you use them in an efficient t manner you will still be taking a performance hit.
By using derived tables (Sub queries) you will hit your data in a more efficient way.
That being said sometimes temp tables can be the best solution for your current situation.
Some benefits of tem tables are
• They are sometimes much easier to create then trying to formulate the same result by using a sub query
• Makes production code cleaner; sub queries can get really messy.
• in data analysis tasks or in situations where the code will never hit a production database they are one of our greatest tools

This being said it is imposable to overlook the performance bottle neck that these little guys can inflict on a database. Let’s take a look at what is involved when SQL encounters a tem temp table
1) Lock tempdb database
2) CREATE the temporary table (write activity)
3) SELECT data & INSERT data (read & write activity)
4) SELECT data from temporary table and permanent table(s) (read activity)
5) DROP TABLE (write activity)
4) Release the locks
This represents a lot of activity under the hood for the database engine and will usually result in poorly performing chunks of code.
This being said I have come across many situations where I have used temp tables in my production code as a last resort. As with everything it depends on the situation.

One example of this is when the situation arises when you need to use a record set generated from one stored procedure in another stored procedure as this is the only way to do this without using a physical table.

A few tips for using temp tables in a efficient manner.
• Only include columns or rows you actually instead on using
• Do not use the select into statement; It places a lock on system objects
• Always delete the temp table yourself... that’s just a waste of resources
• If the temp db is not already on its own disk consider doing this This will help greatly
In closing I would like to say, I personally think that temp tables are a great tool and , if used correctly can be the best situation. As always we must understand the impact of these little guys before we start using them.

Comments, questions
McCannblogs@gmail.com
Or join our LinkedIn group here
http://www.linkedin.com/groups?gid=1856678&trk=hb_side_g

McCann OUT

Wednesday, March 25, 2009

Failing to commit.... transactions that is: another example of poor understanding of the database

I got called in to a client’s office this morning. They were having an emergency. Their online processing system had come to a complete stop. Customers could not complete their orders. The first thing I did was take looks see at activity monitor. What did I see? About 40 – 50 batches being blocked by a single update statement that was wrapped in a transaction. I have seen more and more of this in the last 2 years. I believe the main problem is most people that do not work with databases as their primary job don’t respect them. What I mean by this is they don’t really understand how many of the features work. Indexing is a great example of this. How many .net developers really understand the concept of indexes? Not enough in my view. Yes while it is true that in most places there is trained Database professionals on staff that will handle all the day to day database tasks. However as a IT guy. I make it my business to know as much about the front end application tools as possible so I can better design the database backend. Those that read my arguments on guids as primary keys found here http://sqlserverexpert.blogspot.com/2009/03/guid-as-primary-key-conclusion.html
Know that one of my reasons was the benefits they had for the overall application design process.
I have many programmer buddies that as soon as there is a performance bottleneck right away it’s the databases fault. And this is true of us as well. How many times all of us have went in to a problem thinking “Those dam programmers buggered it up again”. We need to all work together and some of that responsibility belongs at the company level as well. I happen to work for a great company which believes you should know a little bit about everything so that you understand what your piece has to do, what it has to connect to, what kind of data is it going to receive etc. By doing this My Company is able to build extremely flexible systems for our clients; also we tend to bend the conventional rule book a bit. We do this because it is our belief that the system as a whole is being judged not its individual parts. My guid argument has a lot to do with this fact. Yes having a guid as a primary key has some drawbacks and yes it’s not going to be the most optimized database out there, but the system as a whole will be better because of it. Now this is just one example, all of our systems are different. The point I am trying to make is that we as IT people, not as .net coders’, DBAs, BI developers but all of us have a responsibility to our clients to make the best SYSTEM possible

Thoughts, comments, questions
mccannblogs@gmail.com
McCann Out!

Tuesday, March 24, 2009

Guid as Primary Key: The conclusion

Guid as Primary Key: The conclusion
Ok Guys. I have passed around this topic to some groups in LinkedIn, some msdn sites, some other sources and have gotten a lot of feedback. First of all I want to thank everyone that contributed to this topic. Whether you agree with me or not, by throwing out your ideas and experiences, you have helped make our industry better . Now to the results, about 85% of you disagree with me in some way or another. And by doing so you have made me think about my position a lot more and I have made some changes to my thinking as well. I still like using Guids for my primary keys and will continue to do so in most cases. A lot of you pointed out some very interesting flaws and sanarios where guids should not be used. Some of you even went as far as to say that guids have no place in our database. Let us take a deeper look at everything.. First of all if you have not read the first post, do so now because you wil get an understanding where I was coming from then, and where I am going now.
The first post is here http://sqlserverexpert.blogspot.com/2009/03/primary-keys-1-of-2.html
If you read the first post you know about the criteria I use for selecting primary keys. A lot of you disagreed with rule 4: Whatever method we choose it should be implemented though out the whole database; Your right. After talking to many people I know on this one, I agree. Primary keys should be selected on a table by table basis. So why don’t people like guids for primary keys. I got 4 or 5 basic answers
• Storage
• Performance
• Complicated
• Goes against data modeling or normalization rules
• It is just wrong

Let me address the “It’s just wrong” comments first.
If you guys are not going to put the effort in to proving me wrong by giving me reasons why it is not a good idea, then I can’t really defend myself. I know its ugly to look at a guid, and it has no meaning to the actual record. That being said identity columns are ugly to and they have no meaning to the record either, so that is not really a valid argument

Storage:
Ok, yes it is true a guid is a 16 byte (128 bit) integer and yes it will take up more space than a int or big int column. I will admit to this fault. However how cheap is storage these days? One of my clients has a database going on 3 TB now and we use guids for primary keys almost everywhere and storage has never been an issue. Now yes in some cases it is, and you need to scope that out, if storage is a issue then guids might not be the best choice.

Performance:
I have to say, a lot of people don’t give SQL Server enough credit in this department.
Yes if you use a guid you are going to run into some issues and yes you are going to have to take extra care in planning your clustered indexes and your covering indexes. The performance bottle neck is not as great as you would think. Let’s look at what most people think the major performance issues are. There are 2 things people have pointed out

• Page splits
• Index fragmentation
Ok so yes, if you leave the fill factor at the default value, you will definitely encounter more page splits and yes your indexes will have a tendency to become fragmented very quickly.
However if you manage this correctly and you rebuild your indexes often enough then it is really not a big issue. And now that we have the online option when building indexes it makes this more manageable.
• It’s complicated
This is my favourite one. Can someone tell me how a guid is more complicated then managing identity columns? I mean the whole seeding issue and you end up with rows that go (1,2,3,34) I don’t really think I need to go further do I.
• Goes against data modeling and normalization
I have learned a long time ago that you need to be flexible with the rules of any system, or solution. I am not a big fan of people normalizing their database and following every best practice and such to achieve the perfect database. The fact is in most systems, if you played by the rules you would end up with a very low performing database, it would be correct, it would follow the rules but it would not perform at a high level.
I have given my feedback on all the negative comments about guids as primary keys,
Now let me add some more reasons why I use them.

First my situation in case you don’t know or am wondering. I work for a IT consulting firm in Canada. One of our clients has a database of magazine purchasing data from store to store. Basically walmart sold this one issue of people x amount of times. The system includes many servers running many databases all with different purposes but using the same set of data.
This system is where I first saw guids as primary keys. For this system there is really no other choice which brings me to what I think is the biggest strong point about guids. They are unique. The chances of generating the same guid are almost impossible which makes them perfect when you need to work with more than one database in a system.

The other thing I really like is the fact that you can never really realistically run out of guids. I have heard horror stories about int columns about running out of values.

Now let me bring my ace out of my sleeve, let me let you in on a secret why I think guids are the future. Let’s forget for a second that we are just in the database world. Let us think of our whole system. Most databases have application front ends, so let’s talk about the guys responsible for creating these front ends. The big advantage of guids is that you can generate them on the front end and immediately have control over that instance. If we used identity we would have to create the object, send it to the database and get the identity value back from the database. This is one of the hidden gems when we are talking about guids.

Before we end, let us go though some of the interesting alternatives I got from some of you.
The one that I thought about the most was using timestamp as a primary key.
Timestamps do generate in sequential order and would reduce the amount of page splits and index fragmentation however there is one problem

Unfortunately you really can't use TIMESTAMP for any type of comparison. Or really for much of anything. To quote from SQL Server Books Online:

The SQL Server timestamp data type has nothing to do with times or dates. SQL Server timestamps are binary numbers that indicate the relative sequence in which data modifications took place in a database. The timestamp data type was originally implemented to support the SQL Server recovery algorithms.

It further states Never use timestamp columns in keys, especially primary keys, because the timestamp value changes every time the row is modified.

Then we have the good old composite key( more than one column as the primary key)

Talk about performance issues. Try joining 2 tables with a multi column index That is some performance issues.

In closing,
I will in fact change my stance a bit. Before I was saying use guids everywhere. I stand corrected, but use them in most cases, especially if you are planning to scale out the system in the future. The performance cost is not enough to outweigh the obvious advantages. I know some people are still not going to agree with this, and that is fine. Its just one guys opinion

McCann out

Friday, March 20, 2009

BottleNecks

SQL is spiking!!!! What do I do?
In my 5 years as a DBA I often encounter this very standard set of issues
SQL is spiking at 100%, my queries are slow, and their server is not responsive. My queries time out and the list goes on. The truth is there is no easy answer to this problem. There is no magic option to be set or no magic command that will solve this. The best thing to do is run a trace on the database and see what is going on. Most likely you will come across queries that are causing unnecessary disk io operations or page splits, you might find some that are using table scanning all the time (this means you do not have indexes set up correctly).
Here are a few possible reasons for processing spiking
• Poorly designed queries
• Bad table design
• Database files not set up for optimal performance
• No or incorrect indexes on tables
• Indexes are on tables but they are fragmented and need to be rebuilt
• Server does not have enough processing power or ram to handle the tasks required of it

By running a trace you can identify all of these situations and even inexperienced users should be able to determine most issues. Its fixing them that is the hard thing.
So let us go over these issues one by one and give some advice on each one.

• Poorly designed queries
This is the first thing you should check for because it does not require any hardware upgrades and also is the most common issue. Remember that just because queries return the correct result does not make them right. Query tuning is one of the hardest things to master. Thank god we have some great tools to aid us like profile and DTA (database engine tuning advisor).
In my carrier I have seen some really bad queries most of the time due to the developer not understanding how the storage engine works and how to take advantage of the query plan. I would strongly suggest reading “inside sql server 2005 the storage engine” By Microsoft press. This will empower you with the knowledge you need to write high performance queries. In later blogs I will dive in to this topic further and try and explain the best practices of writing queries. I am also always willing to help anyone out (If I can ) on this topic so just drop me a line (see my email at the bottom) .
• Bad table design
This is by far the hardest to fix and often imposable, no matter how good your queries are if the table (s) you are hitting are not set up properly you will have performance issues. You can think of it like this. If SQL has to work harder to get the information it needs then it’s going to take longer for it to return the dataset. Again in later blogs I will dive further in to this topic as it can be placed in a book itself.
• Database files not set up for optimal performance
This is a very common issue and is reacted to the actual storage media used to store the data files for the database. Database files should be stored on a raid configuration at the least. And most enterprise level solutions have multiple database files per database stored on different raid drives. This is the most expensive issue to fix and most of the tomes clients are unwilling to spend the money to resolve this issue.
Also people tend to throw new hardware at problems before determining or fixing other issues related to bottlenecks. In most cases even if you have the greatest hardware, if your tables, indexes, or queries are not doing their job it will not solve the problem.
• No or incorrect indexes on tables
SQL server implements a very powerful indexing system that can be the difference for a query running 2 minutes or 2 hours. Indexing a table gives huge read benefits to queries. They should however be used with caution. The more indexes you place on a table the longer it will take to update and insert records in to the table. I often hear arguments like “why can’t we index the whole table” . The reason is you would have a huge bottleneck on every update and insert statement you though at a table. So you need to think about the impact on every column you index. Things like “how often is this table going to be updated”. Again this is a huge topic that really needs a full understanding of when and what to index. The DTA is a good place to start and will usually pick the right indexes. Again feel free to drop me a line with any specific questions,

• Indexes are on tables but they are fragmented and need to be rebuilt
This is actually the easiest to fix. The problem is once an index is created it can become fragmented over time due to updates and inserts. I won’t go in to much detail about this issue right here. I am planning to do a whole series on indexes in the next couple of weeks and will dive in to the different types of fragmentation and how to rebuild the index (and when to rebuild and when to recreate) again if you need help just drop me a line.

• Server does not have enough processing power or ram to handle the tasks required of it
Sometimes it is simply a matter of asking your server to do too much. SQL Server Needs the hardware to be able to perform its tasks. That’s as simple as I can put it.
You could do everything right but if the server does not have the cycles to execute tasks in a high performing way then there’s really nothing that can be done other then upgrading the box . I have a client who has 2 servers both doing the exact same thing (one is production and one is history) . The production server has 4 CPUS 30 gigs of ram while the other server has one cpu and 8 gigs of ram.
If you ran the same query on both servers the production server would finish in let’s say 5 minutes while the history one would take over an hour.

I hope this at least can get you in the right mind set for tackling your bottlenecks
As always feel free to drop me a line for more advise or questions. I will answer all emails send at mccannblogs@gmail.com
McCann Out

Wednesday, March 18, 2009

Primary keys (1 of 2)

There is a lot of debate going around the industry today regarding Primary keys and how to select the column to use. I have heard three arguments on this subject
• Guid
• Identity
• A column that represents a unique property of the entity in question

In this 2 part blog I will weigh in on my personal preference as well as discuss all three solutions
Today in part one we will discuss the goals we wish to achieve for the primary key and we will discuss the first option. Then tomorrow we will discuss the other 2 options and come to our conclusion

Let me just say now my personal preference is Guids
I have worked and seen the impact on all 3 solutions and feel this is the best one. Is it perfect? No but it comes the closest.
First let’s examine the criteria we would like in a perfect word for our primary key to have.
1. It has to be able to uniquely identify a row
2. Since in most cases the key will be clustered it should be what I call a “light weight” data type
3. It would be great if we could attached some meaning to the column we choose
4. Whatever method we choose it should be implemented though out the whole database

OK I know a lot of people are not going to agree with goal 4.
Let me explain. In my view all tables in a system should follow the same basic design for no other reason than ease of use and understanding by all developers and administrators on staff. What goes for one table should work for all other tables.
Are there exceptions? Yes but they better be good ones I want to know why and have a chance to make our implantation work.

So let’s examine the three choices

Off our three choices the one that might have a chance of obtaining all our primary goals is

Choosing a column represents a unique property of the entity in question

So obviously this method is the only one that will satisfy goal 3. The problem lies with the other 3 goals. And the time costs to everyone evolved by implementing this method

Let’s look at goal 1 again. We have to be able to identify a row with this column.
That means for every table we create we need to be able to identify a row by one column (or a combination of columns) . Has anyone actually implemented this on a whole database??? I think not it is next to imposable to do this for every table in your system
That means it will most likely violate rule 4 and quite possibly rule 2.
I have seen this happen many times people start banging their heads together trying to find that one column that will uniquely identify this entity. All of a sudden columns like email address and some weird long string representing a product number., Or my favourite some combination of 3 characters from one column and 2 characters from this column to make a unique value. ....... do I even have to comment . and yes OldBalls if you are reading this I am talking to you. Then there is the time its going to take to go through every table and design this solution the frustrations that will happen when you realize that you can’t find any column to use for x table and resort to mixing methods and then what do you have? One cluster **** of a database that’s what
In closing this is the worst method to choose and should be avoided like ketchup on a white suit.

comments, questions? drop me a line at
mccannblogs@gmail.com

McCann OUT!!!!!!!!

SQL Server Expert

Friday, March 27, 2009

Hashing out hash tables (temp tables)

Wednesday, March 25, 2009

Failing to commit.... transactions that is: another example of poor understanding of the database

Tuesday, March 24, 2009

Guid as Primary Key: The conclusion

Friday, March 20, 2009

BottleNecks

Wednesday, March 18, 2009

Primary keys (1 of 2)

Favorite Blogs

Guild Blogs

Recommended Links

Total Pageviews

Share On

Recommended Blogs

Followers

Blog Archive

About Me