Tuesday, March 24, 2009

Guid as Primary Key: The conclusion

Guid as Primary Key: The conclusion
Ok Guys. I have passed around this topic to some groups in LinkedIn, some msdn sites, some other sources and have gotten a lot of feedback. First of all I want to thank everyone that contributed to this topic. Whether you agree with me or not, by throwing out your ideas and experiences, you have helped make our industry better . Now to the results, about 85% of you disagree with me in some way or another. And by doing so you have made me think about my position a lot more and I have made some changes to my thinking as well. I still like using Guids for my primary keys and will continue to do so in most cases. A lot of you pointed out some very interesting flaws and sanarios where guids should not be used. Some of you even went as far as to say that guids have no place in our database. Let us take a deeper look at everything.. First of all if you have not read the first post, do so now because you wil get an understanding where I was coming from then, and where I am going now.
The first post is here http://sqlserverexpert.blogspot.com/2009/03/primary-keys-1-of-2.html
If you read the first post you know about the criteria I use for selecting primary keys. A lot of you disagreed with rule 4: Whatever method we choose it should be implemented though out the whole database; Your right. After talking to many people I know on this one, I agree. Primary keys should be selected on a table by table basis. So why don’t people like guids for primary keys. I got 4 or 5 basic answers
• Storage
• Performance
• Complicated
• Goes against data modeling or normalization rules
• It is just wrong

Let me address the “It’s just wrong” comments first.
If you guys are not going to put the effort in to proving me wrong by giving me reasons why it is not a good idea, then I can’t really defend myself. I know its ugly to look at a guid, and it has no meaning to the actual record. That being said identity columns are ugly to and they have no meaning to the record either, so that is not really a valid argument

Ok, yes it is true a guid is a 16 byte (128 bit) integer and yes it will take up more space than a int or big int column. I will admit to this fault. However how cheap is storage these days? One of my clients has a database going on 3 TB now and we use guids for primary keys almost everywhere and storage has never been an issue. Now yes in some cases it is, and you need to scope that out, if storage is a issue then guids might not be the best choice.

I have to say, a lot of people don’t give SQL Server enough credit in this department.
Yes if you use a guid you are going to run into some issues and yes you are going to have to take extra care in planning your clustered indexes and your covering indexes. The performance bottle neck is not as great as you would think. Let’s look at what most people think the major performance issues are. There are 2 things people have pointed out

• Page splits
• Index fragmentation
Ok so yes, if you leave the fill factor at the default value, you will definitely encounter more page splits and yes your indexes will have a tendency to become fragmented very quickly.
However if you manage this correctly and you rebuild your indexes often enough then it is really not a big issue. And now that we have the online option when building indexes it makes this more manageable.
• It’s complicated
This is my favourite one. Can someone tell me how a guid is more complicated then managing identity columns? I mean the whole seeding issue and you end up with rows that go (1,2,3,34) I don’t really think I need to go further do I.
• Goes against data modeling and normalization
I have learned a long time ago that you need to be flexible with the rules of any system, or solution. I am not a big fan of people normalizing their database and following every best practice and such to achieve the perfect database. The fact is in most systems, if you played by the rules you would end up with a very low performing database, it would be correct, it would follow the rules but it would not perform at a high level.
I have given my feedback on all the negative comments about guids as primary keys,
Now let me add some more reasons why I use them.

First my situation in case you don’t know or am wondering. I work for a IT consulting firm in Canada. One of our clients has a database of magazine purchasing data from store to store. Basically walmart sold this one issue of people x amount of times. The system includes many servers running many databases all with different purposes but using the same set of data.
This system is where I first saw guids as primary keys. For this system there is really no other choice which brings me to what I think is the biggest strong point about guids. They are unique. The chances of generating the same guid are almost impossible which makes them perfect when you need to work with more than one database in a system.

The other thing I really like is the fact that you can never really realistically run out of guids. I have heard horror stories about int columns about running out of values.

Now let me bring my ace out of my sleeve, let me let you in on a secret why I think guids are the future. Let’s forget for a second that we are just in the database world. Let us think of our whole system. Most databases have application front ends, so let’s talk about the guys responsible for creating these front ends. The big advantage of guids is that you can generate them on the front end and immediately have control over that instance. If we used identity we would have to create the object, send it to the database and get the identity value back from the database. This is one of the hidden gems when we are talking about guids.

Before we end, let us go though some of the interesting alternatives I got from some of you.
The one that I thought about the most was using timestamp as a primary key.
Timestamps do generate in sequential order and would reduce the amount of page splits and index fragmentation however there is one problem

Unfortunately you really can't use TIMESTAMP for any type of comparison. Or really for much of anything. To quote from SQL Server Books Online:

The SQL Server timestamp data type has nothing to do with times or dates. SQL Server timestamps are binary numbers that indicate the relative sequence in which data modifications took place in a database. The timestamp data type was originally implemented to support the SQL Server recovery algorithms.

It further states Never use timestamp columns in keys, especially primary keys, because the timestamp value changes every time the row is modified.

Then we have the good old composite key( more than one column as the primary key)

Talk about performance issues. Try joining 2 tables with a multi column index That is some performance issues.

In closing,
I will in fact change my stance a bit. Before I was saying use guids everywhere. I stand corrected, but use them in most cases, especially if you are planning to scale out the system in the future. The performance cost is not enough to outweigh the obvious advantages. I know some people are still not going to agree with this, and that is fine. Its just one guys opinion

McCann out

1 comment:

  1. GUIDS do have thier place. In my opinion they work well for system integration. I prefer using int or bigint. It's easier to remember recordid of 1234 vs 2BDF30AB-63FD-45FD-A8F0-76EDCDFAEADE. As an alternative, you could I suppose have a human readable alternative key (username for example) but have the primary key a guid.