How to select the primary key data type?
When a new database is about to be designed, have to think about what data type to use for the primary keys. There are several things that should be taken into consideration: the average database size expected, the possibility of growing in the future, replication or other data synchronization needs, etc. Here I will try to give you a simple way for primary key type selection depending on the database you have to design.
When I design database I'm trying to follow several simple rules:
- Add column named ID in each table and set it as primary key;
- Do not use logical primary keys as physical primary keys: Client table have column ID as PK and Code as logical PK, but ID column will be used in relations;
- Use stored procedure for data fetching and manipulation - makes database changes easier, no recompilation of existing applications needed;
- Avoid auto increment columns - you never know what the value for the PK will be until the row is stored in the database - means no control;
Generally I always have to choose between 16-byte uniqueidentifier and 4-byte int. The INTs are smaller, easy to remember, meaningfully to the user and easy to make them sequent. The uniqueidentifiers are simply unique and have unlimited values.
The primary keys of type int are smaller, which means smaller indexes and better performance. It is also easy to keep the values in a sequence - either by using auto increment (not recommended) or any other way to get the next value. This allows to make clustered index on the column too without performance impact.
The disadvantage of INTs are that you need some logic for obtaining new value. If auto increment is used, then inserting master-detail data might be more complicated, because the primary key value is unknown until the record is stored. This problem can be solved by using stored procedures and logical keys to link master-details.
To add new record when the primary key is unique identifier is easy. But here the size is 4 times bigger! Also you should never make clustered index on it because inserting new records are not sequential and causes the index to be reorganized each time.
The unique identifiers are only useful when you need to synchronize data between databases or you have application that might work off line. If INTs are used, you can have problems with doubled IDs because another user might use the same values for the new records - in this case you need to implement a logic to renumber the PK and all related records and to update the clients with the new values.
What I think is best is to combine the advantages of the INTS with the uniqueness of the guids. Make the primary key of type int, and add additional column of type uniqueidentifier only if you need data synchronization. This is the way the replication works - using rowguid column.