Call us Today! 201-728-8809 |
|
Contact
|
Download
|
About
|
Barcode Tracking Products
Barcode File Tracker
Barcode Medical Chart Tracker
Traverse Barcode Asset Tracking Software
Traverse Barcode Inventory Tracking Software
Barcode Point of Care
Barcode Point of Care Hospital / Nursing Homes
Barcode Point of Care Schools
VICS Bill of Lading
At a Glance
Technical Info
Success Stories
SQL Database Health System
SQL Database Health Monitor
SQL Database Data Cleaner
SQL Backup and Copy and FTP
Other Software
Full Disk? Find Largest Directory
Dynamic DNS Utility
Services
Consulting Services
Analysis and Design
Custom Software
Troubleshooting
Expert Witness
Scope Creep Mitigation
Training
Support
GUIDs GUIDs everywhere, but how is my data unique?
know your data my friend....
After having seen many systems that use GUIDs as the primary key for every table, it is time to say enough is enough!
August 2013. Volume 2 - Issue 1
By Tom Romeo
Revised November 2013.
GUIDs GUIDs everywhere, but how is my data unique?
Featured on SQL Server Central
After having seen many systems that use GUIDs as the primary key for every table, it is time to say enough is enough!
First and foremost I must state most emphatically that there is nothing wrong with using GUIDs. Like everything else in life, moderation and discretion are important.
Here are some advantages for using GUIDs as primary keys:
In theory it is unique across every table, every database, every server
Allows easy merging of records from different databases. Of course every database merge is followed by a database scrub as the potential for duplicate data is introduced. (Actually this is not an advantage after all).
Example, a customer or patient record exists in two different databases each having a different GUID as the unique ID, when merged; there will be two rows with the same data.
A scalar function with a where clause on anything but the GUID could fail if it returns more than one row
You can generate GUIDs from any source rather than requiring a round trip to the database, though it is probably best to not generate GUIDs using jQuery at the browser level of your application. The GUID should be generated at the data tier or in the database itself.
Many replication scenarios require GUID columns
So why would one ever not use a GUID as the primary key?
If the data contains a natural unique identifier then it makes no sense to add an artificial unique identifier.
If the primary key is clustered then the GUID will force the internal trees to be out of balance with respect to the natural order of the data.
If the primary key is clustered and there are frequent inserts then frequent page splits could hinder performance.
1
Similarly if the primary key is clustered and the fill factor of the primary key is adjusted to compensate for frequent inserts, then the primary key may require excessive disk space and may become fragmented more frequently.
1
If the unique identifier is to be rendered as a barcode, then a GUID may be inefficient or force the application to use a two dimensional barcode symbology. This also applies when using pre-printed barcode labels or RFID tags such that the barcode or RFID is unique. There is no need to assign an artificial unique identifier for something that is already unique.
The human readable aspect of the unique identifier is significantly more difficult. For example on an asset tag.
If the GUID requires more space than the actual data then there may be a better way to make the data unique. For example as used in a look-up table for departments or status codes.
It is significantly harder to visually see errors in the data when GUIDs are present.
Essentially, GUIDs have their place. Unfortunately they are ubiquitous mostly due to laziness. It's ok to make mistakes out of ignorance as long as you fix them when you learn the error of your ways, but making mistakes due to laziness is inexcusable. Since many programmers and developers maintain the credo of "I'm lazy therefore I am", it is often too late to fix problems on this level when they manifest.
The concept of using record identifiers or ID(s) dates back to the time when storage was very expensive and the id consisted of a one, two, three, or four byte numeric code that may also have been stored as packed decimal. Many old flat file or ISAM file systems used this method. The whole Y2K effort was all about converting systems from a two digit year to a four digit year.
In the database world, be it hierarchical, networked, relational, or other, using an ID to represent record relationships was and still is very efficient. Some systems actually benefited from forcing the ID column to be a binary multiple as two, four, eight, 16, or 32 bytes.
Given a table that is used to store the status codes {'New', 'Used', 'Retired', 'Unavailable', 'Out of Service'}, One could construct a simple table like the following:
CREATE TABLE
[dbo].[tblStatusCodes](
[StatusCodeID] [smallint]
NOT NULL,
[StatusCodeName] [nvarchar](48)
NOT NULL,
CONSTRAINT
[PK_tblStatusCodes]
PRIMARY KEY CLUSTERED
(
[StatusCodeID]
ASC
)
Such that the data would look like this:
StatusCodeID
StatusCodeName
1
New
2
Used
3
Retired
4
Unavailable
5
Out of Service
Advantages:
The status code name can change in one place and be referenced
The status code ID requires less space than the actual status code name
Referential integrity is simple using foreign keys
Using a GUID rather than a small integer in this case seems ridiculous as the GUID requires twice as much space as the longest status code name.
Given the sample table:
CREATE TABLE
[dbo].[tblTraverseServices](
[Server] [nvarchar](256)
NOT NULL,
[Service] [nvarchar](64)
NOT NULL,
[Description] [nvarchar](128)
NOT NULL,
[ConnectionData] [nvarchar](256)
NOT NULL,
[Port] [bigint]
NOT NULL,
[Enabled] [bit]
NOT NULL,
[LastPingTime] [datetime]
NOT NULL,
[Mod_User] [nvarchar](32)
NOT NULL,
[Mod_Time] [datetime]
NOT NULL,
CONSTRAINT
[PK_tblTraverseServices]
PRIMARY KEY CLUSTERED
(
[Server]
ASC,
[Service]
ASC
)
The primary key is the combination of the server name and service name. There is no GUID to uniquely identify the server because server names must be unique. Allowing 256 bytes gives ample room to store the Fully Qualified Domain Name or (FQDN). As of this writing the max length for a FQDN is 255 bytes.
While the Server and ConnectionData fields may seem redundant, the ConnectionData field may contain an IPV4 or IPV6 address whereas the Server field must contain the server name as the IP address could change, the server name is less likely to change. The data might look like this (omitting the Mod_User and Mod_Time):
Server
Service
Description
ConnectionData
Port
Enabled
LastPingTime
PCSVSERVER060
TraverseArchiveService
PCS Traverse Archive Service
PCSVSERVER060
65101
1
2013-08-02 10:09:25.493
PCSVSERVER060
TraverseDBMaintenanceService
PCS Traverse Database Service
192.168.1.35
65102
0
1990-01-01
PCSVSERVER060
TraverseInterfaceService
PCS Traverse Interface Service
PCSVSERVER060
65103
1
2013-08-02 10:10:21.330
PCSVSERVER060
TraverseReportService
PCS Traverse Report Service
PCSVSERVER060
65104
1
2013-08-02 10:07:16.767
Most importantly, by not having a GUID as the primary key, it is not possible to duplicate data. If a GUID were to be used as the primary key, the potential to add multiple rows with Server = "PCSVSERVER060" and Service = "TraverseArchiveService" exits, as an example.
Other places where introducing a GUID might be a bad idea are:
Importing data from other systems like SAP. SAP has likely already uniquely identified whatever is being exported from SAP. There is no need to tag it with a new GUID.
Importing data from scheduling providers like Microsoft Exchange or IBM Domino. If there is a need to store meetings in a database, you can be assured that all such systems uniquely identify their appointments.
If the belief that using a single field to uniquely identify it improves join performance. (we can dedicate an entire article to debunk this myth)
So where would we use a GUID? There are many good ways to use GUIDs such as a transaction id?
In this table the transaction id is a GUID, and while it is not the primary key, it is part of the primary key as a way to group all of the items that were affected within a single transaction
CREATE TABLE
[dbo].[tblAudit](
[TransactionID] [varchar](36)
NOT NULL,
[TransactionDate] [datetime]
NOT NULL,
[AuditType] [nvarchar](16)
NOT NULL,
[AuditSubType] [nvarchar](32)
NULL,
[TableName] [varchar](50)
NOT NULL,
[FieldName] [varchar](50)
NOT NULL,
[OldValue] [nvarchar](250)
NULL,
[NewValue] [nvarchar](250)
NULL,
[TransStatus] [nchar](8)
NULL,
[Mod_User] [nvarchar](32)
NOT NULL,
[Mod_Time] [datetime]
NOT NULL,
CONSTRAINT
[PK_tblAudit]
PRIMARY KEY CLUSTERED
(
[TransactionID]
ASC,
[AuditType]
ASC,
[TableName]
ASC,
[FieldName]
ASC
)
Our Traverse product uses such a table to track all inserts, updates, and deletes so users can easily run reports on all changes. We didn't need to use a GUID for this field, but it was a natural fit because the system could be distributed having multiple points of origin. Using a GUID was an easy guarantee.
Many systems use audit tables in many ways. Many times such tables do not require primary keys at all as the data may never be retrieved by that key. While a GUID may lend itself to that purpose, it may not be necessary. However, to defend the GUID imagine moving several million rows from one database to another when a network outage interrupts the process.
If each audit row is uniquely identified, then one could resume the process. Otherwise one would need to start over. Of course a GUID primary key it would present itself as a way to guarantee that the same row has not been written more than once as long as the GUID is not generated by an insert trigger..
SQL 2012 introduced a sequence object that may offer advantages over GUIDs in some cases.
1
Insert intensive applications can be rendered inoperable when using GUIDs as the clustered primary key or clustered index. Years ago I witnessed such a system, and at the time the only way to sustain the insertion of several million rows per hour was to drop the clustered primary key. It wasn't until later that the nature of what was happening became clear and why GUIDs and clustered indexes or clustered primary keys are diametrically opposed when great volume is concerned. SQL Server's default fill factor of zero was an unrecognized enemy, as the random nature of the GUID caused frequent page splits. Page splits are resource intensive as they move data to make room for new records. Hence the name clustered index as the records are physically stored in order. Lowering the fill factor could possibly decrease the frequency of page splits to a desirable level, but the cost of larger indexes may also cause problems, and, there is a higher risk of index fragmentation which would call for more frequent re-index operations. Even 15,000 RPM disks could not keep up with the activity until we removed the clustered primary key. While storage has become more affordable, high capacity and high performance storage still comes at a premium. Sequence numbers are far more efficient than GUIDS for this purpose, so we welcome the sequence object in SQL 2012. To spell out the irony of the whole exercise, the GUID actually did nothing to guarantee the uniqueness of each row but rather that each row was guaranteed a spot in the table regardless of whether the process was sending duplicate data.
I'll leave with this parting thought, before creating your next database, know your data my friend....
Click here to see other articles in the series.
Products
VICS Bill of Lading
QuikVixBOL ™ Overview
QuikVixBOL ™ Technical Details
QuikVixBOL ™ Success Stories
Barcode tracking solutions
← Traverse → ™ Overview
Bar-Code File Tracking System
Bar-Code Medical Chart Tracking System
Prevent Medication Errors with barcode technology
Barcode Point of Care for Schools
Barcode POC Overview
Miscellaneous Utilities
What Is My IP
DB Health Assistant
Services
IT Consulting New York City Metro Area
Business Analysis
Custom Software Solutions
Scope Creep Rescue
IT Expert Witness Services
News Letter
Pragmatics - PCS eNewsLetter Articles
Database Performance Series - Proper SQL Transactions
Database Performance Series - Collecting Empirical Data
Database Performance Series - SQL Execution Plan Analysis
Database Performance Series - SQL Server Troubleshooting
Using GUIDs in the database
What is all this CRUD?
I thought my database was collation agnostic.
Free Stuff From PCS
Where Did My Disk Go Utility to find lost Disk space
Print a photo ID for your child
Math Hound - Help your children master basic arithmetic
True Cost Mortgage Calculator
True Cost Shows how much things actually cost
SimpleQ a Simple SQL Database Query Tool
About PCS
Our About Page
Our Contact Page
Terms of Use
Our Privacy Policy
Full Site Map
Corporate Headquarters
Practical Compliance Solutions
One Orient Way - Suite F156
Rutherford
,
NJ
07070
Phone:
201-728-8809
40.827731 N
74.101029 W
Connect With Us
Share
PCS Home
›
News and Articles
›
Database Performance Series - Proper SQL Transactions
© 2002 - 2025 - Copyright Practical Compliance Solutions - All rights reserved