So what is all this CRUD?

In the ubiquitous world of acronyms, CRUD is just another one that makes me cringe. Growing up, crud was not a word that described anything desirable. Why do we allow a word with such negative connotations anywhere near our database?

From Webster's online dictionary

crud

noun

: a dirty or greasy substance

: something unpleasant or worthless

used in a sentance: Did you see the crud in that guy's teeth?

If we agree that [C]reate, [R]ead, [U]pdate, [D]elete are the standard CRUD operations, then a rose by any other name is still a rose, or is it?

It really depends on how we view our database and how we interact with our database. A database is more than just a storage place for data. It can be an integral part of an application; a finely tuned, highly responsive resource.

Sometimes the CRUD aspect of an application is elegant and efficient, and sometimes it is just plain cruddy.

Is my CRUD cruddy?

Over the years, I have witnessed cruddy CRUD, and would like to take this opportunity to point it out by example. Most of the examples can be categorized as "exception driven logic".

Exceptions are expensive and can degrade an application's performance. Exceptions happen and should be handled properly. To drive an application by exception is counterproductive and should avoided.

Exception driven logic can be likened to navigating a maze while blindfolded and traveling at full throttle; not terribly efficient and possibly quite painful.

Test for existence by update

Years ago, back in the VB3 days, I witnessed an application that determined a record's existence by attempting to update it. The logic looked something like this:

Given the table tblDepartments with the following definition and DepartmentID as the Primary Key:

CREATE TABLE  [dbo].[tblDepartments]( 
[DepartmentID] [int] NOT NULL,
[Description] [nvarchar](64) NOT NULL,
[Purpose] [nvarchar] (32) NULL
CONSTRAINT [PK_tblDepartments] PRIMARY KEY CLUSTERED 

                    (
[DepartmentID] ASC

                    )

                    )

                    Update tblDepartments

                    Set Description = 'New Description'

                    Where DepartmentID = 2914

If rows updated = 0 then perform insert.

While this particular example did not throw an actual exception, it could still be considered exception driven logic. For SQL server, this method also requires setting "rowcount" on as an added step because it may be off by default.

More recently I've witnessed logic that is driven by the number of rows updated rather than the number of rows that meet a condition. You might ask how that is different and how can it break. Glad you asked…..

Imagine that there is a service that communicates with a list of devices on a network and must abide by the following rules:

The service cannot communicate with a device that is already registered to same service that is running elsewhere.
The service can only communicate with a maximum number of devices. The maximum is configurable.
When the service starts it must build its list of devices.
- The Service may receive a request to add more devices if it has not reached its maximum.
When the service stops it must render its list available so that the devices are free to be picked up by the same service running someplace else, or the same service when it re-starts.

If the update statement used is:

                    Update top (n) tblDevices
                    Set ServerID = @thisServer
                    Where ServerID is null 
                    

Then the update statement has registered the devices to itself and the number of rows updated equals the number of devices registered to that server (true or false). The answer is, usually because many things can happen to cause the row count to be different than the actual number of devices registered.

For Example, if the server crashes then it is possible that rule #4 has not occurred.

The statement,

select count(*) from tblDevices Where ServerID = @thisServer

will always return the true number of devices registered to the server. If the ServerID field is properly indexed, it will also happen very quickly.

In this case, laziness trumped resiliency. Given that the registration of devices only happens when the service starts up or when it receives a request to check, issuing two calls to SQL Server is not going affect performance of the application but is does put resiliency at risk.

When diagnosing performance, a SQL statement that requires thirty seconds to run but only runs once per hour may be less offensive than a SQL Statement that requires 250 Milliseconds to run but runs five times per second. Context is very important.

Test insert ability by foreign key

Given the tables tblMyHeader and tblMyBody where there can be a many to one relationship between the Header and the Body, the following action should never be part of an application's critical path:

Insert into tblMyBody If the insert fails then insert into tblMyHeader and then retry tblMyBody

While this could happen to a user that is poking around the database, it should never be part of an actual application.

If an application is reporting foreign key exceptions then there must be some flaw in the logic of the application.

Test insert ability by constraint

Tables are composed of data elements. All data elements are important or they would not exist as part of the table. Some data elements are part of the primary key, some are indexed. Certain fields cannot be NULL whereas others can.

Insert statements should be constructed using the list of fields so that no matter what the order of fields in the table, the proper data is always in the correct field. The order of fields could change over time from database to database for the same table due to upgrades for example.

Adding new fields to a table that already has data as part of an upgrade path should be considerate of the existing data. If a new field is to be added and is legitimately not NULL able then the new field must be added using the following steps

Add the New Field as NOT Null along with the constraint

Add the new field as NULL
Update the rows to be the correct default value
(Optional) Alter the field adding the default value as a constraint.
Alter the field to be NOT NULL

Here is the all important question, if the field is set to NOT NULL, where does the responsibility lie for setting the correct value? We can be guaranteed that a default value that is not NULL is not more valuable to the application than NULL because both are likely not the truth until the application sets it to its true value.

One could argue that a bit field in SQL server used as Boolean should never be null because a Boolean can only have two values, true or false. However the opposing argument could be if a Boolean value is not set then it is neither true nor false. Further, requiring a Boolean to have a default value of true or false forces the application to set the value to a state that may not reflect the actual "Real" ¹ state of the value.

¹ wanted to use the word true here but that would have confused things. The point here is that the default value may be a false negative or a false positive until the application can assign the true value. If there are heated debates over this subject then someone has way too much time on his hands and or is a power monger. This may be an indicator that it is time to start looking for a new job.

Test insert ability by primary key

If there exists a table such that there are (n) number of fields and every field is a member of the primary key, then the DBA is going ask a lot of questions, but let's say for sake of argument that it is a table put in place to bridge some logic from the way an application used to work to the new way.

Indexing properly could help achieve fast lookups, but what purpose could all fields of a table belonging to the primary key serve other than eliminating the possibility for duplicate entries? This could imply that insert statements from the application are not designed to check for a row's existence before attempting to insert.

If an application is throwing Primary Key Violation exceptions, then there is a problem in the application. All too frequently we have seen applications that attempt inserts without regard for the rows existence. Exceptions are more expensive than the extra SQL call and certainly more expensive than wrapping the Insert statement in an "If not exists" statement.

While on this subject, I have seen applications remove entire sets of rows and then reinsert them instead of taking the time to update the rows that have changed, delete the rows that have been removed and inserting the new rows. For example, an HTML page that contains two lists and the ability to move items from one list to the other and vice versa, the underlying SQL is coded to delete everything about the items on both lists and perform inserts to populate the state of the data as reflected on the HTML page. This is the easy way, but it is also much more costly.

Everything described in this heading is all about laziness. It is quite offensive.

Test delete ability by foreign key

Relying on foreign keys to prevent deletions should be a validation and not a matter of practice. That is to say, if a row cannot be deleted because the deletion will break referential integrity, it is likely that the application knows this and the foreign key is the fail safe guaranteeing referential integrity.

If there is a button on the screen labeled "Delete" then it is likely that there is enough information about the item to be deleted such that the application knows the item cannot be deleted. Relying on a SQL exception is the lazy way out.

Using cascading deletes on foreign keys

Cascading deletes is a great invention when used wisely. Cascading deletes was a new cool feature added to SQL Server 2000. However, given that the default timeout for a SQL statement is 30 seconds, a cascading delete could throw a SQL timeout exception if millions of rows will be affected by the delete statement. It is important to acknowledge the difference between a trivial delete and a complex time consuming delete. Applications must be able to handle both and use appropriate methods to do so.

For example, when faced with a delete operation that will affect millions of rows, the application could knowingly extend the SQL command timeout, or it could delegate the delete job to some other service as to not hinder the user's ability to use the application. Most users won't tolerate a system that makes them wait; of course the amount of time a user is willing to wait is relative.

If proper care is given to the "User Experience" the system will be more widely accepted.

CRUD and OLTP

Going back to our previous article in our Database Performance Series, we discussed proper construction of SQL transactions and used an example of an actual single user WEB application. The developers of the application did everything right. They separated services to perform given roles, created a data tier, separated the business logic, etc.

Unfortunately they didn't understand how to use the data tier properly and created transactions that required multiple round trips from the WEB Client through the Data Tier, to the Database. This resulted in long exclusive locks on tables that were central to the application.

For example, if the user attempted to add a new complex object from a screen on the WEB Client, upon clicking the save button, the WEB client performed the following:

Create a new transaction
- Make a request of the Data Tier to create a new SQL Transaction in the database
Select some data from important table A
- Make a request of the Data Tier to perform a SQL select statement from very important Table A
Select some important data from Table B
- Make a request of the Data Tier to perform a SQL select statement from very important Table B
Update very important Table C
- Make a request of the Data Tier to perform a SQL Update statement on very important Table C
Insert rows into very important tables A, B, and D
- Make a request of the Data Tier to perform a SQL insert statement in very important tables A, C, and D
Commit Transaction
- Make a request of the Data Tier to commit the SQL transaction
On Error Rollback

Even if the CRUD in each step is as efficient as possible, the transaction is very inefficient, such that two users attempting to perform the same action from two different WEB Clients causes one to roll back. Since all of the tables are critical to the application, other aspects of the system are affected by the inefficient transaction.

A better approach would have been to gather all the necessary data and execute the entire transaction in a single call to the Data Tier.

This is not the first or only place I've seen this mistake. It is a very common mistake among developers that view database interaction as a series of isolated CRUD actions. Also, though there are many references to SQL Server in this article, the same approach would adversely affect an Oracle Database.

I recall an application that I encountered many years ago that utilized Wireless RF terminals that were mounted to Fork Trucks. This was before 802.11B and Wi-Fi was publicly available. If a terminal lost its RF connection and had initiated a transaction, tables would remain locked until the connection timed out. I remedied the problem by re-writing every database transaction to be a single discrete transaction. While at the same company, I helped mitigate the same mistake in a JAVA application that ran over Citrix Terminals to an Oracle Database.

Mistakes such as this one can impair scalability even though the efforts that led to it were an attempt to promote scalability. Scaling to accommodate large numbers of concurrent users is, always has, and perhaps always will be one of the most challenging aspects of rolling out robust applications.

Here is a more detailed discussion on this subject http://msdn.microsoft.com/en-us/library/ms978509.aspx

How does your application measure up?

Let's make sure that our CRUD is not cruddy. The mantra remains, laziness will always come back to bite.

Data is the most important aspect of any business or social application.

The use of data as it relates to CRUD can make or break the effectiveness of any application.

How can we help you with your applications?

Click here to see other articles in the series.

Products