I’m lucky enough to have worked with some excellent data engineers in my time as an analytics professional and I’ve learnt so much from working with each and every one of them.

It dawned on me the other day though that, despite reading through numerous stored procedures and building complicated SQL scripts, I still take so much of what is written on the page for granted. If you’d have asked me “What does ANSI_NULLS mean?”, for instance, I don’t think I would have had a very good answer to that, short of checking the official T-SQL documentation.

That bugs me, because as a general rule, I do like to critically understand things before I use them!

To this end, here’s a short and sweet collection of T-SQL ‘boilerplate’ statements that you might have come across before (particularly w.r.t stored procedures) but - perhaps like me - were too afraid to ask about.

Managing comparisons

Suppose I have a table, Financials, with columns InterestRateYesterday and InterestRateToday.

I want to find instances where InterestRateYesterday = InterestRateToday - how can I do this?

I could write a query to tell me this information like so,

SELECT 
  *
FROM Financials 
WHERE InterestRateYesterday = InterestRateToday

And that would, on the surface, give me what I want: records in which the interest rate is unchanged on a day-to-day basis.

But imagine that we have a few rows for which InterestRateYesterday is NULL and InterestRateToday is NULL. By default, T-SQL will count these instances as ‘True’ which is.. a bit weird right?! It’s not strictly true that the interest rates were ‘the same’ - we only know that the data is ‘missing’ which means we can’t really say either way and the answer is ‘unknown’. More generally, NULL probably shouldn’t even be treated as a ‘real’ piece of data; it represents the absence of data and, as such, ‘equality’ to NULL isn’t really a sensible question to ask.

This conclusion is also the conclusion of the official ANSI standard which states that comparisons like this should not evaluate to ‘True’.

To resolve this kind of comparison conundrum you can simply write the following statement into your SQL script,

SET ANSI_NULLS ON

With this setting enabled, NULL comparisons will, in general, evaluate to UNKNOWN. This helps to prevent erroneous comparisons and deal with sources of uncertainty in the underlying schema effectively.

You can find the full set of scenarios on the official T-SQL documentation hub.

Managing transactions

Erland Sommarskog summarises the importance of atomicity (the ‘A’ in ‘ACID’) in database management systems perfectly in his excellent series on error handling in SQL,

In a database system, we often want updates to be atomic.

For instance, say that the task is to transfer money from one account to another. To this end, we need to update two rows in the CashHoldings table and add two rows to the Transactions table. It’s absolutely impermissible that an error or an interruption would result in money being deposited into the receiving account without it being withdrawn from the other. For this reason, in a database application, error handling is also about transaction handling.

In this example, we need to wrap the operation in BEGIN TRANSACTION and COMMIT TRANSACTION, but not only that: in case of an error, we must make sure that the transaction is rolled back.

The key component of this statement is the final one: “in case of an error, we must make sure that the transaction is rolled back”.

By default, if you wrap a series of INSERT statements on a given table inside a transaction (say), and one of those inserts fails, then only that specific insert will be rolled back - the other inserts performed will be persisted unless they are explicitly ‘rolled back’ with a ROLLBACK statement. This is not normally desired behaviour: inserts performed on one table are often thought of as one ‘unit’ and ought to operate as an ‘all-or-nothing’ operation.

If you’re a bank, you don’t really want some of your customer transactions to be registered: it’s an ‘all-or-nothing’ procedure and, in the case of ‘nothing’, you really want the error to be raised / appropriately handled.

You can typically solve this problem by wrapping statements inside a TRY-CATCH block with the appropriate transaction handling,

BEGIN TRY

  BEGIN TRANSACTION
  
  -- < Logic >
  
  COMMIT TRANSACTION

END TRY
BEGIN CATCH

  ROLLBACK TRANSACTION
  
  -- < Logic >

END CATCH

And this works in most cases but, annoyingly, if the client calling the database server experiences a ‘timeout’ (e.g. the user ‘cancels’ the remote operation or the program calling the code does not respond), the transaction established is actually left open which can cause a lot of other problems down the line. A very insightful discussion of this point can be found here.

As a real fix to all of this, there is a useful statement that you can incorporate into your scripts / procedures - in most cases alongside TRY-CATCH - to safeguard data integrity (regardless of how the error originated, timeout or not),

SET XACT_ABORT ON

In practice, both SET XACT_ABORT ON and TRY-CATCH are typically used in tandem to cover all bases.

The TRY-CATCH construct provides actual error handling i.e. you can actually write custom logic inside the CATCH block and handle errors how you wish to whereas SET XACT_ABORT ON ensures sufficient atomicity on the database when an explicit transaction is documented.

Rules around both of these constructs are summarised best in this particular answer on the DBA StackExchange,

TL;DR;
You only need to use BEGIN CATCH and conditional ROLLBACK; if handling the error.
SET XACT_ABORT must always be ON if you have an explicit transaction, irrespective of whether there is a CATCH, to ensure that rollback happens correctly.

Managing messages

When you modify rows in SQL Server, you’ll sometimes see messages like (1 row(s) affected) in SQL Server Management Studio console.

If you want to turn this feature off, a useful statement is,

SET NOCOUNT ON

This will suppress messages like the above example. Doesn’t do anything else of note but is a common feature of stored procedures in the wild that is worth being aware of!

Summary

A good example of how you can use all of these constructs in tandem might look like this,

SET ANSI_NULLS ON
GO

CREATE PROCEDURE sp_xyz
(
    @param1 NVARCHAR(50),
    ...
)
AS
BEGIN

    SET XACT_ABORT ON
    SET NOCOUNT ON

    ...
       
END
GO

The specification of ANSI_NULLS at the top level is important because ANSI_NULLS cannot be configured at runtime (only at ‘create / modify’ time) - procedures only remember the setting at creation time. You can read more about this quirk here!