I’m lucky enough to have worked with some excellent data engineers in my time as an analytics professional and I’ve learnt so much from working with each and every one of them.
It dawned on me the other day though that, despite reading through numerous stored procedures and building complicated SQL scripts, I still take so much of what is written on the page for granted. If you’d have asked me “What does ANSI_NULLS
mean?”, for instance, I don’t think I would have had a very good answer to that, short of checking the official T-SQL documentation.
That bugs me, because as a general rule, I do like to critically understand things before I use them!
To this end, here’s a short and sweet collection of T-SQL ‘boilerplate’ statements that you might have come across before (particularly w.r.t stored procedures) but - perhaps like me - were too afraid to ask about.
Managing comparisons
Suppose I have a table, Financials
, with columns InterestRateYesterday
and InterestRateToday
.
I want to find instances where InterestRateYesterday = InterestRateToday
- how can I do this?
I could write a query to tell me this information like so,
SELECT
*
FROM Financials
WHERE InterestRateYesterday = InterestRateToday
And that would, on the surface, give me what I want: records in which the interest rate is unchanged on a day-to-day basis.
But imagine that we have a few rows for which InterestRateYesterday
is NULL
and InterestRateToday
is NULL
. By default, T-SQL will count these instances as ‘True’ which is.. a bit weird right?! It’s not strictly true that the interest rates were ‘the same’ - we only know that the data is ‘missing’ which means we can’t really say either way and the answer is ‘unknown’. More generally, NULL
probably shouldn’t even be treated as a ‘real’ piece of data; it represents the absence of data and, as such, ‘equality’ to NULL
isn’t really a sensible question to ask.
This conclusion is also the conclusion of the official ANSI standard which states that comparisons like this should not evaluate to ‘True’.
To resolve this kind of comparison conundrum you can simply write the following statement into your SQL script,
SET ANSI_NULLS ON
With this setting enabled, NULL
comparisons will, in general, evaluate to UNKNOWN
. This helps to prevent erroneous comparisons and deal with sources of uncertainty in the underlying schema effectively.
You can find the full set of scenarios on the official T-SQL documentation hub.
Managing transactions
Erland Sommarskog summarises the importance of atomicity (the ‘A’ in ‘ACID’) in database management systems perfectly in his excellent series on error handling in SQL,
In a database system, we often want updates to be atomic.
For instance, say that the task is to transfer money from one account to another. To this end, we need to update two rows in the CashHoldings table and add two rows to the Transactions table. It’s absolutely impermissible that an error or an interruption would result in money being deposited into the receiving account without it being withdrawn from the other. For this reason, in a database application, error handling is also about transaction handling.
In this example, we need to wrap the operation in
BEGIN TRANSACTION
andCOMMIT TRANSACTION
, but not only that: in case of an error, we must make sure that the transaction is rolled back.
The key component of this statement is the final one: “in case of an error, we must make sure that the transaction is rolled back”.
By default, if you wrap a series of INSERT
statements on a given table inside a transaction (say), and one of those inserts fails, then only that specific insert will be rolled back - the other inserts performed will be persisted unless they are explicitly ‘rolled back’ with a ROLLBACK
statement. This is not normally desired behaviour: inserts performed on one table are often thought of as one ‘unit’ and ought to operate as an ‘all-or-nothing’ operation.
If you’re a bank, you don’t really want some of your customer transactions to be registered: it’s an ‘all-or-nothing’ procedure and, in the case of ‘nothing’, you really want the error to be raised / appropriately handled.
You can typically solve this problem by wrapping statements inside a TRY-CATCH
block with the appropriate transaction handling,
BEGIN TRY
BEGIN TRANSACTION
-- < Logic >
COMMIT TRANSACTION
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION
-- < Logic >
END CATCH
And this works in most cases but, annoyingly, if the client calling the database server experiences a ‘timeout’ (e.g. the user ‘cancels’ the remote operation or the program calling the code does not respond), the transaction established is actually left open which can cause a lot of other problems down the line. A very insightful discussion of this point can be found here.
As a real fix to all of this, there is a useful statement that you can incorporate into your scripts / procedures - in most cases alongside TRY-CATCH
- to safeguard data integrity (regardless of how the error originated, timeout or not),
SET XACT_ABORT ON
In practice, both SET XACT_ABORT ON
and TRY-CATCH
are typically used in tandem to cover all bases.
The TRY-CATCH
construct provides actual error handling i.e. you can actually write custom logic inside the CATCH
block and handle errors how you wish to whereas SET XACT_ABORT ON
ensures sufficient atomicity on the database when an explicit transaction is documented.
Rules around both of these constructs are summarised best in this particular answer on the DBA StackExchange,
TL;DR;
You only need to useBEGIN CATCH
and conditionalROLLBACK;
if handling the error.
SET XACT_ABORT
must always beON
if you have an explicit transaction, irrespective of whether there is aCATCH
, to ensure that rollback happens correctly.
Managing messages
When you modify rows in SQL Server, you’ll sometimes see messages like (1 row(s) affected) in SQL Server Management Studio console.
If you want to turn this feature off, a useful statement is,
SET NOCOUNT ON
This will suppress messages like the above example. Doesn’t do anything else of note but is a common feature of stored procedures in the wild that is worth being aware of!
Summary
A good example of how you can use all of these constructs in tandem might look like this,
SET ANSI_NULLS ON
GO
CREATE PROCEDURE sp_xyz
(
@param1 NVARCHAR(50),
...
)
AS
BEGIN
SET XACT_ABORT ON
SET NOCOUNT ON
...
END
GO
The specification of ANSI_NULLS
at the top level is important because ANSI_NULLS
cannot be configured at runtime (only at ‘create / modify’ time) - procedures only remember the setting at creation time. You can read more about this quirk here!