Recently, I had the fortune (or misfortune depending on how you look at it) of rewatching Christopher Nolan’s 2020 sci-fi heist.. thriller? film starring John David Washington and Robert Pattinson1 amongst others.
If you haven’t watched Tenet yet, I’m not sure I’d recommend it. It’s one of those films that seems to get more confusing the more you watch it. Any payoff gained from finally ‘understanding’ something is usually small and most revelatory moments are structural rather than thematic - think “Christ, I guess that makes a bit of sense now (fistful of anger)?” rather than “Oh damn! It was him all along!”.
Anyway, in Tenet a central concept is that of inversion: the ability to ‘invert’ yourself or other objects so that they travel in the opposite direction of time. From the object’s perspective, everything goes backwards and from the perspective of an external subject, the object appears to travel backwards.
Confused yet? Unhelpfully there are (many) diagrams online that try to explain this concept but which also end up making everything 100x more confusing,
Sidenote: I don’t think inversion can be explained well with a static picture like this - you need to animate it to make the concept more understandable.
Anyway, I’m not here to talk about Tenet per se, but what this convoluted, almost paradoxical concept reminds me of is badly written SQL.
SQL itself is a bit weird
To start off with, the basic structure of a SQL query is itself slightly cyclical,
SELECT
x,
y
FROM table t
WHERE t.x < 1
AND t.y > 10
Whilst the query starts with SELECT
this is not the order in which the statements are executed. In fact, we start with FROM
(where are we sourcing our data from?), followed by the WHERE
clause (what filtering conditions do we impose on our source?) and only then do we ‘cycle back up’ to the SELECT
clause.
It’s a weird concept to get used to at first and it often gets more complicated. Julia Evans talks about this in her excellent book on learning SQL. Here is how she conceptualises a SELECT
query (folks, you really should invest in her book by the way),
Death by subqueries
Back to Tenet: every time a character inverts the timeline in Tenet, a new ‘copy’ of the original person is made and now coexists with its past self at the same time.
The more frequently a character inverts, the more confusing it gets because you - the observer - now have to try to juggle more than one timeline (and all the ensuing possibilities) in your head at the same time.
My analogy for this in SQL is nested subqueries because, fundamentally, they encourage the observer to keep everything inside one’s head, concurrently, which is not practical. Here’s a simple example of a subquery that goes one level deep,
SELECT
contract_id,
total_claims
FROM
(SELECT
contract_id,
SUM(claims) AS total_claims
FROM contract_claims
GROUP BY contract_id) claims
WHERE total_claims > 10000
The business problem here is to extract all contracts with claims that breach a threshold of $10,000.
This query isn’t necessarily ideal but it’s a feasible approach. Where things get really narly is when subqueries become nested.
Suppose we didn’t have ready access to the contract_claims
table. Then,
SELECT
contract_id,
total_claim_size
FROM
(SELECT
contract_id,
SUM(claim_size) AS total_claim_size
FROM (SELECT
contract_id,
claim_size
FROM contract
JOIN claims ON contract.contract_id = claims.contract_id) contract_claims
GROUP BY contract_id) claims
WHERE total_claim_size > 10000
I’m already struggling to read this. The simple fact of the matter is that it’s not ‘natural’ to have to read this query from the ‘inside-out’ (which is precisely what must be done in this case).
Possible ‘Solution’ #1
A more natural approach would be to convert this expression into a series of CTEs like so,
WITH contract_claims AS (
SELECT
contract_id,
claim_size
FROM contract
JOIN claims ON contract.contract_id = claims.contract_id
),
claim_sizes AS (
SELECT
contract_id,
SUM(claim_size) AS total_claim_size
FROM contract_claims
GROUP BY contract_id
)
SELECT
contract_id,
total_claim_size
FROM claim_sizes
WHERE total_claim_size > 10000
This approach is easier to read because - as a human being - you can now read this with some semblance of ‘order’!
- First we get the claims for each contract
- Second we aggregate the claims amount for each contract
- Finally we filter out those contracts with total claims of less than $10,000
Possible ‘Solution’ #2
Another alternative solution is to encapsulate each ‘subquery’ into a temporary table. This has an advantage over pure CTEs in efficiency terms because CTEs - if repetitively scattered across a script - will be repeated in most underlying execution plans. Most SQL engines however can optimise temporary tables much easier.
In the above then, something like the following would be better (not quite sure what dialect of SQL I’m trying to write here; that’s not the point though!),
DROP TABLE IF EXISTS #contract_claims
DROP TABLE IF EXISTS #claim_sizes
CREATE TABLE #contract_claims (
contract_id INT,
claim_size FLOAT
)
CREATE TABLE #claim_sizes (
contract_id INT,
total_claim_size FLOAT
)
INSERT INTO #contract_claims (
contract_id,
claim_size
)
SELECT
contract_id,
claim_size
FROM contract
JOIN claims ON contract.contract_id = claims.contract_id
INSERT INTO #claim_sizes (
contract_id,
total_claim_size
)
SELECT
contract_id,
SUM(claim_size) AS total_claim_size
FROM contract_claims
GROUP BY contract_id
SELECT
contract_id,
total_claim_size
FROM #claim_sizes
WHERE total_claim_size > 10000
Personal take
My personal take: avoid ‘nesting’ like the above whenever you can. It creates too heavy a cognitive load for anyone else reading your code.
The same thing applies to nested if
statements in procedural code. As a general rule, if you’re going more than one level deep, you’re probably doing something wrong2.
Remember, you’re trying to write good code; not the next incomprehensible iteration of Tenet!