We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.
Mastering SQLAlchemy: Part I - The Core
Mastering the Pythonic ORM
Author
Johnny Breen
Published
July 8, 2023
subscribe.html
Interacting with SQL databases from Python is tough when using the bare-bones DBAPI model of pyodbc, sqlite3 or otherwise.
Wouldn’t it be great if there were a system that could abstract all of the nitty gritty (e.g. connection ports and all that) and provide you with a standardised way of interacting with database?
Well, the answer to that question is SQLAlchemy!
Preliminaries
Before we start, I really should mention that all I am essentially doing in this blog post is condensing what is already documented - to a very high standard - on the official SQLAlchemy website.
The documentation will provide more details than I will, but I’m hoping that this blog post can perhaps abstract away some of the more ‘esoteric’ points raised in the official documentation and provide you with a more holistic view of how SQLAlchemy works conceptually and how you can deploy this package in your own projects.
What’s wrong with the classic approach?
The point of SQLAlchemy is to make it easier for you, as a user, to interact with databases.
Without SQLAlchemy, you need to connect to databases the old-school way which typically involves many calls to different intermediate objects.
With the sqlite3 API, you have to construct calls like this,
import sqlite3# Open a connectioncon = sqlite3.connect(':memory:')# Create a cursorcur = con.cursor()# Execute a SQL query cur.execute('SELECT 123 AS Testing')# Fetch results by way of a cursor objectres = cur.fetchone()# Do something with the result e.g. 'print()' in this silly exampleprint(res)
(123,)
This is arguably not the right level of abstraction for the end-user, particularly if the end-user is a typical data analyst - for example, why does the user need to interact with an abstraction like a cursor as part of their workflow?
Part of this is as a result of the DBAPI specification but, still, there is something slightly off about this setup.
Getting started
Install sqlalchemy if you haven’t already.
To start, we need to actually ‘connect’ to a database instance. For illustrative purposes I’m going to execute everything in this tutorial ‘in memory’ (rather than on the hard disk itself or an external data source).
To facilitate connecting to a database, you need to first create an ‘engine’ object,
An engine is like a database manager or a central ‘hub’ of database connections. You can think of it a bit like a taxi rank: you go there to get a taxi (a connection) to your intended destination (a database).
An important point to note here is that we haven’t yet opened any connections to our database - in our analogy, all of the taxis are currently idle (this design is ‘lazy’). It is only once we demand an interaction that a connection is actually opened.
Under the hood, the engine is maintaining a ‘pool’ of database connections so that if you request an artifact from a certain connection at time X and then you perform a different request to the same connection at time X + 1, the engine can ‘reuse’ the existing database connection for this request. The reason for this design is that you can somewhat mitigate the inefficient process of establishing a database connection if the connection is already cached.
The main parameters you might want to configure manually for the underlying pool are pool_size (i.e. how many connections in the pool), max_overflow, pool_recycle and pool_timeout (i.e. when should connections be disconnected and flushed from the pool?) though I think the defaults are reasonable.
There are three components to setting up an Engine object:
The dialect just describes which particular dialect of SQL will be used for this engine
The db_api specification; SQLAlchemy has to use this third-party library in order to make calls to the database (remember, SQLAlchemy is one step removed from database interactions; it is a delegator or an orchestrator so it needs to delegate the work of interacting with the database elsewhere)
The location of the database. In our case, we are just going to build the database in memory
If you are building a more sophisticated database (e.g. Postgres), you might want to consider creating the connection URL with SQLAlchemy’s URL module instead,
with engine.connect() as conn: conn.execute(text('CREATE TABLE Testing (x INT, y INT)')) conn.execute(text('INSERT INTO Testing VALUES (12, 34)')) res = conn.execute(text('SELECT * FROM Testing'))print(res.all())
2023-07-08 16:21:59,922 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,923 INFO sqlalchemy.engine.Engine CREATE TABLE Testing (x INT, y INT)
2023-07-08 16:21:59,924 INFO sqlalchemy.engine.Engine [generated in 0.00165s] ()
2023-07-08 16:21:59,925 INFO sqlalchemy.engine.Engine INSERT INTO Testing VALUES (12, 34)
2023-07-08 16:21:59,925 INFO sqlalchemy.engine.Engine [generated in 0.00079s] ()
2023-07-08 16:21:59,926 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,927 INFO sqlalchemy.engine.Engine [generated in 0.00083s] ()
[(12, 34)]
2023-07-08 16:21:59,928 INFO sqlalchemy.engine.Engine ROLLBACK
It is recommended that you wrap the connection method, .connect() inside a context manager. This allows you to leverage the connection resource for the duration of the with clause, but no longer (on ‘exit’ of the with clause, the connection is returned to the pool rather than being disconnected).
When you run this statement, you will notice in the console log a ROLLBACK statement. If you are not familiar with the ways of SQL, this might at first seem a little bit bemusing.
However, it has a fairly standard meaning: the term ROLLBACK means that any changes that are made to the database within the context of the connection resource are not persistent and the state of the database is rolled back post-execution. We can see this in action: if we try to run SELECT * FROM Testing outside of the scope of the above statement, we’ll notice that all of the values that we previously inserted have vanished,
with engine.connect() as conn: res = conn.execute(text('SELECT * FROM Testing'))print(res.all())
2023-07-08 16:21:59,936 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,936 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,937 INFO sqlalchemy.engine.Engine [cached since 0.01059s ago] ()
[]
2023-07-08 16:21:59,938 INFO sqlalchemy.engine.Engine ROLLBACK
Obviously this behaviour is undesirable when you actually want to make some peristent changes to the database. No problem, we can do that by calling the .commit() method.
Let’s try this again shall we. First, we will insert some values into the database table Testing, only this time we will call .commit(),
with engine.connect() as conn: conn.execute(text('INSERT INTO Testing VALUES (12, 34)')) conn.commit()
2023-07-08 16:21:59,946 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,947 INFO sqlalchemy.engine.Engine INSERT INTO Testing VALUES (12, 34)
2023-07-08 16:21:59,948 INFO sqlalchemy.engine.Engine [cached since 0.02366s ago] ()
2023-07-08 16:21:59,949 INFO sqlalchemy.engine.Engine COMMIT
Now if we select values from the same table within a different connection context, we should observe that the values have been persisted in the database,
with engine.connect() as conn: res = conn.execute(text('SELECT * FROM Testing'))print(res.all())
2023-07-08 16:21:59,963 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,964 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,966 INFO sqlalchemy.engine.Engine [cached since 0.03948s ago] ()
[(12, 34)]
2023-07-08 16:21:59,967 INFO sqlalchemy.engine.Engine ROLLBACK
Note that, alternatively, you can avoid the .commit() method by beginning a transaction explicitly with .begin(),
with engine.begin() as conn: conn.execute(text('INSERT INTO Testing VALUES (56, 78)'))
2023-07-08 16:21:59,977 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,979 INFO sqlalchemy.engine.Engine INSERT INTO Testing VALUES (56, 78)
2023-07-08 16:21:59,979 INFO sqlalchemy.engine.Engine [generated in 0.00074s] ()
2023-07-08 16:21:59,980 INFO sqlalchemy.engine.Engine COMMIT
Reading results
You’ll notice that in order to read results I have been invoking the .all() method on the connection results: this parses the query result as a list of tuples.
However, there are other ways to access results. The result object is iterable so you can run a for loop on it if you want to,
with engine.connect() as conn: res = conn.execute(text('SELECT * FROM Testing'))for row in res:print(f'(x: {row.x}, y: {row.y})')
2023-07-08 16:21:59,990 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,991 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,992 INFO sqlalchemy.engine.Engine [cached since 0.06612s ago] ()
(x: 12, y: 34)
(x: 56, y: 78)
2023-07-08 16:21:59,993 INFO sqlalchemy.engine.Engine ROLLBACK
You can find further information here on the various ways in which you may want to read the results of a query.
Parametrised queries
We can actually insert values based on a Pythonic data structure rather than hard-coding them into the raw SQL string,
with engine.connect() as conn: conn.execute( text('INSERT INTO Testing (x,y) VALUES (:x, :y)'), [{'x': 100, 'y': 200}, {'x': 300, 'y': 400}] ) conn.commit()
2023-07-08 16:22:00,002 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:22:00,003 INFO sqlalchemy.engine.Engine INSERT INTO Testing (x,y) VALUES (?, ?)
2023-07-08 16:22:00,004 INFO sqlalchemy.engine.Engine [generated in 0.00204s] [(100, 200), (300, 400)]
2023-07-08 16:22:00,005 INFO sqlalchemy.engine.Engine COMMIT
Notice that I have issued more than one set of parameters to be inserted into the Testing table?
SQLAlchemy understands implicitly that it needs to execute the INSERT ... statement multiple times in this context (once for each parameter set).
This can come with performance limitations on larger parameter sets so it is not necessarily advisable to follow this approach on larger datasets (though there are ways of dealing with it, which depend on your dialect, see this Stackoverflow thread for a little more coverage on this issue).
Next up
That’s it for part I. In the next part of the series, we’ll move onto ORMs in SQLAlchemy and how you can use them to map Python objects to database tables. We’ll also discuss the merits and demerits of this approach because, whilst it is cool, it does come with its own set of drawbacks that people need to be aware of! ;)