Interacting with SQL databases from Python is tough when using the bare-bones DBAPI model of pyodbc, sqlite3 or otherwise.

Wouldn’t it be great if there were a system that could abstract all of the nitty gritty (e.g. connection ports and all that) and provide you with a standardised way of interacting with database?

Well, the answer to that question is SQLAlchemy!

Preliminaries

Before we start, I really should mention that all I am essentially doing in this blog post is condensing what is already documented - to a very high standard - on the official SQLAlchemy website.

The documentation will provide more details than I will, but I’m hoping that this blog post can perhaps abstract away some of the more ‘esoteric’ points raised in the official documentation and provide you with a more holistic view of how SQLAlchemy works conceptually and how you can deploy this package in your own projects.

What’s wrong with the classic approach?

The point of SQLAlchemy is to make it easier for you, as a user, to interact with databases.

Without SQLAlchemy, you need to connect to databases the old-school way which typically involves many calls to different intermediate objects.

With the sqlite3 API, you have to construct calls like this,

import sqlite3

# Open a connection
con = sqlite3.connect(':memory:')

# Create a cursor
cur = con.cursor()

# Execute a SQL query 
cur.execute('SELECT 123 AS Testing')

# Fetch results by way of a cursor object
res = cur.fetchone()

# Do something with the result e.g. 'print()' in this silly example
print(res)

(123,)

This is arguably not the right level of abstraction for the end-user, particularly if the end-user is a typical data analyst - for example, why does the user need to interact with an abstraction like a cursor as part of their workflow?

Part of this is as a result of the DBAPI specification but, still, there is something slightly off about this setup.

Getting started

Install sqlalchemy if you haven’t already.

To start, we need to actually ‘connect’ to a database instance. For illustrative purposes I’m going to execute everything in this tutorial ‘in memory’ (rather than on the hard disk itself or an external data source).

To facilitate connecting to a database, you need to first create an ‘engine’ object,

from sqlalchemy import create_engine

dialect = 'sqlite'
db_api = 'pysqlite'
location = ':memory:'

conn_url = f'{dialect}+{db_api}:///{location}'

engine = create_engine(url=conn_url, echo=True)

print(engine)

Engine(sqlite+pysqlite:///:memory:)

An engine is like a database manager or a central ‘hub’ of database connections. You can think of it a bit like a taxi rank: you go there to get a taxi (a connection) to your intended destination (a database).

An important point to note here is that we haven’t yet opened any connections to our database - in our analogy, all of the taxis are currently idle (this design is ‘lazy’). It is only once we demand an interaction that a connection is actually opened.

Under the hood, the engine is maintaining a ‘pool’ of database connections so that if you request an artifact from a certain connection at time X and then you perform a different request to the same connection at time X + 1, the engine can ‘reuse’ the existing database connection for this request. The reason for this design is that you can somewhat mitigate the inefficient process of establishing a database connection if the connection is already cached.

The main parameters you might want to configure manually for the underlying pool are pool_size (i.e. how many connections in the pool), max_overflow, pool_recycle and pool_timeout (i.e. when should connections be disconnected and flushed from the pool?) though I think the defaults are reasonable.

There are three components to setting up an Engine object:

The dialect just describes which particular dialect of SQL will be used for this engine
The db_api specification; SQLAlchemy has to use this third-party library in order to make calls to the database (remember, SQLAlchemy is one step removed from database interactions; it is a delegator or an orchestrator so it needs to delegate the work of interacting with the database elsewhere)
The location of the database. In our case, we are just going to build the database in memory

If you are building a more sophisticated database (e.g. Postgres), you might want to consider creating the connection URL with SQLAlchemy’s URL module instead,

from sqlalchemy import URL

conn_url = URL.create(
    drivername="postgresql+psycopg2",
    username="db_user",
    password="db_password",
    host="pg_host",
    port=1433,
    database="mydb",
)

Executing queries (Core)

Basic setup

Let’s run a mock example query using the engine as our base (just like before),

from sqlalchemy import create_engine, text

dialect = 'sqlite'
db_api = 'pysqlite'
location = ':memory:'

conn_url = f'{dialect}+{db_api}:///{location}'

engine = create_engine(url=conn_url, echo=True)

Let’s execute a query,

with engine.connect() as conn:
    conn.execute(text('CREATE TABLE Testing (x INT, y INT)'))
    conn.execute(text('INSERT INTO Testing VALUES (12, 34)'))
    res = conn.execute(text('SELECT * FROM Testing'))
    print(res.all())

2023-07-08 16:21:59,922 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,923 INFO sqlalchemy.engine.Engine CREATE TABLE Testing (x INT, y INT)
2023-07-08 16:21:59,924 INFO sqlalchemy.engine.Engine [generated in 0.00165s] ()
2023-07-08 16:21:59,925 INFO sqlalchemy.engine.Engine INSERT INTO Testing VALUES (12, 34)
2023-07-08 16:21:59,925 INFO sqlalchemy.engine.Engine [generated in 0.00079s] ()
2023-07-08 16:21:59,926 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,927 INFO sqlalchemy.engine.Engine [generated in 0.00083s] ()
[(12, 34)]
2023-07-08 16:21:59,928 INFO sqlalchemy.engine.Engine ROLLBACK

It is recommended that you wrap the connection method, .connect() inside a context manager. This allows you to leverage the connection resource for the duration of the with clause, but no longer (on ‘exit’ of the with clause, the connection is returned to the pool rather than being disconnected).

When you run this statement, you will notice in the console log a ROLLBACK statement. If you are not familiar with the ways of SQL, this might at first seem a little bit bemusing.

However, it has a fairly standard meaning: the term ROLLBACK means that any changes that are made to the database within the context of the connection resource are not persistent and the state of the database is rolled back post-execution. We can see this in action: if we try to run SELECT * FROM Testing outside of the scope of the above statement, we’ll notice that all of the values that we previously inserted have vanished,

with engine.connect() as conn:
    res = conn.execute(text('SELECT * FROM Testing'))
    print(res.all())

2023-07-08 16:21:59,936 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,936 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,937 INFO sqlalchemy.engine.Engine [cached since 0.01059s ago] ()
[]
2023-07-08 16:21:59,938 INFO sqlalchemy.engine.Engine ROLLBACK

This is the default behaviour of SQLAlchemy but you can modify it yourself with some extra toggles!.

Obviously this behaviour is undesirable when you actually want to make some peristent changes to the database. No problem, we can do that by calling the .commit() method.

Let’s try this again shall we. First, we will insert some values into the database table Testing, only this time we will call .commit(),

with engine.connect() as conn:
    conn.execute(text('INSERT INTO Testing VALUES (12, 34)'))
    conn.commit()

2023-07-08 16:21:59,946 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,947 INFO sqlalchemy.engine.Engine INSERT INTO Testing VALUES (12, 34)
2023-07-08 16:21:59,948 INFO sqlalchemy.engine.Engine [cached since 0.02366s ago] ()
2023-07-08 16:21:59,949 INFO sqlalchemy.engine.Engine COMMIT

Now if we select values from the same table within a different connection context, we should observe that the values have been persisted in the database,

with engine.connect() as conn:
    res = conn.execute(text('SELECT * FROM Testing'))
    print(res.all())

2023-07-08 16:21:59,963 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,964 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,966 INFO sqlalchemy.engine.Engine [cached since 0.03948s ago] ()
[(12, 34)]
2023-07-08 16:21:59,967 INFO sqlalchemy.engine.Engine ROLLBACK

Note that, alternatively, you can avoid the .commit() method by beginning a transaction explicitly with .begin(),

with engine.begin() as conn:
    conn.execute(text('INSERT INTO Testing VALUES (56, 78)'))

2023-07-08 16:21:59,977 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,979 INFO sqlalchemy.engine.Engine INSERT INTO Testing VALUES (56, 78)
2023-07-08 16:21:59,979 INFO sqlalchemy.engine.Engine [generated in 0.00074s] ()
2023-07-08 16:21:59,980 INFO sqlalchemy.engine.Engine COMMIT

Reading results

You’ll notice that in order to read results I have been invoking the .all() method on the connection results: this parses the query result as a list of tuples.

However, there are other ways to access results. The result object is iterable so you can run a for loop on it if you want to,

with engine.connect() as conn:
    res = conn.execute(text('SELECT * FROM Testing'))
    for row in res:
        print(f'(x: {row.x}, y: {row.y})')

2023-07-08 16:21:59,990 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:21:59,991 INFO sqlalchemy.engine.Engine SELECT * FROM Testing
2023-07-08 16:21:59,992 INFO sqlalchemy.engine.Engine [cached since 0.06612s ago] ()
(x: 12, y: 34)
(x: 56, y: 78)
2023-07-08 16:21:59,993 INFO sqlalchemy.engine.Engine ROLLBACK

You can find further information here on the various ways in which you may want to read the results of a query.

Parametrised queries

We can actually insert values based on a Pythonic data structure rather than hard-coding them into the raw SQL string,

with engine.connect() as conn:
    conn.execute(
        text('INSERT INTO Testing (x,y) VALUES (:x, :y)'),
        [{'x': 100, 'y': 200}, {'x': 300, 'y': 400}]
    )
    conn.commit()

2023-07-08 16:22:00,002 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2023-07-08 16:22:00,003 INFO sqlalchemy.engine.Engine INSERT INTO Testing (x,y) VALUES (?, ?)
2023-07-08 16:22:00,004 INFO sqlalchemy.engine.Engine [generated in 0.00204s] [(100, 200), (300, 400)]
2023-07-08 16:22:00,005 INFO sqlalchemy.engine.Engine COMMIT

Notice that I have issued more than one set of parameters to be inserted into the Testing table?

SQLAlchemy understands implicitly that it needs to execute the INSERT ... statement multiple times in this context (once for each parameter set).

This can come with performance limitations on larger parameter sets so it is not necessarily advisable to follow this approach on larger datasets (though there are ways of dealing with it, which depend on your dialect, see this Stackoverflow thread for a little more coverage on this issue).

Next up

That’s it for part I. In the next part of the series, we’ll move onto ORMs in SQLAlchemy and how you can use them to map Python objects to database tables. We’ll also discuss the merits and demerits of this approach because, whilst it is cool, it does come with its own set of drawbacks that people need to be aware of! ;)