Category
5 min read

How Metis Optimized Queries Executed by SQLAlchemy - A Case Study

Let’s see how Metis can prevent, monitor, and troubleshoot our databases. In this part, we’re going to play with SQLAlchemy, a feature-rich ORM for Python. We are going to see actual insights provided by Metis and how they improve the query performance.
Published on
October 2, 2023
Share this post
Contributors
Adam Furmanek
Dev Rel
See how Metis can make your database 3x faster and 50% cheaper!

SQLAlchemy is an ORM for Python. It can connect to PostgreSQL and other SQL engines. It supports transactions, relations, eager loading, lazy loading, functions, and other common operations.

We are going to use SQLAlchemy to execute the same queries as in Part 1 of this series.

Data model

The first thing we need to do is to model the database. I’m using the following table definitions:

@dataclass
class TitleBasic(db.Model):
    __tablename__ = 'title_basics'
    __table_args__ = { "schema": "imdb" }
    tconst: str
    titletype: str
    primarytitle: str
    isadult: bool
    startyear: int
    endyear: int
    runtimeminutes: int
    genres: str


    tconst = db.Column(db.Text, primary_key = True)
    titletype = db.Column(db.Text)
    primarytitle = db.Column(db.Text)
    isadult = db.Column(db.Boolean)
    startyear = db.Column(db.Integer)
    endyear = db.Column(db.Integer)
    runtimeminutes = db.Column(db.Integer)
    genres = db.Column(db.Text)


@dataclass
class TitleCrew(db.Model):
    __tablename__ = 'title_crew'
    __table_args__ = { "schema": "imdb" }
    tconst: str
    directors: str
    writers: str


    tconst = db.Column(db.Text, primary_key = True)
    directors = db.Column(db.Text)
    writers = db.Column(db.Text)


@dataclass
class TitlePrincipal(db.Model):
    __tablename__ = 'title_principals'
    __table_args__ = { "schema": "imdb" }
    tconst: str
    ordering: int
    nconst: str
    category: str
    job: str
    characters: str


    tconst = db.Column(db.Text, db.ForeignKey("TitleBasic.tconst"), primary_key = True)
    ordering = db.Column(db.Integer, primary_key = True)
    nconst = db.Column(db.Text)
    category = db.Column(db.Text)
    job = db.Column(db.Text)
    characters = db.Column(db.Text)


@dataclass
class NameBasic(db.Model):
    __tablename__ = 'name_basics'
    __table_args__ = { "schema": "imdb" }
    nconst: str
    primaryname: str
    birthyear: int
    deathyear: int
    primaryprofession: str
    knownfortitles: str


    nconst = db.Column(db.Text, primary_key = True)
    primaryname = db.Column(db.Text)
    birthyear = db.Column(db.Integer)
    deathyear = db.Column(db.Integer)
    primaryprofession = db.Column(db.Text)
    knownfortitles = db.Column(db.Text)

We can see that most of the tables are configured to represent SQL tables directly with no modifications.

For a given actor, find their latest movies

Let’s start with our first query. We can implement the application code for getting the latest movies for an actor:

return db.session.execute(
	select(TitleBasic)
	.join(TitlePrincipal, TitleBasic.tconst == TitlePrincipal.tconst)
	.where(TitlePrincipal.nconst == nconst)
	.order_by(TitleBasic.startyear.desc())
	.limit(10)
).scalars().all()

This is the SQL code that has been generated:

SELECT
  imdb.title_basics.tconst,
  imdb.title_basics.titletype,
  imdb.title_basics.primarytitle,
  imdb.title_basics.isadult,
  imdb.title_basics.startyear,
  imdb.title_basics.endyear,
  imdb.title_basics.runtimeminutes,
  imdb.title_basics.genres
FROM
  imdb.title_basics
  JOIN imdb.title_principals ON imdb.title_basics.tconst = imdb.title_principals.tconst
WHERE
  imdb.title_principals.nconst = 'nm1588970'
ORDER BY
  imdb.title_basics.startyear DESC
LIMIT
  10 /*controller='titlesForAnActor',db_driver='psycopg2',db_framework='sqlalchemy%3A0.35b0',framework='flask%3A2.2.2',route='/titlesForAnActor',traceparent='00-3ec0aae37dbde0930b61f755039323ca-598bb0d6e50466cb-01'*/

We can see the query just joined two tables and filtered the rows. The query executes in nearly 70 seconds and reads millions of rows. That’s a lot. This is what Metis shows for the analysis:

This is the estimated execution plan. Let’s see exactly what was planned:

Notice that the database engine expected to read only a couple hundred rows. The reality is much worse than that.

We can try improving the query by adding the index:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst);

Now we get the following:

This is great. We can see that we managed to reduce the execution time to milliseconds thanks to Metis.

For a given actor, find their ten most highly rated films

Let’s now find the best movies for an actor. This is the code we can use:

return db.session.execute(
	select(TitleBasic)
	.join(TitlePrincipal, TitleBasic.tconst == TitlePrincipal.tconst)
	.join(TitleRating, TitleRating.tconst == TitleBasic.tconst)
	.where(TitlePrincipal.nconst == nconst)
	.order_by(TitleRating.averagerating.desc())
	.limit(10)
).scalars().all()

The code generates the following query:

SELECT
  imdb.title_basics.tconst,
  imdb.title_basics.titletype,
  imdb.title_basics.primarytitle,
  imdb.title_basics.isadult,
  imdb.title_basics.startyear,
  imdb.title_basics.endyear,
  imdb.title_basics.runtimeminutes,
  imdb.title_basics.genres
FROM
  imdb.title_basics
  JOIN imdb.title_principals ON imdb.title_basics.tconst = imdb.title_principals.tconst
  JOIN imdb.title_ratings ON imdb.title_ratings.tconst = imdb.title_basics.tconst
WHERE
  imdb.title_principals.nconst = 'nm1588970'
ORDER BY
  imdb.title_ratings.averagerating DESC
LIMIT
  10 /*controller='highestRatedMoviesForAnActor',db_driver='psycopg2',db_framework='sqlalchemy%3A0.35b0',framework='flask%3A2.2.2',route='/highestRatedMoviesForAnActor',traceparent='00-dda69cef096d50060e3e6d072786ec1c-799ca53e24965773-01'*/

The query runs in 3 seconds, and these are the insights:

We can see that it expects to read 211 rows in total. That’s not true, unfortunately. Let’s see the plan visualization:

We can see the statistics are misleading. The engine expects to read 1 row from title_basics and title_ratings. Let’s now add the index:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst);

And now the query runs immediately:

We can also check whether the index has been used:

As we can see, the estimated plan included the index.

Find the ten top-rated films with some number of votes

Let’s get ten top-rated movies with a minimal number of votes. Here is the application code for getting top movies:

return db.session.execute(
	select(TitleBasic)
	.join(TitleRating, TitleRating.tconst == TitleBasic.tconst)
	.where(TitleRating.numvotes >= numvotes)
	.order_by(TitleRating.averagerating.desc())
).scalars().all()

The code generates the following query:

SELECT
  imdb.title_basics.tconst,
  imdb.title_basics.titletype,
  imdb.title_basics.primarytitle,
  imdb.title_basics.isadult,
  imdb.title_basics.startyear,
  imdb.title_basics.endyear,
  imdb.title_basics.runtimeminutes,
  imdb.title_basics.genres
FROM
  imdb.title_basics
  JOIN imdb.title_ratings ON imdb.title_ratings.tconst = imdb.title_basics.tconst
WHERE
  imdb.title_ratings.numvotes >= '10000'
ORDER BY
  imdb.title_ratings.averagerating DESC /*controller='highestRatedMovies',db_driver='psycopg2',db_framework='sqlalchemy%3A0.35b0',framework='flask%3A2.2.2',route='/highestRatedMovies',traceparent='00-d84bd6dc38f6d4429a8368e054fb1037-9d57a33a0fcb207e-01'*/

We get the following insights:

Similarly to our first part, we can see table scans instead of indexes. Let’s add the index that Metis suggests:

CREATE INDEX IDX_imdb_title_ratings_7a4c4d1e1b2 ON imdb.title_ratings (numvotes);

This is what we get now:

We can see the index was used. Let’s see the insights:

We see that we return a very big result size. Apart from that, all is good.

Summary

That’s it for now. In the next part, we are going to see more queries with SQLAlchemy. They will be much more complex and sophisticated, however, Metis will give us enough help to optimize them. Stay tuned!

This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block.

Never worry about your
database again!

Start using Metis and get your database guardrails set up in minutes