Category
5 min read

How to Optimize SQL Queries With ChatGPT and Metis

Machine Learning is the thing! Let’s see how we can optimize queries with ChatGPT and other online solutions, and how Metis can help even better.
Published on
June 20, 2023
Share this post
Contributors
Adam Furmanek
Dev Rel
Metis Team
See how Metis can make your database 3x faster and 50% cheaper!

We played with IMDb data in Playing with IMDb data. We saw how to improve queries using Metis, and how to configure our database to get best possible performance. I highly recommend reading that article before moving on.

Let’s see if ChatGPT can do the same. I’m using ChatGPT May 24 version available for free at https://chat.openai.com/ 

We are going to take each query and ask ChatGPT how to improve it. We’ll then measure the performance before and after the suggestions to see if it’s worth doing that.

For a given actor, find their latest movies

For each case I provide a link to the discussion with ChatGPT. Here is the first one: https://chat.openai.com/share/36cdd1ac-6520-4a63-8c50-0a514f0e82dc 

We start with the following prompt for ChatGPT:

SQL Queries optimization with ChatGPT prompt discussion

Let’s try it now. I’m restarting my database and running the initial query a few times to get buffers filled and best result which is 4.8 seconds. Let’s now take improved query as specified by ChatGP:

SELECT TB.tconst
FROM name_basics AS NB
LEFT JOIN title_principals AS TP ON TP.nconst = NB.nconst
LEFT JOIN title_basics AS TB ON TB.tconst = TP.tconst
WHERE NB.nconst = 'nm1588970' -- Apply filtering before joining
ORDER BY TB.startyear DESC
LIMIT 10

You can see that I’m taking tconst only. Unfortunately, the execution time is the same (4.8 seconds). ChatGPT didn’t improve the query in this case. Obviously, I could add an index and ChatGPT even mentioned columns nconst and tconst. However, it also claimed that I should index startyear which I use in filtering, but that is not true at all. I order based on the column. Anyway, let’s ask how to configure indexes:

SQL Queries optimization with ChatGPT prompt discussion 2

Let’s create these indexes and see the timings. It took nearly 4 minutes to create indexes. Running the query afterwards took 0.1 second. Clearly an improvement. Execution plan shown by Metis indicates that we used two indexes: idx_title_principals_nconst and idx_title_basics_tconst:

SQL Queries optimization execution plan table

However, if we configure the index indicated by Metis:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst);

We get the following plan:

SQL Queries optimization execution plan table with indicated index

We get the same query performance, but with fewer indexes.

For a given actor, find their ten most highly rated films

https://chat.openai.com/share/0f6e892a-c5d7-4202-b5db-afcf279f2584

Let’s take another query and ask ChatGPT:

SQL Queries optimization with ChatGPT prompt discussion on speeding up an SQL query

ChatGPT didn’t provide a query. Let’s ask for it:

 SQL Queries optimization with ChatGPT prompt discussing alternative query

Let’s try it out. The original query was running in 4.9 seconds. I tried the one suggested by ChatGPT

SELECT TB.tconst, TR.tconst -- Replace column1 and column2 with the actual column names needed
FROM name_basics AS NB
JOIN title_principals AS TP ON TP.nconst = NB.nconst
JOIN title_basics AS TB ON TB.tconst = TP.tconst
JOIN title_ratings AS TR on TR.tconst = TP.tconst
WHERE NB.nconst = 'nm1588970'
ORDER BY TR.averagerating DESC, TR.numvotes DESC
LIMIT 10;

This query has the same execution time. However, it changes the output! It uses INNER JOIN instead of a LEFT JOIN which results in returning just one row instead of two.

Based on the change above, it seems risky to use ChatGPT to optimize the query.

Find the ten top rated films with some number of votes

https://chat.openai.com/share/439c131f-2fc9-4741-a7d1-4aa31e082aec

SQL Queries optimization with ChatGPT prompt discussing making faster SQL query

We can see that ChatGPT suggested a query that is exactly the same as the one I provided. You can verify with execution plans that there is no difference in the plan. Let’s see what indexes id would propose:

SQL Queries optimization with ChatGPT prompt discussing index configuration

We can see this is the same index as the one Metis suggested.

Given two people, list what movies they appeared in together

https://chat.openai.com/share/b95a8045-32f2-4ecb-9da1-c16b39aea4e5 

Let’s carry on with the following:

SQL Queries optimization with ChatGPT prompt discussing making faster SQL query

Nothing big this time. Let’s ask for indexes:

SQL Queries optimization with ChatGPT prompt discussing index configuration

We can see this is exactly the same index that Metis suggests.

List all of the cast and crew in a given movie

https://chat.openai.com/share/040609fc-4c0a-4a90-a913-08f5ecad65c8 

Let’s do something fancier now.

SQL Queries optimization with ChatGPT prompt discussing making faster SQL query

Nothing specific yet. We can see ChatGPT suggests getting rid of OR conditions and using UNION instead. Let’s see if ChatGPT can suggest a better query.

SQL Queries optimization with ChatGPT prompt discussing UNION query alternative

Notice that ChatGPT didn’t realize that we don’t need the title_basics table. When it comes to the performance, my query works in around 86 seconds, ChatGPT’s one works in 84. Not much better. Let’s see if we can hint ChatGPT to ignore the table:

SQL Queries optimization with ChatGPT prompt discussion on title_basics table

Cool, ChatGPT figured out what we mean. However, the new query doesn’t work faster. It’s the same performance. Let’s see if UNION can do better:

SQL Queries optimization with ChatGPT prompt discussion on using UNION condition to optimize a query

Looks promising. First, the query has no errors, it runs correctly. Second, it’s much faster than the previous one. It runs in 24 seconds vs nearly 90 before.

However, the new query returns one additional row full of nulls. It works faster, but changes the behavior and may break the application. Not to mention that we can do better. The query I rewrote manually runs in less than 0.1 seconds and returns the correct dataset. Therefore, relying on ChatGPT in this case gave significant speedups, but we can do much better.

Find the most prolific actor in a given period

https://chat.openai.com/share/65216370-747c-4cde-b14e-df92f93d53c5

Let’s carry on:

SQL Queries optimization with ChatGPT prompt discussing making a faster SQL query

Let’s see how the new query performs. The original one runs in 21 seconds. The new one has the same performance. Removing aggregates didn’t help.

Again, the optimized query I wrote by hand in the other post was faster. It executes in 14 seconds.

Find most prolific actors in a given genre

https://chat.openai.com/share/03644158-50ee-467d-b841-08e34e1bb9fa

Let’s carry one with genres now:

SQL Queries optimization with ChatGPT prompt discussing making a faster genres query

Let’s ask how the optimized query would look like:

SQL Queries optimization with ChatGPT prompt discussing an optimized query

Let’s see how it worked. Original query runs in 42 seconds, the one proposed by ChatGPT runs in 27 seconds. That looks like a big win! However, the query provided by ChatGPT completely missed the point. One can’t replace LIKE operator with IN just like that. ChatGPT’s query returns completely different (and incorrect) results.

Let’s see if ChatGPT can realize that:

SQL Queries optimization with ChatGPT prompt discussing LIKE instead of IN query

The second query provided by ChatGPT works in 52 seconds and returns the correct dataset. Therefore, it’s even worse than it used to be. The optimized query I wrote by hand runs in 34 seconds. Once again, I did better than ChatGPT.

Finding most common coworkers

https://chat.openai.com/share/990f3964-0c90-48f6-8cfb-24d6c34c03aa

Let’s now do something crazy.

SQL Queries optimization with ChatGPT prompt discussing a common coworkers query

Let’s ask for a query specifically:

SQL Queries optimization with ChatGPT prompt discussing an optimal query

My query runs in 9 seconds. The one provided by ChatGPT executes in 6 seconds, but it returns incorrect results. Again, let’s see if ChatGPT can fix that:

SQL Queries optimization with ChatGPT prompt discussing missing LIKE in a query

Query runs in 8 seconds, but still returns the wrong data. ChatGPT couldn’t make it better.

Summary

Database tuning is hard. ChatGPT can help with some simple queries and indexes, but can’t be trusted blindly. Queries may return wrong data, or just provide only partial performance improvement. I'm surprised it generated queries that just work and don't have any syntactic errors. However, that's not enough.

To improve query performance we need good tooling. Metis can give us all the insights we need to reason about queries and optimize them as needed.

This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block.

Never worry about your
database again!

Start using Metis and get your database guardrails set up in minutes