Sequelize is an ORM for Node.js and TypeScript. It can connect to PostgreSQL and other SQL engines. It supports transactions, relations, eager loading, lazy loading, functions, and other common operations. Previous part covered some of the queries from Part 1 of this series. This time we are going to implement more sophisticated scenarios to see how Sequelize can deal with them. Let’s go.
Given two people, list what movies they appeared in together
Let’s find common movies for two people. That’s the application code doing that:
The code generates the following query:
Metis indicates a missing index:
The query takes 5 seconds to finish. We can now improve the query performance by adding an index for title_principals. This is the code to do that:
Notice that we had to configure two associations in the model to make this work. This is very misleading and should be avoided if possible. What if we don’t like the second association? We can implement joins in the application code, like this:
We first load titles for the first person, then load titles for the second person. Finally, we process the lists together, and we send another query that would use both of them. This is the interaction with the database:
These two queries are rather straightforward. We could then intersect the datasets in the application code, or we can ask the database to do so. No matter what we do, we end up with a query similar to this one:
This scenario is now slower. Processing results in application takes 9 seconds to complete, so it’s slower than plain SQL. We can’t tell that it’s always the case, as sometimes it’s faster to process data in the application, especially when we don’t extract many records from the SQL. Your mileage may vary, so always measure your solution.
List all of the cast and crew in a given movie
Let’s now find all the people involved in the movie. Here is the first solution. We can just send the raw query:
This works, but is inefficient as we identified in the previous part. We can do the query with union to speed things up:
They take 100 seconds and 25 seconds respectively. That’s not very fast. Let’s try implementing the same in the application code:
We first take the crew from the title_principals table. Next, we extract the crew from the title_crew. Finally, we split directors and writers, and we calculate the intersection of both datasets to get the names.
This generates the following trace in Metis:
It’s all green! We can see that we have three queries sent, but they are blazingly fast because of indexes. This time it was much better to do processing in the application instead on the SQL level.
As mentioned in the previous section, there is no silver bullet. You need to measure your code and see where the performance benefits are.
Find the most prolific actor in a given period
Let’s now find the person that starred in the most movies in a given period. Let’s do that using application code:
We first find titles in a given period. Next, we take the crew for each of them. Finally, we need to group the identifiers and get the top one, and get the details. This is a bit lengthy and is rather slow because we get all the movies in a given period. That’s a lot of data:
Metis indicates nearly 70 000 rows. That’s a lot to process. What’s more, we then take all those identifiers and send in another query which is nearly 100 000 characters long.
Can we do better? Yes, we can send a raw query:
Metis shows the following:
There are two tables that we can configure indexes on: title_principals and title_basics. Once we do that, we get the best performance.
Find most prolific actors in a given genre
Let’s now find actors that did the most movies in a given genre. This is a very similar scenario to the previous one. Let’s start with the application code again:
This is nearly the same as before, only this time we query for genre. Metis shows the following:
This is even worse than the previous scenario. We now extract around 400 000 rows. While it takes 2.5 seconds to execute the query, sending the data over the wire takes much longer.
To fix that, we can use the raw query:
This gives the following:
Finally, we can configure indexes and make this query much faster.
Finding most common coworkers
Let’s now try to find people that work together the most. We can do that with a raw query:
Metis shows the following:
We can see multiple table scans and millions of rows read. We can optimize that by adding indexes. Let’s see if we can make it faster just by running the code in application:
First, we want to take that data from the title_principals table. We start with a known nconst, find all movies for a given person, and then for all the movies we find other people.
Next, we take similar data from title_crew. We then parse the data, aggregate it, and find top teammates. Finally, we get the data from the name_basics table.
We see that Metis identifies indexes that may improve the scenario. Even without these indexes, the query running in application finishes in 6 seconds vs 19 seconds with raw query. That’s a good starting point. Once again, we need to add improvements and measure them to see how things work in practice.
We examined various real scenarios with Sequelize to see how we can improve the performance. In previous parts of this series we relied on the database only to make it faster. This time we can do some processing with ORM and the application to change the performance characteristics. Metis shows good insights into all what we do, so we can track the optimizations and performance benefits as we proceed with the implementation changes. We can do the same with our actual applications even if we have very few rows in the database.