Object-Relational Mapping (ORM) proves invaluable for developers by streamlining the interaction between relational databases and their application code. By abstracting the database interactions into high-level programming constructs, ORM systems like Sequelize alleviate the need for developers to write complex and error-prone SQL queries. This simplification accelerates development cycles and reduces the potential for SQL injection vulnerabilities. Moreover, ORM fosters platform independence, allowing developers to seamlessly switch between different database management systems without rewriting extensive code.
Sequelize offers developers a seamless and efficient means of interacting with databases.It’s written in TypeScript, provides an easy-to-use API for running the queries. While its API is constructed much more like a query builder, it can still help developers to write queries with full type-safety. It supports multiple database engines, gives the ability to escape the queries written by hand, handles parameters, and integrates well with the rest of the ecosystem.
I’m going to implement the application on Amazon Linux 2, but you should be able to run things on any operating system.
Let’s start by downloading the initial code from GitHub. Fork the repository and checkout tag Step_1:
The server is configured in app.js. We configure logger, CORS, load controllers, and handle errors.
The second one exposes some more methods to get actors, titles by name, crew, and other interesting stuff.
Both controllers are backed by services that don’t return any data yet. You can see them here and here. We also define a set of tests that verify if the application works end to end. The test suite iterates over all the endpoints, calls them one by one, and checks if it received HTTP code [.code]200[.code] indicating success:
Let’s now see the data source.
We are going to use the IMDb dataset. You can download the files and transform them into a database, but the easiest way is to take a docker container that does that already. Clone this repository and run the following script:
This should run a database locally and expose the port 5432 to the localhost.
Once you start it and connect to the database, you can see that some columns are not normalized. For instance title_crew has this row:
When joining the data, we’ll need to split these values by comma and do some more processing.
There are two important identifiers in the IMDb dataset: tconst and nconst. tconst is used to indicate a title. nconst is for a name (actor, director, writer, etc.). We’re going to use these identifiers to join tables efficiently.
The dataset has millions of rows so we can use it to analyze the performance easily. Let’s carry on.
Starting the Application
We have multiple scripts in the root directory of the package that let you run things locally. The application can be run both with your local Node installation or inside a Docker container, so you don’t need to install anything locally.
Let’s start the application locally. build-and-run.sh will compile the application locally and expose it on the default port 3000:
Once you run the script, you can verify that it works properly with:
This should return an empty array:
You can also run test.sh that will run all endpoint tests locally with mocha:
You can also start the application in docker with start-service.sh or start-service.ps1. You can test it with curl just like before. You can remove the docker afterwards with remove-container.sh and remove-container.ps1.
At this point our application works. Let’s now integrate it with Sequelize.
Integration with Sequelize
We’re now going to install Sequelize. You can run all the commands locally (assuming you have Node configured), or you can just checkout the Step_2 tag and see how it works (especially if you don’t have Node and want to run things in docker only).
Let’s install Sequelize as described in its Getting Started guide:
These commands add the packages to the package.json file. We can now use the Sequelize library to access the database.
Let’s start by defining entities. Let’s create src/names/entities/name_basic.entity.js with the following content:
We import Sequelize, and then create a class NameBasic that extends the Model. We define fields like nconst or primaryname, and we define the model name as NameBasic. We can carry on defining other entities the same way.
Next, we need to initialize Sequelize when loading the module. Let’s create src/models/index.js:
We create a new Sequelize instance that connects to the database with the connection string obtained from the environment variables. The environment variables are loaded from the .env file and the connection string points to the local database.
Next, we load entities, define relations between tables, and finally set exports so we can use them from outside of this module.
We can now start using Sequelize. Let’s modify title_ratings.service.js and extract some best movies:
You can see that we import the TitleRating class, and then use the findAll method to filter movies with 10 stars. You can now start the application and use curl to verify that the endpoint returns data now.
You can also rerun tests to see that all works, and that the endpoint for getting best titles is sending an SQL query:
Writing Queries with Sequelize
We can now add more queries to do some more sophisticated logic. Just like before, you can modify the code manually, or checkout the tag Step_3. All the changes are in this commit.
Go to src/titles/titles.service.js and add the following imports at the top of the file:
Let’s now implement the method to get all the titles by their name:
You can see that the structure of the query is the same. We use the findAll method, but this time we need to configure the filter a little bit differently. We want to use the LIKE operator that will do the “contains” operation.
Let’s now implement a method to get all the titles for a given actor:
We take the table title_basic and then join it with title_principal using the tconst column. Next, we filter based on the nconst identifier, and finally sort the result by the startyear to get the latest movies first. We return ten of them.
Similarly, we can get the highest rated movies for an actor:
We join title_basic with title_rating, do the filtering, and finally we order the result based on the star rating of the movie.
Sequelize is very powerful, however, sometimes it’s easier to implement the SQL query directly. We can do it like this to get the crew of a movie:
As mentioned before, some columns in IMDb are stored as CSV values. In order to find a specific component of the CSV, we need to split it by commas and do some pattern matching. Later we’re going to see how to do the same in a different way.
Let’s now run all the tests and see that they send actual queries and get more data:
Introducing Database Monitoring and Observability
Our application works, but we have no idea what it does and what queries it generates. We don’t know whether it’s going to be fast enough when deployed to production. We need to see behind the scenes to make sure that everything works well.
We’re going to integrate the application with Metis to build Database Monitoring. This way we’ll be able to see the performance issues and optimize our queries before going to production.
Before moving on, go to Metis and create a project over there. You’ll need an API key to integrate the application.
You can see all the changes in the tag Step_4. We’re going to implement the instrumentation as it’s described in the documentation. Let’s start by installing the dependencies. Metis uses OpenTelemetry to extract details of the REST calls and the SQL queries. No confidential data is sent to the platform because Metis captures the execution plans only.
Once this is done, we need to add the tracing configuration. Add the file src/tracer.js:
We extract the Metis API key, add a processor for OpenTelemtry’s spans, and register the instrumentation for Pg (SQL driver) and HTTP (network calls). Finally, we need to add the environment variables to the .env file:
METIS_API_KEY is the key of your project. METIS_SERVICE_NAME and METIS_SERVICE_VERSION can be whatever you like. These two values are used just to indicate what application sent the trace. You can use it to do versioning, differentiate between developer and production versions, or to integrate that with your CI/CD pipeline.
All the code is ready. We can now enable the instrumentation so our application uses it. Let’s call startMetisInstrumentation in the main entrypoint and in the tests. This way we can capture the performance insights when we use our application locally or when we run the automated test suite. Let’s now start the application and use curl to test the endpoint:
This should return multiple titles. Let’s now go to your Metis project, go to “Recent Activity” tab, and you should see something like this:
You can see Metis captured the call to the titles/ratings/best endpoint. We can see that the endpoint returned HTTP code 200. Let’s now click on this execution and we should get to this screen:
You can see the spans at the top. There are multiple of them as Sequelize configures the database initially before running the first query. The last span indicates our actual SQL query that has been executed. You can click on the SQL tab to see the query text:
You can see the query that Sequelize sent! This is the same query we saw in the logs. However, this time we can analyze it automatically. Go back to the Insights tab and see the following:
Metis shows us that the query reads over a million of rows by scanning the whole table. This is a critical problem because it won’t scale well when we deploy to production. Let’s fix that.
Using Database Monitoring to Improve the SQL Query Performance
Let’s add an index. The code is in the tag Step_5. First, let’s create a migration that we will execute when starting the application. Let’s add src/models/migrations/001_create_title_ratings_indexed.sql:
This will create an index on the table to make sure our query runs faster. Let’s now implement a little bit of code to run this migration. Add this method to src/models/index.js:
Let’s now restart the application and see this:
Great, the migration has been executed. We can now run curl again and go to Metis project to see this:
You can see there is another call, the one we just did. Let’s open it up and see that the query is now fast enough:
You can see that the query reads about seven thousand rows now. This is a great performance improvement.
Improving Other Queries
We have all the pieces in place now. Let’s see how to use the new Database Monitoring and Observability with Sequelize to improve the performance. There are multiple improvements we can do here, so I’ll just show some of them. See the tag Step_6 to see more and read our article How Metis Optimized Queries Executed by Sequelize to get even more ideas.
Let’s revisit the query returning movies for an actor. Run it with curl:
Go to Metis and see that the call has been captured:
There are critical issues. Let’s see them:
We can see the query read 70 million rows. That’s a lot! Let’s see the query text:
This is definitely not something we expected. Let’s try making it faster and introduce an index:
You can add another migration, rerun the application and see that it helps a bit, but is definitely not enough.
Let’s try to understand why Sequelize generated such a weird query. We joined two tables, ordered the rows based on the startyear, and then returned ten rows. However, Sequelize doesn’t know if the join can create duplicates. In that case, Sequelize needs to go with subqueries to calculate the proper ordering. We can fix that by using the duplicating flag:
We can now restart the application and see that this is the query we now get:
Once we add the index and the proper flag for handling duplicates, we get the following performance:
You can see that now it is good enough to be deployed to production.
In this post we saw how to build a Database Monitoring with Sequelize. We saw how to analyze the SQL query performance, how to monitor the database, and how to optimize the application with actual IMDb dataset. This way you can supercharge all your applications and queries before going to production. And if that’s not enough, read our article about Improving Database Performance to handle database issues like a pro.
What is the purpose of using Sequelize ORM with the IMDB dataset?
How can I import and preprocess the IMDB data using Sequelize?
What are the key features of Sequelize for IMDB dataset analysis?
Sequelize lets us configure relations and simplify the way we write queries. We can automatically verify types, existence of columns, or whether the query is going to execute properly at all.