dbx

Indexing for more efficient queries

By dbx on 2024-09-19

 

TL;DR

This tutorial explains how to create and manage indexes to optimize query performance in SQL. Learn how to:
  • Create indexes on frequently queried columns to speed up searches.
  • Use EXPLAIN to analyze query performance.
  • Choose the right columns for indexing, with examples using the persons, houses, and street tables.

1. What is an Index?

An index is a special lookup table that the database uses to speed up data retrieval. Instead of scanning the entire table, the database can quickly locate the rows based on the indexed column.
Indexes work similarly to an index in a book, helping you quickly find the information you need without scanning every page.

2. Creating an Index

To create an index on a column, you use the CREATE INDEX statement. Let’s say you frequently query the persons table by lastname. You can create an index on the lastname column to make searches faster:
CREATE INDEX idx_lastname ON persons(lastname);
This creates an index called idx_lastname on the lastname column in the persons table. The next time you search for a person by lastname, the query will be faster because the database can use the index to quickly locate the rows.

3. Indexing Multiple Columns

Sometimes, you may query data based on multiple columns. In that case, you can create a multi-column index. For example, if you often search for people by both firstname and lastname:
CREATE INDEX idx_firstname_lastname ON persons(firstname, lastname);
This index improves performance for queries that filter based on both firstname and lastname together.

4. Choosing the Right Columns to Index

Not all columns should be indexed. Indexes improve query performance, but they also have overhead: they take up disk space and slow down INSERT, UPDATE, and DELETE operations because the index needs to be updated. Here are some general guidelines for choosing columns to index:
  • Frequently Queried Columns: Index columns that are used often in WHERE clauses, such as lastname in the persons table.
  • Join Columns: Index columns that are used to join tables. For example, the house_id column is used to join the persons and houses tables, so it would be a good candidate for indexing:
    • CREATE INDEX idx_house_id ON persons(house_id);
  • Foreign Keys: Index foreign key columns to speed up joins and lookups. Since house_id in both the persons and street tables references the id column in the houses table, it’s efficient to index these foreign key columns.
    • CREATE INDEX idx_street_house_id ON street(house_id); CREATE INDEX idx_persons_house_id ON persons(house_id);

5. Using EXPLAIN to Analyze Query Performance

You can use the EXPLAIN statement to see how your query will be executed by the database and whether it is using an index. This is helpful for identifying whether a query will benefit from indexing.

Example: Analyze a Query

Let’s analyze a query that retrieves people living in a house on Elm Street:
EXPLAIN SELECT p.firstname, p.lastname FROM persons p JOIN houses h ON p.house_id = h.id JOIN street s ON s.house_id = h.id WHERE s.name = 'Elm Street';
The output of EXPLAIN will show you whether indexes are being used for the joins or the WHERE clause. If the query isn't using an index, it may scan the entire table (a sequential scan), which is slower for large datasets.

6. Dropping an Index

If an index is no longer needed or if it’s causing performance issues during data modifications, you can drop it using the DROP INDEX statement:
DROP INDEX idx_lastname;
This removes the idx_lastname index from the persons table.

7. Practical Example: Improving Query Performance in Your Schema

Let’s say you have the following query that retrieves all people living in Los Angeles:
SELECT firstname, lastname FROM persons WHERE city = 'Los Angeles';
You can create an index on the city column in the persons table to improve the performance of this query:
CREATE INDEX idx_persons_city ON persons(city);
Now, whenever you search for persons in a specific city, the query will be faster because it uses the index on city.

8. When Not to Use Indexes

Indexes are useful, but there are situations where indexing may not be beneficial:
  • Small tables: For very small tables, indexes might not improve performance since scanning the entire table is quick.
  • Columns with few unique values: Columns with a small number of distinct values (e.g., a sex column with only M and F values) are not good candidates for indexing, as the database still has to scan many rows.