TL;DR
This tutorial explains how to create and manage indexes to optimize query performance in SQL. Learn how to:
- Create indexes on frequently queried columns to speed up searches.
- Use
EXPLAIN
to analyze query performance.
- Choose the right columns for indexing, with examples using the
persons
,houses
, andstreet
tables.
1. What is an Index?
An index is a special lookup table that the database uses to speed up data retrieval. Instead of scanning the entire table, the database can quickly locate the rows based on the indexed column.
Indexes work similarly to an index in a book, helping you quickly find the information you need without scanning every page.
2. Creating an Index
To create an index on a column, you use the
CREATE INDEX
statement. Let’s say you frequently query the persons
table by lastname
. You can create an index on the lastname
column to make searches faster:CREATE INDEX idx_lastname ON persons(lastname);
This creates an index called
idx_lastname
on the lastname
column in the persons
table. The next time you search for a person by lastname
, the query will be faster because the database can use the index to quickly locate the rows.3. Indexing Multiple Columns
Sometimes, you may query data based on multiple columns. In that case, you can create a multi-column index. For example, if you often search for people by both
firstname
and lastname
:CREATE INDEX idx_firstname_lastname ON persons(firstname, lastname);
This index improves performance for queries that filter based on both
firstname
and lastname
together.4. Choosing the Right Columns to Index
Not all columns should be indexed. Indexes improve query performance, but they also have overhead: they take up disk space and slow down
INSERT
, UPDATE
, and DELETE
operations because the index needs to be updated. Here are some general guidelines for choosing columns to index:- Frequently Queried Columns: Index columns that are used often in
WHERE
clauses, such aslastname
in thepersons
table.
- Join Columns: Index columns that are used to join tables. For example, the
house_id
column is used to join thepersons
andhouses
tables, so it would be a good candidate for indexing:
CREATE INDEX idx_house_id ON persons(house_id);
- Foreign Keys: Index foreign key columns to speed up joins and lookups. Since
house_id
in both thepersons
andstreet
tables references theid
column in thehouses
table, it’s efficient to index these foreign key columns.
CREATE INDEX idx_street_house_id ON street(house_id); CREATE INDEX idx_persons_house_id ON persons(house_id);
5. Using EXPLAIN
to Analyze Query Performance
You can use the
EXPLAIN
statement to see how your query will be executed by the database and whether it is using an index. This is helpful for identifying whether a query will benefit from indexing.Example: Analyze a Query
Let’s analyze a query that retrieves people living in a house on
Elm Street
:EXPLAIN SELECT p.firstname, p.lastname FROM persons p JOIN houses h ON p.house_id = h.id JOIN street s ON s.house_id = h.id WHERE s.name = 'Elm Street';
The output of
EXPLAIN
will show you whether indexes are being used for the joins or the WHERE
clause. If the query isn't using an index, it may scan the entire table (a sequential scan), which is slower for large datasets.6. Dropping an Index
If an index is no longer needed or if it’s causing performance issues during data modifications, you can drop it using the
DROP INDEX
statement:DROP INDEX idx_lastname;
This removes the
idx_lastname
index from the persons
table.7. Practical Example: Improving Query Performance in Your Schema
Let’s say you have the following query that retrieves all people living in
Los Angeles
:SELECT firstname, lastname FROM persons WHERE city = 'Los Angeles';
You can create an index on the
city
column in the persons
table to improve the performance of this query:CREATE INDEX idx_persons_city ON persons(city);
Now, whenever you search for persons in a specific city, the query will be faster because it uses the index on
city
.8. When Not to Use Indexes
Indexes are useful, but there are situations where indexing may not be beneficial:
- Small tables: For very small tables, indexes might not improve performance since scanning the entire table is quick.
- Columns with few unique values: Columns with a small number of distinct values (e.g., a
sex
column with onlyM
andF
values) are not good candidates for indexing, as the database still has to scan many rows.