Database indexing

Database Indexing allows us to cut down the number of rows/records that need to be examined when a select query with a where clause is executed.

When a select query with a where clause is executed, every single row is checked to see if the query match the condition. This effectively means that the entire table will have to be scanned (known as the full table scan).

Without Indexing

An index is a data structure that stores the values for a certain specific column of a table and helps us avoid a full table scan. If the table is not indexed, the query will likely not recognize that the row order is optimized, so the query should search the rows linearly. That is, the query must search every row to find a row that matches the criteria. As you can imagine, this can take a long time. Examining every row is not very efficient.

This will take more and more time as the size of the table grows. With the sophistication of data, what can eventually happen is that a table with 1 billion rows is joined with another table with 1 billion rows. The query should spend twice as much time searching twice as many rows.

How can we imporve

If the database tables are in order, we can make search more efficiently without looking up all the records. For example, we are finding a user whose age is 18. If the age column is sorted , we can skip the records of age starting from 18.

In reality, the database table does not sort itself every time the query condition changes to optimize query performance. This is unrealistic. In reality, the index creates the data structure in the database. The data structure type is very likely to be a B-tree. There are many advantages of B-trees, but the main advantage of our purpose is that they are sortable and makes the search more efficiently.

How it works

Database indexes will also store pointers which are simply reference information for the location of the additional information in memory. So to be summarized, it creates a structure with the key of indexed column and the value of pointers which reference the actual database record. When a conditional query runs, it looks up the sorted index structure according to the data structure (B-Tree in this case), get the pointers and then go for the actual records.

Index cost

There is always something you need to give back if you want something. Index also requires additional memory. The more indexes you add, the more memory will be used. Don’t forget to index only the columns that are frequently queried. Otherwise, indexing each column consumes a huge amount of memory.

Secondly, the write operation will be a bit slower as it needs to create a index each time you add an entry to the table. A similar situation applies to updating or deleting data.

I think this will give you the abstraction knowledge of what database indexing looks like.

By Yuuma




tel. 06-6454-8833(平日 10:00~17:00)