Hypothetical Indexes in PostgreSQL: Enhancing Performance Without the Overhead

Introduction

Indexes are essential for speeding up query execution in PostgreSQL, but determining whether an index will be utilized in the execution plan can be challenging without actually creating it. However, on large tables, index creation can be time-consuming and resource-intensive. This is where virtual, or hypothetical, indexes come into play—they allow you to assess the potential impact of an index without the overhead of actually building it.

A virtual or hypothetical index is a simulated index that allows database administrators to predict query performance without physically creating the index. By analyzing the estimated query plan using this virtual index, they can assess the potential benefits before committing to a real index. This approach optimizes database performance by preventing unnecessary index creation, saving both time and resources.

HypoPG is a PostgreSQL extension that adds support for hypothetical or virtual indexes. Once the extension is created, you can start using it immediately. Hypothetical indexes created within one session are isolated and will not affect other sessions, ensuring they remain non-intrusive to the overall database performance.

HypoPG supports the following access methods for hypothetical indexes:

- - btree
  - brin
  - hash (requires PostgreSQL 10 or later)
  - bloom (requires the bloom extension)

However GIST/GIN Access methods are not supported

Let’s walk through an example of how to use HypoPG to evaluate the potential impact of an index without actually creating it.

Install the HypoPG Extension

First, you need to install the HypoPG extension in your PostgreSQL database:

 yum install hypopg_15 -bash-4.2$ psql psql (15.8) Type "help" for help. postgres=# create extension hypopg; CREATE EXTENSION

yum install hypopg_15

-bash-4.2$ psql

psql (15.8)

Type "help" for help.

postgres=# create extension hypopg;

CREATE EXTENSION

Once installed, HypoPG allows you to create hypothetical indexes and analyze query performance as if the indexes physically exist.

Create a hypothetical index

Create a test table with random data

 postgres=# create table test(id int, name varchar); CREATE TABLE postgres=# insert into test select generate_series(1,1000000),md5(generate_series(1,1000000)::text); INSERT 0 1000000

postgres=# create table test(id int, name varchar);

CREATE TABLE

postgres=# insert into test select generate_series(1,1000000),md5(generate_series(1,1000000)::text);

INSERT 0 1000000

Now below is the execution plan of the simple query without any index on the id column

 postgres=# explain select * from test where id = 1; QUERY PLAN ------------------------------------------------------------------------ Gather (cost=1000.00..14542.43 rows=1 width=37) Workers Planned: 2 -> Parallel Seq Scan on test (cost=0.00..13542.33 rows=1 width=37) Filter: (id = 1) (4 rows)

postgres=# explain select * from test where id = 1;

QUERY PLAN

------------------------------------------------------------------------

Gather (cost=1000.00..14542.43 rows=1 width=37)

Workers Planned: 2

-> Parallel Seq Scan on test (cost=0.00..13542.33 rows=1 width=37)

Filter: (id = 1)

(4 rows)

“Due to the lack of an index on the table, a full table scan is currently being performed, which can be inefficient. Adding a simple B-tree index could significantly improve query performance. To assess this potential optimization without making any changes to the actual database schema, we will use HypoPG. The hypopg_create_index() function enables us to create hypothetical indexes for performance comparison, allowing us to evaluate the impact of the index without altering the database.”

 postgres=# SELECT * FROM hypopg_create_index('CREATE INDEX ON test (id)') ; indexrelid | indexname ------------+---------------------- 14264 | <14264>btree_test_id (1 row)

postgres=# SELECT * FROM hypopg_create_index('CREATE INDEX ON test (id)') ;

indexrelid | indexname

------------+----------------------

14264 | <14264>btree_test_id

(1 row)

The function returns two columns:

the object identifier of the hypothetical index
the generated hypothetical index name

We can run the EXPLAIN again to see if PostgreSQL would use this index

 postgres=# explain select * from test where id = 1; QUERY PLAN ------------------------------------------------------------------------------------ Index Scan using "<14264>btree_test_id" on test (cost=0.05..8.07 rows=1 width=37) Index Cond: (id = 1) (2 rows)

postgres=# explain select * from test where id = 1;

QUERY PLAN

------------------------------------------------------------------------------------

Index Scan using "<14264>btree_test_id" on test (cost=0.05..8.07 rows=1 width=37)

Index Cond: (id = 1)

(2 rows)

As you can see the index is being used in the estimated execution plan of the query, so now users can create this index to reduce the execution time of the query.

Hypopg Functions

Below are some of the functions provided by the hypoPG which are useful for working with hypoPG.

hypopg_list_indexes: view that lists all hypothetical indexes that have been created

 postgres=# select * from hypopg_list_indexes ; indexrelid | index_name | schema_name | table_name | am_name ------------+----------------------+-------------+------------+--------- 14264 | <14264>btree_test_id | public | test | btree (1 row)

postgres=# select * from hypopg_list_indexes ;

indexrelid | index_name | schema_name | table_name | am_name

------------+----------------------+-------------+------------+---------

14264 | <14264>btree_test_id | public | test | btree

(1 row)

hypopg_get_indexdef(oid): function that lists the CREATE INDEX statement that would recreate a stored hypothetical index.

 postgres=# SELECT index_name, hypopg_get_indexdef(indexrelid) FROM hypopg_list_indexes; index_name | hypopg_get_indexdef ----------------------+---------------------------------------------- <14264>btree_test_id | CREATE INDEX ON public.test USING btree (id) (1 row)

postgres=# SELECT index_name, hypopg_get_indexdef(indexrelid) FROM hypopg_list_indexes;

index_name | hypopg_get_indexdef

----------------------+----------------------------------------------

<14264>btree_test_id | CREATE INDEX ON public.test USING btree (id)

(1 row)

hypopg_relation_size(oid): This function allows the users to know the estimated size of the index once it is created.

 postgres=# SELECT index_name, pg_size_pretty(hypopg_relation_size(indexrelid)) FROM hypopg_list_indexes; index_name | pg_size_pretty ----------------------+---------------- <14264>btree_test_id | 25 MB (1 row) postgres=# create index idx_id on test(id); CREATE INDEX postgres=# di+ idx_id List of relations Schema | Name | Type | Owner | Table | Persistence | Access method | Size | Description --------+--------+-------+----------+-------+-------------+---------------+-------+------------- public | idx_id | index | postgres | test | permanent | btree | 21 MB | (1 row)

postgres=# SELECT index_name, pg_size_pretty(hypopg_relation_size(indexrelid)) FROM hypopg_list_indexes;

index_name | pg_size_pretty

----------------------+----------------

<14264>btree_test_id | 25 MB

(1 row)

postgres=# create index idx_id on test(id);

CREATE INDEX

postgres=# di+ idx_id

List of relations

--------+--------+-------+----------+-------+-------------+---------------+-------+-------------

(1 row)

The index size is near to the estimated size of the index.

hypopg_drop_index(oid): function that removes the given hypothetical index
hypopg_reset(): function that removes all hypothetical indexes

Hypothetically hide existing indexes

hypopg_hide_index(oid): function that allows you to hide an index in the EXPLAIN output by using its OID. It returns true if the index was successfully hidden, and false otherwise. This will help to identify the impact of deleting one of the indexes from two indexes which have similar columns.

 postgres=# EXPLAIN SELECT * FROM test WHERE id = 1; QUERY PLAN ------------------------------------------------------------------------------------ Index Scan using "<14264>btree_test_id" on test (cost=0.05..8.07 rows=1 width=37) Index Cond: (id = 1) (2 rows) postgres=# select oid from pg_class where relname='idx_id'; oid ------- 16535 (1 row) postgres=# select hypopg_hide_index(16535); hypopg_hide_index ------------------- t (1 row) postgres=# EXPLAIN SELECT * FROM test WHERE id = 1; QUERY PLAN ------------------------------------------------------------------------ Gather (cost=1000.00..14542.43 rows=1 width=37) Workers Planned: 2 -> Parallel Seq Scan on test (cost=0.00..13542.33 rows=1 width=37) Filter: (id = 1) (4 rows)

postgres=# EXPLAIN SELECT * FROM test WHERE id = 1;

QUERY PLAN

------------------------------------------------------------------------------------

Index Scan using "<14264>btree_test_id" on test (cost=0.05..8.07 rows=1 width=37)

Index Cond: (id = 1)

(2 rows)

postgres=# select oid from pg_class where relname='idx_id';

oid

-------

16535

(1 row)

postgres=# select hypopg_hide_index(16535);

hypopg_hide_index

-------------------

(1 row)

postgres=# EXPLAIN SELECT * FROM test WHERE id = 1;

QUERY PLAN

------------------------------------------------------------------------

Gather (cost=1000.00..14542.43 rows=1 width=37)

Workers Planned: 2

-> Parallel Seq Scan on test (cost=0.00..13542.33 rows=1 width=37)

Filter: (id = 1)

(4 rows)

hypopg_unhide_index(oid): function that restore a previously hidden index in the EXPLAIN output by using its OID. It returns true if the index was successfully restored, and false otherwise.
hypopg_unhide_all_index(): function that restore all hidden indexes and returns void.
hypopg_hidden_indexes(): function that returns a list of OIDs for all hidden indexes.
hypopg_hidden_indexes: view that returns a formatted list of all hidden indexes.

Thanks for Reading