Efficiently Storing and Querying JSON Data in PostgreSQL

JSON Data, short for JavaScript Object Notation, is an open standard format that structures data in key-value pairs and arrays. Widely used in web services, JSON serves as a versatile format for data exchange, document storage, and managing unstructured data.

PostgreSQL provides two JSON data types:

JSON: Stores data in raw JSON format, where the data is stored as a plain text string and only validates basic JSON syntax upon insertion.
JSONB: Stores data in a binary format after parsing and validating the JSON structure. This parsed, binary format allows for efficient indexing, faster lookups, and optimized storage, making it ideal for operations that require frequent access or querying of JSON data.

Why JSON data type ?

The JSON data type in PostgreSQL is highly beneficial when dealing with data that doesn’t fit neatly into a fixed schema. It is ideal in cases where:

Input Structure is Unpredictable: JSON is well-suited for capturing data with varying fields and values, as might occur with user-generated content, customer forms, or API responses.

Sparse or Optional Fields: When certain fields are only occasionally used, JSON allows you to store them without rigid schema constraints, saving space and reducing schema complexity.

Mixed Data Types: JSON can handle complex data structures, such as arrays or nested objects, making it easier to work with semi-structured data that might not align with traditional table structures.

Rapid Development and Iteration: JSON’s schema-less format makes it ideal for projects requiring frequent changes, as you can modify fields without altering the database schema.

How to store the Json data

Create the table with jsonb data type and store the data using the regular insert query. Here is an example which generate the random data of 1Million rows. I have used the function “jsonb_build_object” which create the json object from raw data.

jsonb_build_object is a PostgreSQL function that creates a JSONB object from a set of key-value pairs, where each key and value can be specified as separate arguments. This function is particularly useful for constructing JSONB objects dynamically in SQL queries or functions.

Syntax: jsonb_build_object(key1, value1, key2, value2, …)

 postgres=# SELECT jsonb_build_object('name', 'Alice', 'age', 30, 'city', 'New York'); jsonb_build_object -------------------------------------------------- {"age": 30, "city": "New York", "name": "Alice"} (1 row)

postgres=# SELECT jsonb_build_object('name', 'Alice', 'age', 30, 'city', 'New York');

jsonb_build_object

--------------------------------------------------

{"age": 30, "city": "New York", "name": "Alice"}

(1 row)

 postgres=# CREATE TABLE sample_data ( postgres(# id SERIAL PRIMARY KEY, postgres(# data JSONB postgres(# ); CREATE TABLE postgres=# postgres=# INSERT INTO sample_data (data) postgres-# SELECT jsonb_build_object( postgres(# 'name', 'User ' || trunc(random() * 100)::int, postgres(# 'age', trunc(random() * 50 + 20)::int, postgres(# 'address', jsonb_build_object( postgres(# 'street', 'Street ' || trunc(random() * 100)::int, postgres(# 'city', CASE WHEN random() < 0.5 THEN 'City A' ELSE 'City B' END, postgres(# 'zip', trunc(random() * 90000 + 10000)::int postgres(# ), postgres(# 'is_active', random() < 0.5 postgres(# ) postgres-# FROM generate_series(1, 1000000); INSERT 0 1000000 postgres=#

postgres=# CREATE TABLE sample_data (

postgres(# id SERIAL PRIMARY KEY,

postgres(# data JSONB

postgres(# );

CREATE TABLE

postgres=#

postgres=# INSERT INTO sample_data (data)

postgres-# SELECT jsonb_build_object(

postgres(# 'name', 'User ' || trunc(random() * 100)::int,

postgres(# 'age', trunc(random() * 50 + 20)::int,

postgres(# 'address', jsonb_build_object(

postgres(# 'street', 'Street ' || trunc(random() * 100)::int,

postgres(# 'city', CASE WHEN random() < 0.5 THEN 'City A' ELSE 'City B' END,

postgres(# 'zip', trunc(random() * 90000 + 10000)::int

postgres(# ),

postgres(# 'is_active', random() < 0.5

postgres(# )

postgres-# FROM generate_series(1, 1000000);

INSERT 0 1000000

postgres=#

How to query the json data

JSONB supports various operators which are helpful to query the json data.

Existence Operators ( ?, ?&, ?|)

Existence Operator (?) is used to verify if a particular field exists at the top place or not. Here is an example where to find the employees who has email field.

 postgres=# INSERT INTO employee (data) postgres-# VALUES ('{ postgres'# "id": 1, postgres'# "name": "Venkat", postgres'# "age": 30, postgres'# "address": { postgres'# "street": "Stree 1", postgres'# "city": "Hyderabad", postgres'# "zip": "500049" postgres'# }, postgres'# "created_at": "2024-11-09T12:34:56Z" postgres'# }'::jsonb); INSERT 0 1 postgres=# postgres=# postgres=# select * from employee where data ? 'email'; data -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- {"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3 4:56Z"} (1 row)

postgres=# INSERT INTO employee (data)

postgres-# VALUES ('{

postgres'# "id": 1,

postgres'# "name": "Venkat",

postgres'# "age": 30,

postgres'# "address": {

postgres'# "street": "Stree 1",

postgres'# "city": "Hyderabad",

postgres'# "zip": "500049"

postgres'# },

postgres'# "created_at": "2024-11-09T12:34:56Z"

postgres'# }'::jsonb);

INSERT 0 1

postgres=#

postgres=# select * from employee where data ? 'email';

data

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------

{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3

4:56Z"}

(1 row)

?& is a variant of ? Which is used to find if all fields in the given array are present at the root level in the json string.

 postgres=# select * from employee where data ?& array['email','name']; data -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- {"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3 4:56Z"} (1 row)

postgres=# select * from employee where data ?& array['email','name'];

data

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------

{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3

4:56Z"}

(1 row)

This is another variant of ? Which is used to find the rows which has at least one of the field in the array at the root level in json string.

 postgres=# select * from employee where data ?| array['email','name']; data -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- {"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3 4:56Z"} {"id": 2, "age": 27, "name": "Venkat", "address": {"zip": "500049", "city": "Hyderabad", "street": "Stree 1"}, "created_at": "2024-11-10T12:34:56Z"} (2 rows)

postgres=# select * from employee where data ?| array['email','name'];

data

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------

{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3

4:56Z"}

{"id": 2, "age": 27, "name": "Venkat", "address": {"zip": "500049", "city": "Hyderabad", "street": "Stree 1"}, "created_at": "2024-11-10T12:34:56Z"}

(2 rows)

Extract Operators (-> & ->> , #> & #>>)

-> operator will extract the fields and give the output in jsonb format.

 postgres=# select data -> 'id' as ID, data -> 'name' as NAME from employee ; id | name ----+----------------- 1 | "Anurag Shukla" (1 row) postgres=# select pg_typeof(data -> 'id') as ID, pg_typeof(data -> 'name') as NAME from employee ; id | name -------+------- jsonb | jsonb (1 row)

postgres=# select data -> 'id' as ID, data -> 'name' as NAME from employee ;

id | name

----+-----------------

1 | "Anurag Shukla"

(1 row)

postgres=# select pg_typeof(data -> 'id') as ID, pg_typeof(data -> 'name') as NAME from employee ;

id | name

-------+-------

jsonb | jsonb

(1 row)

->> operator will extract the data in string format

 postgres=# select data ->> 'id' as ID, data ->> 'name' as NAME from employee ; id | name ----+--------------- 1 | Anurag Shukla (1 row) postgres=# select pg_typeof(data ->> 'id') as ID, pg_typeof(data ->> 'name') as NAME from employee ; id | name ------+------ text | text (1 row)

postgres=# select data ->> 'id' as ID, data ->> 'name' as NAME from employee ;

id | name

----+---------------

1 | Anurag Shukla

(1 row)

postgres=# select pg_typeof(data ->> 'id') as ID, pg_typeof(data ->> 'name') as NAME from employee ;

id | name

------+------

text | text

(1 row)

This operators (#>, #>>) are to get the data at the given path. The “#>” will get output in jsonb format and the “#>>” will get the data in text format.

 postgres=# select data -> 'id' as ID , data -> 'name' as NAME , data #> '{address,city}' as CITY from employee; id | name | city ----+-----------------+------------- 1 | "Anurag Shukla" | "Hyderabad" 2 | "Venkat" | "Hyderabad" (2 rows) postgres=# select data ->> 'id' as ID , data ->> 'name' as NAME , data #>> '{address,city}' as CITY from employee; id | name | city ----+---------------+----------- 1 | Anurag Shukla | Hyderabad 2 | Venkat | Hyderabad (2 rows)

postgres=# select data -> 'id' as ID , data -> 'name' as NAME , data #> '{address,city}' as CITY from employee;

id | name | city

----+-----------------+-------------

1 | "Anurag Shukla" | "Hyderabad"

2 | "Venkat" | "Hyderabad"

(2 rows)

postgres=# select data ->> 'id' as ID , data ->> 'name' as NAME , data #>> '{address,city}' as CITY from employee;

id | name | city

----+---------------+-----------

1 | Anurag Shukla | Hyderabad

2 | Venkat | Hyderabad

(2 rows)

Contain operators (@> , <@)

This is used to verify if a json string is present in another json string. Here is an example where verifying if the {“id”: 1} is present in the data column.

 postgres=# select * from employee where '{"id": 1}'::jsonb <@ data; data -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- {"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3 4:56Z"} (1 row) postgres=# select * from employee where data @> '{"id": 1}'::jsonb; data -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------- {"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3 4:56Z"} (1 row)

postgres=# select * from employee where '{"id": 1}'::jsonb <@ data;

data

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------

{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3

4:56Z"}

(1 row)

postgres=# select * from employee where data @> '{"id": 1}'::jsonb;

data

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--------

{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:3

4:56Z"}

(1 row)

jsonb_path_query

This is a new feature added in PostgreSQL 12 version and it makes PostgreSQL is more adheres to the SQL standard 2016 and provides below functions which makes the data processing ease.

jsonb_path_exists
jsonb_path_match
jsonb_path_query_array
jsonb_path_query_first

Indexing JSON columns

B-TREE

This can handle only equality operator

 postgres=# explain select * from employee where data='{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:34:56Z"}'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------- Bitmap Heap Scan on employee (cost=84.83..229.56 rows=39 width=32) Recheck Cond: (data = '{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "crea ted_at": "2024-11-09T12:34:56Z"}'::jsonb) -> Bitmap Index Scan on data_btree_idx (cost=0.00..84.83 rows=39 width=0) Index Cond: (data = '{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, " created_at": "2024-11-09T12:34:56Z"}'::jsonb) (4 rows)

postgres=# explain select * from employee where data='{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "created_at": "2024-11-09T12:34:56Z"}';

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------------------------------------------

Bitmap Heap Scan on employee (cost=84.83..229.56 rows=39 width=32)

Recheck Cond: (data = '{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "crea

ted_at": "2024-11-09T12:34:56Z"}'::jsonb)

-> Bitmap Index Scan on data_btree_idx (cost=0.00..84.83 rows=39 width=0)

Index Cond: (data = '{"id": 1, "age": 30, "name": "Anurag Shukla", "email": "anuragshukla@bynatree.com", "address": {"zip": "500049", "city": "Hyderabad", "street": "Street 1"}, "

created_at": "2024-11-09T12:34:56Z"}'::jsonb)

(4 rows)

If you try to use other operators it will not use the index.

 postgres=# explain select * from employee where data ->> 'id' = '1'; QUERY PLAN ------------------------------------------------------------------------------ Gather (cost=1000.00..5740.70 rows=655 width=205) Workers Planned: 2 -> Parallel Seq Scan on employee (cost=0.00..4675.20 rows=273 width=205) Filter: ((data ->> 'id'::text) = '1'::text) (4 rows)

postgres=# explain select * from employee where data ->> 'id' = '1';

QUERY PLAN

------------------------------------------------------------------------------

Gather (cost=1000.00..5740.70 rows=655 width=205)

Workers Planned: 2

-> Parallel Seq Scan on employee (cost=0.00..4675.20 rows=273 width=205)

Filter: ((data ->> 'id'::text) = '1'::text)

(4 rows)

But B-Tree indexes can be created on specific fields of a json column using functional index.

 postgres=# create index id_idx on employee ((data->>'id')); CREATE INDEX

1 2	postgres=# create index id_idx on employee ((data->>'id')); CREATE INDEX

HASH Index

HASH index can be used to index json column which is similar to B-Tree index. This also works only for equality operator.

For both B-Tree and HASH indexes like operator will not work.

GIN Index

GIN index supports below operator classes and each operator class supports various operations which can be indexed. Jsonb_ops, jsonb_path_ops are the most widely used operator classes.

Jsonb_path_ops covers the containment operator (@).

 postgres=# create index idx_jsonb_path_ops on employee using GIN (data jsonb_path_ops); CREATE INDEX postgres=# explain select * from employee where '{"id": 1}'::jsonb <@ data; QUERY PLAN --------------------------------------------------------------------------------------- Bitmap Heap Scan on employee (cost=445.17..5114.91 rows=65099 width=205) Recheck Cond: ('{"id": 1}'::jsonb <@ data) -> Bitmap Index Scan on idx_jsonb_path_ops (cost=0.00..428.89 rows=65099 width=0) Index Cond: (data @> '{"id": 1}'::jsonb) (4 rows)

postgres=# create index idx_jsonb_path_ops on employee using GIN (data jsonb_path_ops);

CREATE INDEX

postgres=# explain select * from employee where '{"id": 1}'::jsonb <@ data;

QUERY PLAN

---------------------------------------------------------------------------------------

Bitmap Heap Scan on employee (cost=445.17..5114.91 rows=65099 width=205)

Recheck Cond: ('{"id": 1}'::jsonb <@ data)

-> Bitmap Index Scan on idx_jsonb_path_ops (cost=0.00..428.89 rows=65099 width=0)

Index Cond: (data @> '{"id": 1}'::jsonb)

(4 rows)

Jsonb_ops supports containment (@) and existence operator (?).

 postgres=# create index idx_jsonb_ops on employee using GIN (data jsonb_ops); CREATE INDEX postgres=# explain select * from employee where data ? 'email'; QUERY PLAN ---------------------------------------------------------------------------------- Bitmap Heap Scan on employee (cost=445.17..5114.91 rows=65099 width=205) Recheck Cond: (data ? 'email'::text) -> Bitmap Index Scan on idx_jsonb_ops (cost=0.00..428.90 rows=65099 width=0) Index Cond: (data ? 'email'::text) (4 rows) postgres=# explain select * from employee where data ?& array['email','name']; QUERY PLAN ---------------------------------------------------------------------------------- Bitmap Heap Scan on employee (cost=471.26..5143.51 rows=65300 width=205) Recheck Cond: (data ?& '{email,name}'::text[]) -> Bitmap Index Scan on idx_jsonb_ops (cost=0.00..454.93 rows=65300 width=0) Index Cond: (data ?& '{email,name}'::text[]) (4 rows)

postgres=# create index idx_jsonb_ops on employee using GIN (data jsonb_ops);

CREATE INDEX

postgres=# explain select * from employee where data ? 'email';

QUERY PLAN

----------------------------------------------------------------------------------

Bitmap Heap Scan on employee (cost=445.17..5114.91 rows=65099 width=205)

Recheck Cond: (data ? 'email'::text)

-> Bitmap Index Scan on idx_jsonb_ops (cost=0.00..428.90 rows=65099 width=0)

Index Cond: (data ? 'email'::text)

(4 rows)

postgres=# explain select * from employee where data ?& array['email','name'];

QUERY PLAN

----------------------------------------------------------------------------------

Bitmap Heap Scan on employee (cost=471.26..5143.51 rows=65300 width=205)

Recheck Cond: (data ?& '{email,name}'::text[])

-> Bitmap Index Scan on idx_jsonb_ops (cost=0.00..454.93 rows=65300 width=0)

Index Cond: (data ?& '{email,name}'::text[])

(4 rows)

Limitations of Json data type

While JSON data types (particularly JSONB) in PostgreSQL offer flexibility, there are several limitations to consider:

Lack of Detailed Statistics: PostgreSQL cannot capture statistics for individual fields within JSONB columns, which can lead the query planner to make suboptimal decisions. As a result, queries on less frequent fields may be slower than expected due to inaccurate cost estimates.
Increased Storage Usage: JSONB doesn’t deduplicate keys, so if the same key appears repeatedly across JSONB documents, each instance consumes storage. This can lead to higher storage costs, especially in large datasets with many repeated keys.
Indexing Overheads: While JSONB supports indexing on specific fields, adding many indexes to JSONB fields can increase maintenance overhead. Index updates can become costly in terms of storage and performance, particularly for frequently updated data.
Performance Impacts on Large JSON Data: Although JSONB is designed for better performance, querying or updating large JSON documents can be slower than equivalent operations on traditional columns due to the need to parse and process the entire document.

Why JSON data type ?

How to store the Json data

How to query the json data

Existence Operators ( ?, ?&, ?|)

Extract Operators (-> & ->> , #> & #>>)

Contain operators (@> , <@)

jsonb_path_query

Indexing JSON columns

B-TREE

HASH Index

GIN Index

Limitations of Json data type

Author: BYNATREE DATA SOLUTION

Leave a Reply Cancel reply

Efficiently Storing and Querying JSON Data in PostgreSQL

Why JSON data type ?

How to store the Json data

How to query the json data

Existence Operators ( ?, ?&, ?|)

Extract Operators (-> & ->> , #> & #>>)

Contain operators (@> , <@)

jsonb_path_query

Indexing JSON columns

B-TREE

HASH Index

GIN Index

Limitations of Json data type

Author: BYNATREE DATA SOLUTION

Related posts

Leave a Reply Cancel reply