Thanks to the input from @FrankHeikens, i found the random_page_cost
parameter. Indeed, lowering that value (to tell postgres its not that expensive to access random parts of the disk) makes postgres use the index.
I found that in my example on my system, for 10M rows, the value needs to be 1.15 or lower.
ALTER TABLESPACE pg_default SET (random_page_cost=1.1);
EXPLAIN ANALYZE SELECT COUNT(*) FROM auction_jsonb_indexed WHERE item->>'author' = 'Author 1';
Aggregate (cost=10906.93..10906.94 rows=1 width=8) (actual time=4.152..4.154 rows=1 loops=1)
-> Bitmap Heap Scan on auction_jsonb_indexed (cost=87.60..10882.00 rows=9970 width=0) (actual time=2.803..3.858 rows=10000 loops=1)
Recheck Cond: ((item ->> 'author'::text) = 'Author 1'::text)
Heap Blocks: exact=10000
-> Bitmap Index Scan on jsonb_author (cost=0.00..85.11 rows=9970 width=0) (actual time=1.465..1.466 rows=10000 loops=1)
Index Cond: ((item ->> 'author'::text) = 'Author 1'::text)
Planning Time: 0.076 ms
Execution Time: 4.271 ms
(Btw: Note that the number of different values in the indexed field also changes the plan. If I create the table with cnt % 10
, a slightly different plan is used. It still benefits from the index, but is considerably slower.)