Tuning pgvector Query Accuracy
Improve pgvector search accuracy by tuning HNSW ef_search and IVFFlat probes — with a tutorial measuring recall vs. speed tradeoffs.
Postgres
This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.
pgvector’s HNSW and IVFFlat indexes are approximate nearest neighbor (ANN) algorithms. They trade some accuracy for speed—out of the box, you might be missing some of the most relevant results. The good news: both indexes expose parameters that let you tune this tradeoff at query time without rebuilding anything.
In this post, we’ll cover:
- Understanding recall - What it means and why it matters
- HNSW ef_search parameter - Tuning candidate list size during search
- IVFFlat probes parameter - Controlling how many clusters to search
- Tutorial - Measuring recall vs speed with different parameter values
Understanding recall
Recall measures what percentage of true nearest neighbors your search actually finds. If the 10 closest vectors to your query are A, B, C, D, E, F, G, H, I, J, and your search returns A, B, C, D, E, F, G, X, Y, Z, your recall@10 is 70%.
- recall@10 = 100%: Perfect accuracy, found all true neighbors
- recall@10 = 95%: Missed half a result on average
- recall@10 = 80%: Missing 2 results out of 10
For semantic search, 95%+ recall is usually sufficient. For applications like deduplication or exact matching, you might need higher.
HNSW: The ef_search parameter
HNSW (Hierarchical Navigable Small World) maintains a multi-layer graph. During search, it traverses the graph starting from the top layer, keeping track of the best candidates found. The ef_search parameter controls how many candidates to track.
Default value: 40
-- Check current value
SHOW hnsw.ef_search;
-- Increase for better recall (at the cost of speed)
SET hnsw.ef_search = 100;
-- Decrease for faster queries (at the cost of recall)
SET hnsw.ef_search = 20;
The relationship between ef_search and recall is roughly logarithmic—doubling ef_search doesn’t double your recall, but it does roughly double query time.
Important constraint: ef_search must be at least as large as the number of results you’re requesting. If you set ef_search = 40 but ask for LIMIT 50, you’ll only get 40 rows.
IVFFlat: The probes parameter
IVFFlat (Inverted File with Flat compression) divides vectors into clusters (called “lists”). During search, it checks only the clusters closest to your query vector. The probes parameter controls how many clusters to search.
Default value: 1
-- Check current value
SHOW ivfflat.probes;
-- Increase for better recall
SET ivfflat.probes = 10;
With probes = 1, IVFFlat only searches the single closest cluster. If your query vector sits near a cluster boundary, the true nearest neighbors might be in an adjacent cluster that never gets searched.
Rule of thumb: Start with probes = sqrt(lists) where lists is the number of clusters in your index. For an index with 100 lists, try probes = 10.
See what QueryPlane can build for you
Connect to your database, write SQL with AI, and build shareable apps — all from your browser.
Tutorial: Measuring recall vs speed
This tutorial demonstrates how changing these parameters affects both recall and query speed.
Prerequisites
- Docker installed and running
- A terminal
Step 1: Start PostgreSQL with pgvector
docker run -d \
--name postgres-pgvector-tuning \
-e POSTGRES_PASSWORD=postgres \
-p 5432:5432 \
pgvector/pgvector:pg16
Step 2: Connect and set up test data
docker exec -it postgres-pgvector-tuning psql -U postgres
CREATE EXTENSION IF NOT EXISTS vector;
-- Create a table with 50,000 random 128-dimensional vectors
CREATE TABLE items (
id BIGSERIAL PRIMARY KEY,
embedding vector(128)
);
INSERT INTO items (embedding)
SELECT
('[' || array_to_string(ARRAY(
SELECT (random())::float4
FROM generate_series(1, 128)
), ',') || ']')::vector(128)
FROM generate_series(1, 50000);
Step 3: Create both index types
-- HNSW index
CREATE INDEX items_hnsw_idx ON items
USING hnsw (embedding vector_cosine_ops);
-- IVFFlat index with 100 lists
CREATE INDEX items_ivfflat_idx ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
Step 4: Get ground truth (exact results)
First, pick a query vector and find the true 10 nearest neighbors using exact search:
-- Disable index usage to get exact results
SET enable_indexscan = off;
SET enable_bitmapscan = off;
-- Store exact results for comparison
CREATE TEMP TABLE exact_results AS
SELECT id, embedding <=> (SELECT embedding FROM items WHERE id = 1) AS distance
FROM items
WHERE id != 1
ORDER BY distance
LIMIT 10;
-- Re-enable indexes
SET enable_indexscan = on;
SET enable_bitmapscan = on;
SELECT * FROM exact_results;
Step 5: Test HNSW with different ef_search values
-- Force use of HNSW index
DROP INDEX items_ivfflat_idx;
-- Test with default ef_search = 40
SET hnsw.ef_search = 40;
EXPLAIN ANALYZE
SELECT id
FROM items
WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;
-- Check how many exact results we found
SELECT COUNT(*) AS recall_count
FROM (
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);
Now try different values:
-- Lower ef_search (faster, less accurate)
SET hnsw.ef_search = 10;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;
-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);
-- Higher ef_search (slower, more accurate)
SET hnsw.ef_search = 200;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;
-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);
Step 6: Test IVFFlat with different probes values
-- Recreate IVFFlat index
CREATE INDEX items_ivfflat_idx ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Drop HNSW to force IVFFlat usage
DROP INDEX items_hnsw_idx;
-- Test with default probes = 1
SET ivfflat.probes = 1;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;
-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);
-- Higher probes
SET ivfflat.probes = 10;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;
-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);
Expected results
You should see query times increase as you raise the accuracy parameters:
| Index | Parameter | Relative Speed |
|---|---|---|
| HNSW | ef_search=10 | Fastest |
| HNSW | ef_search=40 (default) | Moderate |
| HNSW | ef_search=200 | Slower |
| IVFFlat | probes=1 (default) | Fastest |
| IVFFlat | probes=10 | Moderate |
| IVFFlat | probes=50 | Slower |
Note on recall with random data: The tutorial uses random vectors for simplicity. With uniformly random vectors, all points are roughly equidistant, so recall measurements won’t be meaningful. With real embeddings (which have structure and clusters), you’ll see clear recall improvements as you increase ef_search or probes.
In production with real embeddings, expect:
- HNSW: 95%+ recall at default settings, 99%+ with ef_search=200
- IVFFlat: 60-80% recall with probes=1, 95%+ with probes=10
Setting parameters for your session
You can set these parameters at different scopes:
-- Session level (resets on disconnect)
SET hnsw.ef_search = 100;
-- Transaction level (resets after commit/rollback)
SET LOCAL hnsw.ef_search = 100;
-- For a single query using a function
CREATE OR REPLACE FUNCTION search_high_recall(query_embedding vector(128))
RETURNS TABLE(id bigint, distance float) AS $$
BEGIN
SET LOCAL hnsw.ef_search = 200;
RETURN QUERY
SELECT items.id, items.embedding <=> query_embedding AS distance
FROM items
ORDER BY items.embedding <=> query_embedding
LIMIT 10;
END;
$$ LANGUAGE plpgsql;
Wrapping up
pgvector’s approximate indexes trade accuracy for speed, but you control the tradeoff:
- HNSW: Increase
ef_searchfor better recall (default 40, try 100-200 for critical searches) - IVFFlat: Increase
probesrelative to your list count (start withsqrt(lists)) - Set parameters at session or transaction level to tune per-query
For most semantic search applications, the defaults provide good results. Tune these parameters when you need higher recall for specific use cases like deduplication or exact matching.
Cleanup
docker stop postgres-pgvector-tuning
docker rm postgres-pgvector-tuning