Tuning pgvector Query Accuracy

Improve pgvector search accuracy by tuning HNSW ef_search and IVFFlat probes — with a tutorial measuring recall vs. speed tradeoffs.

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.

pgvector’s HNSW and IVFFlat indexes are approximate nearest neighbor (ANN) algorithms. They trade some accuracy for speed—out of the box, you might be missing some of the most relevant results. The good news: both indexes expose parameters that let you tune this tradeoff at query time without rebuilding anything.

In this post, we’ll cover:

Understanding recall - What it means and why it matters
HNSW ef_search parameter - Tuning candidate list size during search
IVFFlat probes parameter - Controlling how many clusters to search
Tutorial - Measuring recall vs speed with different parameter values

Understanding recall

Recall measures what percentage of true nearest neighbors your search actually finds. If the 10 closest vectors to your query are A, B, C, D, E, F, G, H, I, J, and your search returns A, B, C, D, E, F, G, X, Y, Z, your recall@10 is 70%.

recall@10 = 100%: Perfect accuracy, found all true neighbors
recall@10 = 95%: Missed half a result on average
recall@10 = 80%: Missing 2 results out of 10

For semantic search, 95%+ recall is usually sufficient. For applications like deduplication or exact matching, you might need higher.

HNSW: The ef_search parameter

HNSW (Hierarchical Navigable Small World) maintains a multi-layer graph. During search, it traverses the graph starting from the top layer, keeping track of the best candidates found. The ef_search parameter controls how many candidates to track.

Default value: 40

-- Check current value
SHOW hnsw.ef_search;

-- Increase for better recall (at the cost of speed)
SET hnsw.ef_search = 100;

-- Decrease for faster queries (at the cost of recall)
SET hnsw.ef_search = 20;

The relationship between ef_search and recall is roughly logarithmic—doubling ef_search doesn’t double your recall, but it does roughly double query time.

Important constraint: ef_search must be at least as large as the number of results you’re requesting. If you set ef_search = 40 but ask for LIMIT 50, you’ll only get 40 rows.

IVFFlat: The probes parameter

IVFFlat (Inverted File with Flat compression) divides vectors into clusters (called “lists”). During search, it checks only the clusters closest to your query vector. The probes parameter controls how many clusters to search.

Default value: 1

-- Check current value
SHOW ivfflat.probes;

-- Increase for better recall
SET ivfflat.probes = 10;

With probes = 1, IVFFlat only searches the single closest cluster. If your query vector sits near a cluster boundary, the true nearest neighbors might be in an adjacent cluster that never gets searched.

Rule of thumb: Start with probes = sqrt(lists) where lists is the number of clusters in your index. For an index with 100 lists, try probes = 10.

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Get Started Book a Demo

Tutorial: Measuring recall vs speed

This tutorial demonstrates how changing these parameters affects both recall and query speed.

Prerequisites

Docker installed and running
A terminal

Step 1: Start PostgreSQL with pgvector

docker run -d \
  --name postgres-pgvector-tuning \
  -e POSTGRES_PASSWORD=postgres \
  -p 5432:5432 \
  pgvector/pgvector:pg16

Step 2: Connect and set up test data

docker exec -it postgres-pgvector-tuning psql -U postgres

CREATE EXTENSION IF NOT EXISTS vector;

-- Create a table with 50,000 random 128-dimensional vectors
CREATE TABLE items (
  id BIGSERIAL PRIMARY KEY,
  embedding vector(128)
);

INSERT INTO items (embedding)
SELECT
  ('[' || array_to_string(ARRAY(
    SELECT (random())::float4
    FROM generate_series(1, 128)
  ), ',') || ']')::vector(128)
FROM generate_series(1, 50000);

Step 3: Create both index types

-- HNSW index
CREATE INDEX items_hnsw_idx ON items
USING hnsw (embedding vector_cosine_ops);

-- IVFFlat index with 100 lists
CREATE INDEX items_ivfflat_idx ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

Step 4: Get ground truth (exact results)

First, pick a query vector and find the true 10 nearest neighbors using exact search:

-- Disable index usage to get exact results
SET enable_indexscan = off;
SET enable_bitmapscan = off;

-- Store exact results for comparison
CREATE TEMP TABLE exact_results AS
SELECT id, embedding <=> (SELECT embedding FROM items WHERE id = 1) AS distance
FROM items
WHERE id != 1
ORDER BY distance
LIMIT 10;

-- Re-enable indexes
SET enable_indexscan = on;
SET enable_bitmapscan = on;

SELECT * FROM exact_results;

Step 5: Test HNSW with different ef_search values

-- Force use of HNSW index
DROP INDEX items_ivfflat_idx;

-- Test with default ef_search = 40
SET hnsw.ef_search = 40;
EXPLAIN ANALYZE
SELECT id
FROM items
WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;

-- Check how many exact results we found
SELECT COUNT(*) AS recall_count
FROM (
  SELECT id FROM items WHERE id != 1
  ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
  LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);

Now try different values:

-- Lower ef_search (faster, less accurate)
SET hnsw.ef_search = 10;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;

-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
  SELECT id FROM items WHERE id != 1
  ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
  LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);

-- Higher ef_search (slower, more accurate)
SET hnsw.ef_search = 200;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;

-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
  SELECT id FROM items WHERE id != 1
  ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
  LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);

Step 6: Test IVFFlat with different probes values

-- Recreate IVFFlat index
CREATE INDEX items_ivfflat_idx ON items
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Drop HNSW to force IVFFlat usage
DROP INDEX items_hnsw_idx;

-- Test with default probes = 1
SET ivfflat.probes = 1;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;

-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
  SELECT id FROM items WHERE id != 1
  ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
  LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);

-- Higher probes
SET ivfflat.probes = 10;
EXPLAIN ANALYZE
SELECT id FROM items WHERE id != 1
ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
LIMIT 10;

-- Check recall
SELECT COUNT(*) AS recall_count
FROM (
  SELECT id FROM items WHERE id != 1
  ORDER BY embedding <=> (SELECT embedding FROM items WHERE id = 1)
  LIMIT 10
) approximate
WHERE id IN (SELECT id FROM exact_results);

Expected results

You should see query times increase as you raise the accuracy parameters:

Index	Parameter	Relative Speed
HNSW	ef_search=10	Fastest
HNSW	ef_search=40 (default)	Moderate
HNSW	ef_search=200	Slower
IVFFlat	probes=1 (default)	Fastest
IVFFlat	probes=10	Moderate
IVFFlat	probes=50	Slower

Note on recall with random data: The tutorial uses random vectors for simplicity. With uniformly random vectors, all points are roughly equidistant, so recall measurements won’t be meaningful. With real embeddings (which have structure and clusters), you’ll see clear recall improvements as you increase ef_search or probes.

In production with real embeddings, expect:

HNSW: 95%+ recall at default settings, 99%+ with ef_search=200
IVFFlat: 60-80% recall with probes=1, 95%+ with probes=10

Setting parameters for your session

You can set these parameters at different scopes:

-- Session level (resets on disconnect)
SET hnsw.ef_search = 100;

-- Transaction level (resets after commit/rollback)
SET LOCAL hnsw.ef_search = 100;

-- For a single query using a function
CREATE OR REPLACE FUNCTION search_high_recall(query_embedding vector(128))
RETURNS TABLE(id bigint, distance float) AS $$
BEGIN
  SET LOCAL hnsw.ef_search = 200;
  RETURN QUERY
  SELECT items.id, items.embedding <=> query_embedding AS distance
  FROM items
  ORDER BY items.embedding <=> query_embedding
  LIMIT 10;
END;
$$ LANGUAGE plpgsql;

Wrapping up

pgvector’s approximate indexes trade accuracy for speed, but you control the tradeoff:

HNSW: Increase ef_search for better recall (default 40, try 100-200 for critical searches)
IVFFlat: Increase probes relative to your list count (start with sqrt(lists))
Set parameters at session or transaction level to tune per-query

For most semantic search applications, the defaults provide good results. Tune these parameters when you need higher recall for specific use cases like deduplication or exact matching.

Cleanup

docker stop postgres-pgvector-tuning
docker rm postgres-pgvector-tuning