Scaling Vector Search with pgvectorscale

Scale pgvector beyond millions of rows with pgvectorscale's DiskANN index — 28x lower latency and 16x higher throughput vs. Pinecone.

This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.

pgvector’s built-in indexes (HNSW and IVFFlat) work well for datasets up to a few million vectors. Beyond that, query latencies start creeping up and index builds take hours. pgvectorscale is an extension from Timescale that addresses this with a new index type based on Microsoft’s DiskANN algorithm.

On Timescale’s benchmark of 50 million Cohere embeddings, PostgreSQL with pgvector + pgvectorscale achieves 28x lower p95 latency and 16x higher throughput compared to Pinecone’s storage-optimized index at 99% recall.

In this post, we’ll cover:

How DiskANN differs from HNSW - Why disk-based indexing scales better
When to use pgvectorscale - Dataset size thresholds and tradeoffs
Additional features - SBQ compression, streaming inserts, filtered search
Tutorial - Setting up pgvectorscale with Docker and running vector searches

How DiskANN differs from HNSW

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph structure where each layer contains progressively fewer nodes. Search starts at the top layer and works down. The index lives entirely in memory, which becomes expensive at scale.

DiskANN takes a different approach. It builds a single-layer graph optimized for SSD storage, using a technique called “graph pruning” to limit the number of edges per node. The algorithm was developed by Microsoft Research and can handle billion-scale datasets on commodity hardware.

pgvectorscale implements a streaming variant called StreamingDiskANN that supports real-time inserts and updates—the original DiskANN algorithm required rebuilding the entire index for new data.

When to use pgvectorscale

pgvectorscale makes sense when:

Your dataset exceeds 5-10 million vectors
Index memory consumption is a concern
You need consistent query latencies at scale
You’re already using TimescaleDB or Timescale Cloud

For smaller datasets, pgvector’s HNSW index is simpler and performs well. pgvectorscale adds operational complexity and a larger Docker image (~4GB vs ~500MB for pgvector alone).

Additional features

Beyond the DiskANN index, pgvectorscale includes:

Statistical Binary Quantization (SBQ): Compresses vectors for reduced storage and faster search
Streaming inserts: Add vectors without rebuilding the index
Label-based filtered search: Efficient pre-filtering before vector search

See what QueryPlane can build for you

Connect to your database, write SQL with AI, and build shareable apps — all from your browser.

Get Started Book a Demo

Tutorial: Setting up pgvectorscale

This tutorial walks through spinning up PostgreSQL with pgvectorscale and running vector search queries.

Prerequisites

Docker installed and running
A terminal

Step 1: Start PostgreSQL with pgvectorscale

The easiest way to run pgvectorscale is using Timescale’s HA Docker image, which includes both pgvector and pgvectorscale pre-installed:

docker run -d \
  --name postgres-vectorscale \
  -e POSTGRES_PASSWORD=postgres \
  -p 5432:5432 \
  timescale/timescaledb-ha:pg16

This image is larger than the standard pgvector image (~4GB), but includes everything you need.

Step 2: Connect and enable extensions

Connect to the database:

docker exec -it postgres-vectorscale psql -U postgres

Enable the extensions:

-- pgvectorscale depends on pgvector, CASCADE installs it automatically
CREATE EXTENSION IF NOT EXISTS vectorscale CASCADE;

Verify the extensions are installed:

SELECT extname, extversion FROM pg_extension
WHERE extname IN ('vector', 'vectorscale');

Step 3: Create a table with vectors

CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  embedding vector(384)  -- Using 384 dimensions for this example
);

Step 4: Insert sample data

For this tutorial, we’ll insert some random vectors. In production, these would be embeddings from your embedding model:

-- Insert 10,000 random vectors for testing
INSERT INTO documents (title, embedding)
SELECT
  'Document ' || i,
  ('[' || array_to_string(ARRAY(
    SELECT (random() * 2 - 1)::float4
    FROM generate_series(1, 384)
  ), ',') || ']')::vector(384)
FROM generate_series(1, 10000) i;

Step 5: Create a StreamingDiskANN index

CREATE INDEX documents_embedding_idx ON documents
USING diskann (embedding vector_cosine_ops);

The diskann index type is provided by pgvectorscale. It supports the same operator classes as pgvector:

vector_cosine_ops for cosine distance (<=>)
vector_l2_ops for Euclidean distance (<->)
vector_ip_ops for inner product (<#>)

Step 6: Run a similarity search

-- Create a random query vector
WITH query AS (
  SELECT ('[' || array_to_string(ARRAY(
    SELECT (random() * 2 - 1)::float4
    FROM generate_series(1, 384)
  ), ',') || ']')::vector(384) AS embedding
)
SELECT
  d.id,
  d.title,
  1 - (d.embedding <=> q.embedding) AS similarity
FROM documents d, query q
ORDER BY d.embedding <=> q.embedding
LIMIT 5;

Step 7: Verify the index is being used

EXPLAIN ANALYZE
SELECT id, title
FROM documents
ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = 1)
LIMIT 5;

Look for Index Scan using documents_embedding_idx in the output.

Comparing with HNSW

To see the difference, create an HNSW index on the same data:

CREATE INDEX documents_hnsw_idx ON documents
USING hnsw (embedding vector_cosine_ops);

Run the same query and compare execution times in EXPLAIN ANALYZE. For 10,000 vectors, HNSW will likely be faster. The advantage of DiskANN becomes apparent at larger scales (millions of vectors) where memory constraints matter.

Wrapping up

pgvectorscale extends pgvector with DiskANN-based indexing for large-scale vector search. The key takeaways:

Use pgvectorscale when your dataset exceeds 5-10 million vectors or memory is constrained
The StreamingDiskANN index supports real-time inserts without full rebuilds
For smaller datasets, pgvector’s built-in HNSW index remains simpler and often faster

Cleanup

docker stop postgres-vectorscale
docker rm postgres-vectorscale