Using pgvector halfvec in Python
Use pgvector's halfvec type in Python with psycopg2, psycopg3, and SQLAlchemy — save 50% storage with minimal accuracy loss.
Postgres
This post was written by an engineer at QueryPlane. QueryPlane is an app builder for your database: bring your own postgres db and you can create interactive applications to share with other developers, coworkers or even your customers. If you’re interested in trying it out, get started here.
pgvector’s halfvec type stores vectors in 16-bit floats instead of 32-bit, cutting storage and index size in half with negligible impact on search quality. The Python ecosystem supports halfvec through the pgvector Python package created by Andrew Kane, which works with psycopg2, psycopg3, and SQLAlchemy. This guide covers how to use halfvec from Python with each of these drivers.
In this post, we’ll cover:
- Storage savings - Real benchmarks comparing
vectorvshalfvec - psycopg2 integration - Using
HalfVectorwith the classic driver - psycopg3 integration - Using
HalfVectorwith the modern driver - SQLAlchemy integration - Defining
halfveccolumns in ORM models - Precision considerations - What 16-bit floats mean for your data
Why halfvec saves you half your storage
A vector(1536) column (the dimension OpenAI’s text-embedding-ada-002 produces) takes ~6KB per row. With halfvec(1536), that drops to ~3KB. In practice, this translates to exactly 50% savings on both table and index storage. Here’s what we measured with 10,000 1536-dimensional vectors:
| vector | halfvec | Savings | |
|---|---|---|---|
| Table size | 80 MB | 40 MB | 50% |
| HNSW index size | 78 MB | 39 MB | 50% |
| Index build time | 9.1s | 5.5s | 40% |
The savings compound at scale. At 10 million vectors, you’re saving ~40GB of storage and proportionally more RAM for index caching. For a deeper look at when and why to use halfvec, see our Half-Precision Vectors with pgvector post.
Setup
Install the pgvector Python package alongside your preferred PostgreSQL driver:
# With psycopg2
pip install pgvector psycopg2-binary
# With psycopg3
pip install pgvector "psycopg[binary]"
# With SQLAlchemy
pip install pgvector sqlalchemy psycopg2-binary
Then create a halfvec column in PostgreSQL:
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE items (
id BIGSERIAL PRIMARY KEY,
embedding halfvec(1536)
);
CREATE INDEX ON items
USING hnsw (embedding halfvec_cosine_ops);
Note the operator class: halfvec_cosine_ops instead of vector_cosine_ops. This is a common mistake—using the wrong operator class will cause an error. The full list of halfvec operator classes:
| Distance | Operator Class |
|---|---|
| Cosine | halfvec_cosine_ops |
| L2/Euclidean | halfvec_l2_ops |
| Inner product | halfvec_ip_ops |
Using halfvec with psycopg2
The pgvector package provides a HalfVector class and a register_vector function that teaches psycopg2 how to serialize and deserialize halfvec values.
import psycopg2
from pgvector.psycopg2 import register_vector, HalfVector
conn = psycopg2.connect("postgresql://localhost/mydb")
register_vector(conn)
cur = conn.cursor()
# Insert using HalfVector
embedding = HalfVector([0.1, 0.2, 0.3])
cur.execute(
"INSERT INTO items (embedding) VALUES (%s)",
(embedding,)
)
# Insert from a plain Python list (cast in SQL)
embedding_list = [0.1, 0.2, 0.3]
cur.execute(
"INSERT INTO items (embedding) VALUES (%s::halfvec)",
(str(embedding_list).replace(' ', ''),)
)
conn.commit()
When you read halfvec values back, they come as HalfVector objects:
cur.execute("SELECT id, embedding FROM items ORDER BY id LIMIT 3")
for row in cur.fetchall():
print(row[0], row[1], type(row[1]))
# 1 HalfVector([0.0999755859375, 0.199951171875, 0.300048828125]) <class 'pgvector.halfvec.HalfVector'>
Notice the slight precision loss—0.1 becomes 0.0999755859375. This is expected behavior with 16-bit floats and doesn’t meaningfully affect similarity search results.
Searching with psycopg2
query = HalfVector([0.1, 0.2, 0.3])
cur.execute("""
SELECT id, embedding <=> %s AS distance
FROM items
ORDER BY embedding <=> %s
LIMIT 10
""", (query, query))
results = cur.fetchall()
for r in results:
print(f"id={r[0]}, distance={r[1]}")
Using halfvec with psycopg3 (psycopg)
The psycopg3 integration works the same way, with the import path changed to pgvector.psycopg:
import psycopg
from pgvector.psycopg import register_vector, HalfVector
conn = psycopg.connect("postgresql://localhost/mydb")
register_vector(conn)
cur = conn.cursor()
# Insert
embedding = HalfVector([0.1, 0.2, 0.3])
cur.execute(
"INSERT INTO items (embedding) VALUES (%s)",
(embedding,)
)
conn.commit()
# Search
query = HalfVector([0.1, 0.2, 0.3])
cur.execute("""
SELECT id, embedding <=> %s AS distance
FROM items
ORDER BY embedding <=> %s
LIMIT 10
""", (query, query))
for row in cur.fetchall():
print(f"id={row[0]}, distance={row[1]}")
See what QueryPlane can build for you
Connect to your database, write SQL with AI, and build shareable apps — all from your browser.
Using halfvec with SQLAlchemy
For SQLAlchemy, use the HALFVEC column type from pgvector.sqlalchemy:
from sqlalchemy import create_engine, Column, Integer
from sqlalchemy.orm import declarative_base, Session
from pgvector.sqlalchemy import HALFVEC
Base = declarative_base()
class Item(Base):
__tablename__ = 'items'
id = Column(Integer, primary_key=True, autoincrement=True)
embedding = Column(HALFVEC(1536))
engine = create_engine("postgresql+psycopg2://localhost/mydb")
Base.metadata.create_all(engine)
Inserting and querying works with plain Python lists—SQLAlchemy handles the conversion:
with Session(engine) as session:
# Insert with a plain list
item = Item(embedding=[0.1, 0.2, 0.3, ...])
session.add(item)
session.commit()
# Search using cosine distance
query_vec = [0.1, 0.2, 0.3, ...]
results = session.query(
Item.id,
Item.embedding.cosine_distance(query_vec).label('distance')
).order_by('distance').limit(10).all()
for r in results:
print(f"id={r.id}, distance={r.distance}")
Other distance methods available on the column: l2_distance(), max_inner_product().
Working with numpy
Most embedding models return numpy arrays. Converting to HalfVector is straightforward:
import numpy as np
from pgvector.psycopg2 import HalfVector
# From a float32 embedding (typical model output)
embedding_f32 = np.array([0.1, 0.2, 0.3], dtype=np.float32)
hv = HalfVector(embedding_f32.tolist())
# From a float16 embedding (if your model supports it)
embedding_f16 = np.array([0.1, 0.2, 0.3], dtype=np.float16)
hv = HalfVector(embedding_f16.tolist())
The conversion from float32 to float16 happens on the PostgreSQL side when you insert into a halfvec column—you don’t need to pre-convert in Python. But if you’re storing vectors in your application before inserting, using np.float16 arrays saves memory on the application side too.
Precision considerations
16-bit floats (IEEE 754 half-precision) have about 3 decimal digits of precision and a range from ~6×10⁻⁵ to 65,504. For normalized embeddings—which most models produce—this precision is more than sufficient.
Where you might notice the difference:
# float32: 0.1
# float16: 0.0999755859375
# Difference: 0.0000244140625
This precision loss is negligible for cosine similarity. Benchmarks from Neon show less than 1% recall difference between vector and halfvec across standard datasets.
Wrapping up
The halfvec type is a straightforward way to cut your pgvector storage in half:
- Use
HalfVectorfrom thepgvectorPython package for psycopg2 and psycopg3 - Use
HALFVEC(dim)column type for SQLAlchemy models - Use
halfvec_cosine_ops(notvector_cosine_ops) when creating indexes - Plain Python lists and numpy arrays convert automatically—no special handling needed
For new projects, halfvec is a good default. The 50% storage savings compound into faster queries, smaller indexes, and lower infrastructure costs with no meaningful impact on search quality.