Agent Vector Store Integration Specification: Pinecone, Weaviate, and pgvector

Production AI agents require a vector store that handles embeddings, metadata filtering, hybrid search, and per-tenant isolation. This specification compares Pinecone, Weaviate, and pgvector across indexing (HNSW vs IVF), consistency, and cost so teams can pick a backend matched to their RAG workload.

TL;DR

Use pgvector when your data already lives in Postgres and you need transactional consistency. Choose Weaviate when hybrid search (BM25 + vector) and modular embedding pipelines matter. Pick Pinecone when you want a managed serverless backend with minimal ops and predictable scaling. All three support HNSW indexing; only Weaviate and pgvector expose tuning parameters.

Why this specification exists

Agent retrieval quality depends almost entirely on the vector store and how it is wired into the agent loop. A poorly chosen backend caps recall, leaks tenant data across namespaces, or creates query latency that breaks streaming UX. This specification standardizes the integration contract so a team can swap backends without rewriting agent code.

Scope and assumptions

The agent uses an embedding model with stable output dimensions (e.g., 1,536 or 3,072).
Documents are pre-chunked and enriched with metadata before write.
The agent expects k-NN retrieval with optional metadata filters.
Hybrid search (sparse + dense) is desirable but optional.
Multi-tenant isolation is required at the namespace or row level.

Backend comparison

Capability	Pinecone	Weaviate	pgvector	Qdrant	Milvus
Hosting	Serverless or pod	Self-host or cloud	Postgres extension	Self-host or cloud	Self-host or cloud
Index types	Proprietary HNSW	HNSW, flat	HNSW, IVFFlat	HNSW	HNSW, IVF, DiskANN
HNSW tuning	Not exposed (Pinecone, docs)	efConstruction, maxConnections, ef	m, ef_construction, ef_search	m, ef_construct, ef	Full tuning
Metadata filter	Yes	Yes	Yes (SQL WHERE)	Yes	Yes
Hybrid search	Sparse-dense vectors	Native BM25 + vector	Manual (tsvector + cosine)	Native	Native
Multi-tenancy	Namespaces	Tenants	Schema or tenant_id column	Collections	Partitions
ACID guarantees	No	No	Yes (Postgres)	No	No

Reported p50 latency on a 1M-vector, 1,536-dim workload sits in the 3-12 ms range across these systems with HNSW configured for accuracy, with pgvector and Qdrant at the fast end and Pinecone Serverless in the 10-12 ms range (Vecstore benchmark, 2026). Treat these numbers as directional; your network, embedding dimensionality, and filter selectivity dominate real workloads.

Index selection: HNSW vs IVF

HNSW (Hierarchical Navigable Small World) builds a multi-layer proximity graph. It delivers sub-10 ms p50 on millions of vectors with 95%+ recall when tuned, and it accepts incremental inserts without a full rebuild (pgvector docs). The cost is RAM: HNSW typically consumes 2-5x the memory of IVFFlat and has slower build times.

IVF (Inverted File Index) clusters vectors into Voronoi cells and probes only the nearest cells at query time. Build is faster and memory cheaper, but recall is sensitive to nprobe and to data distribution drift after inserts. Milvus recommends IVF when embeddings exceed 100M and memory budget is tight (Milvus blog, 2026).

Default rule for agent workloads. Use HNSW unless you cannot fit the graph in RAM. Tune ef_search per query class: low for autocomplete-style retrieval, high for grounded RAG.

Namespace and multi-tenant patterns

Namespace per tenant (Pinecone, Qdrant collections). Strongest isolation; cross-tenant analytics require fan-out.
Shared index with tenant_id filter (Weaviate tenants, pgvector column). Lower ops; rely on filter correctness for isolation.
Shared index with row-level security (pgvector + Postgres RLS). Strongest correctness guarantee for regulated data; small filter cost.

Agents must always pass a tenant identifier into retrieval and reject any response that contains documents outside that scope. Treat the tenant ID as a security boundary, not a hint.

Hybrid search (sparse + dense)

Dense vectors capture semantics; sparse retrievers like BM25 capture rare tokens (product codes, error strings). Hybrid search combines both via reciprocal rank fusion (RRF) or weighted scoring.

Weaviate: native hybrid query with alpha parameter to balance vector vs BM25.
Pinecone: sparse-dense vectors require generating a sparse representation (e.g., SPLADE) at index time.
pgvector: combine tsvector full-text search with cosine distance and rank-fuse in SQL.

For agent retrieval, hybrid search typically improves recall on out-of-distribution queries containing identifiers, codes, or proper nouns.

Write-time vs read-time consistency

Write-time consistency. Agent writes (e.g., new memory) must be queryable before the next turn. pgvector delivers this via Postgres transactions. Pinecone Serverless and Weaviate are eventually consistent on write; expect propagation lag from milliseconds to seconds.
Read-time consistency. Replica lag affects retrieval freshness in distributed deployments. Pin reads to the primary for memory writes that must be visible immediately.

If your agent depends on "read your own writes" semantics within a single conversation, prefer pgvector or document a write-then-poll pattern in the agent loop.

Refresh cadence

Source	Recommended cadence	Trigger
Static reference docs	Weekly batch	Doc change webhook
Product catalog	Hourly	CDC stream
User memory	Real-time	Per turn
Public web	Daily	Scheduled crawl

Rebuild full indexes only when embedding model changes; otherwise upsert in place.

Cost modeling

Agent retrieval costs decompose into storage, query, and embedding regeneration.

Storage: vector size = dim 4 bytes (float32) or dim 1 byte (int8 quantized). HNSW adds 2-5x for graph edges.
Query: managed services charge per read unit or per query; self-hosted is dominated by RAM and CPU.
Embedding regeneration: triggered by model upgrades or chunking changes.

For a 10M-vector, 1,536-dim corpus, public benchmarks show monthly costs ranging roughly from low hundreds (pgvector on managed Postgres) to several hundreds (managed serverless), depending on QPS and retention (Vecstore benchmark, 2026). Treat any vendor pricing page as the source of truth.

Reference write contract

{
  "id": "doc_123#chunk_4",
  "vector": [0.12, -0.04, 0.91],
  "sparse_vector": {"indices": [42, 1019], "values": [0.7, 0.3]},
  "metadata": {
    "tenant_id": "acme",
    "source": "https://example.com/article",
    "chunk_index": 4,
    "updated_at": "2026-05-03T10:00:00Z",
    "acl": ["role:reader"]
  }
}

Reference query contract

results = store.query(
    tenant_id="acme",
    vector=embed(query_text),
    sparse_vector=sparse_encode(query_text),
    top_k=12,
    filter={"updated_at": {"$gte": "2026-01-01"}},
    hybrid_alpha=0.6,
)

Common pitfalls

Mixing embedding models in one index. Always namespace by model + dimension.
Forgetting to re-embed after a chunking change.
Using top_k larger than the agent context budget can ingest.
Treating namespace filters as security without server-side enforcement.
Skipping hybrid search and then complaining the agent cannot find SKUs.

FAQ

Q: Which vector store should a small team start with?

Start with pgvector if you already run Postgres. You get HNSW indexing, transactional writes, and SQL filters without adding a new system. Move to a dedicated store when memory pressure or QPS exceeds what your Postgres can support.

Q: When does HNSW stop being the right choice?

When the graph no longer fits in RAM or when ingest dominates queries. At hundreds of millions of vectors, IVF or DiskANN can be cheaper if you accept a small recall hit.

Q: How should agents handle a stale read after a memory write?

Treat the write as a future read. Either confirm the write before the next agent turn, or include the just-written content in the working context until the index propagates.

Q: Is hybrid search worth the complexity?

For agent workloads with codes, identifiers, or domain jargon, yes. Pure dense retrieval often misses exact-match tokens. Start with Weaviate or Qdrant native hybrid; build it manually only on pgvector.

Q: How do I avoid leaking documents across tenants?

Pass tenant_id into every read and write. Enforce filters server-side via row-level security (pgvector) or per-tenant indexes (Pinecone, Qdrant). Never rely on application code alone.