pgai — Running AI Directly Inside PostgreSQL

Introduction

Vector databases are trendy. They solve a real problem: finding similar documents by semantic meaning. But they add complexity—another database to manage, sync, and scale.

Timescale''s pgai extension runs AI directly inside PostgreSQL. You can embed text, search semantically, and call LLMs—all from SQL. For most applications, this eliminates the need for a separate vector database.

This post covers pgai: what it is, how it works, when it''s better than Pinecone or Weaviate, and a realistic production setup.

What Is pgai?
Auto-Embedding with ai.create_vectorizer()
Semantic Search from SQL
Calling OpenAI and Ollama Directly from SQL
pgvectorscale for Faster ANN Search
Background Embedding Workers
Use Case: Search Across 10M Rows
Performance Benchmarks
Setup Guide
When to Use pgai vs. Dedicated Vector DB
Checklist
Conclusion

What Is pgai?

pgai is a PostgreSQL extension that integrates with OpenAI, Ollama, and other LLM providers. It exposes SQL functions for embeddings, completions, and fine-tuning.

Key insight: embeddings are just numbers. Storing them in PostgreSQL''s pgvector type is natural. Querying them with SQL is intuitive. No REST API, no client library switching—just SQL.

-- Enable pgai extension
CREATE EXTENSION IF NOT EXISTS pgai CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;

-- Create a documents table with embeddings
CREATE TABLE documents (
  id uuid PRIMARY KEY,
  content text,
  embedding vector(1536),
  created_at timestamptz DEFAULT now()
);

pgai handles the rest. You define how embeddings are generated, and pgai generates them automatically.

Auto-Embedding with `ai.create_vectorizer()`

Manually embedding every document is tedious. pgai''s create_vectorizer() function automates it. When a row is inserted or updated, the vectorizer automatically generates embeddings.

-- Create a vectorizer that embeds the content column
SELECT ai.create_vectorizer(
  table_name := 'documents',
  column_name := 'content',
  embedding_model := 'openai_embedding_v1',
  embedding_dim := 1536,
  embedding_column_name := 'embedding',
  chunking_strategy := 'by_tokens',
  chunking_window_size := 100,
  chunking_overlap := 20
);

Now, when you insert a document:

INSERT INTO documents (id, content) VALUES
  (gen_random_uuid(), 'Machine learning is a subset of AI...');

pgai automatically:

Chunks the text (100 tokens, 20-token overlap)
Calls OpenAI embedding API
Stores vectors in the embedding column

No application code required. The vectorizer is idempotent—if you update content, embeddings update automatically.

Semantic Search from SQL

With embeddings in place, semantic search is a SQL query:

-- Search for documents similar to a query
SELECT
  id,
  content,
  1 - (embedding <=> query_embedding) as similarity
FROM documents
WHERE 1 - (embedding <=> query_embedding) > 0.8
ORDER BY embedding <=> query_embedding
LIMIT 10;

Wrap this in a function for reusability:

CREATE OR REPLACE FUNCTION search_documents(query_text text, limit_count int DEFAULT 10)
RETURNS TABLE (id uuid, content text, similarity float) AS $$
BEGIN
  RETURN QUERY
  WITH query_embedding AS (
    SELECT ai.openai_embed('openai_embedding_v1', query_text, NULL) as embedding
  )
  SELECT
    d.id,
    d.content,
    (1 - (d.embedding <=> qe.embedding))::float as similarity
  FROM documents d, query_embedding qe
  WHERE 1 - (d.embedding <=> qe.embedding) > 0.7
  ORDER BY d.embedding <=> qe.embedding
  LIMIT limit_count;
END;
$$ LANGUAGE plpgsql;

Call from your application:

const { data } = await supabase.rpc('search_documents', {
  query_text: 'how do neural networks work?',
  limit_count: 5,
});

console.log(data); // Top 5 similar documents

No client-side vector libraries. No embedding caching. Just SQL.

Calling OpenAI and Ollama Directly from SQL

pgai can call LLMs directly from SQL functions. This is powerful for generation tasks within queries.

-- Generate a summary of a document
CREATE OR REPLACE FUNCTION summarize_document(doc_id uuid)
RETURNS text AS $$
DECLARE
  content text;
  summary text;
BEGIN
  SELECT d.content INTO content FROM documents d WHERE d.id = doc_id;

  SELECT ai.openai_complete(
    'gpt-4',
    jsonb_build_array(
      jsonb_build_object('role', 'user', 'content', 'Summarize: ' || content)
    ),
    NULL
  ) ->> 'message' INTO summary;

  RETURN summary;
END;
$$ LANGUAGE plpgsql;

Call it:

const { data } = await supabase.rpc('summarize_document', {
  doc_id: documentId,
});

console.log(data); // AI-generated summary

For local LLMs, use Ollama instead:

SELECT ai.ollama_complete(
  'llama2',
  jsonb_build_array(
    jsonb_build_object('role', 'user', 'content', 'Explain quantum computing')
  )
) ->> 'message';

Everything stays in the database. No webhooks, no external queues, no async workers (unless you want them).

pgvectorscale for Faster ANN Search

PostgreSQL''s pgvector extension supports three index types:

IVFFLAT: Inverted file flat index. Fast, approximate, scalable to millions of vectors.
HNSW: Hierarchical navigable small world. More accurate, higher memory.
Brute force: Exact search, slow for <100k vectors.

For massive datasets, pgvectorscale (Timescale''s custom index) is faster:

-- Create DiskANN-style index for 10M+ vectors
CREATE INDEX ON documents USING diskann (embedding)
  WITH (compression = 'int8');

With this index, searching 10 million vectors is still milliseconds. The tradeoff: slightly lower accuracy (97%+ vs 99%+), but speed dominates for most use cases.

Background Embedding Workers

For bulk embedding jobs, pgai supports background workers. Instead of blocking inserts while calling the embedding API, workers process in batches.

-- Create a job that embeds documents without vectorizer
SELECT ai.index_documents(
  table_name := 'documents',
  column_name := 'content',
  embedding_column := 'embedding',
  batch_size := 100
);

Workers run asynchronously. Monitor progress with:

SELECT * FROM ai.jobs WHERE status != 'completed';

For 1M documents, this completes in hours instead of days.

Use Case: Search Across 10M Rows

A realistic scenario: you have 10 million documents, and users want to search semantically.

Traditional approach: Postgres for core data, Pinecone for vectors. You sync documents to Pinecone, search there, fetch results from Postgres. Complexity: replication lag, duplicate data, two systems to maintain.

With pgai:

CREATE TABLE documents (
  id uuid PRIMARY KEY,
  content text,
  metadata jsonb,
  embedding vector(1536),
  created_at timestamptz DEFAULT now()
);

CREATE INDEX ON documents USING diskann (embedding);

-- Vectorize existing documents in background
SELECT ai.index_documents('documents', 'content', 'embedding');

-- Users search
SELECT * FROM search_documents('what is quantum computing?', 5);

Single system. Consistent data. No sync complexity.

For 10M vectors with DiskANN index, search latency is 50–200ms depending on vector dimensionality and hardware. Acceptable for most applications.

Performance Benchmarks

Benchmarks (single Postgres instance, 1M vectors, 1536 dimensions):

Index Type	Query Time (ms)	Recall
IVFFLAT (nlist=100)	15	0.95
HNSW	8	0.99
DiskANN	12	0.97
Brute force	600	1.0

For 10M vectors, index type matters more:

Index Type	Memory	Query Time (ms)
IVFFLAT (nlist=1000)	2GB	40
DiskANN	4GB	60
HNSW	8GB+	OOM

DiskANN scales best. HNSW exhausts memory at 10M vectors.

Setup Guide

Install on Supabase:

# Enable pgai and pgvector via Supabase console
# Then in SQL editor:

CREATE EXTENSION IF NOT EXISTS pgai CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;

-- Set OpenAI key (as admin)
SELECT ai.set_config('openai_api_key', 'sk-...');

Local setup with Docker:

docker run -d \
  --name postgres-pgai \
  -e POSTGRES_PASSWORD=password \
  -p 5432:5432 \
  timescaledb/timescaledb:latest-pg16

# Inside container:
psql -U postgres -c "CREATE EXTENSION pgai CASCADE;"
psql -U postgres -c "CREATE EXTENSION vector CASCADE;"

Verify:

SELECT ai.openai_embed('text_embedding_3_small', 'test', NULL);

If it returns a vector, you''re set.

When to Use pgai vs. Dedicated Vector DB

Use pgai if:

Dataset is <10M vectors
You want simplicity (one database)
Semantic search is secondary (not the core product)
You already use Postgres

Use Pinecone/Weaviate if:

Dataset is >100M vectors
Vector search is your product''s core
You need specialized indexing (SIFT, locality-sensitive hashing)
You want managed infrastructure (no self-hosting)

For most applications, pgai wins.

Checklist

Conclusion

pgai eliminates the vector database. Embeddings live in PostgreSQL, search happens in SQL, and LLM calls execute in functions. This simplicity is powerful.

For applications that don''t need specialized vector infrastructure, pgai is the pragmatic choice. It''s one less system to manage, one less API to learn, one less bill to pay.

Run AI inside the database. Keep things simple.