- Published on
pgai — Running AI Directly Inside PostgreSQL
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Vector databases are trendy. They solve a real problem: finding similar documents by semantic meaning. But they add complexity—another database to manage, sync, and scale.
Timescale''s pgai extension runs AI directly inside PostgreSQL. You can embed text, search semantically, and call LLMs—all from SQL. For most applications, this eliminates the need for a separate vector database.
This post covers pgai: what it is, how it works, when it''s better than Pinecone or Weaviate, and a realistic production setup.
- What Is pgai?
- Auto-Embedding with ai.create_vectorizer()
- Semantic Search from SQL
- Calling OpenAI and Ollama Directly from SQL
- pgvectorscale for Faster ANN Search
- Background Embedding Workers
- Use Case: Search Across 10M Rows
- Performance Benchmarks
- Setup Guide
- When to Use pgai vs. Dedicated Vector DB
- Checklist
- Conclusion
What Is pgai?
pgai is a PostgreSQL extension that integrates with OpenAI, Ollama, and other LLM providers. It exposes SQL functions for embeddings, completions, and fine-tuning.
Key insight: embeddings are just numbers. Storing them in PostgreSQL''s pgvector type is natural. Querying them with SQL is intuitive. No REST API, no client library switching—just SQL.
-- Enable pgai extension
CREATE EXTENSION IF NOT EXISTS pgai CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
-- Create a documents table with embeddings
CREATE TABLE documents (
id uuid PRIMARY KEY,
content text,
embedding vector(1536),
created_at timestamptz DEFAULT now()
);
pgai handles the rest. You define how embeddings are generated, and pgai generates them automatically.
Auto-Embedding with ai.create_vectorizer()
Manually embedding every document is tedious. pgai''s create_vectorizer() function automates it. When a row is inserted or updated, the vectorizer automatically generates embeddings.
-- Create a vectorizer that embeds the content column
SELECT ai.create_vectorizer(
table_name := 'documents',
column_name := 'content',
embedding_model := 'openai_embedding_v1',
embedding_dim := 1536,
embedding_column_name := 'embedding',
chunking_strategy := 'by_tokens',
chunking_window_size := 100,
chunking_overlap := 20
);
Now, when you insert a document:
INSERT INTO documents (id, content) VALUES
(gen_random_uuid(), 'Machine learning is a subset of AI...');
pgai automatically:
- Chunks the text (100 tokens, 20-token overlap)
- Calls OpenAI embedding API
- Stores vectors in the
embeddingcolumn
No application code required. The vectorizer is idempotent—if you update content, embeddings update automatically.
Semantic Search from SQL
With embeddings in place, semantic search is a SQL query:
-- Search for documents similar to a query
SELECT
id,
content,
1 - (embedding <=> query_embedding) as similarity
FROM documents
WHERE 1 - (embedding <=> query_embedding) > 0.8
ORDER BY embedding <=> query_embedding
LIMIT 10;
Wrap this in a function for reusability:
CREATE OR REPLACE FUNCTION search_documents(query_text text, limit_count int DEFAULT 10)
RETURNS TABLE (id uuid, content text, similarity float) AS $$
BEGIN
RETURN QUERY
WITH query_embedding AS (
SELECT ai.openai_embed('openai_embedding_v1', query_text, NULL) as embedding
)
SELECT
d.id,
d.content,
(1 - (d.embedding <=> qe.embedding))::float as similarity
FROM documents d, query_embedding qe
WHERE 1 - (d.embedding <=> qe.embedding) > 0.7
ORDER BY d.embedding <=> qe.embedding
LIMIT limit_count;
END;
$$ LANGUAGE plpgsql;
Call from your application:
const { data } = await supabase.rpc('search_documents', {
query_text: 'how do neural networks work?',
limit_count: 5,
});
console.log(data); // Top 5 similar documents
No client-side vector libraries. No embedding caching. Just SQL.
Calling OpenAI and Ollama Directly from SQL
pgai can call LLMs directly from SQL functions. This is powerful for generation tasks within queries.
-- Generate a summary of a document
CREATE OR REPLACE FUNCTION summarize_document(doc_id uuid)
RETURNS text AS $$
DECLARE
content text;
summary text;
BEGIN
SELECT d.content INTO content FROM documents d WHERE d.id = doc_id;
SELECT ai.openai_complete(
'gpt-4',
jsonb_build_array(
jsonb_build_object('role', 'user', 'content', 'Summarize: ' || content)
),
NULL
) ->> 'message' INTO summary;
RETURN summary;
END;
$$ LANGUAGE plpgsql;
Call it:
const { data } = await supabase.rpc('summarize_document', {
doc_id: documentId,
});
console.log(data); // AI-generated summary
For local LLMs, use Ollama instead:
SELECT ai.ollama_complete(
'llama2',
jsonb_build_array(
jsonb_build_object('role', 'user', 'content', 'Explain quantum computing')
)
) ->> 'message';
Everything stays in the database. No webhooks, no external queues, no async workers (unless you want them).
pgvectorscale for Faster ANN Search
PostgreSQL''s pgvector extension supports three index types:
- IVFFLAT: Inverted file flat index. Fast, approximate, scalable to millions of vectors.
- HNSW: Hierarchical navigable small world. More accurate, higher memory.
- Brute force: Exact search, slow for <100k vectors.
For massive datasets, pgvectorscale (Timescale''s custom index) is faster:
-- Create DiskANN-style index for 10M+ vectors
CREATE INDEX ON documents USING diskann (embedding)
WITH (compression = 'int8');
With this index, searching 10 million vectors is still milliseconds. The tradeoff: slightly lower accuracy (97%+ vs 99%+), but speed dominates for most use cases.
Background Embedding Workers
For bulk embedding jobs, pgai supports background workers. Instead of blocking inserts while calling the embedding API, workers process in batches.
-- Create a job that embeds documents without vectorizer
SELECT ai.index_documents(
table_name := 'documents',
column_name := 'content',
embedding_column := 'embedding',
batch_size := 100
);
Workers run asynchronously. Monitor progress with:
SELECT * FROM ai.jobs WHERE status != 'completed';
For 1M documents, this completes in hours instead of days.
Use Case: Search Across 10M Rows
A realistic scenario: you have 10 million documents, and users want to search semantically.
Traditional approach: Postgres for core data, Pinecone for vectors. You sync documents to Pinecone, search there, fetch results from Postgres. Complexity: replication lag, duplicate data, two systems to maintain.
With pgai:
CREATE TABLE documents (
id uuid PRIMARY KEY,
content text,
metadata jsonb,
embedding vector(1536),
created_at timestamptz DEFAULT now()
);
CREATE INDEX ON documents USING diskann (embedding);
-- Vectorize existing documents in background
SELECT ai.index_documents('documents', 'content', 'embedding');
-- Users search
SELECT * FROM search_documents('what is quantum computing?', 5);
Single system. Consistent data. No sync complexity.
For 10M vectors with DiskANN index, search latency is 50–200ms depending on vector dimensionality and hardware. Acceptable for most applications.
Performance Benchmarks
Benchmarks (single Postgres instance, 1M vectors, 1536 dimensions):
| Index Type | Query Time (ms) | Recall |
|---|---|---|
| IVFFLAT (nlist=100) | 15 | 0.95 |
| HNSW | 8 | 0.99 |
| DiskANN | 12 | 0.97 |
| Brute force | 600 | 1.0 |
For 10M vectors, index type matters more:
| Index Type | Memory | Query Time (ms) |
|---|---|---|
| IVFFLAT (nlist=1000) | 2GB | 40 |
| DiskANN | 4GB | 60 |
| HNSW | 8GB+ | OOM |
DiskANN scales best. HNSW exhausts memory at 10M vectors.
Setup Guide
Install on Supabase:
# Enable pgai and pgvector via Supabase console
# Then in SQL editor:
CREATE EXTENSION IF NOT EXISTS pgai CASCADE;
CREATE EXTENSION IF NOT EXISTS vector CASCADE;
-- Set OpenAI key (as admin)
SELECT ai.set_config('openai_api_key', 'sk-...');
Local setup with Docker:
docker run -d \
--name postgres-pgai \
-e POSTGRES_PASSWORD=password \
-p 5432:5432 \
timescaledb/timescaledb:latest-pg16
# Inside container:
psql -U postgres -c "CREATE EXTENSION pgai CASCADE;"
psql -U postgres -c "CREATE EXTENSION vector CASCADE;"
Verify:
SELECT ai.openai_embed('text_embedding_3_small', 'test', NULL);
If it returns a vector, you''re set.
When to Use pgai vs. Dedicated Vector DB
Use pgai if:
- Dataset is <10M vectors
- You want simplicity (one database)
- Semantic search is secondary (not the core product)
- You already use Postgres
Use Pinecone/Weaviate if:
- Dataset is >100M vectors
- Vector search is your product''s core
- You need specialized indexing (SIFT, locality-sensitive hashing)
- You want managed infrastructure (no self-hosting)
For most applications, pgai wins.
Checklist
- Enable
pgaiandpgvectorextensions - Create a table with
vector(1536)column - Set up OpenAI API key in pgai
- Create a vectorizer for auto-embedding
- Write a search function in SQL
- Test search latency with 1000+ documents
- Create DiskANN index for scale
- Set up background jobs for bulk embedding
- Monitor embedding API costs
- Test LLM function calls (summarize, generate)
Conclusion
pgai eliminates the vector database. Embeddings live in PostgreSQL, search happens in SQL, and LLM calls execute in functions. This simplicity is powerful.
For applications that don''t need specialized vector infrastructure, pgai is the pragmatic choice. It''s one less system to manage, one less API to learn, one less bill to pay.
Run AI inside the database. Keep things simple.