- Published on
AI Agent Memory — Short-Term Context, Long-Term Storage, and Episodic Recall
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Agents without memory are forgetful and inefficient. They repeat analysis, forget user preferences, and don't learn from past interactions. Effective memory systems layer multiple types: short-term context for immediate reasoning, episodic memory for past interactions, semantic memory for facts, and external stores for scale. This post explores production memory architectures for agents.
- In-Context Memory: Conversation History Management
- External Memory with Vector Stores
- Episodic Memory: Interaction Summaries
- Semantic Memory: Facts About User and Domain
- Memory Write Strategy
- Memory Retrieval Strategy
- Memory Compression
- Memory Privacy and Deletion
- Checklist
- Conclusion
In-Context Memory: Conversation History Management
The simplest memory is conversation history kept in the context window. This works until it doesn't—context windows fill up.
interface Message {
role: 'user' | 'assistant' | 'system';
content: string;
timestamp: number;
tokens?: number;
}
interface ConversationContext {
sessionId: string;
messages: Message[];
totalTokens: number;
maxContextTokens: number;
}
class ContextWindowManager {
private maxContextTokens: number = 8000; // Leave room for response
private tokensBudgetPerMessage: number = 100; // Estimate
async manageContext(context: ConversationContext): Promise<Message[]> {
// Count tokens in current messages
const totalTokens = await this.estimateTokens(context.messages);
if (totalTokens < this.maxContextTokens) {
// All messages fit
return context.messages;
}
// Context overflow: summarize older messages
return this.pruneContext(context.messages);
}
private async pruneContext(messages: Message[]): Promise<Message[]> {
// Strategy 1: Keep system message + recent messages
if (messages.length <= 10) {
return messages;
}
// Keep first (system message) and last 10 messages
const system = messages.filter((m) => m.role === 'system');
const recent = messages.slice(-10);
return [...system, ...recent];
}
private async estimateTokens(messages: Message[]): Promise<number> {
// Rough estimate: 1 token per 4 characters
return messages.reduce((sum, msg) => sum + Math.ceil(msg.content.length / 4), 0);
}
}
// BETTER: Summarize old messages instead of dropping them
class SummarizingContextManager {
async manageContext(messages: Message[]): Promise<Message[]> {
if (messages.length <= 20) {
return messages;
}
const toSummarize = messages.slice(0, -10);
const toKeep = messages.slice(-10);
// Summarize conversation before the last 10 messages
const summary = await this.summarizeConversation(toSummarize);
const summaryMessage: Message = {
role: 'system',
content: `[Earlier conversation summary]\n${summary}`,
timestamp: Math.min(...toSummarize.map((m) => m.timestamp)),
};
return [summaryMessage, ...toKeep];
}
private async summarizeConversation(messages: Message[]): Promise<string> {
const conversation = messages.map((m) => `${m.role}: ${m.content}`).join('\n');
const prompt = `Summarize this conversation in 2-3 sentences, focusing on key decisions and facts discovered:
${conversation}`;
// Call LLM to summarize
return 'Summary of conversation...';
}
}
// BEST: Multi-level memory management
interface MultiLevelMemory {
inContextMessages: Message[]; // Last N messages, in context window
episodicMemory: Message[]; // Summaries of past sessions
semanticMemory: Map<string, string>; // Facts about user/domain
}
class IntelligentContextManager {
async buildContext(
sessionId: string,
currentQuery: string,
memory: MultiLevelMemory,
): Promise<Message[]> {
const contextMessages: Message[] = [];
// Start with semantic facts about the user/domain
if (memory.semanticMemory.size > 0) {
const facts = Array.from(memory.semanticMemory.entries())
.map(([key, value]) => `${key}: ${value}`)
.join('\n');
contextMessages.push({
role: 'system',
content: `Context about this user:\n${facts}`,
timestamp: Date.now(),
});
}
// Add summaries of past sessions (episodic memory)
if (memory.episodicMemory.length > 0) {
contextMessages.push({
role: 'system',
content: `Past interactions:\n${memory.episodicMemory.map((m) => m.content).join('\n')}`,
timestamp: Date.now(),
});
}
// Add current conversation history (in-context)
contextMessages.push(...memory.inContextMessages);
// Add current query
contextMessages.push({
role: 'user',
content: currentQuery,
timestamp: Date.now(),
});
return contextMessages;
}
}
In-context memory is fast but expensive. Use it for immediate context, offload older content to external storage.
External Memory with Vector Stores
Vector stores enable semantic search of large amounts of past information without fitting it in context.
interface MemoryEntry {
id: string;
content: string;
embedding: number[];
metadata: {
type: 'interaction' | 'fact' | 'note';
timestamp: number;
sessionId?: string;
source?: string;
};
}
class VectorMemoryStore {
private entries: Map<string, MemoryEntry> = new Map();
async storeMemory(content: string, type: string, sessionId?: string): Promise<string> {
const id = `mem-${Date.now()}-${Math.random()}`;
// Generate embedding using sentence-transformers or OpenAI embedding API
const embedding = await this.generateEmbedding(content);
const entry: MemoryEntry = {
id,
content,
embedding,
metadata: {
type: type as 'interaction' | 'fact' | 'note',
timestamp: Date.now(),
sessionId,
},
};
this.entries.set(id, entry);
return id;
}
async retrieveRelevantMemories(query: string, topK: number = 5): Promise<MemoryEntry[]> {
// Get embedding for query
const queryEmbedding = await this.generateEmbedding(query);
// Find most similar memories using cosine similarity
const similarities = Array.from(this.entries.values()).map((entry) => ({
entry,
similarity: this.cosineSimilarity(queryEmbedding, entry.embedding),
}));
return similarities
.sort((a, b) => b.similarity - a.similarity)
.slice(0, topK)
.map((s) => s.entry);
}
private cosineSimilarity(a: number[], b: number[]): number {
let dotProduct = 0;
let normA = 0;
let normB = 0;
for (let i = 0; i < a.length; i++) {
dotProduct += a[i] * b[i];
normA += a[i] * a[i];
normB += b[i] * b[i];
}
return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
}
private async generateEmbedding(text: string): Promise<number[]> {
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: text,
}),
});
const data = (await response.json()) as any;
return data.data[0].embedding;
}
}
// Production: Use Pinecone, Weaviate, or Qdrant
class ProductionVectorStore {
private client = new (require('@pinecone-database/pinecone').Pinecone)({
apiKey: process.env.PINECONE_API_KEY,
});
async storeMemory(
id: string,
content: string,
embedding: number[],
metadata: Record<string, unknown>,
): Promise<void> {
const index = this.client.index('agent-memory');
await index.upsert([
{
id,
values: embedding,
metadata,
},
]);
}
async retrieveRelevantMemories(
embedding: number[],
topK: number = 5,
): Promise<Array<{ id: string; metadata: Record<string, unknown> }>> {
const index = this.client.index('agent-memory');
const results = await index.query({
vector: embedding,
topK,
includeMetadata: true,
});
return results.matches.map((m: any) => ({
id: m.id,
metadata: m.metadata,
}));
}
}
Vector stores scale to millions of memories and enable semantic recall without fitting everything in context.
Episodic Memory: Interaction Summaries
Episodic memory stores summaries of past interactions. When should the agent recall them?
interface Episode {
id: string;
sessionId: string;
startTime: number;
endTime: number;
topic: string;
summary: string;
keyDecisions: string[];
outcome: string;
}
class EpisodicMemoryManager {
async storeSession(messages: Message[], sessionId: string): Promise<Episode> {
// Extract key information from session
const topic = await this.extractTopic(messages);
const summary = await this.summarizeSession(messages);
const keyDecisions = await this.extractDecisions(messages);
const outcome = messages[messages.length - 1].content;
const episode: Episode = {
id: `ep-${Date.now()}`,
sessionId,
startTime: messages[0].timestamp,
endTime: messages[messages.length - 1].timestamp,
topic,
summary,
keyDecisions,
outcome,
};
// Store in vector database for semantic search
const embedding = await this.generateEmbedding(summary);
await this.vectorStore.storeMemory(episode.id, summary, embedding, {
type: 'episode',
topic,
sessionId,
timestamp: episode.endTime,
});
return episode;
}
async recallRelevantSessions(query: string, topK: number = 3): Promise<Episode[]> {
// Find semantically similar past episodes
const embedding = await this.generateEmbedding(query);
const similarities = await this.vectorStore.query(embedding, topK);
return similarities
.filter((s: any) => s.metadata.type === 'episode')
.map((s: any) => this.reconstructEpisode(s.metadata));
}
private async extractTopic(messages: Message[]): Promise<string> {
const firstFew = messages.slice(0, 3).map((m) => m.content).join('\n');
const prompt = `What is the main topic of this conversation?
${firstFew}
Respond with a 2-3 word topic, like "expense-approval" or "database-migration".`;
return this.llmCall(prompt);
}
private async summarizeSession(messages: Message[]): Promise<string> {
const conversation = messages.map((m) => `${m.role}: ${m.content}`).join('\n');
const prompt = `Summarize this session in 3-4 sentences. Focus on what the user wanted, what tools were used, and what the outcome was.
${conversation}`;
return this.llmCall(prompt);
}
private async extractDecisions(messages: Message[]): Promise<string[]> {
const conversation = messages.map((m) => m.content).join('\n');
const prompt = `Extract 2-3 key decisions or actions from this conversation:
${conversation}
Format as bullet points.`;
const response = await this.llmCall(prompt);
return response.split('\n').filter((line) => line.startsWith('-'));
}
private async generateEmbedding(text: string): Promise<number[]> {
return [];
}
private vectorStore = {
storeMemory: async () => {},
query: async () => [],
};
private async llmCall(prompt: string): Promise<string> {
return '';
}
private reconstructEpisode(metadata: Record<string, unknown>): Episode {
return {
id: '',
sessionId: '',
startTime: 0,
endTime: 0,
topic: '',
summary: '',
keyDecisions: [],
outcome: '',
};
}
}
Episodic memory captures what happened in past conversations without keeping full message history.
Semantic Memory: Facts About User and Domain
Semantic memory is structured knowledge: facts about the user, domain, preferences, and constraints.
interface SemanticFact {
key: string; // e.g., "user.preferred_language"
value: string;
confidence: number; // 0-1, how sure are we?
source: string; // Where did we learn this?
timestamp: number; // When did we learn it?
}
class SemanticMemory {
private facts: Map<string, SemanticFact> = new Map();
async recordFact(key: string, value: string, source: string, confidence: number = 0.8): Promise<void> {
const existing = this.facts.get(key);
if (existing && existing.confidence >= confidence) {
// Keep the more confident fact
return;
}
this.facts.set(key, {
key,
value,
confidence,
source,
timestamp: Date.now(),
});
}
async getFact(key: string): Promise<string | null> {
const fact = this.facts.get(key);
return fact?.value || null;
}
async getAllFacts(): Promise<Record<string, string>> {
const result: Record<string, string> = {};
for (const [key, fact] of this.facts.entries()) {
result[key] = fact.value;
}
return result;
}
async extractAndStoreFacts(messages: Message[]): Promise<void> {
const conversation = messages.map((m) => m.content).join('\n');
const prompt = `Extract facts about the user from this conversation.
Return JSON: { "facts": [{"key": "user.name", "value": "John", "confidence": 0.9}] }
${conversation}`;
const response = await this.llmCall(prompt);
try {
const extracted = JSON.parse(response);
for (const fact of extracted.facts) {
await this.recordFact(fact.key, fact.value, 'conversation', fact.confidence);
}
} catch (error) {
// Failed to parse, skip
}
}
async updateUserPreferences(userInputs: Record<string, string>): Promise<void> {
// Explicit user preferences override extracted facts
for (const [key, value] of Object.entries(userInputs)) {
await this.recordFact(`user.${key}`, value, 'explicit', 1.0);
}
}
private async llmCall(prompt: string): Promise<string> {
return '{"facts": []}';
}
}
// Example usage: Personalize agent behavior based on facts
class PersonalizedAgent {
async runWithPersonalization(query: string, userId: string): Promise<string> {
const facts = await this.semanticMemory.getAllFacts();
const systemPrompt = `You are an AI assistant. Here's what you know about the user:
${Object.entries(facts)
.map(([key, value]) => `- ${key}: ${value}`)
.join('\n')}
Use this information to personalize your response. Remember their preferences and constraints.`;
return this.llmCall(systemPrompt, query);
}
private semanticMemory = new SemanticMemory();
private async llmCall(system: string, query: string): Promise<string> {
return '';
}
}
Semantic facts are the difference between generic and personalized agents. Store what you learn about users and domains.
Memory Write Strategy
Not everything should be stored. What's worth remembering?
class MemoryWriteStrategy {
async shouldStore(
content: string,
type: 'user_input' | 'assistant_response' | 'tool_result',
): Promise<boolean> {
// Only store user inputs and important tool results
if (type === 'assistant_response') {
return false; // Don't store assistant outputs
}
// Check if content is truly informative
const importance = await this.scoreImportance(content);
return importance > 0.6;
}
private async scoreImportance(content: string): Promise<number> {
// Heuristics:
// - Contains numbers/dates (likely important)
// - Contains domain-specific terms
// - Is a user preference ("I prefer...", "I want...")
// - Is a constraint ("budget is...", "deadline is...")
let score = 0;
if (/\d+/.test(content)) score += 0.2; // Has numbers
if (/(prefer|want|need|require)/.test(content)) score += 0.3; // Preference
if (/(budget|deadline|constraint)/.test(content)) score += 0.3; // Constraint
return Math.min(1, score);
}
}
// BETTER: Explicit categories
interface StoragePolicy {
[type: string]: {
store: boolean;
category: 'episodic' | 'semantic' | 'ignore';
ttl?: number; // Time to live in seconds
};
}
const memoryPolicy: StoragePolicy = {
'user_preference': {
store: true,
category: 'semantic',
},
'temporary_note': {
store: true,
category: 'episodic',
ttl: 86400, // 24 hours
},
'system_error': {
store: true,
category: 'episodic',
ttl: 3600, // 1 hour
},
'chat_turn': {
store: false,
category: 'ignore',
},
};
Be selective about what you store. Focus on facts, preferences, and important context, not every message.
Memory Retrieval Strategy
When should the agent recall past memories?
class MemoryRetrievalStrategy {
async shouldRetrieveMemories(query: string): Promise<boolean> {
// Always retrieve for certain keywords
const triggerWords = ['remember', 'previously', 'last time', 'before', 'previously discussed'];
if (triggerWords.some((word) => query.toLowerCase().includes(word))) {
return true;
}
// For new sessions, don't retrieve (unless explicitly asked)
return false;
}
async retrieveContextualMemories(
query: string,
userId: string,
): Promise<{
episodic: Episode[];
semantic: Record<string, string>;
}> {
// Always get semantic facts (preferences, constraints)
const semantic = await this.getSemanticFacts(userId);
// Get episodic memories only if relevant
const episodic = await this.getRelevantEpisodes(query, userId);
return { episodic, semantic };
}
private async getSemanticFacts(userId: string): Promise<Record<string, string>> {
// Query database for user facts
return {};
}
private async getRelevantEpisodes(query: string, userId: string): Promise<Episode[]> {
// Vector search for relevant past interactions
return [];
}
}
Retrieve memories contextually: always semantic facts, episodic memories when relevant.
Memory Compression
Memory stores grow unbounded. Compress old memories to retain information in less space.
class MemoryCompressor {
async compressOldMemories(cutoffDays: number = 30): Promise<void> {
const now = Date.now();
const cutoff = now - cutoffDays * 86400 * 1000;
const oldEntries = Array.from(this.store.entries())
.filter(([_, entry]) => entry.metadata.timestamp < cutoff)
.slice(0, 100); // Process in batches
for (const [id, entry] of oldEntries) {
// Compress by summarizing
const compressed = await this.compress(entry.content);
// Replace old entry with compressed version
await this.store.update(id, {
...entry,
content: compressed,
metadata: {
...entry.metadata,
compressed: true,
},
});
}
}
private async compress(content: string): Promise<string> {
if (content.length < 200) {
return content; // Too short to compress
}
const prompt = `Compress this to 1-2 sentences, retaining only essential facts:
${content}`;
return this.llmCall(prompt);
}
private store = new Map();
private async llmCall(prompt: string): Promise<string> {
return '';
}
}
Compress old episodic memories to long-term facts, reducing storage costs.
Memory Privacy and Deletion
Users should control what's remembered about them and be able to delete their memories.
class MemoryPrivacy {
async deleteUserMemories(userId: string): Promise<void> {
// Delete all entries associated with this user
const userEntries = await this.findUserEntries(userId);
for (const entry of userEntries) {
await this.store.delete(entry.id);
}
console.log(`Deleted ${userEntries.length} memories for user ${userId}`);
}
async deleteMemoriesOlderThan(days: number): Promise<void> {
const cutoff = Date.now() - days * 86400 * 1000;
const oldEntries = await this.findEntriesBefore(cutoff);
for (const entry of oldEntries) {
await this.store.delete(entry.id);
}
}
async anonymizeMemories(userId: string): Promise<void> {
// Remove PII while keeping factual content
const userEntries = await this.findUserEntries(userId);
for (const entry of userEntries) {
const anonymized = await this.removePII(entry.content);
await this.store.update(entry.id, {
...entry,
content: anonymized,
});
}
}
private async removePII(content: string): Promise<string> {
// Replace names, emails, phone numbers, etc.
let cleaned = content;
cleaned = cleaned.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL]');
cleaned = cleaned.replace(/\d{3}-\d{3}-\d{4}/g, '[PHONE]');
cleaned = cleaned.replace(/\b[A-Z][a-z]+ [A-Z][a-z]+\b/g, '[NAME]');
return cleaned;
}
private async findUserEntries(userId: string): Promise<MemoryEntry[]> {
return [];
}
private async findEntriesBefore(timestamp: number): Promise<MemoryEntry[]> {
return [];
}
private store = new Map();
}
Respect user privacy: provide deletion, anonymization, and explicit consent for memory storage.
Checklist
- Short-term: Use conversation history in context window, summarize when full
- Long-term: Vector store with semantic search
- Episodic: Summarize past sessions, store as searchable memories
- Semantic: Extract and store facts about users and domain
- Compression: Periodically compress old memories to summaries
- Privacy: Support deletion, anonymization, and consent
Conclusion
Memory systems turn agents from stateless chatbots into assistants that learn. Layer in-context history for immediate context, vector stores for semantic search, episodic summaries of past interactions, and semantic facts about users and domains. Compress old memories and always respect privacy. Good memory systems are what separate agents that feel intelligent from those that just answer the next question.