AI Agent Memory — Short-Term Context, Long-Term Storage, and Episodic Recall

Introduction

Agents without memory are forgetful and inefficient. They repeat analysis, forget user preferences, and don't learn from past interactions. Effective memory systems layer multiple types: short-term context for immediate reasoning, episodic memory for past interactions, semantic memory for facts, and external stores for scale. This post explores production memory architectures for agents.

In-Context Memory: Conversation History Management
External Memory with Vector Stores
Episodic Memory: Interaction Summaries
Semantic Memory: Facts About User and Domain
Memory Write Strategy
Memory Retrieval Strategy
Memory Compression
Memory Privacy and Deletion
Checklist
Conclusion

In-Context Memory: Conversation History Management

The simplest memory is conversation history kept in the context window. This works until it doesn't—context windows fill up.

interface Message {
  role: 'user' | 'assistant' | 'system';
  content: string;
  timestamp: number;
  tokens?: number;
}

interface ConversationContext {
  sessionId: string;
  messages: Message[];
  totalTokens: number;
  maxContextTokens: number;
}

class ContextWindowManager {
  private maxContextTokens: number = 8000; // Leave room for response
  private tokensBudgetPerMessage: number = 100; // Estimate

  async manageContext(context: ConversationContext): Promise<Message[]> {
    // Count tokens in current messages
    const totalTokens = await this.estimateTokens(context.messages);

    if (totalTokens < this.maxContextTokens) {
      // All messages fit
      return context.messages;
    }

    // Context overflow: summarize older messages
    return this.pruneContext(context.messages);
  }

  private async pruneContext(messages: Message[]): Promise<Message[]> {
    // Strategy 1: Keep system message + recent messages
    if (messages.length <= 10) {
      return messages;
    }

    // Keep first (system message) and last 10 messages
    const system = messages.filter((m) => m.role === 'system');
    const recent = messages.slice(-10);

    return [...system, ...recent];
  }

  private async estimateTokens(messages: Message[]): Promise<number> {
    // Rough estimate: 1 token per 4 characters
    return messages.reduce((sum, msg) => sum + Math.ceil(msg.content.length / 4), 0);
  }
}

// BETTER: Summarize old messages instead of dropping them
class SummarizingContextManager {
  async manageContext(messages: Message[]): Promise<Message[]> {
    if (messages.length <= 20) {
      return messages;
    }

    const toSummarize = messages.slice(0, -10);
    const toKeep = messages.slice(-10);

    // Summarize conversation before the last 10 messages
    const summary = await this.summarizeConversation(toSummarize);

    const summaryMessage: Message = {
      role: 'system',
      content: `[Earlier conversation summary]\n${summary}`,
      timestamp: Math.min(...toSummarize.map((m) => m.timestamp)),
    };

    return [summaryMessage, ...toKeep];
  }

  private async summarizeConversation(messages: Message[]): Promise<string> {
    const conversation = messages.map((m) => `${m.role}: ${m.content}`).join('\n');

    const prompt = `Summarize this conversation in 2-3 sentences, focusing on key decisions and facts discovered:

${conversation}`;

    // Call LLM to summarize
    return 'Summary of conversation...';
  }
}

// BEST: Multi-level memory management
interface MultiLevelMemory {
  inContextMessages: Message[]; // Last N messages, in context window
  episodicMemory: Message[]; // Summaries of past sessions
  semanticMemory: Map<string, string>; // Facts about user/domain
}

class IntelligentContextManager {
  async buildContext(
    sessionId: string,
    currentQuery: string,
    memory: MultiLevelMemory,
  ): Promise<Message[]> {
    const contextMessages: Message[] = [];

    // Start with semantic facts about the user/domain
    if (memory.semanticMemory.size > 0) {
      const facts = Array.from(memory.semanticMemory.entries())
        .map(([key, value]) => `${key}: ${value}`)
        .join('\n');

      contextMessages.push({
        role: 'system',
        content: `Context about this user:\n${facts}`,
        timestamp: Date.now(),
      });
    }

    // Add summaries of past sessions (episodic memory)
    if (memory.episodicMemory.length > 0) {
      contextMessages.push({
        role: 'system',
        content: `Past interactions:\n${memory.episodicMemory.map((m) => m.content).join('\n')}`,
        timestamp: Date.now(),
      });
    }

    // Add current conversation history (in-context)
    contextMessages.push(...memory.inContextMessages);

    // Add current query
    contextMessages.push({
      role: 'user',
      content: currentQuery,
      timestamp: Date.now(),
    });

    return contextMessages;
  }
}

In-context memory is fast but expensive. Use it for immediate context, offload older content to external storage.

External Memory with Vector Stores

Vector stores enable semantic search of large amounts of past information without fitting it in context.

interface MemoryEntry {
  id: string;
  content: string;
  embedding: number[];
  metadata: {
    type: 'interaction' | 'fact' | 'note';
    timestamp: number;
    sessionId?: string;
    source?: string;
  };
}

class VectorMemoryStore {
  private entries: Map<string, MemoryEntry> = new Map();

  async storeMemory(content: string, type: string, sessionId?: string): Promise<string> {
    const id = `mem-${Date.now()}-${Math.random()}`;

    // Generate embedding using sentence-transformers or OpenAI embedding API
    const embedding = await this.generateEmbedding(content);

    const entry: MemoryEntry = {
      id,
      content,
      embedding,
      metadata: {
        type: type as 'interaction' | 'fact' | 'note',
        timestamp: Date.now(),
        sessionId,
      },
    };

    this.entries.set(id, entry);
    return id;
  }

  async retrieveRelevantMemories(query: string, topK: number = 5): Promise<MemoryEntry[]> {
    // Get embedding for query
    const queryEmbedding = await this.generateEmbedding(query);

    // Find most similar memories using cosine similarity
    const similarities = Array.from(this.entries.values()).map((entry) => ({
      entry,
      similarity: this.cosineSimilarity(queryEmbedding, entry.embedding),
    }));

    return similarities
      .sort((a, b) => b.similarity - a.similarity)
      .slice(0, topK)
      .map((s) => s.entry);
  }

  private cosineSimilarity(a: number[], b: number[]): number {
    let dotProduct = 0;
    let normA = 0;
    let normB = 0;

    for (let i = 0; i < a.length; i++) {
      dotProduct += a[i] * b[i];
      normA += a[i] * a[i];
      normB += b[i] * b[i];
    }

    return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
  }

  private async generateEmbedding(text: string): Promise<number[]> {
    const response = await fetch('https://api.openai.com/v1/embeddings', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model: 'text-embedding-3-small',
        input: text,
      }),
    });

    const data = (await response.json()) as any;
    return data.data[0].embedding;
  }
}

// Production: Use Pinecone, Weaviate, or Qdrant
class ProductionVectorStore {
  private client = new (require('@pinecone-database/pinecone').Pinecone)({
    apiKey: process.env.PINECONE_API_KEY,
  });

  async storeMemory(
    id: string,
    content: string,
    embedding: number[],
    metadata: Record<string, unknown>,
  ): Promise<void> {
    const index = this.client.index('agent-memory');

    await index.upsert([
      {
        id,
        values: embedding,
        metadata,
      },
    ]);
  }

  async retrieveRelevantMemories(
    embedding: number[],
    topK: number = 5,
  ): Promise<Array<{ id: string; metadata: Record<string, unknown> }>> {
    const index = this.client.index('agent-memory');

    const results = await index.query({
      vector: embedding,
      topK,
      includeMetadata: true,
    });

    return results.matches.map((m: any) => ({
      id: m.id,
      metadata: m.metadata,
    }));
  }
}

Vector stores scale to millions of memories and enable semantic recall without fitting everything in context.

Episodic Memory: Interaction Summaries

Episodic memory stores summaries of past interactions. When should the agent recall them?

interface Episode {
  id: string;
  sessionId: string;
  startTime: number;
  endTime: number;
  topic: string;
  summary: string;
  keyDecisions: string[];
  outcome: string;
}

class EpisodicMemoryManager {
  async storeSession(messages: Message[], sessionId: string): Promise<Episode> {
    // Extract key information from session
    const topic = await this.extractTopic(messages);
    const summary = await this.summarizeSession(messages);
    const keyDecisions = await this.extractDecisions(messages);
    const outcome = messages[messages.length - 1].content;

    const episode: Episode = {
      id: `ep-${Date.now()}`,
      sessionId,
      startTime: messages[0].timestamp,
      endTime: messages[messages.length - 1].timestamp,
      topic,
      summary,
      keyDecisions,
      outcome,
    };

    // Store in vector database for semantic search
    const embedding = await this.generateEmbedding(summary);
    await this.vectorStore.storeMemory(episode.id, summary, embedding, {
      type: 'episode',
      topic,
      sessionId,
      timestamp: episode.endTime,
    });

    return episode;
  }

  async recallRelevantSessions(query: string, topK: number = 3): Promise<Episode[]> {
    // Find semantically similar past episodes
    const embedding = await this.generateEmbedding(query);
    const similarities = await this.vectorStore.query(embedding, topK);

    return similarities
      .filter((s: any) => s.metadata.type === 'episode')
      .map((s: any) => this.reconstructEpisode(s.metadata));
  }

  private async extractTopic(messages: Message[]): Promise<string> {
    const firstFew = messages.slice(0, 3).map((m) => m.content).join('\n');

    const prompt = `What is the main topic of this conversation?

${firstFew}

Respond with a 2-3 word topic, like "expense-approval" or "database-migration".`;

    return this.llmCall(prompt);
  }

  private async summarizeSession(messages: Message[]): Promise<string> {
    const conversation = messages.map((m) => `${m.role}: ${m.content}`).join('\n');

    const prompt = `Summarize this session in 3-4 sentences. Focus on what the user wanted, what tools were used, and what the outcome was.

${conversation}`;

    return this.llmCall(prompt);
  }

  private async extractDecisions(messages: Message[]): Promise<string[]> {
    const conversation = messages.map((m) => m.content).join('\n');

    const prompt = `Extract 2-3 key decisions or actions from this conversation:

${conversation}

Format as bullet points.`;

    const response = await this.llmCall(prompt);
    return response.split('\n').filter((line) => line.startsWith('-'));
  }

  private async generateEmbedding(text: string): Promise<number[]> {
    return [];
  }

  private vectorStore = {
    storeMemory: async () => {},
    query: async () => [],
  };

  private async llmCall(prompt: string): Promise<string> {
    return '';
  }

  private reconstructEpisode(metadata: Record<string, unknown>): Episode {
    return {
      id: '',
      sessionId: '',
      startTime: 0,
      endTime: 0,
      topic: '',
      summary: '',
      keyDecisions: [],
      outcome: '',
    };
  }
}

Episodic memory captures what happened in past conversations without keeping full message history.

Semantic Memory: Facts About User and Domain

Semantic memory is structured knowledge: facts about the user, domain, preferences, and constraints.

interface SemanticFact {
  key: string; // e.g., "user.preferred_language"
  value: string;
  confidence: number; // 0-1, how sure are we?
  source: string; // Where did we learn this?
  timestamp: number; // When did we learn it?
}

class SemanticMemory {
  private facts: Map<string, SemanticFact> = new Map();

  async recordFact(key: string, value: string, source: string, confidence: number = 0.8): Promise<void> {
    const existing = this.facts.get(key);

    if (existing && existing.confidence >= confidence) {
      // Keep the more confident fact
      return;
    }

    this.facts.set(key, {
      key,
      value,
      confidence,
      source,
      timestamp: Date.now(),
    });
  }

  async getFact(key: string): Promise<string | null> {
    const fact = this.facts.get(key);
    return fact?.value || null;
  }

  async getAllFacts(): Promise<Record<string, string>> {
    const result: Record<string, string> = {};

    for (const [key, fact] of this.facts.entries()) {
      result[key] = fact.value;
    }

    return result;
  }

  async extractAndStoreFacts(messages: Message[]): Promise<void> {
    const conversation = messages.map((m) => m.content).join('\n');

    const prompt = `Extract facts about the user from this conversation.
Return JSON: { "facts": [{"key": "user.name", "value": "John", "confidence": 0.9}] }

${conversation}`;

    const response = await this.llmCall(prompt);

    try {
      const extracted = JSON.parse(response);

      for (const fact of extracted.facts) {
        await this.recordFact(fact.key, fact.value, 'conversation', fact.confidence);
      }
    } catch (error) {
      // Failed to parse, skip
    }
  }

  async updateUserPreferences(userInputs: Record<string, string>): Promise<void> {
    // Explicit user preferences override extracted facts
    for (const [key, value] of Object.entries(userInputs)) {
      await this.recordFact(`user.${key}`, value, 'explicit', 1.0);
    }
  }

  private async llmCall(prompt: string): Promise<string> {
    return '{"facts": []}';
  }
}

// Example usage: Personalize agent behavior based on facts
class PersonalizedAgent {
  async runWithPersonalization(query: string, userId: string): Promise<string> {
    const facts = await this.semanticMemory.getAllFacts();

    const systemPrompt = `You are an AI assistant. Here's what you know about the user:

${Object.entries(facts)
  .map(([key, value]) => `- ${key}: ${value}`)
  .join('\n')}

Use this information to personalize your response. Remember their preferences and constraints.`;

    return this.llmCall(systemPrompt, query);
  }

  private semanticMemory = new SemanticMemory();

  private async llmCall(system: string, query: string): Promise<string> {
    return '';
  }
}

Semantic facts are the difference between generic and personalized agents. Store what you learn about users and domains.

Memory Write Strategy

Not everything should be stored. What's worth remembering?

class MemoryWriteStrategy {
  async shouldStore(
    content: string,
    type: 'user_input' | 'assistant_response' | 'tool_result',
  ): Promise<boolean> {
    // Only store user inputs and important tool results
    if (type === 'assistant_response') {
      return false; // Don't store assistant outputs
    }

    // Check if content is truly informative
    const importance = await this.scoreImportance(content);
    return importance > 0.6;
  }

  private async scoreImportance(content: string): Promise<number> {
    // Heuristics:
    // - Contains numbers/dates (likely important)
    // - Contains domain-specific terms
    // - Is a user preference ("I prefer...", "I want...")
    // - Is a constraint ("budget is...", "deadline is...")

    let score = 0;

    if (/\d+/.test(content)) score += 0.2; // Has numbers
    if (/(prefer|want|need|require)/.test(content)) score += 0.3; // Preference
    if (/(budget|deadline|constraint)/.test(content)) score += 0.3; // Constraint

    return Math.min(1, score);
  }
}

// BETTER: Explicit categories
interface StoragePolicy {
  [type: string]: {
    store: boolean;
    category: 'episodic' | 'semantic' | 'ignore';
    ttl?: number; // Time to live in seconds
  };
}

const memoryPolicy: StoragePolicy = {
  'user_preference': {
    store: true,
    category: 'semantic',
  },
  'temporary_note': {
    store: true,
    category: 'episodic',
    ttl: 86400, // 24 hours
  },
  'system_error': {
    store: true,
    category: 'episodic',
    ttl: 3600, // 1 hour
  },
  'chat_turn': {
    store: false,
    category: 'ignore',
  },
};

Be selective about what you store. Focus on facts, preferences, and important context, not every message.

Memory Retrieval Strategy

When should the agent recall past memories?

class MemoryRetrievalStrategy {
  async shouldRetrieveMemories(query: string): Promise<boolean> {
    // Always retrieve for certain keywords
    const triggerWords = ['remember', 'previously', 'last time', 'before', 'previously discussed'];
    if (triggerWords.some((word) => query.toLowerCase().includes(word))) {
      return true;
    }

    // For new sessions, don't retrieve (unless explicitly asked)
    return false;
  }

  async retrieveContextualMemories(
    query: string,
    userId: string,
  ): Promise<{
    episodic: Episode[];
    semantic: Record<string, string>;
  }> {
    // Always get semantic facts (preferences, constraints)
    const semantic = await this.getSemanticFacts(userId);

    // Get episodic memories only if relevant
    const episodic = await this.getRelevantEpisodes(query, userId);

    return { episodic, semantic };
  }

  private async getSemanticFacts(userId: string): Promise<Record<string, string>> {
    // Query database for user facts
    return {};
  }

  private async getRelevantEpisodes(query: string, userId: string): Promise<Episode[]> {
    // Vector search for relevant past interactions
    return [];
  }
}

Retrieve memories contextually: always semantic facts, episodic memories when relevant.

Memory Compression

Memory stores grow unbounded. Compress old memories to retain information in less space.

class MemoryCompressor {
  async compressOldMemories(cutoffDays: number = 30): Promise<void> {
    const now = Date.now();
    const cutoff = now - cutoffDays * 86400 * 1000;

    const oldEntries = Array.from(this.store.entries())
      .filter(([_, entry]) => entry.metadata.timestamp < cutoff)
      .slice(0, 100); // Process in batches

    for (const [id, entry] of oldEntries) {
      // Compress by summarizing
      const compressed = await this.compress(entry.content);

      // Replace old entry with compressed version
      await this.store.update(id, {
        ...entry,
        content: compressed,
        metadata: {
          ...entry.metadata,
          compressed: true,
        },
      });
    }
  }

  private async compress(content: string): Promise<string> {
    if (content.length < 200) {
      return content; // Too short to compress
    }

    const prompt = `Compress this to 1-2 sentences, retaining only essential facts:

${content}`;

    return this.llmCall(prompt);
  }

  private store = new Map();

  private async llmCall(prompt: string): Promise<string> {
    return '';
  }
}

Compress old episodic memories to long-term facts, reducing storage costs.

Memory Privacy and Deletion

Users should control what's remembered about them and be able to delete their memories.

class MemoryPrivacy {
  async deleteUserMemories(userId: string): Promise<void> {
    // Delete all entries associated with this user
    const userEntries = await this.findUserEntries(userId);

    for (const entry of userEntries) {
      await this.store.delete(entry.id);
    }

    console.log(`Deleted ${userEntries.length} memories for user ${userId}`);
  }

  async deleteMemoriesOlderThan(days: number): Promise<void> {
    const cutoff = Date.now() - days * 86400 * 1000;

    const oldEntries = await this.findEntriesBefore(cutoff);

    for (const entry of oldEntries) {
      await this.store.delete(entry.id);
    }
  }

  async anonymizeMemories(userId: string): Promise<void> {
    // Remove PII while keeping factual content
    const userEntries = await this.findUserEntries(userId);

    for (const entry of userEntries) {
      const anonymized = await this.removePII(entry.content);

      await this.store.update(entry.id, {
        ...entry,
        content: anonymized,
      });
    }
  }

  private async removePII(content: string): Promise<string> {
    // Replace names, emails, phone numbers, etc.
    let cleaned = content;
    cleaned = cleaned.replace(/[\w.-]+@[\w.-]+\.\w+/g, '[EMAIL]');
    cleaned = cleaned.replace(/\d{3}-\d{3}-\d{4}/g, '[PHONE]');
    cleaned = cleaned.replace(/\b[A-Z][a-z]+ [A-Z][a-z]+\b/g, '[NAME]');
    return cleaned;
  }

  private async findUserEntries(userId: string): Promise<MemoryEntry[]> {
    return [];
  }

  private async findEntriesBefore(timestamp: number): Promise<MemoryEntry[]> {
    return [];
  }

  private store = new Map();
}

Respect user privacy: provide deletion, anonymization, and explicit consent for memory storage.

Checklist

Short-term: Use conversation history in context window, summarize when full
Long-term: Vector store with semantic search
Episodic: Summarize past sessions, store as searchable memories
Semantic: Extract and store facts about users and domain
Compression: Periodically compress old memories to summaries
Privacy: Support deletion, anonymization, and consent

Conclusion

Memory systems turn agents from stateless chatbots into assistants that learn. Layer in-context history for immediate context, vector stores for semantic search, episodic summaries of past interactions, and semantic facts about users and domains. Compress old memories and always respect privacy. Good memory systems are what separate agents that feel intelligent from those that just answer the next question.