- Published on
Continual Learning for AI Systems — Keeping Models Fresh Without Catastrophic Forgetting
- Authors
- Name
Introduction
Models trained in 2024 are outdated in 2026. New events, discoveries, and user feedback require model updates. But fine-tuning on new data often causes catastrophic forgetting — the model forgets what it learned during pre-training. This guide covers continual learning strategies for keeping models fresh without breaking existing capabilities.
- The Knowledge Cutoff Problem
- Fine-Tuning on New Data
- Elastic Weight Consolidation (EWC)
- Experience Replay
- Adapter-Based Continual Learning
- RAG as Alternative to Continual Learning
- Scheduled Re-Training Pipelines
- Drift Detection Triggers
- Checklist
- Conclusion
The Knowledge Cutoff Problem
LLMs have fixed knowledge cutoffs. A model trained through March 2025 doesn't know about events after that date. Users ask about recent news, products, and events that didn't exist during training.
Three approaches to solve this:
- Fine-tune on new data: Train on recent documents, but risks forgetting old knowledge
- Retrieval-Augmented Generation (RAG): Keep external knowledge base current, don't update model
- Hybrid: Update model parameters on critical knowledge, use RAG for everything else
interface KnowledgeManagementStrategy {
approach: 'finetune' | 'rag' | 'hybrid';
knowledgeCutoffDate: Date;
updateFrequency: 'weekly' | 'monthly' | 'quarterly';
costPerUpdate: number;
latency: number;
}
class KnowledgeManager {
private strategy: KnowledgeManagementStrategy;
private externalKB: Map<string, string> = new Map();
constructor(strategy: KnowledgeManagementStrategy) {
this.strategy = strategy;
}
async answerQuery(query: string, model: LLMModel): Promise<string> {
if (this.strategy.approach === 'rag') {
// Always retrieve context
const context = await this.retrieveContext(query);
return model.generate(`Context: ${context}\n\nQuery: ${query}`);
}
if (this.strategy.approach === 'finetune') {
// Rely on fine-tuned knowledge
return model.generate(query);
}
if (this.strategy.approach === 'hybrid') {
// Retrieve for recent/specialized knowledge
const isRecentTopic = await this.classifyAsRecent(query);
if (isRecentTopic) {
const context = await this.retrieveContext(query);
return model.generate(`Context: ${context}\n\nQuery: ${query}`);
}
return model.generate(query);
}
return '';
}
private async retrieveContext(query: string): Promise<string> {
// Search external knowledge base
return '';
}
private async classifyAsRecent(query: string): Promise<boolean> {
// Use NER to detect recent entities/dates
return false;
}
}
Fine-Tuning on New Data
Simple approach: collect new examples and fine-tune. Risk: catastrophic forgetting.
interface FinetuneBatch {
examples: TrainingExample[];
domain: string;
priority: 'critical' | 'normal';
}
async function finetuneOnNewData(
baseModel: LLMModel,
newData: FinetuneBatch[],
validationSet: TrainingExample[]
): Promise<LLMModel> {
const model = baseModel.clone();
for (const batch of newData) {
// Train on new data
for (const example of batch.examples) {
const loss = await model.computeLoss(example.instruction, example.response);
await model.backward(loss, learningRate = 5e-6);
}
}
// Validate that we haven't forgotten old knowledge
const validationLoss = await evaluateOnValidationSet(model, validationSet);
console.log(`Validation loss after fine-tuning: ${validationLoss}`);
if (validationLoss > 0.5) {
console.warn('Significant forgetting detected!');
}
return model;
}
Elastic Weight Consolidation (EWC)
EWC penalizes changes to parameters that were important for pre-training. Parameters that matter for original task = high Fisher Information; changes are expensive.
class ElasticWeightConsolidation {
private baseModel: LLMModel;
private fisherMatrix: Map<string, number[][]> = new Map();
private lambda: number = 0.4; // Weight of EWC penalty
async computeFisherInformation(
model: LLMModel,
calibrationSet: TrainingExample[]
): Promise<void> {
// Fisher information matrix: how much does loss change with parameter changes?
// F = E[∇log p(y|x)²]
for (const example of calibrationSet) {
const grad = await model.computeGradient(example.instruction);
for (const [paramName, gradValues] of Object.entries(grad)) {
if (!this.fisherMatrix.has(paramName)) {
this.fisherMatrix.set(paramName, []);
}
const fisher = this.fisherMatrix.get(paramName)!;
fisher.push(gradValues.map((g) => g * g));
}
}
// Average Fisher information
for (const [paramName, fisher] of this.fisherMatrix.entries()) {
const avg = fisher[0].map((_, i) =>
fisher.reduce((sum, row) => sum + row[i], 0) / fisher.length
);
this.fisherMatrix.set(paramName, [avg]);
}
}
async finetuneWithEWC(
model: LLMModel,
newData: TrainingExample[],
learningRate: number = 5e-6
): Promise<LLMModel> {
const originalParams = model.getParameters();
for (const example of newData) {
const loss = await model.computeLoss(example.instruction, example.response);
// EWC penalty: penalize deviation from original parameters
let ewcPenalty = 0;
const currentParams = model.getParameters();
for (const [paramName, fisher] of this.fisherMatrix.entries()) {
const originalParam = originalParams[paramName];
const currentParam = currentParams[paramName];
const diff = currentParam - originalParam;
// Penalty proportional to: Fisher Information * (change)²
ewcPenalty += (fisher[0][0] * diff * diff);
}
const totalLoss = loss + this.lambda * ewcPenalty;
await model.backward(totalLoss, learningRate);
}
return model;
}
}
async function elasticWeightConsolidation(
baseModel: LLMModel,
calibrationSet: TrainingExample[],
newData: TrainingExample[]
): Promise<LLMModel> {
const ewc = new ElasticWeightConsolidation();
// Step 1: Compute Fisher Information on original task
await ewc.computeFisherInformation(baseModel, calibrationSet);
// Step 2: Fine-tune with EWC penalty
return ewc.finetuneWithEWC(baseModel, newData);
}
Experience Replay
Keep a buffer of old examples. When fine-tuning on new data, also replay old examples to prevent forgetting:
class ExperienceReplayBuffer {
private buffer: TrainingExample[] = [];
private maxSize: number = 1000;
addExamples(examples: TrainingExample[]): void {
for (const example of examples) {
this.buffer.push(example);
if (this.buffer.length > this.maxSize) {
// Remove oldest example
this.buffer.shift();
}
}
}
sampleBatch(batchSize: number): TrainingExample[] {
const batch: TrainingExample[] = [];
for (let i = 0; i < batchSize; i++) {
const idx = Math.floor(Math.random() * this.buffer.length);
batch.push(this.buffer[idx]);
}
return batch;
}
}
async function trainWithExperienceReplay(
model: LLMModel,
newData: TrainingExample[],
replayBuffer: ExperienceReplayBuffer,
replayRatio: number = 0.2 // 20% replay, 80% new data
): Promise<LLMModel> {
const totalBatches = Math.ceil(newData.length / 32);
for (let batch = 0; batch < totalBatches; batch++) {
const newBatch = newData.slice(batch * 32, (batch + 1) * 32);
// Mix in replayed experiences
const replaySize = Math.floor(newBatch.length * replayRatio);
const replayBatch = replayBuffer.sampleBatch(replaySize);
const mixedBatch = [...newBatch, ...replayBatch];
for (const example of mixedBatch) {
const loss = await model.computeLoss(example.instruction, example.response);
await model.backward(loss, learningRate = 5e-6);
}
}
// Add new examples to buffer for future replays
replayBuffer.addExamples(newData);
return model;
}
Adapter-Based Continual Learning
Rather than updating all model weights, train small adapter layers. Original weights stay frozen, adapters learn new tasks:
interface Adapter {
name: string;
domain: string;
parameters: Map<string, number[]>;
trainableParams: number;
}
class AdapterModule {
private baseModel: LLMModel;
private adapters: Map<string, Adapter> = new Map();
async addAdapterForDomain(
domain: string,
trainingData: TrainingExample[]
): Promise<void> {
const adapter: Adapter = {
name: `adapter-${domain}`,
domain,
parameters: new Map(),
trainableParams: 1000 // Small adapter
};
// Initialize small adapter networks
// In production: use LoRA (Low-Rank Adaptation)
// adapter = original_output + adapter_layer(x)
// Train adapter
for (const example of trainingData) {
// Forward pass through base model + adapter
const baseOutput = await this.baseModel.generate(example.instruction);
// Compute adapter loss (only adapter weights change)
const loss = await this.computeAdapterLoss(
example.instruction,
baseOutput,
example.response
);
await this.backpropagateAdapter(adapter, loss);
}
this.adapters.set(domain, adapter);
}
async generateWithAdapter(
instruction: string,
domain: string
): Promise<string> {
const adapter = this.adapters.get(domain);
if (!adapter) {
return this.baseModel.generate(instruction);
}
// Base model output + adapter refinement
const baseOutput = await this.baseModel.generate(instruction);
const refinedOutput = await this.applyAdapter(
instruction,
baseOutput,
adapter
);
return refinedOutput;
}
private async computeAdapterLoss(
instruction: string,
baseOutput: string,
expected: string
): Promise<number> {
// Compute loss only on adapter contribution
return 0.5;
}
private async backpropagateAdapter(adapter: Adapter, loss: number): Promise<void> {
// Update adapter parameters, not base model
}
private async applyAdapter(
instruction: string,
baseOutput: string,
adapter: Adapter
): Promise<string> {
// Apply domain-specific adaptation
return baseOutput;
}
}
RAG as Alternative to Continual Learning
Rather than updating model parameters, maintain external knowledge base that's always current:
class RAGSystem {
private vectorDB: VectorDatabase;
private retriever: Retriever;
async indexNewDocuments(documents: Document[]): Promise<void> {
for (const doc of documents) {
const embedding = await this.vectorDB.embed(doc.text);
await this.vectorDB.index(doc.id, embedding, doc.text);
}
}
async answer(query: string, model: LLMModel): Promise<string> {
// Retrieve relevant context
const context = await this.retriever.retrieve(query, topK = 5);
// Augment prompt with context
const augmentedPrompt = `
You are a helpful assistant. Use the provided context to answer the question.
Context:
${context.map((c) => c.text).join('\n\n')}
Question: ${query}
Answer:`;
return model.generate(augmentedPrompt);
}
}
interface Document {
id: string;
text: string;
metadata: {
source: string;
date: Date;
};
}
interface VectorDatabase {
embed(text: string): Promise<number[]>;
index(id: string, embedding: number[], text: string): Promise<void>;
}
interface Retriever {
retrieve(query: string, topK: number): Promise<Document[]>;
}
Scheduled Re-Training Pipelines
Automate model updates on a schedule:
interface RetrainingSchedule {
frequency: 'weekly' | 'monthly' | 'quarterly';
trainingDataSource: string;
evaluationMetrics: string[];
minImprovement: number; // Only deploy if metrics improve by >this
autoRollback: boolean;
}
class ScheduledRetrainingPipeline {
async executePipeline(schedule: RetrainingSchedule): Promise<void> {
console.log(`Starting scheduled retraining (${schedule.frequency})`);
// Step 1: Collect new training data
const newData = await this.collectNewData(schedule.trainingDataSource);
// Step 2: Train new model
const newModel = await this.trainModel(newData);
// Step 3: Evaluate
const metrics = await this.evaluateModel(newModel, schedule.evaluationMetrics);
// Step 4: Compare with current model
const currentMetrics = await this.getCurrentMetrics();
const improvement = this.computeImprovement(metrics, currentMetrics);
if (improvement > schedule.minImprovement) {
// Deploy new model
await this.deployModel(newModel);
console.log(`Deployed new model with ${(improvement * 100).toFixed(2)}% improvement`);
} else {
console.log(`New model did not meet improvement threshold (${improvement * 100}%)`);
}
}
private async collectNewData(source: string): Promise<TrainingExample[]> {
// Collect logs, user feedback, labeled corrections
return [];
}
private async trainModel(data: TrainingExample[]): Promise<LLMModel> {
// Fine-tune or train from scratch
return new LLMModel();
}
private async evaluateModel(
model: LLMModel,
metrics: string[]
): Promise<Record<string, number>> {
return {};
}
private async getCurrentMetrics(): Promise<Record<string, number>> {
return {};
}
private computeImprovement(
newMetrics: Record<string, number>,
oldMetrics: Record<string, number>
): number {
// Average improvement across metrics
let totalImprovement = 0;
for (const key in newMetrics) {
totalImprovement += (newMetrics[key] - oldMetrics[key]) / oldMetrics[key];
}
return totalImprovement / Object.keys(newMetrics).length;
}
private async deployModel(model: LLMModel): Promise<void> {
// Canary deploy or shadow test before full rollout
}
}
Drift Detection Triggers
Monitor for data distribution shift and automatically trigger retraining:
class DriftDetector {
private recentQueries: string[] = [];
private windowSize: number = 1000;
async detectDrift(newQueries: string[]): Promise<boolean> {
// Update window
this.recentQueries.push(...newQueries);
if (this.recentQueries.length > this.windowSize) {
this.recentQueries = this.recentQueries.slice(-this.windowSize);
}
// Check for distribution shift
const historicalDistribution = await this.getHistoricalDistribution();
const currentDistribution = await this.analyzeQueries(this.recentQueries);
// Use KL divergence or Wasserstein distance
const divergence = this.computeKLDivergence(
historicalDistribution,
currentDistribution
);
const threshold = 0.1;
if (divergence > threshold) {
console.log(`Drift detected! KL divergence: ${divergence}`);
return true;
}
return false;
}
private async getHistoricalDistribution(): Promise<Distribution> {
return {};
}
private async analyzeQueries(queries: string[]): Promise<Distribution> {
// Analyze query topics, entities, lengths, etc.
return {};
}
private computeKLDivergence(
dist1: Distribution,
dist2: Distribution
): number {
// KL(P||Q) = Σ P(x) * log(P(x) / Q(x))
return 0.05;
}
}
type Distribution = Record<string, number>;
Checklist
- Choose knowledge management strategy: fine-tune, RAG, or hybrid
- Use EWC or experience replay if fine-tuning to prevent catastrophic forgetting
- Consider adapters for domain-specific learning without modifying base model
- Implement RAG for frequently-updated knowledge (news, products, current events)
- Set up scheduled retraining pipelines with automatic evaluation
- Monitor for data distribution drift and trigger retraining when detected
- Track model performance on old tasks while adding new capabilities
- Implement rollback triggers if new model degrades existing performance
- Build experience replay buffer to maintain old knowledge
- Version training data alongside model versions
Conclusion
Models grow stale as the world changes. Fine-tuning on new data is simple but causes forgetting. EWC and experience replay mitigate forgetting by protecting important parameters and replaying old examples. Adapters enable domain-specific learning without modifying base weights. RAG keeps knowledge current without model updates. Scheduled retraining with drift detection automates freshness. Together, these techniques enable continual learning without breaking production performance.