LLM Conversation Design — System Prompts, Personas, and Context Management

Introduction

The quality of LLM outputs depends heavily on prompt design. A well-structured system prompt acts as the foundation for consistent, predictable behavior. This post explores production-grade prompt architecture, covering system prompt structure, persona design, few-shot examples, and context management strategies.

System Prompt Architecture
Persona Definition for Different Use Cases
Few-Shot Examples in System Prompt
Dynamic Context Injection
Conversation History Truncation Strategy
Temperature for Different Use Cases
Stop Sequences and Presence/Frequency Penalties
System Prompt Testing Methodology
Checklist
Conclusion

System Prompt Architecture

Effective system prompts combine role definition, constraints, output format specification, and examples.

interface SystemPromptConfig {
  role: string;
  constraints: string[];
  outputFormat: string;
  examples: Example[];
}

interface Example {
  input: string;
  output: string;
  explanation?: string;
}

function buildSystemPrompt(config: SystemPromptConfig): string {
  const parts = [];

  // Role definition
  parts.push(`You are a ${config.role}.`);

  // Constraints
  if (config.constraints.length &gt; 0) {
    parts.push("\nConstraints:");
    config.constraints.forEach((constraint, i) => {
      parts.push(`${i + 1}. ${constraint}`);
    });
  }

  // Output format
  if (config.outputFormat) {
    parts.push(`\nOutput Format:\n${config.outputFormat}`);
  }

  // Examples
  if (config.examples.length &gt; 0) {
    parts.push("\nExamples:");
    config.examples.forEach((example, i) => {
      parts.push(`\nExample ${i + 1}:`);
      parts.push(`Input: ${example.input}`);
      parts.push(`Output: ${example.output}`);
    });
  }

  return parts.join("\n");
}

const customerServicePrompt = buildSystemPrompt({
  role: "helpful customer service representative for an e-commerce platform",
  constraints: [
    "Respond within 2-3 sentences",
    "Never make promises about refunds without manager approval",
    "Acknowledge customer frustration empathetically",
    "Escalate complex issues to specialized teams",
  ],
  outputFormat: `
Respond in this format:
1. Acknowledgment of the issue
2. Proposed solution or next steps
3. Expected timeline if applicable
  `,
  examples: [
    {
      input: "My order hasn''t arrived after 2 weeks!",
      output:
        "I understand how frustrating that must be. Let me look up your order tracking—can you provide your order number? We typically deliver within 5-7 business days, and I''ll investigate what happened.",
    },
    {
      input: "I want a refund.",
      output:
        "I''d be happy to help resolve this. To process a refund request, I''ll need to understand what went wrong with your purchase. Can you tell me what the issue was?",
    },
  ],
});

async function callLLMWithSystemPrompt(
  userMessage: string,
  systemPrompt: string
): Promise&lt;string&gt; {
  const client = new Anthropic();

  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 512,
    system: systemPrompt,
    messages: [{ role: "user", content: userMessage }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Persona Definition for Different Use Cases

Define distinct personas for different user-facing roles. Store persona configurations for consistent behavior.

interface PersonaConfig {
  name: string;
  role: string;
  tone: "formal" | "casual" | "technical" | "empathetic";
  expertise: string[];
  restrictions: string[];
}

const personas: Record&lt;string, PersonaConfig&gt; = {
  technical_support: {
    name: "Technical Support Specialist",
    role: "expert technical support engineer",
    tone: "technical",
    expertise: [
      "API integration",
      "SDK usage",
      "Debugging",
      "Performance optimization",
    ],
    restrictions: [
      "Only discuss technical topics",
      "Refer billing questions to finance team",
      "Don''t make commitments about features",
    ],
  },

  product_expert: {
    name: "Product Expert",
    role: "knowledgeable product specialist",
    tone: "casual",
    expertise: ["product features", "use cases", "industry best practices"],
    restrictions: [
      "Stay on-brand in messaging",
      "Link to official documentation",
    ],
  },

  data_analyst: {
    name: "Data Analyst",
    role: "data analytics expert",
    tone: "formal",
    expertise: ["statistics", "data interpretation", "SQL", "Python"],
    restrictions: [
      "Only recommend statistical methods",
      "Question data quality before analysis",
    ],
  },
};

function buildPersonaSystemPrompt(persona: PersonaConfig): string {
  const toneGuidance = {
    formal: "Use professional, precise language.",
    casual: "Use conversational, friendly language.",
    technical: "Use precise technical terminology without over-simplifying.",
    empathetic: "Show understanding of user frustration and pain points.",
  };

  const prompt = `You are ${persona.role}.

Tone: ${toneGuidance[persona.tone]}

Expertise Areas: ${persona.expertise.join(", ")}

Important Restrictions:
${persona.restrictions.map((r, i) => `${i + 1}. ${r}`).join("\n")}

Answer questions within your area of expertise. For out-of-scope questions, politely redirect.`;

  return prompt;
}

Few-Shot Examples in System Prompt

Few-shot examples dramatically improve output consistency and quality. Structure them clearly.

interface FewShotPrompt {
  instruction: string;
  examples: Array&lt;{
    input: string;
    output: string;
    reasoning?: string;
  }&gt;;
}

function buildFewShotPrompt(config: FewShotPrompt): string {
  let prompt = config.instruction + "\n\n";

  config.examples.forEach((example, idx) => {
    prompt += `Example ${idx + 1}:\n`;
    prompt += `Input: ${example.input}\n`;
    prompt += `Output: ${example.output}\n`;

    if (example.reasoning) {
      prompt += `Reasoning: ${example.reasoning}\n`;
    }

    prompt += "\n";
  });

  return prompt;
}

// Classification example with few-shots
const classificationPrompt = buildFewShotPrompt({
  instruction:
    "Classify the sentiment of the following customer review as positive, neutral, or negative.",
  examples: [
    {
      input: "The product arrived quickly and works perfectly!",
      output: "positive",
      reasoning:
        "Key words: ''quickly'', ''perfectly'' indicate satisfaction.",
    },
    {
      input: "It''s okay, nothing special.",
      output: "neutral",
      reasoning:
        "Lukewarm assessment without strong positive or negative indicators.",
    },
    {
      input: "Broken on arrival, terrible support.",
      output: "negative",
      reasoning: "Words ''broken'' and ''terrible'' indicate dissatisfaction.",
    },
  ],
});

// Extraction example with few-shots
const extractionPrompt = buildFewShotPrompt({
  instruction:
    "Extract the product name, price, and rating from the review. Return as JSON.",
  examples: [
    {
      input:
        "The iPhone 15 Pro costs $999 and has 4.5 stars. It''s fantastic!",
      output: JSON.stringify({
        product: "iPhone 15 Pro",
        price: "$999",
        rating: 4.5,
      }),
    },
    {
      input: "Samsung Galaxy S24 ($799) - 4.2/5 stars. Great device.",
      output: JSON.stringify({
        product: "Samsung Galaxy S24",
        price: "$799",
        rating: 4.2,
      }),
    },
  ],
});

Dynamic Context Injection

Inject user context, conversation history, or external data into prompts dynamically.

interface ConversationContext {
  userId: string;
  userName: string;
  accountTier: "free" | "pro" | "enterprise";
  previousInteractions: Array&lt;{ topic: string; date: string }&gt;;
  userPreferences: Record&lt;string, string&gt;;
}

interface DynamicPromptConfig {
  basePrompt: string;
  context: ConversationContext;
  conversationHistory: Array&lt;{
    role: "user" | "assistant";
    content: string;
  }&gt;;
}

function injectContext(config: DynamicPromptConfig): string {
  let prompt = config.basePrompt;

  // Inject user-specific context
  const contextSection = `
User Context:
- Name: ${config.context.userName}
- Account Tier: ${config.context.accountTier}
- Interaction History: ${
    config.context.previousInteractions.length > 0
      ? config.context.previousInteractions
          .map((i) => `${i.topic} (${i.date})`)
          .join(", ")
      : "First interaction"
  }
`;

  prompt += "\n" + contextSection;

  // Add conversation history summary
  if (config.conversationHistory.length &gt; 0) {
    prompt += "\n\nConversation Context:\n";
    config.conversationHistory.forEach((msg) => {
      prompt += `${msg.role === "user" ? "User" : "Assistant"}: ${msg.content}\n`;
    });
  }

  return prompt;
}

async function generateContextAwareResponse(
  userMessage: string,
  context: ConversationContext,
  history: Array&lt;{ role: "user" | "assistant"; content: string }&gt;
): Promise&lt;string&gt; {
  const basePrompt = buildSystemPrompt({
    role: "personalized assistant",
    constraints: [
      "Reference user''s previous interactions when relevant",
      "Adjust recommendations based on account tier",
    ],
    outputFormat: "Natural conversational response",
    examples: [],
  });

  const fullPrompt = injectContext({
    basePrompt,
    context,
    conversationHistory: history,
  });

  const client = new Anthropic();

  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    system: fullPrompt,
    messages: [{ role: "user", content: userMessage }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Conversation History Truncation Strategy

Manage memory constraints by intelligently truncating older messages while preserving important context.

interface TruncationStrategy {
  maxTokens: number;
  keepSystemMessages: boolean;
  prioritizeRecent: boolean;
}

interface Message {
  role: "user" | "assistant";
  content: string;
  tokens: number;
}

function truncateHistory(
  messages: Message[],
  strategy: TruncationStrategy
): Message[] {
  let totalTokens = messages.reduce((sum, msg) => sum + msg.tokens, 0);

  if (totalTokens &lt;= strategy.maxTokens) {
    return messages;
  }

  // Keep the most recent messages
  let truncated: Message[] = [];
  let currentTokens = 0;

  // Iterate from newest to oldest
  for (let i = messages.length - 1; i &gt;= 0; i--) {
    const msg = messages[i];

    if (currentTokens + msg.tokens &lt;= strategy.maxTokens) {
      truncated.unshift(msg);
      currentTokens += msg.tokens;
    } else {
      break;
    }
  }

  return truncated;
}

// Alternative: summarize older messages
async function summarizeOlderMessages(
  messages: Message[],
  maxTokens: number
): Promise&lt;string&gt; {
  const tokensPerMessage = messages.reduce((sum, msg) => sum + msg.tokens, 0);

  if (tokensPerMessage &lt;= maxTokens) {
    return "";
  }

  // Find breakpoint where we exceed token limit
  let summaryMessages: Message[] = [];
  let remainingTokens = maxTokens * 0.3; // Reserve 30% for recent messages

  for (const msg of messages) {
    if (remainingTokens &lt;= 0) break;
    summaryMessages.push(msg);
    remainingTokens -= msg.tokens;
  }

  const client = new Anthropic();

  const summaryResponse = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 256,
    messages: [
      {
        role: "user",
        content: `Summarize this conversation in 2-3 sentences:\n${summaryMessages
          .map((m) => `${m.role}: ${m.content}`)
          .join("\n")}`,
      },
    ],
  });

  return summaryResponse.content[0].type === "text"
    ? summaryResponse.content[0].text
    : "";
}

Temperature for Different Use Cases

Use temperature strategically based on task requirements: low for consistency, high for creativity.

type TaskType =
  | "factual-qa"
  | "creative-writing"
  | "code-generation"
  | "conversation"
  | "classification";

interface TemperatureConfig {
  taskType: TaskType;
  temperature: number;
  topP?: number;
  rationale: string;
}

const temperatureSettings: Record&lt;TaskType, TemperatureConfig&gt; = {
  "factual-qa": {
    taskType: "factual-qa",
    temperature: 0.1,
    topP: 0.9,
    rationale: "Consistent, factual answers with minimal variation",
  },

  "creative-writing": {
    taskType: "creative-writing",
    temperature: 0.8,
    topP: 0.95,
    rationale: "Diverse outputs with creativity while maintaining coherence",
  },

  "code-generation": {
    taskType: "code-generation",
    temperature: 0.2,
    topP: 0.9,
    rationale: "Consistent, predictable code output",
  },

  conversation: {
    taskType: "conversation",
    temperature: 0.7,
    topP: 0.95,
    rationale: "Natural, engaging conversation with some variation",
  },

  classification: {
    taskType: "classification",
    temperature: 0.0,
    topP: 1.0,
    rationale: "Deterministic classification with consistent categories",
  },
};

async function callWithOptimalTemperature(
  userMessage: string,
  taskType: TaskType,
  systemPrompt: string
): Promise&lt;string&gt; {
  const config = temperatureSettings[taskType];
  const client = new Anthropic();

  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    system: systemPrompt,
    messages: [{ role: "user", content: userMessage }],
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

Stop Sequences and Presence/Frequency Penalties

Control output length and repetition using stop sequences and parameter tuning.

interface OutputControlConfig {
  stopSequences?: string[];
  presencePenalty?: number;
  frequencyPenalty?: number;
}

// Stop sequences prevent generating beyond a natural boundary
const stopSequences = {
  singleTurn: ["\n\nUser:", "\n\nAssistant:"], // Stop after one response
  codeBlock: ["```"], // Stop after code block
  bulletList: ["==NEXT_TOPIC=="], // Stop after bullet list
  json: [], // Let model finish JSON naturally
};

// Presence penalty discourages repeating tokens that already appeared
// Frequency penalty makes the model less likely to repeat tokens proportionally
interface ParameterConfig {
  presencePenalty: number; // -2.0 to 2.0
  frequencyPenalty: number; // -2.0 to 2.0
  taskType: string;
}

const parameterConfigs: Record&lt;string, ParameterConfig&gt; = {
  minimal_repetition: {
    presencePenalty: 0.6,
    frequencyPenalty: 0.6,
    taskType: "creative writing",
  },

  standard: {
    presencePenalty: 0.0,
    frequencyPenalty: 0.0,
    taskType: "general",
  },

  technical: {
    presencePenalty: 0.1,
    frequencyPenalty: 0.1,
    taskType: "code or technical documentation",
  },
};

async function callWithOutputControl(
  userMessage: string,
  config: OutputControlConfig
): Promise&lt;string&gt; {
  const client = new Anthropic();

  const response = await client.messages.create({
    model: "claude-3-5-sonnet-20241022",
    max_tokens: 1024,
    messages: [{ role: "user", content: userMessage }],
    stop_sequences: config.stopSequences,
  });

  return response.content[0].type === "text" ? response.content[0].text : "";
}

System Prompt Testing Methodology

Test prompts systematically before production deployment. Use evaluation metrics and golden datasets.

interface TestCase {
  input: string;
  expectedOutput: string;
  testCategory: string;
}

interface PromptEvaluation {
  promptVersion: string;
  testCasesRun: number;
  passRate: number;
  failedTests: TestCase[];
  executionTime: number;
}

class PromptTester {
  async evaluatePrompt(
    systemPrompt: string,
    testCases: TestCase[]
  ): Promise&lt;PromptEvaluation&gt; {
    const startTime = Date.now();
    let passCount = 0;
    const failedTests: TestCase[] = [];

    const client = new Anthropic();

    for (const testCase of testCases) {
      try {
        const response = await client.messages.create({
          model: "claude-3-5-sonnet-20241022",
          max_tokens: 512,
          system: systemPrompt,
          messages: [{ role: "user", content: testCase.input }],
        });

        const actualOutput =
          response.content[0].type === "text" ? response.content[0].text : "";

        if (this.evaluateMatch(actualOutput, testCase.expectedOutput)) {
          passCount++;
        } else {
          failedTests.push(testCase);
        }
      } catch (error) {
        failedTests.push(testCase);
      }
    }

    const executionTime = Date.now() - startTime;

    return {
      promptVersion: "v1",
      testCasesRun: testCases.length,
      passRate: (passCount / testCases.length) * 100,
      failedTests,
      executionTime,
    };
  }

  private evaluateMatch(actual: string, expected: string): boolean {
    // Fuzzy matching for text similarity
    return (
      actual.toLowerCase().includes(expected.toLowerCase()) ||
      this.calculateSimilarity(actual, expected) &gt; 0.8
    );
  }

  private calculateSimilarity(str1: string, str2: string): number {
    // Simple Levenshtein-like similarity
    const shorter = str1.length &lt; str2.length ? str1 : str2;
    const longer = str1.length &gt;= str2.length ? str1 : str2;

    if (shorter.length === 0) return 0;

    const editDistance = this.levenshteinDistance(shorter, longer);
    return 1 - editDistance / longer.length;
  }

  private levenshteinDistance(s1: string, s2: string): number {
    const track = Array(s2.length + 1)
      .fill(null)
      .map(() => Array(s1.length + 1).fill(0));

    for (let i = 0; i &lt;= s1.length; i += 1) track[0][i] = i;
    for (let j = 0; j &lt;= s2.length; j += 1) track[j][0] = j;

    for (let j = 1; j &lt;= s2.length; j += 1) {
      for (let i = 1; i &lt;= s1.length; i += 1) {
        const indicator = s1[i - 1] === s2[j - 1] ? 0 : 1;
        track[j][i] = Math.min(
          track[j][i - 1] + 1,
          track[j - 1][i] + 1,
          track[j - 1][i - 1] + indicator
        );
      }
    }

    return track[s2.length][s1.length];
  }
}

Checklist

Structure system prompts with role, constraints, format, and examples
Create distinct personas for different use cases with consistent tone
Include 3-5 few-shot examples for task-specific prompts
Implement dynamic context injection for personalization
Truncate conversation history intelligently to respect token limits
Use low temperature (0.1-0.3) for factual tasks, higher (0.7-0.9) for creative
Test prompts systematically with golden test datasets
Version system prompts and track performance across versions

Conclusion

Production-grade conversation design requires treating system prompts as critical infrastructure. Structure prompts clearly with role, constraints, format, and examples. Test thoroughly before deploying. Use dynamic context injection to personalize without bloating the base prompt. Temperature and stop sequences give fine-grained control over output characteristics. By combining these practices, you''ll build AI systems that behave predictably and reliably across millions of user interactions.