Chain-of-Thought Prompting — Complete Guide

Introduction

Chain-of-Thought (CoT) prompting asks LLMs to show their reasoning step-by-step. This simple technique dramatically improves performance on complex tasks. This guide covers implementation and variations.

Basic Chain-of-Thought
Without CoT
With CoT
Implementation in Code
Variants of Chain-of-Thought
Explicit CoT Format
Numbered Steps
Detailed Explanation Format
Few-Shot Chain-of-Thought
Self-Consistency
Tree-of-Thought
Language Models with Built-in CoT
Metrics: CoT Improvement
Best Practices
Conclusion
FAQ

Basic Chain-of-Thought

Without CoT

Q: A restaurant has 6 tables. Each table has 4 chairs. 3 chairs are broken. How many working chairs?
A: 21

(Wrong! Correct: 6 * 4 - 3 = 21... wait, this one is right. Try a harder problem)

With CoT

Q: A restaurant has 6 tables. Each table has 4 chairs. 3 chairs are broken. How many working chairs?

Let me work through this step by step:
1. Total chairs = 6 tables * 4 chairs per table = 24 chairs
2. Broken chairs = 3
3. Working chairs = 24 - 3 = 21 chairs

A: 21 chairs

Implementation in Code

from openai import OpenAI

client = OpenAI()

def answer_with_cot(question: str) -> dict:
    """Get answer with chain-of-thought reasoning."""
    prompt = f"""Think through this step-by-step:

Question: {question}

Step-by-step reasoning:"""

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You are a helpful assistant. Show all reasoning."},
            {"role": "user", "content": prompt}
        ],
        temperature=0.7
    )

    reasoning = response.choices[0].message.content

    # Extract final answer
    summary_prompt = f"""Based on this reasoning:
{reasoning}

Provide only the final answer in one sentence."""

    summary_response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": summary_prompt}],
        temperature=0
    )

    return {
        "reasoning": reasoning,
        "answer": summary_response.choices[0].message.content
    }

# Usage
result = answer_with_cot("If x + 5 = 12, what is 2x?")
print(result)

Variants of Chain-of-Thought

Explicit CoT Format

Think step by step:
1. First, [action]
2. Then, [action]
3. Finally, [action]
Answer: [conclusion]

Numbered Steps

Please solve this problem:
Problem: [description]

Solution:
Step 1: [reasoning]
Step 2: [reasoning]
Step 3: [reasoning]

Final answer: [answer]

Detailed Explanation Format

Explain your reasoning:
- Why: [rationale]
- How: [method]
- Therefore: [conclusion]

Answer: [answer]

Few-Shot Chain-of-Thought

Provide examples with reasoning:

prompt = """Solve math problems step-by-step.

Example 1:
Q: 3 * 4 + 2 = ?
Let me think: 3 * 4 = 12, then 12 + 2 = 14
A: 14

Example 2:
Q: (10 - 5) * 2 = ?
Let me think: 10 - 5 = 5, then 5 * 2 = 10
A: 10

Now solve:
Q: 7 * 3 - 5 = ?"""

Self-Consistency

Multiple outputs for better answers:

from collections import Counter

def self_consistent_reasoning(question: str, num_attempts: int = 3) -> str:
    """Get answer using self-consistency."""
    answers = []

    for _ in range(num_attempts):
        response = client.chat.completions.create(
            model="gpt-4",
            messages=[
                {"role": "system", "content": "Explain your reasoning step-by-step"},
                {"role": "user", "content": question}
            ],
            temperature=0.7  # Variety
        )

        # Extract final answer (last line)
        content = response.choices[0].message.content
        lines = content.strip().split('\n')
        final_line = lines[-1]
        answers.append(final_line)

    # Return most common answer
    answer_counts = Counter(answers)
    most_common = answer_counts.most_common(1)[0][0]

    return most_common

# Usage
answer = self_consistent_reasoning("What is 25 * 4?")
print(f"Answer: {answer}")

Tree-of-Thought

Explore multiple reasoning branches:

def tree_of_thought_reasoning(question: str) -> dict:
    """Explore multiple reasoning paths."""
    # Generate multiple interpretations
    prompt = f"""Generate 3 different approaches to solve this:
{question}

For each approach:
1. Explain the logic
2. Show calculations
3. State the answer"""

    response = client.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.8
    )

    return {"approaches": response.choices[0].message.content}

Language Models with Built-in CoT

Some models naturally do CoT:

# Claude tends to show reasoning naturally
from anthropic import Anthropic

client = Anthropic()

response = client.messages.create(
    model="claude-3-opus-20240229",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Solve: 17 * 8 + 23 = ?"}
    ]
)

# Claude will likely show step-by-step reasoning
print(response.content[0].text)

Metrics: CoT Improvement

def compare_cot_performance(question: str):
    """Compare CoT vs non-CoT."""
    # Without CoT
    response1 = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{"role": "user", "content": question}]
    )
    without_cot = response1.choices[0].message.content

    # With CoT
    response2 = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{
            "role": "user",
            "content": f"{question}\nLet me think step by step:"
        }]
    )
    with_cot = response2.choices[0].message.content

    return {
        "without_cot": without_cot,
        "with_cot": with_cot
    }

Best Practices

Be explicit: "Think step by step"
Use structured format: Number steps, use bullets
Specify output format: How should the final answer look?
Combine with other techniques: Few-shot + CoT is powerful
Verify reasoning: Check logic in each step

Conclusion

Chain-of-Thought prompting is one of the most effective techniques for improving LLM reasoning. Simple to implement, powerful results.

FAQ

Q: Does CoT work for all tasks? A: Most effective for reasoning-heavy tasks (math, logic, analysis). Less useful for simple retrieval.

Q: How much does CoT improve accuracy? A: Typically 5-30% improvement depending on task complexity.

Q: Can I use CoT with local models? A: Yes, works with any LLM including Ollama and Transformers.