PII Handling in LLM Applications — Detection, Redaction, and Compliance

Introduction

Your user asks the chatbot a question containing their email, phone number, and social security number. You send it directly to the LLM API, and now that PII lives in Anthropic''s logs forever.

Or worse: your company is subject to GDPR or HIPAA. Sending unredacted PII to third-party LLM APIs becomes a legal violation.

This post covers detecting PII, redacting it, pseudonymizing sensitive data, and compliance strategies.

PII Detection with Presidio
Redaction Before Sending to LLM
Pseudonymization: Tokenize and Restore
LLM Provider Data Retention Policies
Audit Logging of PII Access
User Consent and Data Residency
Conclusion

PII Detection with Presidio

Use Microsoft''s Presidio to detect PII patterns (SSNs, credit cards, emails, etc):

import Anthropic from "@anthropic-ai/sdk";

interface PIIEntity {
  type: string; // "EMAIL_ADDRESS", "CREDIT_CARD", "US_SSN", etc.
  text: string;
  score: number; // 0-1 confidence
  startIndex: number;
  endIndex: number;
}

class PIIDetector {
  async detectPII(text: string): Promise<PIIEntity[]> {
    const patterns: Record<string, RegExp> = {
      EMAIL_ADDRESS: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
      US_SSN: /\b\d{3}-\d{2}-\d{4}\b/g,
      CREDIT_CARD: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
      US_PHONE: /\b(?:\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})\b/g,
      PASSPORT: /\b[A-Z]{1,2}\d{6,9}\b/g,
      IP_ADDRESS: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
    };

    const entities: PIIEntity[] = [];

    for (const [type, pattern] of Object.entries(patterns)) {
      let match;
      while ((match = pattern.exec(text)) !== null) {
        entities.push({
          type,
          text: match[0],
          score: 0.95, // High confidence for regex matches
          startIndex: match.index,
          endIndex: match.index + match[0].length,
        });
      }
    }

    // Sort by position to avoid overlaps
    return entities.sort((a, b) => a.startIndex - b.startIndex);
  }
}

export { PIIDetector, PIIEntity };

Redaction Before Sending to LLM

Replace PII with placeholders before sending to the LLM:

interface RedactionResult {
  redactedText: string;
  replacements: Map<string, string>; // placeholder -> original value
  piiFound: boolean;
  entityCount: number;
}

class PIIRedactor {
  async redactText(text: string): Promise<RedactionResult> {
    const detector = new PIIDetector();
    const entities = await detector.detectPII(text);

    if (entities.length === 0) {
      return {
        redactedText: text,
        replacements: new Map(),
        piiFound: false,
        entityCount: 0,
      };
    }

    let redactedText = text;
    const replacements = new Map<string, string>();

    // Process entities in reverse order to maintain indices
    for (let i = entities.length - 1; i >= 0; i--) {
      const entity = entities[i];
      const placeholder = `[${entity.type}_${i}]`;

      replacements.set(placeholder, entity.text);

      redactedText =
        redactedText.slice(0, entity.startIndex) +
        placeholder +
        redactedText.slice(entity.endIndex);
    }

    return {
      redactedText,
      replacements,
      piiFound: true,
      entityCount: entities.length,
    };
  }

  async restoreFromRedaction(
    text: string,
    replacements: Map<string, string>
  ): Promise<string> {
    let restored = text;

    for (const [placeholder, original] of replacements) {
      restored = restored.replace(placeholder, original);
    }

    return restored;
  }
}

export { PIIRedactor, RedactionResult };

Pseudonymization: Tokenize and Restore

For applications needing data linkage (e.g., user sessions), pseudonymize PII instead of full redaction:

interface PseudonymizationMap {
  originalValue: string;
  pseudonym: string;
  expiresAt: Date;
}

class PIIPseudonymizer {
  private pseudonymMap = new Map<string, PseudonymizationMap>();
  private reverseMap = new Map<string, string>();

  async pseudonymizeText(text: string): Promise<{
    pseudonymizedText: string;
    mappings: PseudonymizationMap[];
  }> {
    const detector = new PIIDetector();
    const entities = await detector.detectPII(text);

    let pseudonymizedText = text;
    const mappings: PseudonymizationMap[] = [];

    // Process in reverse order to maintain indices
    for (let i = entities.length - 1; i >= 0; i--) {
      const entity = entities[i];
      let pseudonym = this.reverseMap.get(entity.text);

      if (!pseudonym) {
        // Create new pseudonym for this value
        pseudonym = `PSE_${entity.type}_${Math.random().toString(36).slice(2, 9).toUpperCase()}`;

        const mapping: PseudonymizationMap = {
          originalValue: entity.text,
          pseudonym,
          expiresAt: new Date(Date.now() + 30 * 24 * 60 * 60 * 1000), // 30 days
        };

        this.pseudonymMap.set(pseudonym, mapping);
        this.reverseMap.set(entity.text, pseudonym);
        mappings.push(mapping);
      }

      pseudonymizedText =
        pseudonymizedText.slice(0, entity.startIndex) +
        pseudonym +
        pseudonymizedText.slice(entity.endIndex);
    }

    return { pseudonymizedText, mappings };
  }

  async depseudonymizeText(text: string): Promise<string> {
    let depseudonymized = text;

    for (const [pseudonym, mapping] of this.pseudonymMap) {
      depseudonymized = depseudonymized.replace(
        new RegExp(pseudonym, "g"),
        mapping.originalValue
      );
    }

    return depseudonymized;
  }

  cleanupExpiredMappings(): void {
    const now = new Date();

    for (const [pseudonym, mapping] of this.pseudonymMap) {
      if (mapping.expiresAt < now) {
        this.reverseMap.delete(mapping.originalValue);
        this.pseudonymMap.delete(pseudonym);
      }
    }
  }
}

export { PIIPseudonymizer, PseudonymizationMap };

LLM Provider Data Retention Policies

Understand what happens to your data:

interface LLMProviderPolicy {
  provider: string;
  logsRetention: string; // e.g., "30 days"
  trainingDataUse: boolean;
  defaultBehavior: "retained" | "not_retained" | "configurable";
  businessAgreementRequired: boolean;
  gdprCompliant: boolean;
}

class LLMProviderPolicyRegistry {
  private policies: Record<string, LLMProviderPolicy> = {
    anthropic: {
      provider: "Anthropic",
      logsRetention: "30 days",
      trainingDataUse: false,
      defaultBehavior: "not_retained",
      businessAgreementRequired: false,
      gdprCompliant: true,
    },
    openai: {
      provider: "OpenAI",
      logsRetention: "30 days",
      trainingDataUse: false, // When opted out
      defaultBehavior: "retained",
      businessAgreementRequired: true,
      gdprCompliant: false, // Data processing agreement required
    },
    cohere: {
      provider: "Cohere",
      logsRetention: "1 day",
      trainingDataUse: false,
      defaultBehavior: "not_retained",
      businessAgreementRequired: false,
      gdprCompliant: true,
    },
  };

  getPolicy(provider: string): LLMProviderPolicy {
    const policy = this.policies[provider.toLowerCase()];
    if (!policy) {
      throw new Error(`Unknown provider: ${provider}`);
    }
    return policy;
  }

  isSafeForGDPR(provider: string): boolean {
    const policy = this.getPolicy(provider);
    return policy.gdprCompliant && policy.defaultBehavior === "not_retained";
  }

  isSafeForHIPAA(provider: string): boolean {
    // HIPAA requires data processing agreements and encryption
    const policy = this.getPolicy(provider);
    return policy.businessAgreementRequired && policy.logsRetention !== "indefinite";
  }
}

export { LLMProviderPolicyRegistry, LLMProviderPolicy };

Audit Logging of PII Access

Maintain compliance records of when and where PII was accessed:

interface PIIAuditLog {
  timestamp: Date;
  userId: string;
  action: "detected" | "redacted" | "pseudonymized" | "depseudonymized";
  piiType: string;
  dataClassification: "public" | "internal" | "confidential" | "restricted";
  purpose: string;
  retention: number; // days
}

class PIIAuditLogger {
  async logPIIAccess(log: PIIAuditLog): Promise<void> {
    // Store in immutable audit log (e.g., DynamoDB with point-in-time recovery)
    console.log(
      `[AUDIT] ${log.timestamp.toISOString()} - ${log.action} ${log.piiType} for user ${log.userId}`
    );

    // Encrypt and persist
    await this.persistAuditLog(log);

    // For regulated industries, also send to external audit system
    if (log.dataClassification === "restricted") {
      await this.sendToExternalAuditSystem(log);
    }
  }

  async getAuditTrail(
    userId: string,
    startDate: Date,
    endDate: Date
  ): Promise<PIIAuditLog[]> {
    // Query audit logs for compliance reports
    return [];
  }

  async generateComplianceReport(
    dataClassification: string
  ): Promise<{ logCount: number; coverage: number }> {
    // Generate report for auditors: what PII was accessed, when, and by whom
    return { logCount: 0, coverage: 0 };
  }

  private async persistAuditLog(log: PIIAuditLog): Promise<void> {
    // Store with encryption and redundancy
  }

  private async sendToExternalAuditSystem(log: PIIAuditLog): Promise<void> {
    // For HIPAA/SOC 2 compliance
  }
}

export { PIIAuditLogger, PIIAuditLog };

Respect user consent and data residency requirements:

interface UserPrivacyConsent {
  userId: string;
  allowedLLMProcessing: boolean;
  allowedDataResidency: string[]; // e.g., ["US", "EU"]
  consentDate: Date;
  expiresAt: Date;
}

class UserConsentManager {
  async canSendToLLM(
    userId: string,
    llmProvider: string,
    dataResidency: string
  ): Promise<boolean> {
    const consent = await this.getConsent(userId);

    if (!consent || !consent.allowedLLMProcessing) {
      return false;
    }

    if (!consent.allowedDataResidency.includes(dataResidency)) {
      return false;
    }

    if (consent.expiresAt < new Date()) {
      return false;
    }

    return true;
  }

  async requestConsent(userId: string): Promise<void> {
    // Show consent modal to user
    // Store consent in database with timestamp
    console.log(`Requesting AI processing consent for user ${userId}`);
  }

  async deleteUserData(userId: string): Promise<void> {
    // GDPR right to be forgotten
    // Delete from pseudonymization map, audit logs, etc.
    console.log(`Deleting all data for user ${userId}`);
  }

  private async getConsent(userId: string): Promise<UserPrivacyConsent | null> {
    // Query database
    return null;
  }
}

export { UserConsentManager, UserPrivacyConsent };

Conclusion

PII handling in LLM applications requires multiple layers: detection (Presidio), redaction before sending to APIs, pseudonymization for user linkage, understanding provider policies, audit logging, and consent management.

Build privacy into your system from day one. The cost of scrubbing PII after a breach is far higher than preventing it upfront. Your compliance team and users will thank you.