- Published on
PII Handling in LLM Applications — Detection, Redaction, and Compliance
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Your user asks the chatbot a question containing their email, phone number, and social security number. You send it directly to the LLM API, and now that PII lives in Anthropic''s logs forever.
Or worse: your company is subject to GDPR or HIPAA. Sending unredacted PII to third-party LLM APIs becomes a legal violation.
This post covers detecting PII, redacting it, pseudonymizing sensitive data, and compliance strategies.
- PII Detection with Presidio
- Redaction Before Sending to LLM
- Pseudonymization: Tokenize and Restore
- LLM Provider Data Retention Policies
- Audit Logging of PII Access
- User Consent and Data Residency
- Conclusion
PII Detection with Presidio
Use Microsoft''s Presidio to detect PII patterns (SSNs, credit cards, emails, etc):
import Anthropic from "@anthropic-ai/sdk";
interface PIIEntity {
type: string; // "EMAIL_ADDRESS", "CREDIT_CARD", "US_SSN", etc.
text: string;
score: number; // 0-1 confidence
startIndex: number;
endIndex: number;
}
class PIIDetector {
async detectPII(text: string): Promise<PIIEntity[]> {
const patterns: Record<string, RegExp> = {
EMAIL_ADDRESS: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g,
US_SSN: /\b\d{3}-\d{2}-\d{4}\b/g,
CREDIT_CARD: /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g,
US_PHONE: /\b(?:\+?1[-.\s]?)?\(?([0-9]{3})\)?[-.\s]?([0-9]{3})[-.\s]?([0-9]{4})\b/g,
PASSPORT: /\b[A-Z]{1,2}\d{6,9}\b/g,
IP_ADDRESS: /\b(?:\d{1,3}\.){3}\d{1,3}\b/g,
};
const entities: PIIEntity[] = [];
for (const [type, pattern] of Object.entries(patterns)) {
let match;
while ((match = pattern.exec(text)) !== null) {
entities.push({
type,
text: match[0],
score: 0.95, // High confidence for regex matches
startIndex: match.index,
endIndex: match.index + match[0].length,
});
}
}
// Sort by position to avoid overlaps
return entities.sort((a, b) => a.startIndex - b.startIndex);
}
}
export { PIIDetector, PIIEntity };
Redaction Before Sending to LLM
Replace PII with placeholders before sending to the LLM:
interface RedactionResult {
redactedText: string;
replacements: Map<string, string>; // placeholder -> original value
piiFound: boolean;
entityCount: number;
}
class PIIRedactor {
async redactText(text: string): Promise<RedactionResult> {
const detector = new PIIDetector();
const entities = await detector.detectPII(text);
if (entities.length === 0) {
return {
redactedText: text,
replacements: new Map(),
piiFound: false,
entityCount: 0,
};
}
let redactedText = text;
const replacements = new Map<string, string>();
// Process entities in reverse order to maintain indices
for (let i = entities.length - 1; i >= 0; i--) {
const entity = entities[i];
const placeholder = `[${entity.type}_${i}]`;
replacements.set(placeholder, entity.text);
redactedText =
redactedText.slice(0, entity.startIndex) +
placeholder +
redactedText.slice(entity.endIndex);
}
return {
redactedText,
replacements,
piiFound: true,
entityCount: entities.length,
};
}
async restoreFromRedaction(
text: string,
replacements: Map<string, string>
): Promise<string> {
let restored = text;
for (const [placeholder, original] of replacements) {
restored = restored.replace(placeholder, original);
}
return restored;
}
}
export { PIIRedactor, RedactionResult };
Pseudonymization: Tokenize and Restore
For applications needing data linkage (e.g., user sessions), pseudonymize PII instead of full redaction:
interface PseudonymizationMap {
originalValue: string;
pseudonym: string;
expiresAt: Date;
}
class PIIPseudonymizer {
private pseudonymMap = new Map<string, PseudonymizationMap>();
private reverseMap = new Map<string, string>();
async pseudonymizeText(text: string): Promise<{
pseudonymizedText: string;
mappings: PseudonymizationMap[];
}> {
const detector = new PIIDetector();
const entities = await detector.detectPII(text);
let pseudonymizedText = text;
const mappings: PseudonymizationMap[] = [];
// Process in reverse order to maintain indices
for (let i = entities.length - 1; i >= 0; i--) {
const entity = entities[i];
let pseudonym = this.reverseMap.get(entity.text);
if (!pseudonym) {
// Create new pseudonym for this value
pseudonym = `PSE_${entity.type}_${Math.random().toString(36).slice(2, 9).toUpperCase()}`;
const mapping: PseudonymizationMap = {
originalValue: entity.text,
pseudonym,
expiresAt: new Date(Date.now() + 30 * 24 * 60 * 60 * 1000), // 30 days
};
this.pseudonymMap.set(pseudonym, mapping);
this.reverseMap.set(entity.text, pseudonym);
mappings.push(mapping);
}
pseudonymizedText =
pseudonymizedText.slice(0, entity.startIndex) +
pseudonym +
pseudonymizedText.slice(entity.endIndex);
}
return { pseudonymizedText, mappings };
}
async depseudonymizeText(text: string): Promise<string> {
let depseudonymized = text;
for (const [pseudonym, mapping] of this.pseudonymMap) {
depseudonymized = depseudonymized.replace(
new RegExp(pseudonym, "g"),
mapping.originalValue
);
}
return depseudonymized;
}
cleanupExpiredMappings(): void {
const now = new Date();
for (const [pseudonym, mapping] of this.pseudonymMap) {
if (mapping.expiresAt < now) {
this.reverseMap.delete(mapping.originalValue);
this.pseudonymMap.delete(pseudonym);
}
}
}
}
export { PIIPseudonymizer, PseudonymizationMap };
LLM Provider Data Retention Policies
Understand what happens to your data:
interface LLMProviderPolicy {
provider: string;
logsRetention: string; // e.g., "30 days"
trainingDataUse: boolean;
defaultBehavior: "retained" | "not_retained" | "configurable";
businessAgreementRequired: boolean;
gdprCompliant: boolean;
}
class LLMProviderPolicyRegistry {
private policies: Record<string, LLMProviderPolicy> = {
anthropic: {
provider: "Anthropic",
logsRetention: "30 days",
trainingDataUse: false,
defaultBehavior: "not_retained",
businessAgreementRequired: false,
gdprCompliant: true,
},
openai: {
provider: "OpenAI",
logsRetention: "30 days",
trainingDataUse: false, // When opted out
defaultBehavior: "retained",
businessAgreementRequired: true,
gdprCompliant: false, // Data processing agreement required
},
cohere: {
provider: "Cohere",
logsRetention: "1 day",
trainingDataUse: false,
defaultBehavior: "not_retained",
businessAgreementRequired: false,
gdprCompliant: true,
},
};
getPolicy(provider: string): LLMProviderPolicy {
const policy = this.policies[provider.toLowerCase()];
if (!policy) {
throw new Error(`Unknown provider: ${provider}`);
}
return policy;
}
isSafeForGDPR(provider: string): boolean {
const policy = this.getPolicy(provider);
return policy.gdprCompliant && policy.defaultBehavior === "not_retained";
}
isSafeForHIPAA(provider: string): boolean {
// HIPAA requires data processing agreements and encryption
const policy = this.getPolicy(provider);
return policy.businessAgreementRequired && policy.logsRetention !== "indefinite";
}
}
export { LLMProviderPolicyRegistry, LLMProviderPolicy };
Audit Logging of PII Access
Maintain compliance records of when and where PII was accessed:
interface PIIAuditLog {
timestamp: Date;
userId: string;
action: "detected" | "redacted" | "pseudonymized" | "depseudonymized";
piiType: string;
dataClassification: "public" | "internal" | "confidential" | "restricted";
purpose: string;
retention: number; // days
}
class PIIAuditLogger {
async logPIIAccess(log: PIIAuditLog): Promise<void> {
// Store in immutable audit log (e.g., DynamoDB with point-in-time recovery)
console.log(
`[AUDIT] ${log.timestamp.toISOString()} - ${log.action} ${log.piiType} for user ${log.userId}`
);
// Encrypt and persist
await this.persistAuditLog(log);
// For regulated industries, also send to external audit system
if (log.dataClassification === "restricted") {
await this.sendToExternalAuditSystem(log);
}
}
async getAuditTrail(
userId: string,
startDate: Date,
endDate: Date
): Promise<PIIAuditLog[]> {
// Query audit logs for compliance reports
return [];
}
async generateComplianceReport(
dataClassification: string
): Promise<{ logCount: number; coverage: number }> {
// Generate report for auditors: what PII was accessed, when, and by whom
return { logCount: 0, coverage: 0 };
}
private async persistAuditLog(log: PIIAuditLog): Promise<void> {
// Store with encryption and redundancy
}
private async sendToExternalAuditSystem(log: PIIAuditLog): Promise<void> {
// For HIPAA/SOC 2 compliance
}
}
export { PIIAuditLogger, PIIAuditLog };
User Consent and Data Residency
Respect user consent and data residency requirements:
interface UserPrivacyConsent {
userId: string;
allowedLLMProcessing: boolean;
allowedDataResidency: string[]; // e.g., ["US", "EU"]
consentDate: Date;
expiresAt: Date;
}
class UserConsentManager {
async canSendToLLM(
userId: string,
llmProvider: string,
dataResidency: string
): Promise<boolean> {
const consent = await this.getConsent(userId);
if (!consent || !consent.allowedLLMProcessing) {
return false;
}
if (!consent.allowedDataResidency.includes(dataResidency)) {
return false;
}
if (consent.expiresAt < new Date()) {
return false;
}
return true;
}
async requestConsent(userId: string): Promise<void> {
// Show consent modal to user
// Store consent in database with timestamp
console.log(`Requesting AI processing consent for user ${userId}`);
}
async deleteUserData(userId: string): Promise<void> {
// GDPR right to be forgotten
// Delete from pseudonymization map, audit logs, etc.
console.log(`Deleting all data for user ${userId}`);
}
private async getConsent(userId: string): Promise<UserPrivacyConsent | null> {
// Query database
return null;
}
}
export { UserConsentManager, UserPrivacyConsent };
Conclusion
PII handling in LLM applications requires multiple layers: detection (Presidio), redaction before sending to APIs, pseudonymization for user linkage, understanding provider policies, audit logging, and consent management.
Build privacy into your system from day one. The cost of scrubbing PII after a breach is far higher than preventing it upfront. Your compliance team and users will thank you.