How to redact PII before sending a prompt to an LLM
A walkthrough of the four-step pattern (detect, redact, prompt, rehydrate) that keeps personal data out of OpenAI, Anthropic, and Gemini calls without breaking the user experience.
Picture the bug report nobody wants to file. A support agent pastes a customer's chat history into a summarizer prompt. The chat contains the customer's email, the last four of a card, and an address. Your app sends the raw text to OpenAI, gets back a tidy summary, ships it to the agent, and the round-trip works. The summary is good. The customer is happy. The audit team is going to have questions.
Yes, you set the no-train flag on the API key. Yes, the provider says they retain inputs for 30 days for abuse review and then drop them. None of that is wrong. It's also not the answer your security reviewer wants. The cleaner answer is that the prompt never contained the personal data in the first place. That's what this post is about: the four-step pattern that gets you there, what each step actually does, and the places it gets awkward.
The four-step pattern to redact PII before an LLM call
The shape of the pattern is the same whether you're writing it from scratch or using a library. Four steps, in order, every request:
- Detect. Find every span of text in the prompt that looks like personal data: emails, phone numbers, names, addresses, payment instruments, national IDs, IP addresses, the long tail.
- Redact. Replace each span with a stable placeholder, something like
[EMAIL_1]or[PERSON_2], and keep a private mapping from placeholder back to the original value. - Prompt. Send the redacted text to the model. The placeholders survive the model's reasoning intact most of the time, because they look like opaque identifiers and the model treats them as such.
- Rehydrate. When the response comes back, walk it for placeholders and substitute the original values back in before the user sees it.
The user sees a response that quotes their actual email back at them. The model never saw it. The mapping never left your trust boundary, or if it did, it was sealed with a key the model provider doesn't hold. That's the whole pattern.
What detection actually looks like
The word “detection” hides a lot of disagreement. Three flavors show up in practice, and a serious system uses all three.
Regex is fast, deterministic, and dumb. It's perfect for things with a fixed shape: an email is a token with an at-sign and a dot, an IPv4 is four dotted octets, a credit card is a 13-to-19 digit run with optional separators. Regex alone is also where most teams ship a bug, because \d{16} matches an order number as eagerly as a real card.
Checksum validators are the cheap fix. Cards run Luhn. IBANs run mod-97. US SSNs have area-and-group rules that exclude obvious junk. IP addresses have range rules. Stack a validator behind every regex that has one and the false-positive rate drops by an order of magnitude. A 16-digit run that fails Luhn isn't a card, it's an order ID, and the redactor leaves it alone.
Machine learning covers the entities that don't have a shape. A person's name is just a sequence of capitalized tokens that could be anything. An address is a fuzzy bag of lines, ZIP codes, and street suffixes. For these you need a model that learned context. We default to a small distilled DeBERTa-style classifier (Piiranha) because it runs in tens of milliseconds on CPU. A larger model would be more accurate and too slow to put on the request path.
The reason a hybrid approach wins is mostly economics. Pure regex misses the names. Pure ML costs latency and money on every email and phone number it didn't need a model for. Run the structural detectors first, validate them, then ask the ML layer about what's left. If you want a longer take on the ML side, we wrote up where Presidio fits in this stack and the tradeoffs of running it yourself.
PII redaction for OpenAI in five lines
Here's the whole round-trip with the peyeeye Python SDK in front of an OpenAI call. The same shape works for Anthropic and Gemini; only the chat client changes.
# pip install peyeeye openai from peyeeye import shield from openai import OpenAI prompt = "Summarize: Ada (ada@lovelace.dev) called about card 4242 4242 4242 4242." red = shield.redact(prompt) # red.text -> [PERSON_1] ([EMAIL_1]) ... [CARD_1] reply = OpenAI().chat.completions.create(model="gpt-4o", messages=[{"role": "user", "content": red.text}]) final = shield.rehydrate(reply.choices[0].message.content, red.session)
A few things to notice. The placeholders are deterministic across calls in the same session, so if you ask a follow-up question about Ada, she's still [PERSON_1] on the second turn. The session handle in red.session is either a server-side id (a ses_ token) or a sealed AEAD blob (a skey_ token, the mapping encrypted to a key only your backend holds). Which one you use is a tradeoff we cover in stateless versus stateful sessions.
If you're running a multi-step chain or an agent rather than a single call, the pattern is identical, you just apply it at every model boundary. We have a worked example in PII redaction for LangChain that shows where the redact and rehydrate steps slot into LCEL.
Deterministic tokens and why they matter
One thing to get right: when the same value appears more than once in a prompt, or the same conversation makes more than one call, the placeholder needs to stay stable. If turn one redacts Ada to [PERSON_1] and turn two redacts her to [PERSON_3], the model sees two different people and the conversation breaks. Worse, your rehydrator might map a response token back to the wrong human.
That's what deterministic tokens solve. Within a session, value-to-placeholder is a one-to-one mapping. The first time we see ada@lovelace.dev we mint [EMAIL_1]. Every subsequent occurrence, across turns, gets the same token. Counters seed from the existing mapping so a fresh call doesn't collide with placeholders the model wrote into its own response on a previous turn. That last detail is the kind of thing you'll discover the hard way if you build it yourself.
The streaming gotcha
Streaming responses are the most common place this pattern gets awkward. The model starts emitting tokens as Server-Sent Events, you want to forward each chunk to the browser as it arrives, and now your rehydration step has to either run on every chunk or sit at the end and rewrite the buffer.
We currently pass the chunks through unchanged. The placeholders are inert text from the browser's perspective; if a UI renders [EMAIL_1] for a frame and then the rehydrated final text replaces it, the user sees a brief flash of placeholder. For most chat UIs that's acceptable, because the streaming buffer is replaced wholesale at the end of the response. For a few it isn't, and we'll be honest: chunk-level rehydration is on the roadmap and not in the box today. If you need it now, you can buffer the stream server-side, rehydrate, and forward; you lose the streaming UX but get correct text at every frame.
What to do when you can't redact
There's a class of prompts that don't fit cleanly into this pattern. A 120-page contract that you want the model to summarize is one of them. Every paragraph might mention a name; once you tokenize them all, the prompt is now a soup of [PERSON_N] markers and the model loses the thread of who did what. The redaction is correct. The output is worse.
The honest answer is that you have a tradeoff to make, and pretending otherwise is how you ship a feature that nobody trusts. Three options, none perfect:
- Selective redaction. Redact only the entity types your compliance posture actually requires. A contract summarizer probably doesn't need to hide people's names from a model the company already has a DPA with; it does need to hide bank account numbers and tax IDs. Configure the entity set per use case.
- Provider isolation. Use a model deployment with a written zero- retention guarantee for the high-context cases, and reserve full redaction for the low-context ones. Azure OpenAI with abuse monitoring disabled, an Anthropic zero-retention contract, or an in-VPC model are the usual answers.
- Two-pass with summary. Run a first pass on the redacted prompt to extract structure (categories, dates, amounts), then a second pass on a smaller, less sensitive prompt that uses the structured output. Often the second pass is small enough that full redaction works.
None of these are clever. They're just the choices that exist. Pick the one that matches your contracts and your appetite, and write down which you picked so the next person on the team doesn't have to reverse-engineer the decision.
Putting it together
The four-step pattern (detect, redact, prompt, rehydrate) is the thing the security reviewer wants to see on the architecture diagram. Hybrid detection (regex plus checksums plus a small ML layer) is what makes the detect step accurate enough to be useful at request latency. Deterministic tokens within a session keep multi-turn chats coherent. Sealed sessions or zero-retention server sessions cover the storage question. Streaming and very long contexts are where the pattern gets honest about tradeoffs, and you should plan for those rather than be surprised by them.
That's the shape. The implementation is two HTTP calls and a token map. We documented the wire format in five paragraphs of our docs, and the Python, TypeScript, and Go SDKs are thin wrappers over them. If you'd rather start by poking at the API directly, the free tier is enough to get a feel for it.
Try the four-step pattern. Free tier, no credit card, 90 seconds from a fresh terminal to your first redacted prompt going out to OpenAI.
Get an API key →