Presidio is a great library. It's also a service you have to run.
A practical look at where Microsoft Presidio fits, where it stops, and what you end up building around it the moment you wire it into an LLM pipeline.
If you've spent more than a weekend on PII redaction, you've met Presidio. It's the most-recommended answer on Stack Overflow, the default project README example, and the safe choice in compliance review. We use parts of its detection model internally, and none of what follows is a knock on Presidio as a library.
But every team we've watched ship Presidio into an LLM pipeline ends up writing the same three pieces of glue: a service wrapper, a token-to-value store, and a small fleet of custom recognizers for the entities Presidio doesn't cover by default. By the time you're done, the “use Presidio” ticket has turned into a service that someone on your team owns forever.
peyeeye is what you get if you assume nobody wants to own that service. This post is a quick tour of where the line falls. If you're earlier in the process and just trying to decide whether to redact PII before sending it to an LLM at all, start there and come back.
What Presidio actually gives you as a PII redaction tool
Out of the box, Presidio ships an Analyzer with around fourteen recognizers (US-centric: phone, SSN, credit card, email, IP, person, location, and a handful of others), an Anonymizer that does regex-style replace/mask/hash on the spans the analyzer found, and an ImageRedactor for OCR. There's a spaCy backend by default and a pluggable transformers / Stanza option behind it.
That covers detection. It does not cover the rest of the round-trip in an LLM app:
- Rehydration. Presidio gives you redaction (text → text). Putting the real values back into the model's response is yours to build, which means a token map, a store, an expiry policy, and a thread-safe lookup.
- Sessions. If two redact calls in the same chat turn need consistent tokens (so
[PERSON_1]means the same person on call 2 as on call 1), you're managing that state. - Non-US entities. The default recognizers are mostly US. IBANs, UK NINO, German Steuer-ID, French INSEE, Spanish DNI, Indian Aadhaar, Brazilian CPF: those are all write-your-own.
- Structural validation. Some recognizers ship with checksums (the credit card recognizer does Luhn). Many don't. If you don't want
123-45-6789matching every nine-digit number in a stack trace, you're adding the validator.
None of these are unsolvable. They're just code, and code has a total cost. The argument for self-hosting Presidio mostly comes down to whether that cost is worth not depending on a vendor. Sometimes it is. Often it isn't.
The hidden line item in “just run Presidio” is the service itself. Presidio is a Python library, not a hosted product. To call it from anything that isn't Python, you wrap it in FastAPI or gRPC, add a health check, configure the spaCy or transformers model to load on boot, give it enough memory not to thrash, scale it horizontally for traffic, instrument it for the SOC, and make sure model files live somewhere your build pipeline can fetch them reliably. None of that is exotic, but it adds up to a service your platform team has to babysit.
PII redaction for LLMs: the five-line version
Here's the same job in peyeeye, end-to-end, including rehydration:
# pip install peyeeye from peyeeye import shield red = shield.redact("Email me at ada@lovelace.dev") answer = openai.chat(red.text) # model only sees [EMAIL_1] final = shield.rehydrate(answer, red.session)
The model never sees a real email. The user sees the rehydrated reply. There is no service to operate, no recognizer to maintain, and no token store to scale. If you're curious about the wire format, it's two endpoints (/v1/redact and /v1/rehydrate) and they're documented in five paragraphs.
The detection layer underneath is hybrid. We start with regex plus structural validators (Luhn for cards, mod-97 for IBANs, the SSN exclusion list, IPv4 and IPv6 sanity checks) so a bare nine-digit number in a stack trace doesn't come back tagged as an SSN. For free-text fields where regex hits a wall, you can opt in to an ML backend (Presidio with the Piiranha DeBERTa model, or a Transformers token classifier) by flipping PEYEEYE_ML_BACKEND. The default is regex because most LLM payloads are short, structured, and don't need a model to find the email address.
Presidio alternative: the honest comparison
Take this with the usual caveat: both projects are extensible and any cell in this table can be made to read differently with enough work. What it's describing is what you get on day one.
When you should still self-host Presidio
We're not going to argue that self-hosting is always the wrong call. There are real reasons it's the right one:
- You can't send a payload to a third party at all, even briefly.
- You're running on an air-gapped network, or in a region we don't serve.
- You already operate a Python service stack with idle capacity and the ops cost is effectively zero on the margin.
- You want bit-level control over the recognizer set, scoring, and confidence thresholds.
If any of those apply, Presidio is genuinely the right tool. The version of peyeeye you can self-host via Docker Compose exists for the same reason: some payloads can't leave the building.
Rehydration, sessions, and GDPR LLM workflows
The team most likely to regret self-hosting Presidio is the one shipping an LLM feature on a deadline. Detection is maybe a third of what you actually need. The rest is the part that lands on your team forever: the token store, the session model, the non-US recognizers, the on-call rotation, the eventual rewrite when someone asks for “but stateless this time”.
Rehydration is where most home-grown setups quietly fail. You ship the redaction step, the model returns [EMAIL_1], and now somebody has to put the real address back without leaking the mapping into a log line, a trace, or a cache. peyeeye does it as a second HTTP call, with the mapping either pinned to a session id or sealed into an opaque skey_… blob the model never sees.
Sessions are the other piece people underestimate. A real chat turn is rarely one redact call. It's a system prompt, a user message, a tool call, a tool response, and a model reply, all of which need to use the same token for the same person. Without a session, [PERSON_1] on call one might be Ada and [PERSON_1] on call two might be Grace, and your assistant starts replying to the wrong human. peyeeye handles this by mutating the reverse index in place across calls keyed on a ses_… id, so token numbering stays stable for as long as you keep the session open.
The compliance side matters too. If you're working through GDPR for LLM workloads, the question auditors care about is whether personal data ever sat in a third-party model's context window or training pool. Stateless mode answers that with a yes-or-no: the keys live with you, the tokens go to the model, and nothing is retained on our side by default. Presidio can be made to do the same thing, but you'll be writing the retention story yourself.
That's the gap we try to fill. One HTTP call in, one HTTP call out, deterministic tokens, sealed sessions when you want zero retention, and the entity set you'd otherwise have spent a sprint writing.
A pragmatic Presidio alternative, not a replacement
We don't think Presidio should disappear, and we're not pretending peyeeye covers every case it does. Presidio's OCR pipeline, its image redactor, and the fine-grained control it gives you over recognizer scoring are all reasons a team might stay on it. If your problem is “I have a hundred thousand scanned PDFs and need to batch-redact them on a box that never touches the internet”, peyeeye isn't the answer.
Where we think we earn our keep is the LLM-shaped problem: short payloads, real-time latency budgets, deterministic tokens across a multi-turn session, and rehydration that you didn't have to write. If that's the shape of your problem, the five lines above are the whole integration. If it isn't, Presidio is a solid library and we'll happily hand you the parts of our detection stack we built on top of it.
A few practical notes on picking between them. If you already have a Python service stack, in-house ML ops, and you're comfortable owning a recognizer fleet, the Presidio path is well-trodden and the community is large. If you're a TypeScript shop, or you ship from a serverless runtime, or you just want one less moving piece in the LLM critical path, an HTTP API with first-party Python and TypeScript SDKs is usually the lower-friction choice. Both can produce a compliant pipeline; they just land the work on different teams.
One last thing worth saying out loud: nothing about PII redaction is ever finished. New fields appear in your data, regulators publish new guidance, and the model providers change their retention defaults every couple of quarters. Whichever tool you pick, plan for the entity catalog to grow, the validators to get stricter, and the rehydration contract to need a versioning story. We do that work in the open: every detector ships with sample inputs, every validator has a test, and the eval harness scores the whole stack against public datasets so changes are visible. Whether you build on Presidio or on us, that's the bar to hold.
Try it. Free tier, no credit card, 90 seconds from a fresh terminal to your first redacted prompt.
Get an API key →