← peyeeye / blog · 2026-05-17 · 7 min read

GLiNER for PII: where it shines, and where it falls short

GLiNER is a clever zero-shot NER model. We use it. But generic NER misses structural PII like cards, IBANs, and tax IDs, and it doesn't handle the rest of the LLM round-trip.


When GLiNER hit Hacker News, the demo did something genuinely novel: type any entity label in plain English and the model would tag spans for it. No fine-tuning, no labeled data, no schema migration. A lot of PII redaction posts now point at it as the obvious answer. Why didn't we just wrap GLiNER in an HTTP server and ship it?

The honest version is that we did, sort of. The optional ML backend in peyeeye uses a model from the same family of compact transformer NER. GLiNER is a good piece of work and we're not here to argue otherwise. What this post is about is the gap between “a model that finds spans” and “a service that keeps PII out of an LLM round-trip,” because that gap is bigger than it looks.

If you're still figuring out whether you need redaction at all, the four-step pattern for redacting PII before an LLM call is the better starting point. Come back here once you've decided you want to.

What GLiNER is genuinely good at

GLiNER's pitch is real. You hand it a list of label strings (“person”, “medication”, “internal project name”, whatever) and it returns character spans. No retraining. The base model is small enough to run on CPU at interactive latency, the weights are open, and the API is three lines of Python. For a product that needs to tag novel, fuzzy, language-shaped entities, this is the right shape of tool.

A few places where it's actively the right call:

For PII specifically, GLiNER will reliably catch names, locations, and organizations, and it will catch many emails and phone numbers that a regex misses because they're phrased oddly (“reach me at ada at lovelace dot dev”). That's real value and we're not going to pretend otherwise.

Where PII detection with GLiNER hits the wall: structural validation

Now the part where generic NER quietly stops being enough. A lot of the PII you actually care about isn't a name. It's a number with structure: a sixteen-digit card, an IBAN, a US SSN, an IPv4 address, a national tax ID. A neural NER model will tell you a string “looks like” a credit card. It can't tell you if the digits pass a Luhn check.

That distinction matters more than it sounds. Here is a real string that shows up in stack traces:

# request_id from a load balancer log
req_id = "1234567890123456"

A pure NER model trained on the label “credit card” will redact that. It's sixteen digits in a context that mentions request and id, and the model has learned that sixteen-digit strings are often cards. The redaction step will replace it with [CARD_1], the LLM will receive a useless placeholder where a request id used to be, and the model's answer will be worse for it. Rehydration won't save you because the value wasn't a card to begin with.

False positives in PII redaction are a different shape of problem from false positives in search or tagging. Every false positive is a real value the LLM no longer sees, and the LLM's answer quality is bounded by what it sees. So “mostly correct” recall on structural entities isn't enough. You want the redaction step to refuse to fire unless the structure checks out.

The same trap shows up with IBANs and tax IDs. A GLiNER prompt with the label “IBAN” will tag any string that resembles a country prefix followed by digits. Plenty of internal account numbers and reference codes look like that. The mod-97 check (treat the rearranged string as a base-10 integer, divide by 97, expect remainder 1) is what separates a real IBAN from a string that resembles one. It's three lines of Python and it changes precision by an order of magnitude on noisy input. SSNs have a similar story: the area number can't be 000, 666, or 900 to 999, the group can't be 00, the serial can't be 0000. A model has no built-in way to enforce that, and adding it as a post-filter is exactly the kind of glue you end up writing if you start from a generic NER backbone.

You can teach a transformer to learn checksums in principle. In practice no PII NER model we've seen does, because the training distribution is dominated by names and addresses where the “structure” is linguistic rather than arithmetic. The validators are cheaper and more reliable as a separate layer.

That's where structural PII validation earns its keep. Luhn for cards. Mod-97 for IBANs. Range checks for SSN area numbers. Octet bounds and reserved-block exclusion for IPv4. These are five-line functions, but a generic NER model has no notion of them and there's no clean way to bolt them on after the fact, because by the time the model fired, the span is already a candidate. You need the validators in the loop.

Latency math when a GLiNER alternative sits inline

The other thing the demos elide: even a small NER model adds real latency. A 200M parameter encoder running on a free-tier CPU instance lands in the 50 to 100ms range per call once you account for tokenization, model load, and span decoding. On a warm GPU you can get it to 10 to 20ms, but most teams shipping LLM features aren't provisioning GPUs for the redaction step.

Inline before every LLM call, that's a budget. Not a fatal one, but enough that you start caring about caching, warm pools, and cold-start handling. A regex pass with checksum validators on top runs in under a millisecond on the same input. The pragmatic answer is to do regex plus validators first, fall back to the ML model only when the fast path didn't find anything, and then merge the spans. That's the layout peyeeye uses, and it's mostly invisible from the API.

The other half: rehydration, sessions, and what GLiNER doesn't do

Even if GLiNER nailed every span, you'd still be writing the rest of the pipeline yourself. The model returns offsets. Replacing them with stable placeholders is your code. Keeping [PERSON_1] stable across two calls in the same chat is your code. Putting the real values back into the model's response at egress is your code. The token-to-value store, its expiry policy, its retention story, its thread-safety, its leak-resistance under logging: all yours.

Rehydration in particular is the step that quietly fails in most home-grown setups. The model returns text containing [EMAIL_1], you have to put the original address back without leaking the mapping into a trace, a cache, or a log line. We have a longer post on the GDPR and HIPAA implications of where that mapping lives; the short version is that auditors will ask, and “in the process memory of a Python service we wrote ourselves” is a harder answer than it sounds.

How peyeeye uses something like GLiNER under the hood

For the record: we run a transformer NER backend ourselves. The current default is Piiranha (a DeBERTa-v3 token classifier trained on PII labels), but the architectural slot is the same one GLiNER would fit into. What we layer on top is the part that makes it usable for redaction:

Net effect: higher precision than ML alone (because the validators catch the looks-like-a-card request id), broader coverage than regex alone (because the model catches the oddly phrased phone number), and rehydration as a single HTTP call instead of a service you operate.

PII redaction with rehydration: the five-line version

For comparison, here's the same end-to-end job through peyeeye. The model never sees a real email; the user sees the rehydrated reply.

# pip install peyeeye
from peyeeye import shield

red = shield.redact("Charge card 4242 4242 4242 4242 for ada@lovelace.dev")
answer = openai.chat(red.text) # sees [CARD_1] and [EMAIL_1]
final = shield.rehydrate(answer, red.session)

The card passes Luhn so it gets redacted; a sixteen-digit request id in the same prompt would not. That's the layering working as intended. There's no model to warm, no token store to scale, and the rehydration step is one call. If you'd rather see the wire format directly, it's two endpoints documented in the peyeeye docs.

When you should still pick GLiNER over a hosted PII service

The honest answer isn't never. There are real reasons GLiNER is the right tool:

If none of those apply, what you probably want is detection plus structural validation plus rehydration as a single thing you call, not three separate things you build. That is the slot peyeeye is trying to fill, and the reason we run a transformer in the backend rather than instead of one.

Try the layered version. Free tier, no credit card, structural validators and rehydration included. About 90 seconds from a fresh terminal to your first redacted prompt.

Get an API key