← peyeeye / blog · 2026-05-10 · 8 min read

Adding PII redaction to a LangChain pipeline

How to wrap a LangChain chain or agent with redact and rehydrate steps so the model never sees customer data, with code that works for both LCEL and the older runnable interfaces.

LangChain makes it very easy to forget where your prompt actually goes. You compose a chain, the chain composes a prompt, the prompt composes a request, and somewhere four frames down the call stack a customer's home address is on the wire to a hosted model. Agents are worse: each tool call, each scratchpad rewrite, each “reflect on previous step” round trip is another opportunity for raw user data to leave your process.

This post is the practical version of that problem. We'll wrap a LangChain pipeline so the model only ever sees placeholders like [EMAIL_1], and the rehydrated answer goes back to the caller with the real values restored. If you're still convincing yourself this is worth doing, our piece on redacting PII before the prompt hits the LLM covers the why. Here we're going to assume yes and get into the where and the how.

Why PII redaction in LangChain is its own problem

A bare OpenAI or Anthropic call is one function. Wrap a redact step in front of it and a rehydrate step behind it and you're done. LangChain is not one function. A single chain might look like this in pseudocode:

Input dict arrives from your handler.
A prompt template formats it into a string.
The chat model wrapper sends that string to a provider.
An output parser turns the response into a Python object.
Maybe one or more tools fire, each with their own LLM round trip.

Each arrow in that list is a place customer data can slip through. The trick is picking the spot where you can still see structured input on one side and raw output on the other, without having to monkey-patch a vendor wrapper.

Where to redact: not the prompt template, not the LLM wrapper

The two tempting wrong spots are the prompt template and the chat model wrapper.

Redacting inside the prompt template feels natural because that's where the strings come together. The problem is timing. By the time the template has rendered, you've already lost the structured input dict, and the rendered prompt is what the rest of the chain consumes. You'd be redacting a string you also have to keep an unredacted copy of, which defeats the point.

Redacting inside the chat model wrapper (subclassing ChatOpenAI, for example) sounds clean, but the wrapper doesn't know about your session. You can redact on the way out, but you can't rehydrate the parsed output downstream because that flows through the parser and into whatever calls the chain. You'd be carrying redaction state in a side channel.

The clean spot is two LangChain runnables: one wrapping the input dict before it reaches the prompt template, and a second one rehydrating the parsed output before it leaves the chain. They're symmetrical, they fit naturally into LCEL, and they let the rest of the pipeline stay vendor-agnostic.

A side note on detection. The redact runnable doesn't care which detector you use underneath. We default to the peyeeye hosted detector because it covers a wider entity set and includes checksum validation, but if you're running Microsoft Presidio in your stack already you can swap it in here without changing the pipeline shape. We wrote a longer comparison in our Presidio alternative post if you're weighing that decision. The thing the runnable pattern wants is a function that takes a string and gives you back a redacted string plus a session handle; whose code is doing the detection is your call.

A LangChain runnable for redact and rehydrate

Here's the LCEL version end-to-end. We're using the peyeeye Python SDK, which exposes shield.redact() and shield.rehydrate() on top of the /v1/redact and /v1/rehydrate endpoints.

from langchain_core.runnables import RunnableLambda, RunnablePassthrough
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from peyeeye import shield

def redact_input(payload):
    # payload is the dict your caller passes into the chain
    red = shield.redact(payload["question"])
    return {
        "question": red.text,
        "_session": red.session,
    }

def rehydrate_output(payload):
    answer = payload["answer"]
    session = payload["_session"]
    return shield.rehydrate(answer, session)

prompt = ChatPromptTemplate.from_template(
    "Answer the user's question.\nQuestion: {question}"
)
model = ChatOpenAI(model="gpt-4o-mini")

chain = (
    RunnableLambda(redact_input)
    | RunnablePassthrough.assign(
        answer=prompt | model | (lambda r: r.content),
    )
    | RunnableLambda(rehydrate_output)
)

result = chain.invoke({"question": "Email ada@lovelace.dev with the receipt"})

The RunnablePassthrough.assign in the middle is doing the load-bearing work. It calls the prompt and the model with the redacted question, then writes the model's text into a new answer key while preserving _session alongside it. That keeps the session id available when we rehydrate. The model never sees ada@lovelace.dev, only [EMAIL_1], and the caller gets the real address back in the response.

A few notes on the shape. We're using a stateful session id (ses_…) here because it keeps the example short. If you want zero retention on our end, swap in a sealed key (skey_…) by passing session="stateless" to the redact call. Our post on stateless vs stateful sessions digs into when each is worth it.

Threading the session through RunnableConfig

The closure-over-payload trick above works for a single chain invocation. If you have a longer-lived agent loop where multiple sub-chains share the same session, you'll want the session id to ride along on RunnableConfig instead of getting copied through every dict.

from langchain_core.runnables import RunnableConfig

def redact_input(payload, config: RunnableConfig):
    red = shield.redact(payload["question"])
    config["configurable"]["peyeeye_session"] = red.session
    return {"question": red.text}

def rehydrate_output(answer: str, config: RunnableConfig):
    session = config["configurable"]["peyeeye_session"]
    return shield.rehydrate(answer, session)

That keeps the runnable signatures clean and gives every step in the pipeline a way to ask “what session am I in?” without plumbing it through state.

The older runnable interface: a Chain subclass

If you're still on a stable LangChain release that predates LCEL, or you've got a codebase full of LLMChain instances, you can do the same thing with a custom Chain subclass. It's less elegant, but it works.

from langchain.chains.base import Chain
from peyeeye import shield

class RedactedChain(Chain):
    inner: Chain
    input_key: str = "question"
    output_key: str = "answer"

    @property
    def input_keys(self):
        return [self.input_key]

    @property
    def output_keys(self):
        return [self.output_key]

    def _call(self, inputs, run_manager=None):
        red = shield.redact(inputs[self.input_key])
        out = self.inner.invoke({self.input_key: red.text})
        restored = shield.rehydrate(out[self.output_key], red.session)
        return {self.output_key: restored}

Same shape, more boilerplate. The reason we're showing it is that some teams can't move to LCEL on the same timeline they're shipping a redaction story, and we'd rather give you a working pattern than wave at the migration guide.

Streaming: the placeholder boundary problem

Streaming is where this pattern needs an honest caveat. LangChain's .astream() emits chunks the moment the model produces them. The model produces tokens. Tokens do not respect placeholder boundaries. A rehydrate call against a half-streamed string can land mid-placeholder, which means [EMAIL with no closing bracket and nothing to look up.

We've tried three approaches and none of them are perfectly clean.

The simplest is to defer rehydration to the end of the stream. You stream the redacted answer to your downstream consumer (with placeholders visible), then run one rehydrate pass when the stream closes. This is fine for tool-call traces, internal logs, or any UI that's buffering before showing text anyway. It is not fine if you're piping straight to a chat bubble that updates token-by-token.

The middle option is to buffer per-chunk and only flush rehydrated text when you've seen a closing ]. You hold a small rolling buffer, scan for placeholder shapes, rehydrate completed ones, and pass through everything else immediately. The user sees text appear with a small delay around any redacted span. The delay is usually invisible because placeholders are short.

The third option is to disable streaming on the wrapped chain entirely and stream a fake-progress indicator on the way out. We list it for completeness; we don't love it, and we don't recommend it unless your UI already tolerates non-streamed model output for other reasons.

For the buffered version, the runnable is a stateful generator rather than a simple lambda. It's about thirty lines of code and we'll publish a reference implementation in the SDK docs alongside the synchronous example.

Redact agent prompts: tools and intermediate steps

Agents introduce a wrinkle. The agent's scratchpad, the tool input arguments, the tool outputs, the “observe and reflect” turns: every one of those is another LLM call, and every one of them can leak data if you only redacted the top-level user input.

The good news is that if you redacted the input, every downstream LLM call already receives placeholders, because the agent is just composing strings out of what you've handed it. The trap is on the tool side. If your agent decides to call asend_email tool with to="[EMAIL_1]", the tool gets the placeholder. A naive tool implementation will try to send mail to [EMAIL_1] and fail loudly, or worse, send mail to a literal string.

You have two options here, and we've seen both work in production.

Option one is to rehydrate at the tool boundary. Each tool that needs real values calls shield.rehydrate(arg, session) on its arguments before doing anything real. The session id rides along on RunnableConfig. The model still never sees real PII; the tool sees real PII only at the moment of execution.

Option two is to keep redaction one-way. The agent reasons over placeholders and emits placeholders. Your handler intercepts the final tool call (or the structured response that your application acts on) and resolves placeholders only at the egress boundary. That's nice if your tools are trusted internal functions and you want raw data to stay confined to a small blast radius.

We tend to recommend option one for agents that are doing real I/O on the user's behalf, and option two for agents that are mostly drafting text for a human to review. Neither is wrong; the difference is where you draw the “what counts as egress” line.

One more thing to watch with agents: the LangChain agent executor often logs intermediate steps for tracing. If you have LangSmith or any tracing backend attached, those traces will record whatever flowed through your runnables. The placeholders are fine to record. Real values, if you rehydrate too early, are not. This is one of the reasons we lean toward keeping rehydration as far downstream as possible: the trace shows [EMAIL_1] all the way through, and only your final handler sees the real address.

Rehydrate LangChain output without breaking the chain contract

One last small detail. LangChain output parsers expect specific shapes: StrOutputParser returns a string, PydanticOutputParser returns a model instance. Where does rehydration go?

For string outputs, rehydrate after the parser. The parser sees placeholders, returns a string, and the rehydrate runnable returns the same string with values restored.

For structured outputs, you have a choice. You can rehydrate before parsing (which means the parser validates real values and any field-level constraints fire on real data), or after parsing (which means the model output is parsed against placeholders and you walk the resulting object substituting strings). We default to rehydrating before parsing because validation errors against real data are more useful, but it depends on whether your parser tolerates the size and shape of fully rehydrated values.

One last thing about latency

Two extra HTTP calls per chain run is not free. In our measurements they add somewhere between 30 and 60 ms total to a typical chain that's already spending 800 ms or more in the model. If you're running on a tight latency budget, the sealed (skey_…) mode does the rehydrate decryption locally in the SDK and saves you a round trip on the way back. That's a real win for streaming UIs in particular, where the rehydrate step lands on the critical path.

Caching also helps more than people expect. The redact endpoint is deterministic for the same input string within a session, so if your chain re-runs on the same input (retries, idempotent re-tries from a queue, deterministic test runs) the second call can short-circuit. We don't do this in the SDK by default because cache invariants are application-specific, but a lru_cache around the redact wrapper is usually safe when the session id is part of the key.

The pattern itself is small. Once you have a redact runnable and a rehydrate runnable you can compose them into anything LangChain composes: chains, branches, parallel fan-outs, agent executors, or the LangGraph state machines we'll cover in a follow-up. The thing to keep in mind is symmetry: every place a real value enters the chain needs a redact upstream, and every place a real value has to leave the chain needs a rehydrate downstream. Get those two boundaries right and the rest of the pipeline goes back to being LangChain.

Wire it in. Free tier, no credit card. Drop the runnable in front of your existing chain and watch the model logs go quiet.

Get an API key →