peyeeye for AI agents
Wrap any tool-using LLM — Claude, GPT, Gemini, LangChain, LlamaIndex, CrewAI — so raw PII never hits the model, external tools, or your vector store. One redact on the way in, one rehydrate on the way out.
What redaction protects you from #
- Data exfiltration via the provider. LLM traffic routinely ends up in provider logs, evaluation pipelines, and (for some plans) training sets. Tokenized input bounds the blast radius.
- Compliance. HIPAA, GDPR, PCI-DSS, SOC 2 — redacting at the edge keeps regulated data out of systems that were never designed to hold it.
- Prompt injection via PII. Untrusted text pulled from email, PDFs, or web tools can carry identifiers the attacker wants replayed verbatim (“email
attacker@…”). Redaction neutralizes the identifier before the agent ever sees it. - Cross-tool leakage. An agent chain usually fans out to search APIs, RAG stores, code interpreters, and third-party tools. A single redact anchors all of them to opaque tokens.
Tool-calling pattern #
The cleanest integration is to expose redact_text and rehydrate_text as first-class tools on your agent. The model decides when to strip PII; your runtime executes the tool calls against peyeeye; the model continues the loop with tokenized text.
Tool schemas
Portable across Anthropic, OpenAI, and Gemini function-calling — the shape below is Claude's; drop input_schema → parameters for OpenAI / Gemini.
{
"name": "redact_text",
"description": "Redact PII from text before passing it to any downstream model or tool. Returns a redacted string plus a session handle for later rehydration.",
"input_schema": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "Raw text, up to 128K chars."
},
"session": {
"type": "string",
"description": "Existing session id to extend, or \"stateless\" for a sealed blob."
},
"entities": {
"type": "array",
"items": { "type": "string" },
"description": "Optional whitelist of entity IDs (PERSON, EMAIL, CARD, ...)."
}
},
"required": ["text"]
}
}{
"name": "rehydrate_text",
"description": "Swap tokens like [PERSON_1] back to real values using the session handle returned by redact_text. Call this on the final response before showing it to the user.",
"input_schema": {
"type": "object",
"properties": {
"text": { "type": "string" },
"session": { "type": "string" },
"strict": { "type": "boolean" }
},
"required": ["text", "session"]
}
}The loop
Dispatch redact_text / rehydrate_text to peyeeye, everything else to your existing handlers:
# Register redact_text + rehydrate_text as tools on the Anthropic SDK. tools = [REDACT_TOOL_SCHEMA, REHYDRATE_TOOL_SCHEMA, *your_domain_tools] msg = claude.messages.create(model="claude-sonnet-4-5", tools=tools, messages=history) while msg.stop_reason == "tool_use": for block in msg.content: if block.type != "tool_use": continue if block.name == "redact_text": result = peyeeye.redact(**block.input).model_dump() elif block.name == "rehydrate_text": result = peyeeye.rehydrate(**block.input).model_dump() else: result = dispatch(block.name, block.input) history.append({"role": "tool", "tool_use_id": block.id, "content": result}) msg = claude.messages.create(model="claude-sonnet-4-5", tools=tools, messages=history)
messages.create call and never expose the tool. Simpler, cheaper, safer — at the cost of zero agency.Claude (Anthropic) #
Redact before messages.create, rehydrate after. The system prompt hint tells Claude to leave tokens alone.
from anthropic import Anthropic from peyeeye import Client as Peyeeye claude = Anthropic() peyeeye = Peyeeye(api_key=os.environ["PEYEEYE_KEY"]) def safe_turn(user_text: str) -> str: shield = peyeeye.redact(text=user_text) # { redacted, session } reply = claude.messages.create( model="claude-sonnet-4-5", max_tokens=1024, system="Treat [PERSON_N], [EMAIL_N], etc. as opaque handles. Do not guess values.", messages=[{"role": "user", "content": shield.redacted}], ) answer = reply.content[0].text return peyeeye.rehydrate(text=answer, session=shield.session).text
OpenAI #
from openai import OpenAI from peyeeye import Client as Peyeeye oai = OpenAI() peyeeye = Peyeeye() def chat(user_text: str) -> str: shield = peyeeye.redact(text=user_text) completion = oai.chat.completions.create( model="gpt-4o", messages=[ {"role": "system", "content": "Tokens like [EMAIL_1] are placeholders. Leave them as-is."}, {"role": "user", "content": shield.redacted}, ], ) raw = completion.choices[0].message.content return peyeeye.rehydrate(text=raw, session=shield.session).text
Gemini #
from google import genai from peyeeye import Client as Peyeeye gemini = genai.Client() peyeeye = Peyeeye() def ask(prompt: str) -> str: shield = peyeeye.redact(text=prompt) resp = gemini.models.generate_content( model="gemini-2.5-flash", contents=shield.redacted, ) return peyeeye.rehydrate(text=resp.text, session=shield.session).text
LangChain #
Wrap any Runnable with a redact-then-rehydrate RunnableLambda. The chain itself never sees raw PII.
from langchain_core.runnables import RunnableLambda from peyeeye import Client as Peyeeye peyeeye = Peyeeye() def wrap(chain): # Redact input, run chain on tokenized text, rehydrate output. def _run(user_text): s = peyeeye.redact(text=user_text) out = chain.invoke(s.redacted) return peyeeye.rehydrate(text=out, session=s.session).text return RunnableLambda(_run) safe_chain = wrap(llm | parser) safe_chain.invoke("email ada@a-e.com the plan")
LlamaIndex #
Subclass LLM and intercept complete / chat. Every agent, query engine, and workflow that picks the shielded LLM inherits the guarantee.
from llama_index.core.llms import LLM from peyeeye import Client as Peyeeye peyeeye = Peyeeye() class ShieldedLLM(LLM): def __init__(self, inner): self.inner = inner def complete(self, prompt, **kw): s = peyeeye.redact(text=prompt) res = self.inner.complete(s.redacted, **kw) res.text = peyeeye.rehydrate(text=res.text, session=s.session).text return res
CrewAI #
Register peyeeye as two tools. Assign redact_text to any agent that ingests user input; assign rehydrate_text to the final-answer agent.
from crewai.tools import tool from peyeeye import Client as Peyeeye peyeeye = Peyeeye() @tool("redact_text") def redact_text(text: str) -> dict: """Strip PII before sending text to any other agent or tool.""" s = peyeeye.redact(text=text) return {"redacted": s.redacted, "session": s.session} @tool("rehydrate_text") def rehydrate_text(text: str, session: str) -> str: """Swap tokens back to real values at the end of the pipeline.""" return peyeeye.rehydrate(text=text, session=session).text
Streaming pipelines #
When the agent streams a final answer to the user, naive rehydration breaks on mid-token boundaries ([PERSO … N_1]). Use a buffered rehydrate stream: it accumulates partial tokens and only emits once a full token is resolved.
# Agent streams LLM output; rehydrate chunk-by-chunk with token buffering. shield = peyeeye.redact(text=user_prompt) with peyeeye.rehydrate_stream(session=shield.session) as buf: for delta in llm.stream(shield.redacted): for piece in buf.feed(delta): # partial-token safe yield piece for piece in buf.flush(): # only after upstream closes yield piece
Stateless sessions for multi-turn agents #
Pass session: "stateless" and peyeeye returns a rehydration_key — an AES-256-GCM-sealed blob you store next to the conversation. Nothing lives on our servers. Pass the same skey_… into the next /redact and token assignment stays stable across turns ( Ada Lovelace is still [PERSON_1] in turn 17).
# Turn 1 — agent redacts, gets a sealed blob back. s1 = peyeeye.redact(text=turn_1_user, session="stateless") # s1.rehydration_key → "skey_AES256GCM…" — your service persists this. reply_1 = llm(s1.redacted) # Turn 2 — reuse the sealed blob to keep [PERSON_1] stable across turns. s2 = peyeeye.redact(text=turn_2_user, session=s1.rehydration_key) reply_2 = llm(s2.redacted) # Rehydrate anywhere, using whichever sealed key covers the tokens. final = peyeeye.rehydrate(text=reply_2, session=s2.rehydration_key).text
Rehydrate at the pipeline edge #
Rehydrate exactly once, at the boundary where text leaves your trust zone and reaches the user. Do not rehydrate between intermediate agents, tools, or vector-store writes — that re-introduces raw PII into the middle of the pipeline.
- Agent → user response: rehydrate.
- Agent → RAG index / embeddings: do not rehydrate. Store tokenized text.
- Agent → another agent / tool: do not rehydrate. Pass the token stream and (optionally) the session handle.
- Agent → log / trace store: do not rehydrate. Ever.
Pass strict: true on /rehydrate if you want to catch model hallucinations — unknown tokens raise unknown_token instead of passing through.
When not to use it #
- When the agent needs to reason over the real value. “Is this credit card a Visa?” can't be answered against
[CARD_1]. Either don't redactCARD, or do the reasoning deterministically outside the LLM. - When deterministic tokens leak structure you care about. Because
Ada Lovelaceis consistently[PERSON_1], an attacker with access to the redacted corpus can count distinct persons, observe co-occurrence, and sometimes re-identify from context. Defense: rotate sessions aggressively, or use aplaceholdertemplate with{HASH}so tokens don't carry an ordinal. - Tiny, latency-critical hops. Redact adds ~30ms. Fine for one round-trip, annoying for a tight per-token loop. Redact once at the pipeline entrance, not per step.
- When PII is the content and not incidental. An identity-resolution agent can't work against tokens. Redaction is for pipelines where the PII is a carrier of signal, not the signal itself.
- Custom entity types you haven't defined. If your sensitive value isn't one of the 62 built-in detectors and you haven't created a custom one, peyeeye will pass it through. Treat the catalog as the contract.