peyeeye for AI agents

Wrap any tool-using LLM — Claude, GPT, Gemini, LangChain, LlamaIndex, CrewAI — so raw PII never hits the model, external tools, or your vector store. One redact on the way in, one rehydrate on the way out.

✓

Already read the API reference? This page is the agent-specific playbook: tool schemas, framework snippets, streaming, stateless sessions, and the rough edges to watch for.

What redaction protects you from #

Data exfiltration via the provider. LLM traffic routinely ends up in provider logs, evaluation pipelines, and (for some plans) training sets. Tokenized input bounds the blast radius.
Compliance. HIPAA, GDPR, PCI-DSS, SOC 2 — redacting at the edge keeps regulated data out of systems that were never designed to hold it.
Prompt injection via PII. Untrusted text pulled from email, PDFs, or web tools can carry identifiers the attacker wants replayed verbatim (“email attacker@…”). Redaction neutralizes the identifier before the agent ever sees it.
Cross-tool leakage. An agent chain usually fans out to search APIs, RAG stores, code interpreters, and third-party tools. A single redact anchors all of them to opaque tokens.

Tool-calling pattern #

The cleanest integration is to expose redact_text and rehydrate_text as first-class tools on your agent. The model decides when to strip PII; your runtime executes the tool calls against peyeeye; the model continues the loop with tokenized text.

Tool schemas

Portable across Anthropic, OpenAI, and Gemini function-calling — the shape below is Claude's; drop input_schema → parameters for OpenAI / Gemini.

{
  "name": "redact_text",
  "description": "Redact PII from text before passing it to any downstream model or tool. Returns a redacted string plus a session handle for later rehydration.",
  "input_schema": {
    "type": "object",
    "properties": {
      "text": {
        "type": "string",
        "description": "Raw text, up to 128K chars."
      },
      "session": {
        "type": "string",
        "description": "Existing session id to extend, or \"stateless\" for a sealed blob."
      },
      "entities": {
        "type": "array",
        "items": { "type": "string" },
        "description": "Optional whitelist of entity IDs (PERSON, EMAIL, CARD, ...)."
      }
    },
    "required": ["text"]
  }
}

{
  "name": "rehydrate_text",
  "description": "Swap tokens like [PERSON_1] back to real values using the session handle returned by redact_text. Call this on the final response before showing it to the user.",
  "input_schema": {
    "type": "object",
    "properties": {
      "text":    { "type": "string" },
      "session": { "type": "string" },
      "strict":  { "type": "boolean" }
    },
    "required": ["text", "session"]
  }
}

The loop

Dispatch redact_text / rehydrate_text to peyeeye, everything else to your existing handlers:

# Register redact_text + rehydrate_text as tools on the Anthropic SDK.
tools = [REDACT_TOOL_SCHEMA, REHYDRATE_TOOL_SCHEMA, *your_domain_tools]

msg = claude.messages.create(model="claude-sonnet-4-5", tools=tools, messages=history)

while msg.stop_reason == "tool_use":
    for block in msg.content:
        if block.type != "tool_use": continue

        if block.name == "redact_text":
            result = peyeeye.redact(**block.input).model_dump()
        elif block.name == "rehydrate_text":
            result = peyeeye.rehydrate(**block.input).model_dump()
        else:
            result = dispatch(block.name, block.input)

        history.append({"role": "tool", "tool_use_id": block.id, "content": result})

    msg = claude.messages.create(model="claude-sonnet-4-5", tools=tools, messages=history)

If you don't want the model making redaction decisions, redact before the first messages.create call and never expose the tool. Simpler, cheaper, safer — at the cost of zero agency.

Claude (Anthropic) #

Redact before messages.create, rehydrate after. The system prompt hint tells Claude to leave tokens alone.

from anthropic import Anthropic
from peyeeye import Client as Peyeeye

claude  = Anthropic()
peyeeye = Peyeeye(api_key=os.environ["PEYEEYE_KEY"])

def safe_turn(user_text: str) -> str:
    shield = peyeeye.redact(text=user_text)            # { redacted, session }

    reply = claude.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        system="Treat [PERSON_N], [EMAIL_N], etc. as opaque handles. Do not guess values.",
        messages=[{"role": "user", "content": shield.redacted}],
    )
    answer = reply.content[0].text
    return peyeeye.rehydrate(text=answer, session=shield.session).text

OpenAI #

from openai import OpenAI
from peyeeye import Client as Peyeeye

oai     = OpenAI()
peyeeye = Peyeeye()

def chat(user_text: str) -> str:
    shield = peyeeye.redact(text=user_text)

    completion = oai.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Tokens like [EMAIL_1] are placeholders. Leave them as-is."},
            {"role": "user",   "content": shield.redacted},
        ],
    )
    raw = completion.choices[0].message.content
    return peyeeye.rehydrate(text=raw, session=shield.session).text

Gemini #

from google import genai
from peyeeye import Client as Peyeeye

gemini  = genai.Client()
peyeeye = Peyeeye()

def ask(prompt: str) -> str:
    shield = peyeeye.redact(text=prompt)
    resp   = gemini.models.generate_content(
        model="gemini-2.5-flash",
        contents=shield.redacted,
    )
    return peyeeye.rehydrate(text=resp.text, session=shield.session).text

LangChain #

Wrap any Runnable with a redact-then-rehydrate RunnableLambda. The chain itself never sees raw PII.

from langchain_core.runnables import RunnableLambda
from peyeeye import Client as Peyeeye

peyeeye = Peyeeye()

def wrap(chain):
    # Redact input, run chain on tokenized text, rehydrate output.
    def _run(user_text):
        s = peyeeye.redact(text=user_text)
        out = chain.invoke(s.redacted)
        return peyeeye.rehydrate(text=out, session=s.session).text
    return RunnableLambda(_run)

safe_chain = wrap(llm | parser)
safe_chain.invoke("email ada@a-e.com the plan")

LlamaIndex #

Subclass LLM and intercept complete / chat. Every agent, query engine, and workflow that picks the shielded LLM inherits the guarantee.

from llama_index.core.llms import LLM
from peyeeye import Client as Peyeeye

peyeeye = Peyeeye()

class ShieldedLLM(LLM):
    def __init__(self, inner): self.inner = inner
    def complete(self, prompt, **kw):
        s   = peyeeye.redact(text=prompt)
        res = self.inner.complete(s.redacted, **kw)
        res.text = peyeeye.rehydrate(text=res.text, session=s.session).text
        return res

CrewAI #

Register peyeeye as two tools. Assign redact_text to any agent that ingests user input; assign rehydrate_text to the final-answer agent.

from crewai.tools import tool
from peyeeye import Client as Peyeeye

peyeeye = Peyeeye()

@tool("redact_text")
def redact_text(text: str) -> dict:
    """Strip PII before sending text to any other agent or tool."""
    s = peyeeye.redact(text=text)
    return {"redacted": s.redacted, "session": s.session}

@tool("rehydrate_text")
def rehydrate_text(text: str, session: str) -> str:
    """Swap tokens back to real values at the end of the pipeline."""
    return peyeeye.rehydrate(text=text, session=session).text

Streaming pipelines #

When the agent streams a final answer to the user, naive rehydration breaks on mid-token boundaries ([PERSO … N_1]). Use a buffered rehydrate stream: it accumulates partial tokens and only emits once a full token is resolved.

# Agent streams LLM output; rehydrate chunk-by-chunk with token buffering.
shield = peyeeye.redact(text=user_prompt)

with peyeeye.rehydrate_stream(session=shield.session) as buf:
    for delta in llm.stream(shield.redacted):
        for piece in buf.feed(delta):          # partial-token safe
            yield piece
    for piece in buf.flush():                # only after upstream closes
        yield piece

Flush after the upstream closes, never during. A mid-stream flush can emit a half-token that looks like a real bracket expression to downstream regex.

Stateless sessions for multi-turn agents #

Pass session: "stateless" and peyeeye returns a rehydration_key — an AES-256-GCM-sealed blob you store next to the conversation. Nothing lives on our servers. Pass the same skey_… into the next /redact and token assignment stays stable across turns ( Ada Lovelace is still [PERSON_1] in turn 17).

# Turn 1 — agent redacts, gets a sealed blob back.
s1 = peyeeye.redact(text=turn_1_user, session="stateless")
# s1.rehydration_key → "skey_AES256GCM…"  — your service persists this.

reply_1 = llm(s1.redacted)

# Turn 2 — reuse the sealed blob to keep [PERSON_1] stable across turns.
s2 = peyeeye.redact(text=turn_2_user, session=s1.rehydration_key)

reply_2 = llm(s2.redacted)

# Rehydrate anywhere, using whichever sealed key covers the tokens.
final = peyeeye.rehydrate(text=reply_2, session=s2.rehydration_key).text

The sealed blob grows with the distinct-entity count. Rotate at conversation boundaries — don't let it unbound.

Rehydrate at the pipeline edge #

Rehydrate exactly once, at the boundary where text leaves your trust zone and reaches the user. Do not rehydrate between intermediate agents, tools, or vector-store writes — that re-introduces raw PII into the middle of the pipeline.

Agent → user response: rehydrate.
Agent → RAG index / embeddings: do not rehydrate. Store tokenized text.
Agent → another agent / tool: do not rehydrate. Pass the token stream and (optionally) the session handle.
Agent → log / trace store: do not rehydrate. Ever.

Pass strict: true on /rehydrate if you want to catch model hallucinations — unknown tokens raise unknown_token instead of passing through.

When not to use it #

When the agent needs to reason over the real value. “Is this credit card a Visa?” can't be answered against [CARD_1]. Either don't redact CARD, or do the reasoning deterministically outside the LLM.
When deterministic tokens leak structure you care about. Because Ada Lovelace is consistently [PERSON_1], an attacker with access to the redacted corpus can count distinct persons, observe co-occurrence, and sometimes re-identify from context. Defense: rotate sessions aggressively, or use a placeholder template with {HASH} so tokens don't carry an ordinal.
Tiny, latency-critical hops. Redact adds ~30ms. Fine for one round-trip, annoying for a tight per-token loop. Redact once at the pipeline entrance, not per step.
When PII is the content and not incidental. An identity-resolution agent can't work against tokens. Redaction is for pipelines where the PII is a carrier of signal, not the signal itself.
Custom entity types you haven't defined. If your sensitive value isn't one of the 62 built-in detectors and you haven't created a custom one, peyeeye will pass it through. Treat the catalog as the contract.

NextBack to API reference BackGet an API key