The peyeeye.ai API
Redact PII on the way into your LLM prompts and rehydrate it on the way out. One round-trip, deterministic tokens, zero data retention by default.
How it fits into your stack #
peyeeye.ai is not an LLM provider. It's a thin, stateless shield that sits between your application and whatever model you're using — Claude, GPT, Gemini, a fine-tune, your own checkpoint. Two HTTP endpoints do the whole dance:
POST /v1/redact— your raw prompt in, tokenized text out.- You prompt the LLM with the tokenized text.
POST /v1/rehydrate— the model's reply in, original values back in.
Everything peyeeye.ai does is synchronous, idempotent, and observable. There is no queue, no background worker, no magic. If your LLM call times out, ours already finished.
Guarantees #
- Zero retention by default. Redacted text and source values are held in memory for the session's TTL (default
15m) and then discarded. Setsession: "stateless"to skip server-side storage entirely. - Deterministic tokens within a session.
Ada Lovelaceis always[PERSON_1]inside one session, and never leaks across sessions. - At-rest encryption (AES-256-GCM) and TLS 1.3 in transit.
- Per-org isolation. Custom detectors, policies, and API keys are scoped to your organization — cross-tenant leakage is impossible by construction.
Quickstart #
One redact + one rehydrate, from zero to working code. Grab your API key from the dashboard first.
1 · Install
# Node / TypeScript
npm install peyeeye2 · Set your key
export PEYEEYE_KEY="pk_live_51H…"
3 · Round-trip a prompt
The SDK wraps redact+rehydrate into a single shield() helper. This is the recommended pattern — you can still call the raw endpoints if you need to.
import { Peyeeye } from "peyeeye"; import Anthropic from "@anthropic-ai/sdk"; const peyeeye = new Peyeeye({ apiKey: process.env.PEYEEYE_KEY }); const claude = new Anthropic(); const shield = await peyeeye.shield(); const safe = await shield.redact("Hi, I'm Ada, ada@a-e.com"); const reply = await claude.messages.create({ model: "claude-sonnet-*", max_tokens: 256, messages: [{ role: "user", content: safe }] }); console.log(await shield.rehydrate(reply.content[0].text)); // "Hi Ada, thanks — we've emailed ada@a-e.com."
session handle is opaque — it's how rehydrate matches tokens back to real values. Pass it verbatim. Don't persist it longer than the redacted text lives.Authentication #
All requests use bearer-token auth. Keys are prefixed pk_live_, scoped to one organization, and don't expire — rotate them yourself in the dashboard.
Authorization: Bearer pk_live_51H… Content-Type: application/json Idempotency-Key: req_a1b2c3d4 # optional, recommended
Idempotency #
Pass an Idempotency-Key header to safely retry. We cache the full response keyed on the tuple (api_key, idempotency_key). Mismatched bodies raise idempotency_conflict.
POST /v1/redact #
Detect PII in a block of text, replace each span with a deterministic token, and return a session handle you can later rehydrate.
Body parameters
128K characters. Arrays accepted — each element is redacted in the same session.fr-FR SIRET, en-GB NHS number).default: "auto""stateless" to skip server-side storage — the response will include a rehydration_key blob you must present to /rehydrate."[{TYPE}_{N}]" (default), "<{TYPE}>", or a custom format with {TYPE}, {N}, {HASH} variables.Example
POST /v1/redact Authorization: Bearer pk_live_… Content-Type: application/json { "text": "Hi, I'm Ada Lovelace.\nEmail: ada@analytic-engines.com\nCard: 4242 4242 4242 4242", "locale": "en-US", "policy": "default" }
{
"redacted": "Hi, I'm [PERSON_1].\nEmail: [EMAIL_1]\nCard: [CARD_1]",
"session": "ses_7fA2kLw9MxPq",
"entities": [
{ "token": "[PERSON_1]", "type": "PERSON",
"span": [8, 20], "confidence": 0.98 },
{ "token": "[EMAIL_1]", "type": "EMAIL",
"span": [29, 55], "confidence": 1.00 },
{ "token": "[CARD_1]", "type": "CARD",
"span": [62, 81], "confidence": 0.99 }
],
"latency_ms": 38,
"expires_at": "2026-05-01T14:27:03Z"
}POST /v1/rehydrate #
Substitute tokens in a string with the original values held in a session. Unknown tokens pass through verbatim — we don't fail the call if the LLM made one up.
Body parameters
/redact, or rehydration_key blob if you used stateless mode.true, any unknown tokens raise unknown_token instead of passing through. Useful for catching model hallucinations.default: falseResponse
{
"text": "Hi Ada, thanks — we've emailed ada@analytic-engines.com.",
"replaced": 2,
"unknown": [],
"latency_ms": 11
}More endpoints #
Everything else the dashboard uses is available over the same bearer-token API.
locale, policy, chars processed, entity count, expires_at, and whether it's already expired.id, category, sample, locales; customs add kind, pattern, enabled.id, kind: "regex" | "fewshot", pattern, examples, confidence_floor. Plan-gated: Free allows 1, Build 3, Pro 10, Scale unlimited. Over-cap returns 403 forbidden.pattern, toggle enabled, or tune confidence_floor without a full replace.pattern into a POST /v1/entities call to adopt one.Errors & retries #
All errors return a JSON body with code, message, and request_id. Transient errors (429, 5xx) are safe to retry with exponential backoff — the SDKs do this for you.
strict: true mode hit a token that wasn't in the session. Often means the LLM hallucinated a placeholder./redact.128K characters. Split the text and redact each chunk into the same session.Retry-After header. SDKs back off automatically.Rate limits #
Per-key limits, measured as requests-per-second with a burst bucket of 2× sustained RPS. Response headers report your remaining budget:
X-RateLimit-Limit: 500
X-RateLimit-Remaining: 487
Retry-After: 0.42 # seconds, only on 429- Free — 2 rps sustained, 5 rps burst
- Build — 200 rps sustained, 400 rps burst
- Pro — 1000 rps sustained, 2000 rps burst
- Scale — 3000 rps sustained, 6000 rps burst
Sessions & tokens #
A session is the bridge that lets peyeeye.ai swap tokens back to real values later. Two modes:
Stateful (default)
We hold the mapping for 15m after the last touch, then discard it. Simple, low-latency, but requires server-side storage on our end — if that's a non-starter for you, use stateless mode instead. DELETE /v1/sessions/:id to drop the mapping early.
Stateless
Pass session: "stateless". The response includes an opaque rehydration_key (prefixed skey_) — an AES-256-GCM-sealed blob of the token→value mapping. Store it yourself. Send it back to /rehydrate as the session value. We never persist anything.
Entity catalog #
62 built-in entity types (regex + checksum validated, supplemented by ML NER), grouped below. Every ID is usable in entities: [...] or as a policy rule.
Custom detectors #
Define your own detector with a regex, or drop in a handful of example strings and let peyeeye induce the pattern (LLM-backed when enabled, heuristic fallback otherwise):
{
"id": "ORDER_ID",
"kind": "regex",
"pattern": "#A-\\d{6,}",
"examples": ["#A-884217", "#A-007431"],
"confidence_floor": 0.9
}If pattern is omitted, peyeeye induces one from examples at create time. Test-drive patterns against sample text before you save them with POST /v1/entities/test.
Streaming #
When you're piping an LLM's token stream back to a user, naive rehydration breaks on mid-token boundaries. The streaming API buffers partial tokens until they complete, then emits cleanly. Build plan and higher.
Post a list of chunks; get back Server-Sent Events in three flavours — session fires once with the new session id, redacted fires per chunk, done closes the stream.
# POST /v1/redact/stream body: { "chunks": ["Hi, I'm Ada", " — card 4242 4242 4242 4242"] }
event: session
data: {"session":"ses_7fA2kLw9MxPq"}
event: redacted
data: {"text":"Hi, I'm [PERSON_1]","entities":1}
event: redacted
data: {"text":" — card [CARD_1]","entities":1}
event: done
data: {"chars":37}Both SDKs wrap this with partial-token buffering so you can interleave upstream LLM chunks with rehydration safely. Open a shield once, redact the user prompt, then pipe each streamed LLM chunk through rehydrateChunk:
import { Peyeeye } from "peyeeye"; import Anthropic from "@anthropic-ai/sdk"; const peyeeye = new Peyeeye({ apiKey: process.env.PEYEEYE_KEY! }); const claude = new Anthropic(); const shield = await peyeeye.shield(); const safe = await shield.redact(userInput); const upstream = await claude.messages.stream({ model: "claude-sonnet-*", messages: [{ role: "user", content: safe }], }); for await (const chunk of upstream) { if (chunk.type !== "content_block_delta") continue; const out = await shield.rehydrateChunk(chunk.delta.text); // partial-token safe process.stdout.write(out); } process.stdout.write(await shield.flush()); // emit any buffered remainder
from peyeeye import Peyeeye from anthropic import Anthropic import os, sys peyeeye = Peyeeye(api_key=os.environ["PEYEEYE_KEY"]) claude = Anthropic() with peyeeye.shield() as shield: safe = shield.redact(user_input) with claude.messages.stream( model="claude-sonnet-*", max_tokens=512, messages=[{"role": "user", "content": safe}], ) as upstream: for text in upstream.text_stream: sys.stdout.write(shield.rehydrate_chunk(text)) # partial-token safe sys.stdout.flush() sys.stdout.write(shield.flush()) # emit any buffered remainder
If you want the raw SSE — for example from a runtime without the SDK on it — post directly to /v1/redact/stream and consume the stream of session / redacted / done events:
import { Peyeeye } from "peyeeye"; const peyeeye = new Peyeeye({ apiKey: process.env.PEYEEYE_KEY! }); for await (const ev of peyeeye.redactStream({ chunks: ["Hi, I'm Ada", " — card 4242 4242 4242 4242"], })) { if (ev.event === "session") sessionId = ev.data.session; if (ev.event === "redacted") process.stdout.write(ev.data.text); }
from peyeeye import Peyeeye peyeeye = Peyeeye(api_key="pk_live_...") for ev in peyeeye.redact_stream([ "Hi, I'm Ada", " — card 4242 4242 4242 4242", ]): if ev.event == "session": session_id = ev.data["session"] elif ev.event == "redacted": print(ev.data["text"])
SDKs #
First-party libraries, open-source under MIT. Full parity with the HTTP API — redact, rehydrate, streaming with partial-token buffering, stateless sealed sessions, custom detectors, session management. Current stable release: v1.0.0.
TypeScript / Node
Node 18+, Bun, Deno, Cloudflare Workers, Vercel Edge. Zero runtime dependencies — uses the platform fetch. Dual ESM + CJS build with typed .d.ts / .d.cts.
Python
Python 3.9+. Single runtime dependency (httpx). Fully type-hinted with py.typed. Shield context manager handles session lifecycle automatically.
TypeScript / Node
Install:
# npm, pnpm, yarn, or bun — pick your poison
npm install peyeeye
pnpm add peyeeye
bun add peyeeyeQuickstart — end-to-end redact → LLM → rehydrate:
import { Peyeeye } from "peyeeye"; import Anthropic from "@anthropic-ai/sdk"; const peyeeye = new Peyeeye({ apiKey: process.env.PEYEEYE_KEY! }); const claude = new Anthropic(); const shield = await peyeeye.shield(); const safe = await shield.redact("Hi, I'm Ada, ada@a-e.com"); const reply = await claude.messages.create({ model: "claude-sonnet-*", max_tokens: 256, messages: [{ role: "user", content: safe }], }); console.log(await shield.rehydrate(reply.content[0].text)); // "Hi Ada, thanks — we've emailed ada@a-e.com."
shield() opens a session on the first redact() call, keeps reusing it across subsequent calls, and swaps tokens back on rehydrate(). The same real value always yields the same token within a shield; tokens never leak across shields.
Client configuration:
new Peyeeye({ apiKey: "pk_live_…", baseUrl: "https://api.peyeeye.ai", // optional maxRetries: 3, // 429 + 5xx back off exponentially timeoutMs: 30_000, // per-request timeout defaultHeaders: { "X-App": "my-app" }, fetch: globalThis.fetch, // override on Cloudflare Workers });
Low-level calls (when you don't want the shield helper):
const r = await peyeeye.redact("Card: 4242 4242 4242 4242"); // r.redacted → "Card: [CARD_1]" // r.session → "ses_…" // r.entities → [{ token: "[CARD_1]", type: "CARD", span: [6, 25], confidence: 0.99 }] const back = await peyeeye.rehydrate("Confirmation for [CARD_1].", r.session); // back.text → "Confirmation for 4242 4242 4242 4242."
Full surface: README — shield, stateless sealed mode, SSE streaming, custom detectors, session management, retry / rate-limit headers, typed errors.
Python
Install:
# pip, poetry, pdm, uv — works with any installer
pip install peyeeye
poetry add peyeeye
uv pip install peyeeyeQuickstart — end-to-end redact → LLM → rehydrate:
import os from peyeeye import Peyeeye from anthropic import Anthropic peyeeye = Peyeeye(api_key=os.environ["PEYEEYE_KEY"]) claude = Anthropic() with peyeeye.shield() as shield: safe = shield.redact("Hi, I'm Ada, ada@a-e.com") reply = claude.messages.create( model="claude-sonnet-*", max_tokens=256, messages=[{"role": "user", "content": safe}], ) print(shield.rehydrate(reply.content[0].text))
Inside the with block the shield pins a single session: the same real value always maps to the same token, and the session is cleaned up on exit (stateful mode).
Client configuration:
from peyeeye import Peyeeye peyeeye = Peyeeye( api_key="pk_live_...", base_url="https://api.peyeeye.ai", # optional timeout=30.0, # per-request timeout (seconds) max_retries=3, # 429 + 5xx back off exponentially default_headers={"X-App": "my-app"}, )
Low-level calls (skip the shield helper):
r = peyeeye.redact("Card: 4242 4242 4242 4242") # r.redacted → "Card: [CARD_1]" # r.session → "ses_…" # r.entities → [DetectedEntity(token="[CARD_1]", type="CARD", span=(6, 25), confidence=0.99)] back = peyeeye.rehydrate("Confirmation for [CARD_1].", session=r.session) # back.text → "Confirmation for 4242 4242 4242 4242."
Stateless sealed mode — server never persists the mapping, the sealedskey_… blob carries everything the rehydrate step needs:
with peyeeye.shield(stateless=True) as shield: safe = shield.redact("Ada, 4242 4242 4242 4242") # shield.rehydration_key → "skey_AES-GCM-sealed..." # Shipped to a client, used later, no server-side state. print(shield.rehydrate("Hi [PERSON_1], your [CARD_1] is active."))
Typed errors from the API:
from peyeeye import PeyeeyeError try: peyeeye.redact(text) except PeyeeyeError as e: # e.status, e.code, e.message, e.request_id if e.code == "rate_limited": retry(e.retry_after) elif e.code == "forbidden": upgrade_plan() else: raise
Full surface: README — shield, stateless sealed mode, SSE streaming via redact_stream(), custom detectors, session management, retry / rate-limit headers, typed errors.