Yavio

Security

PII stripping, API key scoping, widget JWT auth, and rate limiting

Yavio is designed with a security-first approach. This page covers how the platform protects sensitive data and prevents abuse.

PII Stripping

The PII stripping engine runs on every event before it reaches ClickHouse. It is ON by default and cannot be disabled in Community or Cloud Free tiers.

Defense in Depth

PII stripping uses a two-layer strategy:

  1. SDK layer — a lightweight, non-configurable best-effort pass strips common PII patterns before events leave the developer's process
  2. Ingestion API layer — a full configurable scrub runs as the authoritative safety net before writing to ClickHouse

This ensures PII is caught regardless of SDK version and provides redundancy if either layer has a bug.

What Gets Redacted

PatternDetection MethodReplacementLayer
Email addressesRFC 5322 regex[EMAIL_REDACTED]SDK + Ingest
Credit card numbersLuhn algorithm + format check[CC_REDACTED]SDK + Ingest
SSN / Tax IDsNNN-NN-NNNN patterns[SSN_REDACTED]SDK + Ingest
Phone numbersInternational format detection (E.164 + common formats)[PHONE_REDACTED]SDK + Ingest
Physical addressesHeuristic: number + street name patterns[ADDRESS_REDACTED]Ingest only

Both layers scan all string fields in the event, including nested JSON in metadata. They target identity data — business values like prices, cities, dates, and quantities are preserved.

API Key Authentication

Server-side SDKs authenticate with a project API key (yav_... prefix). Each project gets a unique key, managed from the dashboard.

  • API keys are sent as Authorization: Bearer <key> headers
  • The ingestion API validates keys against PostgreSQL with an in-memory LRU cache (60s TTL, 10,000 entries)
  • Revoked keys are rejected within 60 seconds of revocation

Key Scoping

Each API key is scoped to a single project within a workspace. A key can only write events to its own project — it cannot read data, access other projects, or perform administrative actions.

Widget JWT Authentication

Widgets never receive the project API key. Instead, the server-side proxy mints a short-lived JWT for each widget interaction:

PropertyValue
AlgorithmHMAC-SHA256
Expiry15 minutes
ScopeSingle trace ID
PermissionsWrite-only event ingestion

JWT Claims

ClaimDescription
pidProject ID
widWorkspace ID
tidTrace ID (events must match this trace)
sidSession ID (shared with the server)

Even if a widget JWT is extracted from the browser iframe, it is useless after 15 minutes and cannot be used to send events for other traces.

Trace Validation

Every event in a widget JWT batch must have a traceId matching the JWT's tid claim. The ingestion API rejects the entire batch if any event has a mismatched trace ID.

Rate Limiting

ScopeLimitPurpose
Per API key1,000 events/secondPrevents abuse from leaked keys
Per API key burst5,000 eventsAllows short spikes (e.g., batch imports)
Per IP (unauthenticated)10 requests/secondProtects against brute-force key scanning

Rate limit state is stored in memory. Exceeded limits return 429 Too Many Requests with a Retry-After header.

Reverse Proxy Requirement

IP-based rate limiting relies on the X-Forwarded-For header to identify clients, since the application sits behind a reverse proxy and cannot see the original socket address directly.

Your reverse proxy must overwrite (not append to) the X-Forwarded-For header with the client's real IP. If the proxy merely appends, a client can send a forged header value to get a fresh rate-limit bucket on every request.

Example Nginx configuration:

proxy_set_header X-Forwarded-For $remote_addr;

Example Caddy configuration:

header_up X-Forwarded-For {remote_host}

Without this, IP-based rate limits can be bypassed trivially.

Event Field Size Limits

Individual event fields are validated against maximum size limits to prevent storage exhaustion:

FieldMax SizeBehavior
metadata (JSON)10 KBTruncated
user_traits (JSON)5 KBTruncated
error_message2 KBTruncated
event_name256 charsEvent rejected
user_id256 charsEvent rejected
Total event size50 KBEvent rejected
Batch size500 KBBatch rejected (413)

Truncation is preferred over rejection for non-critical fields to avoid data loss.

Pipeline Guarantees

GuaranteeHow
At-least-once deliverySDK retries + ClickHouse deduplication
OrderingMillisecond timestamps + ClickHouse sort order
No data loss on restartRetry with exponential backoff
No data loss on shutdownSynchronous final flush from SDK
Backpressure503 response when buffer exceeds 100K events

On this page