Redaction

Redaction replaces secret-bearing substrings in content with [REDACTED:type] markers before the content reaches the underlying store. The original substring never lands in memory or in the document body. Future recalls surface the redacted version. Hindsight offers four layers of redaction, each one catching things the others miss.

Tier summary

Basic ships Layer 1 only (the 44-pattern sensitive_data rule, action redact). The block action is silently downgraded to redact. Cloud Enterprise adds Layers 2 to 4 (the 176-pattern provider catalog, base64 expansion, the LLM screen) plus the security_events audit trail (with captured-secret fingerprints and the submitting Hindsight API key name).

The Four Layers

Pick the layers that match your bank's threat model. The table below maps each layer to what it catches and the tier it ships in.

Layer	Catches	Tier
Sensitive data (44 patterns)	Common credential formats, PII	Basic
Provider pattern coverage (176 additional, 220 total)	SaaS provider tokens	Cloud Enterprise
Base64 expansion	Secrets hidden in Authorization headers or base64 blobs	Cloud Enterprise
LLM screen	Conversational secrets in plain prose	Cloud Enterprise

🛡️ Sensitive Data (Basic)

Basic

This is the only layer included in Basic. The other three layers (provider catalog, base64 expansion, LLM screen) require Cloud Enterprise.

What it catches. AWS access keys, GitHub personal access tokens, Stripe keys, JWTs, PEM private keys, credit card numbers, and US Social Security numbers. The 44-pattern OWASP-aligned default set covers the credential formats most commonly leaked into agent transcripts.

Attack countered. A developer pastes a token into chat while debugging. The agent stores the conversation. A later recall surfaces the token to another user, another session, or to anyone with read access on the bank. Without redaction, the bank quietly accumulates credentials, and every recall call widens the exposure.

How to enable. In the Console, navigate to Bank Settings, click the Memory Defense card, and add a rule with detector sensitive_data and action redact. Save the policy. New retain calls are screened immediately, existing memories are unaffected.

What you see. Stored memories and document bodies contain [REDACTED:type] markers in place of the original substring. On Cloud Enterprise, one row is written to security_events per match, recording the detector name, the matched type, and a hash of the original content for forensic review. Basic does the redaction but does not record security_events rows; the audit trail is an Enterprise capability.

The full Basic catalog (44 patterns). Everything Basic catches, by category:

AI and LLM providers (12)

Anthropic API Key
OpenAI Project Key
OpenAI Admin Key
OpenAI API Key
Google API Key
Google OAuth Token
xAI Key
Groq Key
HuggingFace Token
Replicate Token
Perplexity Key
Databricks Token

Cloud providers (3)

AWS Access Key
AWS Session Token
DigitalOcean Token

Source control and CI (9)

GitHub Fine-Grained PAT
GitHub Personal Access Token
GitHub App Token
GitHub User Token
GitHub Refresh Token
GitHub OAuth Token
GitLab PAT
npm Token
PyPI Token

Payment processors (4)

Stripe Secret Key
Stripe Restricted Key
Square Token
Braintree Token

Communications (8)

Slack Token
Slack Webhook
Twilio API Key
Twilio Account SID
SendGrid Key
Mailgun Key
Discord Bot Token
Telegram Bot Token

Commerce (1)

Shopify Token

Database connection strings (3)

PostgreSQL URL
MySQL URL
MongoDB URL

Private keys and tokens (2)

PEM Private Key
JWT

Personally identifiable information (2)

Credit Card
US Social Security Number

🎯 Provider Pattern Coverage (Cloud Enterprise)

Cloud Enterprise

This layer adds 176 provider-specific patterns on top of the Basic 44, for a total of 220. Basic ships only the 44-pattern sensitive_data rule.

What it catches. The SaaS provider tokens that basic redaction misses. Coverage includes Slack, Discord, Twilio, SendGrid, GitLab, Mailgun, Cloudflare, Datadog, Notion, Linear, Figma, HuggingFace, Postman, Vercel, Azure DevOps, and GCP OAuth tokens, on top of the 13 patterns in the OWASP set.

Attack countered. An agent integrated with one of these providers emits its own token in tool output. Error responses, debug logs, and copy-paste-able curl commands are the usual culprits. The basic regex set does not know the shape of a Slack bot token or a Datadog API key, so the value slips through and gets stored verbatim. With provider coverage on, the captured secret is preserved in security_events as a redacted-identifiable fingerprint (e.g. ghp_AAAA...BBBB) along with the Hindsight API key name that submitted the retain, so SIEM tooling can identify exactly which key in your inventory leaked and which agent or service produced it.

How to enable. Add a rule with detector detect_secrets and action redact.

What you see. The detector name in security_events is the specific provider (GitHub Token, Slack Token, Stripe Live Key, and so on), so triage can route incidents to the right service owner without manually decoding the pattern.

📦 Base64 Expansion (Cloud Enterprise)

Cloud Enterprise

This layer is part of Cloud Enterprise. Basic does not decode base64 content before scanning.

What it catches. Secrets hidden inside base64 blobs. Two real-world cases dominate. First, Authorization: Basic <b64(user:pass)> HTTP headers that show up in tool output or copy-pasted curl commands. Second, base64-wrapped JSON payloads from external tools that embed credentials inside structured fields.

Attack countered. An attacker, or a careless integration, base64-wraps a credential so the regex layers pass the blob through unscrubbed. You store the raw blob. Anyone with recall access decodes it. The credential leaks. With base64 expansion on, the encoded blob is decoded in-memory, scanned for secrets by the other detectors, and any matching blob is replaced in-place with a redaction marker. The captured secret is recorded as a fingerprint in security_events so SIEM tooling can match it against your credential inventory.

How to enable. Add a rule with detector base64_decode and action redact.

🤖 LLM Screen (Cloud Enterprise)

Cloud Enterprise

This layer is part of Cloud Enterprise. Basic does not run an LLM screen over content.

What it catches. Secrets that show up in conversational prose, where structural patterns fail. "the password is hunter2", "use access code SUMMER2025", "the API key is abc-xyz-123, don't share". None of these have a fixed shape that a regex can lock onto.

Attack countered. Users pasting credentials inside natural language instructions or chat messages, where every regex layer misses the secret because nothing in its shape looks like a token. The risk is high for chat-driven internal agents and customer-support copilots, where users casually drop credentials into conversational turns.

How it works. Behind the scenes, the LLM is asked to identify embedded credentials in the payload and return a structured list of hits. Decisions are cached on a content hash, so identical content costs zero LLM tokens on repeat. This makes repeat workloads (document re-ingestion, replayed conversations, idempotent retries) effectively free after the first screen.

How to enable. Add a rule with detector llm_screen and action redact.

Cost note

LLM screen adds real latency, typically 500ms to 2s per unique payload. It is recommended only for high-sensitivity banks where the conversational secret risk is real. For most banks, the structural layers catch enough of the credential population that LLM screen is overkill.

Complete pattern catalog

Cloud Enterprise

The full 220-pattern catalog below ships in Cloud Enterprise via the detect_secrets rule. Basic ships only the 44-pattern sensitive_data rule (a subset of the patterns below, covering the common credential formats: AWS, GitHub PAT, Stripe, JWT, PEM private keys, credit card, US SSN, and other widely-leaked shapes).

Catalog at a glance

220 provider patterns in the Cloud Enterprise detect_secrets rule
3 pattern sources: detect-secrets 1.5.0 (25), GitLeaks (171), Hindsight-native (24)
8 categories for the GitLeaks tier alone
Continuously expanded: the floor is locked at 200 by an automated test

Each provider name in the list below is what appears in the detector field of the security_events row when a match is recorded, so you can use this catalog to write SIEM alert rules.

detect-secrets 1.5.0 plugins (25)

Provider plugins shipped by the detect-secrets library. The high-entropy plugins (Base64HighEntropyString, HexHighEntropyString) are intentionally excluded because they produce excessive false positives on common short strings.

Artifactory
AWS Key
Azure Storage Key
Basic Auth
Cloudant
Discord Bot Token
GitHub Token
GitLab Token
IBM Cloud IAM
IBM COS HMAC
IP Public
JWT Token
Keyword
Mailchimp
NPM
OpenAI
Private Key
PyPI Token
SendGrid
Slack
SoftLayer
Square OAuth
Stripe
Telegram Bot Token
Twilio Key

GitLeaks rules (171)

Rules vendored from the MIT-licensed GitLeaks project. The loader intentionally skips 50 rules that collide with detect-secrets coverage (AWS, GitHub, GitLab, Slack, Stripe, SendGrid, Mailchimp, Twilio, Discord, Telegram, OpenAI, NPM, PyPI, JWT, Artifactory, Private Key, IBM Cloud IAM, IBM COS, SoftLayer, Square, Cloudant, and Azure Storage) so the same secret is not double-reported under two different format labels.

AI / ML providers (7)

Anthropic API key
Anthropic Admin API key
Cohere API token
Hugging Face access token
Hugging Face organization API token
Perplexity API key
Private AI API token

Cloud and infrastructure (32)

Alibaba access key ID
Alibaba secret key
Cloudflare API key
Cloudflare global API key
Cloudflare origin CA key
DigitalOcean access token
DigitalOcean PAT
DigitalOcean refresh token
Doppler API token
Fastly API token
Fly.io access token
GCP API key
Harness API key
HashiCorp Terraform API token
HashiCorp Terraform password
Heroku API key
Heroku API key v2
Kubernetes secret YAML
Netlify access token
OpenShift user token
PlanetScale API token
PlanetScale OAuth token
PlanetScale password
Pulumi API token
Scalingo API token
SettleMint application access token
SettleMint personal access token
SettleMint service access token
Vault batch token
Vault service token
Yandex access token
Yandex API key

Source control and CI (13)

Atlassian API token
Bitbucket client ID
Bitbucket client secret
Clojars API token
Codecov access token
DroneCI access token
JFrog API key
JFrog identity token
NuGet config password
ReadMe API token
RubyGems API token
Sourcegraph access token
Travis CI access token

Payments and finance (21)

Bittrex access key
Bittrex secret key
Coinbase access token
Duffel API token
Finicity API token
Finicity client secret
Finnhub access token
Flutterwave encryption key
Flutterwave public key
Flutterwave secret key
GoCardless API token
Kraken access token
KuCoin access token
KuCoin secret key
Plaid API token
Plaid client ID
Plaid secret key
Shopify access token
Shopify custom access token
Shopify private app access token
Shopify shared secret

Communications and email (17)

Beamer API token
EasyPost API token
EasyPost test API token
Gitter access token
Lob API key
Lob publishable API key
Mailgun private API token
Mailgun public key
Mailgun signing key
Mattermost access token
MessageBird API token
MessageBird client ID
Microsoft Teams webhook
Sendbird access ID
Sendbird access token
Sendinblue API token
Shippo API token

Analytics and observability (28)

Airtable API key
Airtable personal access token
Datadog access token
Defined Networking API token
Dynatrace API token
Grafana API key
Grafana Cloud API token
Grafana service account token
Infracost API token
LaunchDarkly access token
Mapbox API token
MaxMind license key
New Relic browser API token
New Relic insert key
New Relic user API ID
New Relic user API key
Octopus Deploy API key
Postman API token
Prefect API token
Sentry access token
Sentry org token
Sentry user token
Sidekiq secret
Sidekiq sensitive URL
Snyk API token
Sonar API token
Sumo Logic access ID
Sumo Logic access token

Identity and auth (28)

1Password secret key
1Password service account token
Adobe client ID
Adobe client secret
Age secret key
Asana client ID
Asana client secret
Authress service client access key
Azure AD client secret
curl auth header
curl auth user
Etsy access token
Facebook access token
Facebook page access token
Facebook secret
Flickr access token
Frame.io API token
Freshbooks access token
Intra42 client secret
LinkedIn client ID
LinkedIn client secret
Okta access token
Twitch API token
Twitter access secret
Twitter access token
Twitter API key
Twitter API secret
Twitter bearer token

Other (25)

Adafruit API key
Algolia API key
Cisco Meraki API key
ClickHouse Cloud API secret key
Confluent access token
Confluent secret key
Contentful delivery API token
Databricks API token
Dropbox API token
Dropbox long-lived API token
Dropbox short-lived API token
Freemius secret key
Generic API key
HubSpot API key
Intercom API key
Linear API key
Linear client secret
Looker client ID
Looker client secret
Notion API token
NYTimes access token
RapidAPI access token
Typeform API token
Zendesk secret key

Hindsight-native patterns (24)

Patterns that target modern AI providers, database connection URLs with embedded credentials, and PII formats that the upstream catalogs do not cover reliably. Each detector name is emitted as hindsight:<label>.

hindsight:anthropic_key
hindsight:openai_project_key
hindsight:openai_admin_key
hindsight:google_api_key
hindsight:google_oauth_token
hindsight:xai_key
hindsight:groq_key
hindsight:huggingface_token
hindsight:replicate_token
hindsight:perplexity_key
hindsight:databricks_token
hindsight:digitalocean_token
hindsight:github_fg_pat
hindsight:shopify_token
hindsight:stripe_restricted
hindsight:mailgun_key
hindsight:slack_webhook
hindsight:db_url_postgres
hindsight:db_url_mysql
hindsight:db_url_mongodb
hindsight:jwt
hindsight:private_key_pem
hindsight:credit_card
hindsight:ssn_us

Auto-enforced floor

The 220 count is verified by test_total_pattern_count_meets_enterprise_bar in the cloud test suite. If the catalog ever drops below 200, CI fails. Patterns are added as new SaaS providers ship recognizable credential shapes.

Recommended Layering

Most banks should enable the basic sensitive_data rule plus provider coverage: rules for sensitive_data and detect_secrets, both with action redact. That combination catches the structural credential leaks (AWS, GitHub, Stripe, JWT, and the 175-plus provider-specific formats added on Enterprise) at near-zero latency cost. It is the right baseline for nearly any production Cloud Enterprise deployment.

Add base64_decode if your agents process opaque tool payloads, copy-pasted HTTP requests, or curl-style content. Add llm_screen only if conversational secrets are a real threat (chat-driven internal agents, customer-support copilots, banks that ingest free-form user messages at scale). Both layers ship in Cloud Enterprise and stack cleanly on top of the structural layers.

Basic deployments

Basic deployments cannot enable detect_secrets, base64_decode, or llm_screen because those rules are not implemented in the Basic extension. Basic also silently downgrades the block action to redact. If you need any of those capabilities, run Cloud Enterprise.

What Survives, What Does Not

Redaction never alters the agent's downstream behavior beyond removing the secret substring. Fact extraction runs on the redacted content, so memory units reflect the redacted version, and entity resolution sees the redacted text. Recall surfaces redacted content to callers. The agent never sees the raw credential after the screen runs.

On Cloud Enterprise, a redacted-identifiable fingerprint of each captured secret (e.g. ghp_AAAA...BBBB) is recorded in security_events.event_metadata.hits[].preview. The plaintext is never persisted, so the audit trail does not become a second-class credential warehouse. The Hindsight API key name that submitted the retain is recorded on the same row so SIEM tooling can attribute the leak to a specific agent or service. Basic does not maintain a security_events audit trail at all; raw values are redacted in-place and discarded.

Document append still works correctly. Redacted output is byte-deterministic on identical input, so re-screening the same chunk produces the same hash, and the de-duplication paths that compare content hashes continue to function. Idempotent retain calls remain idempotent after redaction is enabled.

Where to Go Next

Detectors covers non-redaction enforcement: blocking and protected tag namespaces.
Policies walks through writing, validating, and rolling out a bank policy.

The Four Layers​

🛡️ Sensitive Data (Basic)​

AI and LLM providers (12)​

Cloud providers (3)​

Source control and CI (9)​

Payment processors (4)​

Communications (8)​

Commerce (1)​

Database connection strings (3)​

Private keys and tokens (2)​

Personally identifiable information (2)​

🎯 Provider Pattern Coverage (Cloud Enterprise)​

📦 Base64 Expansion (Cloud Enterprise)​

🤖 LLM Screen (Cloud Enterprise)​

Complete pattern catalog​

detect-secrets 1.5.0 plugins (25)​

GitLeaks rules (171)​

AI / ML providers (7)​

Cloud and infrastructure (32)​

Source control and CI (13)​

Payments and finance (21)​

Communications and email (17)​

Analytics and observability (28)​

Identity and auth (28)​

Other (25)​

Hindsight-native patterns (24)​

Recommended Layering​

What Survives, What Does Not​

Where to Go Next​

The Four Layers

🛡️ Sensitive Data (Basic)

AI and LLM providers (12)

Cloud providers (3)

Source control and CI (9)

Payment processors (4)

Communications (8)

Commerce (1)

Database connection strings (3)

Private keys and tokens (2)

Personally identifiable information (2)

🎯 Provider Pattern Coverage (Cloud Enterprise)

📦 Base64 Expansion (Cloud Enterprise)

🤖 LLM Screen (Cloud Enterprise)

Complete pattern catalog

detect-secrets 1.5.0 plugins (25)

GitLeaks rules (171)

AI / ML providers (7)

Cloud and infrastructure (32)

Source control and CI (13)

Payments and finance (21)

Communications and email (17)

Analytics and observability (28)

Identity and auth (28)

Other (25)

Hindsight-native patterns (24)

Recommended Layering

What Survives, What Does Not

Where to Go Next