Skip to main content

Redaction

Redaction replaces secret-bearing substrings in content with [REDACTED:type] markers before the content reaches the underlying store. The original substring never lands in memory or in the document body. Future recalls surface the redacted version. Hindsight offers four layers of redaction, each one catching things the others miss.

Tier summary

Basic ships Layer 1 only (the 44-pattern sensitive_data rule, action redact). The block action is silently downgraded to redact. Cloud Enterprise adds Layers 2 to 4 (the 176-pattern provider catalog, base64 expansion, the LLM screen) plus the security_events audit trail (with captured-secret fingerprints and the submitting Hindsight API key name).

The Four Layers

Pick the layers that match your bank's threat model. The table below maps each layer to what it catches and the tier it ships in.

LayerCatchesTier
Sensitive data (44 patterns)Common credential formats, PIIBasic
Provider pattern coverage (176 additional, 220 total)SaaS provider tokensCloud Enterprise
Base64 expansionSecrets hidden in Authorization headers or base64 blobsCloud Enterprise
LLM screenConversational secrets in plain proseCloud Enterprise

🛡️ Sensitive Data (Basic)

Basic

This is the only layer included in Basic. The other three layers (provider catalog, base64 expansion, LLM screen) require Cloud Enterprise.

What it catches. AWS access keys, GitHub personal access tokens, Stripe keys, JWTs, PEM private keys, credit card numbers, and US Social Security numbers. The 44-pattern OWASP-aligned default set covers the credential formats most commonly leaked into agent transcripts.

Attack countered. A developer pastes a token into chat while debugging. The agent stores the conversation. A later recall surfaces the token to another user, another session, or to anyone with read access on the bank. Without redaction, the bank quietly accumulates credentials, and every recall call widens the exposure.

How to enable. In the Console, navigate to Bank Settings, click the Memory Defense card, and add a rule with detector sensitive_data and action redact. Save the policy. New retain calls are screened immediately, existing memories are unaffected.

What you see. Stored memories and document bodies contain [REDACTED:type] markers in place of the original substring. On Cloud Enterprise, one row is written to security_events per match, recording the detector name, the matched type, and a hash of the original content for forensic review. Basic does the redaction but does not record security_events rows; the audit trail is an Enterprise capability.

The full Basic catalog (44 patterns). Everything Basic catches, by category:

AI and LLM providers (12)

  • Anthropic API Key
  • OpenAI Project Key
  • OpenAI Admin Key
  • OpenAI API Key
  • Google API Key
  • Google OAuth Token
  • xAI Key
  • Groq Key
  • HuggingFace Token
  • Replicate Token
  • Perplexity Key
  • Databricks Token

Cloud providers (3)

  • AWS Access Key
  • AWS Session Token
  • DigitalOcean Token

Source control and CI (9)

  • GitHub Fine-Grained PAT
  • GitHub Personal Access Token
  • GitHub App Token
  • GitHub User Token
  • GitHub Refresh Token
  • GitHub OAuth Token
  • GitLab PAT
  • npm Token
  • PyPI Token

Payment processors (4)

  • Stripe Secret Key
  • Stripe Restricted Key
  • Square Token
  • Braintree Token

Communications (8)

  • Slack Token
  • Slack Webhook
  • Twilio API Key
  • Twilio Account SID
  • SendGrid Key
  • Mailgun Key
  • Discord Bot Token
  • Telegram Bot Token

Commerce (1)

  • Shopify Token

Database connection strings (3)

  • PostgreSQL URL
  • MySQL URL
  • MongoDB URL

Private keys and tokens (2)

  • PEM Private Key
  • JWT

Personally identifiable information (2)

  • Credit Card
  • US Social Security Number

🎯 Provider Pattern Coverage (Cloud Enterprise)

Cloud Enterprise

This layer adds 176 provider-specific patterns on top of the Basic 44, for a total of 220. Basic ships only the 44-pattern sensitive_data rule.

What it catches. The SaaS provider tokens that basic redaction misses. Coverage includes Slack, Discord, Twilio, SendGrid, GitLab, Mailgun, Cloudflare, Datadog, Notion, Linear, Figma, HuggingFace, Postman, Vercel, Azure DevOps, and GCP OAuth tokens, on top of the 13 patterns in the OWASP set.

Attack countered. An agent integrated with one of these providers emits its own token in tool output. Error responses, debug logs, and copy-paste-able curl commands are the usual culprits. The basic regex set does not know the shape of a Slack bot token or a Datadog API key, so the value slips through and gets stored verbatim. With provider coverage on, the captured secret is preserved in security_events as a redacted-identifiable fingerprint (e.g. ghp_AAAA...BBBB) along with the Hindsight API key name that submitted the retain, so SIEM tooling can identify exactly which key in your inventory leaked and which agent or service produced it.

How to enable. Add a rule with detector detect_secrets and action redact.

What you see. The detector name in security_events is the specific provider (GitHub Token, Slack Token, Stripe Live Key, and so on), so triage can route incidents to the right service owner without manually decoding the pattern.

📦 Base64 Expansion (Cloud Enterprise)

Cloud Enterprise

This layer is part of Cloud Enterprise. Basic does not decode base64 content before scanning.

What it catches. Secrets hidden inside base64 blobs. Two real-world cases dominate. First, Authorization: Basic <b64(user:pass)> HTTP headers that show up in tool output or copy-pasted curl commands. Second, base64-wrapped JSON payloads from external tools that embed credentials inside structured fields.

Attack countered. An attacker, or a careless integration, base64-wraps a credential so the regex layers pass the blob through unscrubbed. You store the raw blob. Anyone with recall access decodes it. The credential leaks. With base64 expansion on, the encoded blob is decoded in-memory, scanned for secrets by the other detectors, and any matching blob is replaced in-place with a redaction marker. The captured secret is recorded as a fingerprint in security_events so SIEM tooling can match it against your credential inventory.

How to enable. Add a rule with detector base64_decode and action redact.

🤖 LLM Screen (Cloud Enterprise)

Cloud Enterprise

This layer is part of Cloud Enterprise. Basic does not run an LLM screen over content.

What it catches. Secrets that show up in conversational prose, where structural patterns fail. "the password is hunter2", "use access code SUMMER2025", "the API key is abc-xyz-123, don't share". None of these have a fixed shape that a regex can lock onto.

Attack countered. Users pasting credentials inside natural language instructions or chat messages, where every regex layer misses the secret because nothing in its shape looks like a token. The risk is high for chat-driven internal agents and customer-support copilots, where users casually drop credentials into conversational turns.

How it works. Behind the scenes, the LLM is asked to identify embedded credentials in the payload and return a structured list of hits. Decisions are cached on a content hash, so identical content costs zero LLM tokens on repeat. This makes repeat workloads (document re-ingestion, replayed conversations, idempotent retries) effectively free after the first screen.

How to enable. Add a rule with detector llm_screen and action redact.

Cost note

LLM screen adds real latency, typically 500ms to 2s per unique payload. It is recommended only for high-sensitivity banks where the conversational secret risk is real. For most banks, the structural layers catch enough of the credential population that LLM screen is overkill.

Complete pattern catalog

Cloud Enterprise

The full 220-pattern catalog below ships in Cloud Enterprise via the detect_secrets rule. Basic ships only the 44-pattern sensitive_data rule (a subset of the patterns below, covering the common credential formats: AWS, GitHub PAT, Stripe, JWT, PEM private keys, credit card, US SSN, and other widely-leaked shapes).

Catalog at a glance
  • 220 provider patterns in the Cloud Enterprise detect_secrets rule
  • 3 pattern sources: detect-secrets 1.5.0 (25), GitLeaks (171), Hindsight-native (24)
  • 8 categories for the GitLeaks tier alone
  • Continuously expanded: the floor is locked at 200 by an automated test

Each provider name in the list below is what appears in the detector field of the security_events row when a match is recorded, so you can use this catalog to write SIEM alert rules.

detect-secrets 1.5.0 plugins (25)

Provider plugins shipped by the detect-secrets library. The high-entropy plugins (Base64HighEntropyString, HexHighEntropyString) are intentionally excluded because they produce excessive false positives on common short strings.

  • Artifactory
  • AWS Key
  • Azure Storage Key
  • Basic Auth
  • Cloudant
  • Discord Bot Token
  • GitHub Token
  • GitLab Token
  • IBM Cloud IAM
  • IBM COS HMAC
  • IP Public
  • JWT Token
  • Keyword
  • Mailchimp
  • NPM
  • OpenAI
  • Private Key
  • PyPI Token
  • SendGrid
  • Slack
  • SoftLayer
  • Square OAuth
  • Stripe
  • Telegram Bot Token
  • Twilio Key

GitLeaks rules (171)

Rules vendored from the MIT-licensed GitLeaks project. The loader intentionally skips 50 rules that collide with detect-secrets coverage (AWS, GitHub, GitLab, Slack, Stripe, SendGrid, Mailchimp, Twilio, Discord, Telegram, OpenAI, NPM, PyPI, JWT, Artifactory, Private Key, IBM Cloud IAM, IBM COS, SoftLayer, Square, Cloudant, and Azure Storage) so the same secret is not double-reported under two different format labels.

AI / ML providers (7)

  • Anthropic API key
  • Anthropic Admin API key
  • Cohere API token
  • Hugging Face access token
  • Hugging Face organization API token
  • Perplexity API key
  • Private AI API token

Cloud and infrastructure (32)

  • Alibaba access key ID
  • Alibaba secret key
  • Cloudflare API key
  • Cloudflare global API key
  • Cloudflare origin CA key
  • DigitalOcean access token
  • DigitalOcean PAT
  • DigitalOcean refresh token
  • Doppler API token
  • Fastly API token
  • Fly.io access token
  • GCP API key
  • Harness API key
  • HashiCorp Terraform API token
  • HashiCorp Terraform password
  • Heroku API key
  • Heroku API key v2
  • Kubernetes secret YAML
  • Netlify access token
  • OpenShift user token
  • PlanetScale API token
  • PlanetScale OAuth token
  • PlanetScale password
  • Pulumi API token
  • Scalingo API token
  • SettleMint application access token
  • SettleMint personal access token
  • SettleMint service access token
  • Vault batch token
  • Vault service token
  • Yandex access token
  • Yandex API key

Source control and CI (13)

  • Atlassian API token
  • Bitbucket client ID
  • Bitbucket client secret
  • Clojars API token
  • Codecov access token
  • DroneCI access token
  • JFrog API key
  • JFrog identity token
  • NuGet config password
  • ReadMe API token
  • RubyGems API token
  • Sourcegraph access token
  • Travis CI access token

Payments and finance (21)

  • Bittrex access key
  • Bittrex secret key
  • Coinbase access token
  • Duffel API token
  • Finicity API token
  • Finicity client secret
  • Finnhub access token
  • Flutterwave encryption key
  • Flutterwave public key
  • Flutterwave secret key
  • GoCardless API token
  • Kraken access token
  • KuCoin access token
  • KuCoin secret key
  • Plaid API token
  • Plaid client ID
  • Plaid secret key
  • Shopify access token
  • Shopify custom access token
  • Shopify private app access token
  • Shopify shared secret

Communications and email (17)

  • Beamer API token
  • EasyPost API token
  • EasyPost test API token
  • Gitter access token
  • Lob API key
  • Lob publishable API key
  • Mailgun private API token
  • Mailgun public key
  • Mailgun signing key
  • Mattermost access token
  • MessageBird API token
  • MessageBird client ID
  • Microsoft Teams webhook
  • Sendbird access ID
  • Sendbird access token
  • Sendinblue API token
  • Shippo API token

Analytics and observability (28)

  • Airtable API key
  • Airtable personal access token
  • Datadog access token
  • Defined Networking API token
  • Dynatrace API token
  • Grafana API key
  • Grafana Cloud API token
  • Grafana service account token
  • Infracost API token
  • LaunchDarkly access token
  • Mapbox API token
  • MaxMind license key
  • New Relic browser API token
  • New Relic insert key
  • New Relic user API ID
  • New Relic user API key
  • Octopus Deploy API key
  • Postman API token
  • Prefect API token
  • Sentry access token
  • Sentry org token
  • Sentry user token
  • Sidekiq secret
  • Sidekiq sensitive URL
  • Snyk API token
  • Sonar API token
  • Sumo Logic access ID
  • Sumo Logic access token

Identity and auth (28)

  • 1Password secret key
  • 1Password service account token
  • Adobe client ID
  • Adobe client secret
  • Age secret key
  • Asana client ID
  • Asana client secret
  • Authress service client access key
  • Azure AD client secret
  • curl auth header
  • curl auth user
  • Etsy access token
  • Facebook access token
  • Facebook page access token
  • Facebook secret
  • Flickr access token
  • Frame.io API token
  • Freshbooks access token
  • Intra42 client secret
  • LinkedIn client ID
  • LinkedIn client secret
  • Okta access token
  • Twitch API token
  • Twitter access secret
  • Twitter access token
  • Twitter API key
  • Twitter API secret
  • Twitter bearer token

Other (25)

  • Adafruit API key
  • Algolia API key
  • Cisco Meraki API key
  • ClickHouse Cloud API secret key
  • Confluent access token
  • Confluent secret key
  • Contentful delivery API token
  • Databricks API token
  • Dropbox API token
  • Dropbox long-lived API token
  • Dropbox short-lived API token
  • Freemius secret key
  • Generic API key
  • HubSpot API key
  • Intercom API key
  • Linear API key
  • Linear client secret
  • Looker client ID
  • Looker client secret
  • Notion API token
  • NYTimes access token
  • RapidAPI access token
  • Typeform API token
  • Zendesk secret key

Hindsight-native patterns (24)

Patterns that target modern AI providers, database connection URLs with embedded credentials, and PII formats that the upstream catalogs do not cover reliably. Each detector name is emitted as hindsight:<label>.

  • hindsight:anthropic_key
  • hindsight:openai_project_key
  • hindsight:openai_admin_key
  • hindsight:google_api_key
  • hindsight:google_oauth_token
  • hindsight:xai_key
  • hindsight:groq_key
  • hindsight:huggingface_token
  • hindsight:replicate_token
  • hindsight:perplexity_key
  • hindsight:databricks_token
  • hindsight:digitalocean_token
  • hindsight:github_fg_pat
  • hindsight:shopify_token
  • hindsight:stripe_restricted
  • hindsight:mailgun_key
  • hindsight:slack_webhook
  • hindsight:db_url_postgres
  • hindsight:db_url_mysql
  • hindsight:db_url_mongodb
  • hindsight:jwt
  • hindsight:private_key_pem
  • hindsight:credit_card
  • hindsight:ssn_us
Auto-enforced floor

The 220 count is verified by test_total_pattern_count_meets_enterprise_bar in the cloud test suite. If the catalog ever drops below 200, CI fails. Patterns are added as new SaaS providers ship recognizable credential shapes.

Most banks should enable the basic sensitive_data rule plus provider coverage: rules for sensitive_data and detect_secrets, both with action redact. That combination catches the structural credential leaks (AWS, GitHub, Stripe, JWT, and the 175-plus provider-specific formats added on Enterprise) at near-zero latency cost. It is the right baseline for nearly any production Cloud Enterprise deployment.

Add base64_decode if your agents process opaque tool payloads, copy-pasted HTTP requests, or curl-style content. Add llm_screen only if conversational secrets are a real threat (chat-driven internal agents, customer-support copilots, banks that ingest free-form user messages at scale). Both layers ship in Cloud Enterprise and stack cleanly on top of the structural layers.

Basic deployments

Basic deployments cannot enable detect_secrets, base64_decode, or llm_screen because those rules are not implemented in the Basic extension. Basic also silently downgrades the block action to redact. If you need any of those capabilities, run Cloud Enterprise.

What Survives, What Does Not

Redaction never alters the agent's downstream behavior beyond removing the secret substring. Fact extraction runs on the redacted content, so memory units reflect the redacted version, and entity resolution sees the redacted text. Recall surfaces redacted content to callers. The agent never sees the raw credential after the screen runs.

On Cloud Enterprise, a redacted-identifiable fingerprint of each captured secret (e.g. ghp_AAAA...BBBB) is recorded in security_events.event_metadata.hits[].preview. The plaintext is never persisted, so the audit trail does not become a second-class credential warehouse. The Hindsight API key name that submitted the retain is recorded on the same row so SIEM tooling can attribute the leak to a specific agent or service. Basic does not maintain a security_events audit trail at all; raw values are redacted in-place and discarded.

Document append still works correctly. Redacted output is byte-deterministic on identical input, so re-screening the same chunk produces the same hash, and the de-duplication paths that compare content hashes continue to function. Idempotent retain calls remain idempotent after redaction is enabled.

Where to Go Next

  • Detectors covers non-redaction enforcement: blocking and protected tag namespaces.
  • Policies walks through writing, validating, and rolling out a bank policy.