The redaction system uses pattern-based detection to identify sensitive information in documents. Each redaction category relies on a combination of keywords, formats, and structural signals to determine whether content should be redacted.
💡 Tip
For the most precise extraction of data points, we recommend using Dragon AI. It’s designed to understand context and structure beyond pattern-based detection, making it better suited for complex or ambiguous data.
Personally Identifiable Information (PII)
Email Address
What the system looks for
A standard email structure containing:
A username
An
@symbolA valid domain and extension
Will match
Won’t match
user@example(missing domain extension)user.example.com(missing@symbol)
Phone Number
What the system looks for
Recognized phone number formats that include separators and optional country or area codes.
Will match
123-456-7890(555) 123-4567+1 415-555-2671
Won’t match
1234567890(no separators)123-456(too short)
Address
What the system looks for
Street addresses with clear structural indicators.
Required signals
A street-type keyword (e.g. Street, Road, Avenue, Lane) or a unit prefix (e.g. Apt, Suite, Flat)
A valid postal code (US ZIP or UK postcode)
Will match
123 Main Street, New York, NY 10001Flat 3, 78 Victoria Road, Edinburgh, EH1 2JWPO Box 123, London, AA1 1AA
Won’t match
123 Main(missing street type and postal code)Order #12345(numbers without address context)
US Social Security Number (SSN)
What the system looks for
A valid SSN format with correct digit groupings and known validation rules.
Will match
123-45-6789
Won’t match
123456789(missing dashes)123-45-67890(incorrect number of digits)000-00-0000(invalid values)
Age
What the system looks for
Age expressions that combine numbers with time units or age-specific language.
Will match
Age: 25 years30 years old6 months
Won’t match
25(number without age context)Year 25(not age-related)
Gender
What the system looks for
Explicit gender terms or gender labels with context.
Will match
male,female,non-binary, gender-fluid, transgender.Gender: MGender: F
Won’t match
MorFon their own (no gender context)
UK National Insurance Number
What the system looks for
A valid UK National Insurance number format with correct prefixes and suffixes.
Will match
AB 12 34 56 C
Won’t match
Invalid prefixes or formats that do not meet NI standards
US ZIP Code
What the system looks for
A ZIP code paired with a US state abbreviation.
Will match
NY 10001CA 90210-1234
Won’t match
10001(missing state)12345(no state context)
Financial and Business Information
Credit Card Number
What the system looks for
Valid credit card number formats 15-16 digits long.
Will match
4532-0151-1283-0366(Visa)6011-0009-9013-9424(Discover)
Won’t match
1234567812345678(fails validation)Random numeric sequences with invalid lengths or formatting
Account Number
What the system looks for
Standalone numeric sequences between 8 and 17 digits.
Will match
123456789012345
Won’t match
Numbers embedded in URLs
Numbers with prefixes like
ACCT-
Amount
What the system looks for
Monetary values with a currency symbol or currency code.
Will match
$1,500.00EUR 25.50-£100.00
Won’t match
1500(no currency context)
IBAN
What the system looks for
International Bank Account Numbers with valid country codes and lengths.
Will match
GB82 WEST 1234 5698 7654 32
Won’t match
Incorrect lengths or invalid formats
SWIFT / BIC Code
What the system looks for
Valid 8- or 11-character bank identifier codes.
Will match
BOFAUS3NDEUTDEFF500
Won’t match
Invalid formats or unsupported country codes
Other Sensitive Data
Date
What the system looks for
Commonly used date formats.
Will match
12/25/202325-Dec-2023January 15, 2024
Time
What the system looks for
Time values with optional AM/PM or timezone indicators.
Will match
14:30:002:30 PM14:30 UTC
URLs
What the system looks for
Fully qualified URLs that include a protocol.
Will match
https://example.comhttps://api.example.com/path
Won’t match
www.example.com(missing protocol)
IP Addresses
What the system looks for
Valid IPv4 or IPv6 address formats.
Will match
192.168.1.12001:0db8:85a3:0000:0000:8a2e:0370:7334
Six- and Eight-Digit Numbers
What the system looks for
Exact numeric sequences of six or eight digits.
Will match
12345612345678
Won’t match
Numbers embedded in URLs
Numbers with invalid separators
