Sensitive Data Discovery

Sensitive Data Discovery scans your managed devices for files containing personally identifiable information (PII), payment card data (PCI), protected health information (PHI), credentials, and financial records. The system uses pattern-based detection to identify sensitive content, assigns risk scores and confidence levels to each finding, and provides remediation workflows to encrypt, quarantine, or securely delete the offending files. A centralized dashboard tracks open findings, risk distribution, and remediation progress across your fleet.

Scans can be triggered manually for specific devices or scheduled through scan policies. Each scan runs on the agent side, and results are reported back to the API where findings are stored, deduplicated across scans, and made available for reporting and remediation.

Key Concepts

Data Classification Types

| Type | Description | |------|-------------| | pii | Personally identifiable information — names, addresses, phone numbers, email addresses, government IDs (SSN, passport numbers) | | pci | Payment card industry data — credit/debit card numbers, CVVs, cardholder data | | phi | Protected health information — medical records, insurance IDs, health conditions, prescription data | | credential | Stored credentials — API keys, passwords, tokens, private keys, connection strings | | financial | Financial records — bank account numbers, routing numbers, tax documents, financial statements |

Risk Levels

| Risk | Description | |------|-------------| | low | Finding has limited exposure potential — low match count, non-sensitive file location, or low confidence | | medium | Finding warrants review — moderate match count or sensitive file type | | high | Finding requires attention — multiple matches in an accessible location or high-confidence detection | | critical | Finding demands immediate action — credentials in plaintext, unprotected PCI data, or high match count with high confidence |

Finding Statuses

| Status | Description | |--------|-------------| | open | Active finding that has not been addressed | | remediated | Finding has been remediated (encrypted, quarantined, deleted, or manually marked) | | accepted | Risk has been explicitly accepted by an administrator | | false_positive | Finding has been determined to be a false positive |

Remediation Actions

| Action | Destructive | Description | |--------|-------------|-------------| | encrypt | Yes | Encrypt the file in place using the configured encryption key | | quarantine | Yes | Move the file to a quarantine directory on the device | | secure_delete | Yes | Permanently and securely delete the file | | accept_risk | No | Acknowledge the finding and accept the risk | | false_positive | No | Mark the finding as a false positive | | mark_remediated | No | Manually mark the finding as remediated |

Scan Policies

Scan policies define the detection configuration, file scope, and schedule for sensitive data scans. Each policy is scoped to an organization and specifies which data types to detect, which paths to scan, and how often to run.

Creating a Policy

POST /sensitive-data/policies
Content-Type: application/json
Authorization: Bearer <token>

{
  "orgId": "uuid",
  "name": "Weekly PII & Credential Scan",
  "scope": {
    "includePaths": ["/Users", "/home", "C:\\Users"],
    "excludePaths": ["/Users/*/Library", "C:\\Windows"],
    "fileTypes": [".txt", ".csv", ".xlsx", ".json", ".env", ".conf"],
    "maxFileSizeBytes": 104857600,
    "workers": 4,
    "timeoutSeconds": 600
  },
  "detectionClasses": ["pii", "credential"],
  "schedule": {
    "enabled": true,
    "type": "interval",
    "intervalMinutes": 10080,
    "timezone": "America/Chicago"
  },
  "isActive": true
}

Policy Fields

| Field | Type | Required | Description | |-------|------|----------|-------------| | orgId | UUID | No | Organization ID. Auto-resolved for org-scoped tokens | | name | string | Yes | Policy name (max 200 chars) | | scope | object | No | Scan scope configuration (see below) | | detectionClasses | string[] | Yes | Data types to detect: pii, pci, phi, credential, financial (1-5) | | schedule | object | No | Scan schedule configuration (see below) | | isActive | boolean | No | Whether the policy is active (default true) |

Scope Configuration

| Field | Type | Description | |-------|------|-------------| | includePaths | string[] | Paths to scan (max 256, each up to 2,048 chars) | | excludePaths | string[] | Paths to exclude from scanning (max 256) | | fileTypes | string[] | File extensions to scan (max 128, each up to 32 chars) | | maxFileSizeBytes | integer | Maximum file size to scan (1 KB to 1 GB) | | workers | integer | Concurrent scan workers (1-32) | | timeoutSeconds | integer | Scan timeout in seconds (5-1,800) | | suppressPaths | string[] | Paths to suppress in findings (max 256) | | suppressPatternIds | string[] | Pattern IDs to suppress (max 200) | | suppressFilePathRegex | string[] | Regex patterns for file paths to suppress (max 80) | | ruleToggles | object | Per-rule enable/disable overrides (key: rule ID, value: boolean) |

Schedule Configuration

| Field | Type | Description | |-------|------|-------------| | enabled | boolean | Whether the schedule is active (default true) | | type | string | Schedule type: manual, interval, or cron | | intervalMinutes | integer | Scan interval in minutes (5 to 10,080 / one week) | | cron | string | Cron expression (when type is cron) | | timezone | string | Timezone for scheduled scans | | deviceIds | UUID[] | Specific devices to scan (max 1,000). If omitted, scans all devices in the org |

Updating a Policy

PUT /sensitive-data/policies/:id
Content-Type: application/json

{
  "name": "Updated Scan Policy",
  "detectionClasses": ["pii", "credential", "pci"],
  "isActive": true
}

Deleting a Policy

DELETE /sensitive-data/policies/:id

Running Scans

Triggering a Manual Scan

POST /sensitive-data/scan
Content-Type: application/json
Authorization: Bearer <token>

{
  "deviceIds": ["device-uuid-1", "device-uuid-2"],
  "policyId": "policy-uuid",
  "detectionClasses": ["pii", "credential"],
  "scope": {
    "includePaths": ["/home"],
    "fileTypes": [".txt", ".csv", ".env"],
    "maxFileSizeBytes": 52428800
  },
  "idempotencyKey": "scan-2026-02-15-batch-1"
}

| Field | Type | Required | Description | |-------|------|----------|-------------| | deviceIds | UUID[] | Yes | Devices to scan (1-200) | | policyId | UUID | No | Use an existing policy’s scope and detection classes | | scope | object | No | Override the scan scope (takes precedence over policy scope) | | detectionClasses | string[] | No | Override detection classes (takes precedence over policy) | | idempotencyKey | string | No | Client-provided idempotency key (8-128 chars). Also accepted via Idempotency-Key header |

The API creates one scan record per device and enqueues each scan through BullMQ. Decommissioned devices are excluded. The response includes the created scans, the count of successfully queued jobs, and any skipped device IDs.

Scan Statuses

| Status | Description | |--------|-------------| | queued | Scan created and enqueued for the agent | | running | Agent is actively scanning the device | | completed | Scan finished — results include findings summary | | failed | Scan encountered an error |

Listing Recent Scans

GET /sensitive-data/scans?limit=50

Returns recent scans ordered by creation time, with device hostname, policy reference, status, and timing information.

Getting Scan Details

GET /sensitive-data/scans/:id

Returns full scan details including a findings summary with counts by risk level and status. If the scan has pre-computed summary counters in its summary JSONB, those are returned directly. Otherwise, findings are aggregated on the fly.

Findings and Reports

Querying Findings

GET /sensitive-data/report?status=open&risk=critical&dataType=credential&page=1&limit=50

Report Query Parameters

| Parameter | Type | Description | |-----------|------|-------------| | status | string | Filter by status: open, remediated, accepted, false_positive | | risk | string | Filter by risk level: low, medium, high, critical | | dataType | string | Filter by data type: pii, pci, phi, credential, financial | | deviceId | UUID | Filter findings for a specific device | | scanId | UUID | Filter findings from a specific scan | | page | integer | Page number for pagination | | limit | integer | Results per page (default 200) |

Finding Response Fields

| Field | Type | Description | |-------|------|-------------| | id | UUID | Unique finding identifier | | orgId | UUID | Organization ID | | deviceId | UUID | Device where the file was found | | deviceName | string | Hostname of the device | | scanId | UUID | Scan that discovered the finding | | filePath | string | Full path to the file containing sensitive data | | dataType | string | Classification type: pii, pci, phi, credential, financial | | patternId | string | Identifier of the detection pattern that matched | | matchCount | integer | Number of matches found in the file | | risk | string | Risk level: low, medium, high, critical | | confidence | float | Detection confidence score (0.0 to 1.0) | | fileOwner | string | File owner on the device | | fileModifiedAt | ISO 8601 | When the file was last modified | | firstSeenAt | ISO 8601 | When this finding was first detected | | lastSeenAt | ISO 8601 | When this finding was last confirmed | | occurrenceCount | integer | Number of scans that have found this file | | status | string | Current status: open, remediated, accepted, false_positive | | remediationAction | string | Action taken (if any) | | remediatedAt | ISO 8601 | When remediation occurred |

Dashboard

The dashboard endpoint aggregates all findings data into a single response for the sensitive data overview:

GET /sensitive-data/dashboard

Dashboard Response

{
  "data": {
    "totals": {
      "findings": 1250,
      "open": 842,
      "criticalOpen": 23,
      "remediated24h": 45,
      "averageOpenAgeHours": 168.5
    },
    "byDataType": {
      "pii": 520,
      "credential": 380,
      "pci": 200,
      "phi": 100,
      "financial": 50
    },
    "byRisk": {
      "low": 600,
      "medium": 350,
      "high": 200,
      "critical": 100
    }
  }
}

| Field | Description | |-------|-------------| | totals.findings | Total number of findings across all statuses | | totals.open | Number of findings in open status | | totals.criticalOpen | Number of critical-risk findings that are still open | | totals.remediated24h | Number of findings remediated in the last 24 hours | | totals.averageOpenAgeHours | Average age of open findings in hours | | byDataType | Finding count broken down by data classification | | byRisk | Finding count broken down by risk level |

Remediation

Remediating Findings

POST /sensitive-data/remediate
Content-Type: application/json
Authorization: Bearer <token>

{
  "findingIds": ["finding-uuid-1", "finding-uuid-2"],
  "action": "quarantine",
  "confirm": true,
  "quarantineDir": "/var/lib/breeze/quarantine/sensitive",
  "dryRun": false
}

Remediation Request Fields

| Field | Type | Required | Description | |-------|------|----------|-------------| | findingIds | UUID[] | Yes | Findings to remediate (1-250) | | action | string | Yes | encrypt, quarantine, secure_delete, accept_risk, false_positive, mark_remediated | | confirm | boolean | Conditional | Required for destructive actions (encrypt, quarantine, secure_delete) | | dryRun | boolean | No | Preview which findings would be affected without making changes (default false) | | secondApprovalToken | string | Conditional | Required for secure_delete when second approval is enabled | | encryptionKeyRef | string | No | Reference to the encryption key (for encrypt action) | | encryptionKeyVersion | string | No | Version of the encryption key | | quarantineDir | string | No | Custom quarantine directory path on the device |

How Remediation Works

Non-destructive actions (accept_risk, false_positive, mark_remediated) update the finding status directly in the database. No command is sent to the agent.
Destructive actions (encrypt, quarantine, secure_delete) queue a command to the target device’s agent via the command queue. Each finding becomes a separate command targeting the specific file path.
Dry run mode returns the list of eligible findings and their file paths without making any changes, allowing you to preview the impact before committing.
Second approval can be required for secure_delete operations by setting the SENSITIVE_DATA_REQUIRE_SECOND_APPROVAL environment variable. When enabled, a valid secondApprovalToken must be provided.

Remediation Response

For destructive actions, the response includes queued commands and any failures:

{
  "data": {
    "queued": [
      { "findingId": "uuid", "commandId": "uuid" }
    ],
    "failed": [
      { "findingId": "uuid", "error": "Device is offline" }
    ],
    "updated": 5
  }
}

API Reference

Scans

| Method | Path | Description | |--------|------|-------------| | POST | /sensitive-data/scan | Trigger a manual scan on one or more devices | | GET | /sensitive-data/scans | List recent scans with status and summary | | GET | /sensitive-data/scans/:id | Get scan details with findings breakdown |

Findings and Reports

| Method | Path | Description | |--------|------|-------------| | GET | /sensitive-data/report | Query findings with filtering and pagination | | GET | /sensitive-data/dashboard | Aggregated dashboard with totals, risk, and data type distribution |

Remediation

| Method | Path | Description | |--------|------|-------------| | POST | /sensitive-data/remediate | Remediate findings (destructive or non-destructive) |

Policies

| Method | Path | Description | |--------|------|-------------| | GET | /sensitive-data/policies | List all scan policies for the organization | | POST | /sensitive-data/policies | Create a new scan policy | | PUT | /sensitive-data/policies/:id | Update an existing policy | | DELETE | /sensitive-data/policies/:id | Delete a policy |

Troubleshooting

Scan stuck in queued status. The scan was created but the agent has not started processing it. Verify that the target device is online and the agent is connected. Check that BullMQ workers are running and processing the sensitive data scan queue. If the scan enqueue failed, the creation response includes an enqueueFailures count greater than zero.

No findings returned after a completed scan. The scan completed but did not detect any sensitive data matching the configured detection classes and scope. Verify the detectionClasses include the types you expect to find. Check the scope.includePaths to ensure the correct directories are being scanned. Review scope.excludePaths and suppressPaths to make sure the target files are not being excluded. Also check scope.maxFileSizeBytes — files larger than the limit are skipped.

Duplicate scan created despite idempotency key. Idempotency checks match on both the key and the request fingerprint (a SHA-256 hash of device IDs, policy, scope, and detection classes). If any of these values differ between requests, the fingerprint will not match and a new scan will be created. Idempotency protection also only applies to scans created within the last 24 hours.

Destructive remediation rejected with confirm=true error. Destructive actions (encrypt, quarantine, secure_delete) require confirm: true in the request body. If the secure_delete action is rejected despite confirmation, check whether the SENSITIVE_DATA_REQUIRE_SECOND_APPROVAL environment variable is enabled — if so, a valid secondApprovalToken must also be provided.

Remediation command failed to queue for a device. When a destructive remediation command fails to queue, the finding ID appears in the failed array of the response with an error message. Common causes include the device being offline or the command queue being unavailable. The finding’s remediationAction and remediationMetadata are still updated to reflect the attempted action, but the agent will not receive the command until it is re-queued.

Dashboard shows stale totals. The dashboard computes totals by scanning all findings in real time. If the finding count is large, the response may take a moment to compute. The averageOpenAgeHours is calculated from each open finding’s lastSeenAt timestamp. If scans are not running regularly, the age values may appear inflated.