Skip to content

Playbooks

Playbooks are structured, multi-step remediation workflows that run against a single device. Each playbook defines an ordered sequence of steps — diagnose, act, wait, verify, rollback — that execute tools on the target device and evaluate the results. Playbooks codify your standard operating procedures so that incident response is consistent, auditable, and repeatable regardless of who triggers it.

Breeze ships with built-in playbooks for common scenarios (disk cleanup, service restart, memory pressure relief). Organizations can also create custom playbooks scoped to their own environment. Playbooks can be triggered manually by administrators, automatically by the AI assistant in response to alerts, or programmatically through the API.

Every execution is recorded with per-step timing, tool input/output, and pass/fail status. If a step fails, the playbook’s failure policy determines whether execution stops, continues, or triggers a rollback sequence.


Playbooks can be triggered in several ways from the Breeze dashboard.

The most common way to run a playbook is through the AI assistant during an investigation. When the AI identifies a remediation opportunity — for example, a device with high disk usage — it can suggest and execute the appropriate playbook. The AI handles variable substitution and monitors each step as it executes.

  1. Navigate to the device. Open the device detail page for the target device.

  2. Open Playbook History. Scroll to the Playbook History section on the device page. This shows all past playbook executions for the device.

  3. Trigger a playbook. The AI assistant or an alert-triggered automation initiates playbook execution on the device. You can also use the API to execute a specific playbook against the device (see API Reference).

When executing a playbook, you supply runtime variables that are substituted into step inputs. For example, the Service Restart playbook requires a serviceName variable to know which service to restart. The AI assistant fills these in automatically based on context; when triggering via API, pass them in the variables field.


Every playbook execution is recorded and visible from the device’s Playbook History section.

Each execution row shows:

  • Playbook name and category — with a color-coded category badge (disk, service, memory, patch, security).
  • Status — Pending, Running, Waiting, Completed, Failed, Rolled Back, or Cancelled, each with a distinct icon and color.
  • Trigger source — who or what initiated the run (AI, manual, alert).
  • Duration — total execution time from start to completion.

Click an execution to expand it and view:

  • Step-by-step results — each step is listed with its index number, name, tool used, duration, and status (Done, Failed, Skipped, Pending, Running). Failed steps highlight in red for quick identification.
  • Error details — if the execution failed, the error message is displayed in a prominent banner.
  • Rollback indicator — if a rollback was triggered, a notice appears confirming it was executed.

Use the Refresh button to update the list while executions are still in progress.


Each step in a playbook has a type that determines its role in the workflow:

| Type | Purpose | |------|---------| | diagnose | Collect baseline data before remediation. Runs a tool and captures current state for later comparison. | | act | Perform a remediation action. Runs a tool that modifies the device (restart a service, delete files, etc.). | | wait | Pause execution for a specified number of seconds. Allows the system to stabilize before verification. | | verify | Check that the remediation achieved the desired result. Evaluates a condition against tool output. | | rollback | Undo changes if verification fails. Only executed when the failure policy is set to rollback. |

| Status | Meaning | |--------|---------| | pending | Execution record created but no steps have started | | running | At least one step is actively executing | | waiting | Execution is paused on a wait step | | completed | All steps finished successfully and verification passed | | failed | A step failed and the failure policy stopped execution | | rolled_back | A step failed and the rollback sequence was executed | | cancelled | Execution was cancelled by a user or system before completion |

| Status | Meaning | |--------|---------| | pending | Step has not started yet | | running | Step is currently executing | | completed | Step finished successfully | | failed | Step encountered an error | | skipped | Step was bypassed (e.g., remaining steps after a failure with stop policy) |

Each step can define an onFailure behavior that controls what happens when it fails:

| Policy | Behavior | |--------|----------| | stop | Abort the playbook immediately. Remaining steps are marked skipped. This is the default. | | continue | Log the failure and proceed to the next step. | | rollback | Execute any rollback-type steps in reverse order, then mark the execution as rolled_back. |


Breeze includes three built-in playbooks that are available to all organizations out of the box. They are updated automatically with each release.

Category: disk Required permissions: devices:read, devices:execute

A four-phase workflow that frees disk space safely:

  1. Capture baseline disk usage — Runs the analyze_disk_usage tool to collect current disk utilization and identify cleanup candidates.

  2. Preview safe cleanup candidates — Runs disk_cleanup in preview mode against safe categories: temporary files, browser cache, package cache, and trash. No files are deleted in this step.

  3. Execute cleanup — Runs disk_cleanup in execute mode, deleting the files identified in the preview step.

  4. Wait for filesystem metrics — Pauses for 30 seconds to allow disk metrics to refresh.

  5. Verify disk usage improved — Runs analyze_disk_usage again and checks that disk_usage_percent is below 90%. If verification fails, execution stops.

Category: service Required permissions: devices:read, devices:execute Trigger conditions: Can be linked to service_down alerts (auto-execute is disabled by default)

  1. Check current service status — Uses manage_services to read the service state before remediation.

  2. Restart target service — Restarts the service specified by the {{serviceName}} variable.

  3. Wait for service startup — Pauses for 10 seconds to allow the service to initialize.

  4. Verify service health — Checks that service_status equals running. If the service is not running after restart, execution stops.

Category: memory Required permissions: devices:read, devices:execute

  1. Capture baseline memory metrics — Runs analyze_metrics to check RAM utilization over the last hour.

  2. Restart memory-heavy service — Restarts the service specified by {{serviceName}} to release held memory.

  3. Wait for memory stabilization — Pauses for 300 seconds (5 minutes) to allow memory metrics to settle.

  4. Verify memory improved — Checks that ram_usage_percent is below 85%. If memory usage is still elevated, execution stops.


Organizations can create custom playbooks scoped to their own environment. Custom playbooks have an orgId set to the creating organization and are only visible to users with access to that organization.

Custom playbooks are created through the API. Each playbook definition includes:

| Field | Type | Required | Description | |-------|------|----------|-------------| | name | string | Yes | Human-readable playbook name (max 255 characters) | | description | text | Yes | Detailed description of what the playbook does | | steps | PlaybookStep[] | Yes | Ordered array of step definitions | | triggerConditions | object | No | Conditions under which the playbook can be auto-triggered | | category | string | No | Grouping category: disk, service, memory, patch, or security | | requiredPermissions | string[] | No | Permissions the executing user must have (defaults to []) | | isActive | boolean | No | Whether the playbook is available for execution (defaults to true) |

Trigger conditions control when a playbook can be automatically activated:

| Field | Type | Description | |-------|------|-------------| | alertTypes | string[] | Alert types that can trigger this playbook (e.g., ["service_down", "high_cpu"]) | | deviceTags | string[] | Only trigger for devices with these tags | | autoExecute | boolean | If true, execute automatically when conditions match. If false, require manual confirmation. | | minSeverity | string | Minimum alert severity to trigger: low, medium, high, or critical |


Each step in a playbook is defined as a JSON object with the following fields:

| Field | Type | Required | Description | |-------|------|----------|-------------| | type | string | Yes | Step type: diagnose, act, wait, verify, or rollback | | name | string | Yes | Short name displayed in the execution log | | description | string | Yes | Detailed explanation of what this step does | | tool | string | No | Name of the tool to execute (not required for wait steps) | | toolInput | object | No | Key-value pairs passed to the tool. Supports {{variable}} template placeholders. | | waitSeconds | number | No | Number of seconds to pause (only used by wait steps) | | verifyCondition | object | No | Condition to evaluate after tool execution (used by verify steps) | | onFailure | string | No | Failure behavior for this step: stop, continue, or rollback |

Verify steps evaluate a condition against the tool output to determine success:

| Field | Type | Description | |-------|------|-------------| | metric | string | The metric name to evaluate from the tool’s output | | operator | string | Comparison operator: lt (less than), gt (greater than), eq (equal), ne (not equal), contains | | value | any | The expected value to compare against |

Example: Verify disk usage is below 90%:

{
"type": "verify",
"name": "Verify disk usage",
"description": "Confirm disk usage dropped below threshold",
"tool": "analyze_disk_usage",
"toolInput": { "deviceId": "{{deviceId}}", "refresh": true },
"verifyCondition": {
"metric": "disk_usage_percent",
"operator": "lt",
"value": 90
},
"onFailure": "stop"
}

Playbooks can be activated through several mechanisms:

Any user with the required permissions can execute a playbook on a specific device through the API:

Terminal window
POST /playbooks/:playbookId/execute
Content-Type: application/json
Authorization: Bearer <token>
{
"deviceId": "device-uuid",
"variables": {
"serviceName": "nginx"
}
}

The variables object provides runtime values for {{variable}} placeholders in step toolInput fields. The context field can pass additional metadata such as an alertId or conversationId for traceability.

The Breeze AI assistant can execute playbooks as part of an automated incident response conversation. When the AI identifies a matching playbook for a detected issue, it calls the execute endpoint with triggeredBy: "ai" and includes the conversationId in the execution context. The AI then monitors execution progress and updates step results as each step completes.

Playbooks with triggerConditions.alertTypes configured can respond to matching alerts. When an alert fires and its type matches a playbook’s trigger conditions:

  • If autoExecute is true, the playbook runs immediately on the affected device.
  • If autoExecute is false, the playbook is suggested to the operator but requires manual confirmation.

Additional filtering applies: deviceTags restricts matching to devices with specific tags, and minSeverity sets the minimum alert severity that qualifies.


Every playbook run creates an execution record. Execution records include:

| Field | Description | |-------|-------------| | id | Unique execution UUID | | orgId | Organization the execution belongs to | | deviceId | Target device UUID | | playbookId | Playbook definition UUID | | status | Current execution status | | currentStepIndex | Index of the step currently executing (0-based) | | steps | Array of per-step results with timing, tool output, and status | | context | Execution context including alertId, conversationId, and variables | | errorMessage | Error description if the execution failed | | rollbackExecuted | Whether a rollback sequence was triggered | | triggeredBy | How the execution was initiated (e.g., ai, manual, alert) | | triggeredByUserId | UUID of the user who triggered the execution (if applicable) | | startedAt | Timestamp when the first step began | | completedAt | Timestamp when the execution finished |

List all executions with optional filters:

GET /playbooks/executions?deviceId=&playbookId=&status=&limit=50

Get full detail for a single execution, including the playbook definition and device information:

GET /playbooks/executions/:executionId

The response includes the complete steps array with per-step results:

{
"execution": {
"id": "exec-uuid",
"status": "completed",
"currentStepIndex": 4,
"steps": [
{
"stepIndex": 0,
"stepName": "Capture baseline disk usage",
"status": "completed",
"toolUsed": "analyze_disk_usage",
"toolInput": { "deviceId": "device-uuid", "refresh": true },
"toolOutput": "Disk usage: 94.2%...",
"startedAt": "2026-02-23T10:00:00.000Z",
"completedAt": "2026-02-23T10:00:03.500Z",
"durationMs": 3500
}
]
},
"playbook": { "id": "...", "name": "Disk Cleanup", "category": "disk" },
"device": { "id": "...", "hostname": "web-server-01" }
}

As each step completes, the execution record is updated via PATCH:

Terminal window
PATCH /playbooks/executions/:executionId
Content-Type: application/json
Authorization: Bearer <token>
{
"status": "running",
"currentStepIndex": 2,
"steps": [
{
"stepIndex": 0,
"stepName": "Capture baseline",
"status": "completed",
"toolUsed": "analyze_disk_usage",
"durationMs": 3500
}
]
}

Execution status transitions are strictly validated. The allowed transitions are:

| From | Allowed transitions | |------|---------------------| | pending | running, cancelled | | running | waiting, completed, failed, rolled_back, cancelled | | waiting | running, completed, failed, rolled_back, cancelled | | completed | (terminal — no transitions) | | failed | (terminal — no transitions) | | rolled_back | (terminal — no transitions) | | cancelled | (terminal — no transitions) |


All playbook endpoints require authentication and scope-based authorization. Organization-scope users see their own organization’s playbooks plus all built-in playbooks. Partner-scope users see playbooks across their managed organizations. System-scope users see everything.

| Method | Path | Description | |--------|------|-------------| | GET | /playbooks | List active playbooks (?category=disk\|service\|memory\|patch\|security\|all) | | GET | /playbooks/:id | Get a single playbook definition by ID | | POST | /playbooks/:id/execute | Execute a playbook on a device | | GET | /playbooks/executions | List execution history (?deviceId=&playbookId=&status=&limit=) | | GET | /playbooks/executions/:id | Get full execution detail with playbook and device info | | PATCH | /playbooks/executions/:id | Update execution progress (status, steps, context) |

Request body:

| Field | Type | Required | Description | |-------|------|----------|-------------| | deviceId | UUID | Yes | Target device to run the playbook against | | variables | object | No | Runtime values for {{variable}} placeholders in step toolInput | | context | object | No | Additional context (alertId, conversationId, userInput) |

Response: Returns the created execution record, the playbook definition (including steps), and the target device.

Error responses:

| Status | Reason | |--------|--------| | 404 | Playbook not found, not active, or access denied | | 403 | User lacks required permissions defined by the playbook | | 404 | Device not found or access denied | | 403 | Playbook and device belong to different organizations | | 409 | Referenced resource was deleted concurrently |

Request body (all fields optional, at least one required):

| Field | Type | Description | |-------|------|-------------| | status | string | New execution status (must be a valid transition) | | currentStepIndex | number | Index of the currently executing step | | steps | array | Updated step results array | | context | object | Updated execution context | | errorMessage | string or null | Error message (set on failure, clear on recovery) | | rollbackExecuted | boolean | Whether rollback was triggered | | startedAt | string or null | ISO 8601 timestamp | | completedAt | string or null | ISO 8601 timestamp |


Playbooks integrate with the Incident Response system. When a playbook is executed in the context of an active incident, the execution record includes the incidentId in its context, creating a direct link between the remediation workflow and the incident timeline.

The AI assistant can trigger playbooks during incident response conversations. For example, when investigating a compromised device, the AI might execute the Service Restart playbook to restart a suspicious service, then record the result as evidence on the incident.

Create custom playbooks with category: "security" for incident-specific workflows:

  • Endpoint isolation — disable network interfaces, block USB, kill suspicious processes
  • Evidence collection — gather logs, running processes, network connections, and screenshots
  • Service recovery — restart affected services after containment, verify health

Security playbooks can be linked to alert trigger conditions with alertTypes: ["security_threat"] so they execute automatically (or with confirmation) when a security alert fires.


Playbook not appearing in the list. Verify the playbook’s isActive field is true. The GET /playbooks endpoint only returns active playbooks. Also confirm the authenticated user has access to the playbook’s organization, or that the playbook is a built-in (built-in playbooks are visible to all organizations).

Execution fails with “Missing required permissions”. The playbook definition specifies requiredPermissions that the executing user must have. Check the missingPermissions array in the 403 response to identify which permissions are needed. Built-in playbooks require devices:read and devices:execute.

Execution stuck in pending status. The execute endpoint creates the execution record but does not run steps automatically. The AI assistant or the calling system is responsible for driving execution by invoking each step’s tool and updating the execution via PATCH. If no external caller advances the execution, it remains in pending.

PATCH returns 409 Conflict. The execution was modified by another process between your read and write. Re-fetch the execution via GET /playbooks/executions/:id, merge your changes with the current state, and retry the PATCH.

Invalid status transition error. Execution status changes are validated against an allowed-transitions table. Terminal statuses (completed, failed, rolled_back, cancelled) cannot transition to any other status. Check the current execution status before attempting a PATCH.

Verify step fails but remediation worked. The verification condition may be too strict, or the wait step before verification may not allow enough time for metrics to settle. Increase waitSeconds on the preceding wait step, or adjust the verifyCondition threshold. For disk cleanup, 30 seconds is usually sufficient; for memory relief, 300 seconds (5 minutes) is recommended.

Template variables not substituted. Ensure the variable names in toolInput match the keys passed in the variables field of the execute request. Template placeholders use the format {{variableName}} and are case-sensitive.