IncidentPulseOperations

IncidentPulse Documentation

Browse the sections below or use the quick navigation chips.

01What is IncidentPulse?

Operational visibility from detection to public communication

IncidentPulse is an internal and public incident management platform that helps teams log outages, coordinate responders, and communicate status transparently. It replaces spreadsheets and siloed tools with a single system of record for incidents, teams, and customer-facing updates.

The problem it solves

Fragmented tooling slows response and frustrates customers. IncidentPulse centralizes detection, response, and communication so every stakeholder sees the same truth.

  • Log incidents from monitoring tools, engineers, and customer reports without duplicating work.
  • Coordinate responders with guided runbooks, assignments, and automated stakeholder updates.
  • Publish a public status page that reads from the same dataset, keeping customers aligned.

Who should use it

Designed for Admins, Operators, and Public stakeholders who need accountability and transparency.

Admin

Platform owners orchestrating policy, integrations, and governance.

Operator

On-call responders executing playbooks and communicating progress.

Public

Customers, partners, or executives monitoring health and uptime.

Incident flow at a glance

Each incident moves through a connected lifecycle so no context is lost between teams.

Detect

Alerts and monitors create an incident with auto severity.

Coordinate

Operators assign roles, run playbooks, and capture updates.

Communicate

Stakeholders receive timeline posts, email, and status page updates.

Learn

Admins review analytics, publish postmortems, and refine SLAs.

02System Roles

Role clarity accelerates response

Map responsibilities to clear permissions so the right people can act quickly without risking governance.

AdminView details

Configures the platform, enforces governance, and provides leadership with holistic visibility.

  • Define severity policies, SLAs, and incident templates.
  • Manage integrations (PagerDuty, Slack, email, webhooks) and authentication providers.
  • Brand and publish the public status site with service catalogs and regional overrides.
OperatorView details

Frontline responders who run playbooks, collaborate in the timeline, and keep stakeholders informed.

  • Create and triage incidents with suggested severity and impacted services.
  • Assign tasks, capture timeline updates, and coordinate with external teams.
  • Request maintenance windows or policy adjustments from Admins when escalation is required.
PublicView details

Stakeholders, customers, or partners who consume the read-only status page and historical reports.

  • Subscribe to relevant services for proactive notifications via email or RSS.
  • Review current incident impact, remediation steps, and estimated resolution times.
  • Access uptime history and post-incident summaries for contractual transparency.
CapabilityAdminOperatorPublic
Create / triage incidentsFullFullNo
Edit SLAs & playbooksFullSuggestNo
Publish status updatesFullFullRead
Manage team membersFullNoNo
View analytics & MTTRFullFullSummary
Configure integrationsFullRequestNo
Brand public status siteFullRequestNo
03Core Features

Built for high-performing response teams

Each capability pairs a polished UI with deep operational workflows so you can launch quickly and scale with confidence.

Incident Management

Unify alert intake, manual reports, and automation triggers into a single prioritized queue.

  • Triage with severity templates, service impact tagging, and follow-up tasks.
  • Link incidents to postmortems and problem records for continuous improvement.
  • Enforced closure workflow requires root-cause analysis and resolution summaries before marking incidents resolved.
Placeholder for screenshot: pair this card with dashboard imagery highlighting incident management.

Status Updates

Keep stakeholders aligned with templated updates across email, Slack, and public channels.

  • Compose updates once and distribute across every channel with scheduling support.
  • Auto-expire outdated posts and highlight customer-facing impact visually.
  • Announce scheduled maintenance separately from incidents so customers know what to expect.
Placeholder for screenshot: pair this card with dashboard imagery highlighting status updates.

Service Catalog

Model each customer-facing surface—website, APIs, data stores—and monitor them independently.

  • Admins define services with friendly names and descriptions; operators tag incidents to impacted services.
  • Public status pages render per-service health states so customers instantly know what is degraded.
Placeholder for screenshot: pair this card with dashboard imagery highlighting service catalog.

Team Roles & Assignments

Clarify ownership by assigning incident commanders, communications leads, and subject experts.

  • Role handoffs with acknowledgements prevent silent drops during shift changes.
  • Escalation policies auto-notify backup owners to maintain coverage.
Placeholder for screenshot: pair this card with dashboard imagery highlighting team roles & assignments.

SLA & Metrics Tracking

Monitor MTTR, detection time, and customer impact with ready-to-share analytics dashboards.

  • Compare incident volume by service, severity, or root cause category.
  • Export metrics to BI tools or share secure links with leadership.
Placeholder for screenshot: pair this card with dashboard imagery highlighting sla & metrics tracking.

Evidence Attachments

Give responders a single place to upload screenshots, HAR files, or log bundles alongside each timeline update.

  • Supports up to five files per update (10 MB each) stored inside the secure uploads directory defined by UPLOAD_DIR.
  • Incident creation form also accepts optional evidence so triage starts with screenshots, HAR files, or log snippets already attached.
  • Serve attachments through the backend /uploads gateway so reviewers can download or share artifacts without leaving the app.
Placeholder for screenshot: pair this card with dashboard imagery highlighting evidence attachments.

Public Status Page

Deliver a trustworthy, branded status experience that reflects the latest internal truth.

  • Custom domains, regional scoping, and historical uptime badges build credibility.
  • Allow visitors to filter by service and subscribe for real-time alerts.
  • Show scheduled maintenance windows in advance without triggering outage states.
Placeholder for screenshot: pair this card with dashboard imagery highlighting public status page.

System Settings

Govern identity providers, integrations, webhooks, and audit trails from a single control center.

  • Role-based access controls and SCIM keep user lifecycle in sync.
  • Audit log export satisfies compliance and security review requirements.
  • Live System Audit page surfaces logins, user provisioning, and incident lifecycle changes with filters and pagination.
Placeholder for screenshot: pair this card with dashboard imagery highlighting system settings.
04Architecture

Modern, resilient architecture

IncidentPulse separates marketing, operator, and public surfaces while sharing a secure core API and database.

Frontend

Next.js marketing site and authenticated dashboard deployed on Vercel with React Server Components for performance.

API Gateway

Node/Express service hosted on Render exposes REST endpoints, handles auth, and orchestrates incident workflows.

Database

PostgreSQL managed by Prisma migrations stores incidents, updates, user roles, and SLA policies with auditing.

Public Status

Static, cache-friendly status site hydrated via incremental revalidation so customers always see accurate data.

FrontendNext.js + ReactAPINode / ExpressDatabasePostgreSQL + PrismaPublic Status PageREST / React QueryPrisma ORMISR + Cache
05Setup Guide

Launch in minutes for demos or production

A streamlined developer experience keeps onboarding fast whether you are evaluating locally or deploying to the cloud.

Prerequisites

Follow these steps to stay productive.

  • Node.js 20+ and npm 10+ installed locally.
  • Vercel account for hosting the Next.js frontend.
  • Render (or any Node-compatible platform) for the backend API.
  • PostgreSQL database (Render, Neon, Supabase, or self-managed).

Run locally

Follow these steps to stay productive.

  • git clone https://github.com/your-org/incident-pulse.git
  • cp frontend/.env.local.example frontend/.env.local && update API base URLs.
  • cd frontend && npm install
  • cd backend && npm install && npx prisma migrate dev
  • Set UPLOAD_DIR in backend/.env (defaults to uploads inside the backend folder) and ensure that directory exists so attachment uploads can be written.
  • Run both apps: npm run dev inside frontend and backend directories.

Deploy

Follow these steps to stay productive.

  • Push the repository to GitHub or GitLab with protected main branches.
  • Connect Vercel to the repo, selecting the frontend directory, and configure environment variables (NEXT_PUBLIC_API_URL, AUTH_SECRET).
  • Provision the backend on Render with build command npm install && npm run build and start command npm run start.
  • Set DATABASE_URL, UPLOAD_DIR, and auth secrets on Render, then run npx prisma migrate deploy.
  • If Render needs persistent evidence storage, point UPLOAD_DIR to an attached disk or cloud storage mount.
  • Configure custom domains for the public status page and marketing site.
06Webhook Automation

Automate incident intake from observability tools

Connect monitoring platforms, scheduled jobs, or custom scripts to IncidentPulse. Alerts create or update incidents automatically, while recovery events close them and notify the assigned operator.

Endpoints & authentication

  • POST /webhooks/incidents – create, dedupe, or escalate incidents.
  • POST /webhooks/incidents/recovery – resolve the matching incident once service is healthy.
  • Sign requests with X-Signature (HMAC-SHA256 using WEBHOOK_HMAC_SECRET). Trusted internal tools can fall back to X-Webhook-Token.
  • Support for X-Idempotency-Key, a 60 requests/minute rate limit per token, and a ±10 minute skew window onoccurredAt keeps integrations reliable.

Environment variables

  • WEBHOOK_HMAC_SECRET – required hex string used when calculating HMAC signatures. Generate it once (for exampleopenssl rand -hex 32), store it in Render → Environment, and distribute it to downstream tools via your secrets manager — it is not displayed in the dashboard.
  • WEBHOOK_SHARED_TOKEN – optional bearer token for services that cannot sign requests.
  • WEBHOOK_SYSTEM_USER_ID – optional UUID of the automation account that should author webhook incidents.
  • Track adoption via GET /metrics/webhook (requires admin authentication).

Treat the HMAC secret like any other credential: rotate it from Render if compromised and share it with integrators through a secure channel (password vault, secret manager). Open the dashboard’s Webhooks & Integrations tab to copy the alert and recovery endpoints, grab ready-to-run cURL/Postman snippets, and wire up Slack, Discord, Teams, or Telegram notifications. The panel reiterates which secrets live in Render but intentionally never exposes the raw values to prevent leakage.

Alert webhook request
POST /webhooks/incidents
Content-Type: application/json
X-Signature: <hex-hmac-from-WEBHOOK_HMAC_SECRET>

{
  "service": "checkout-api",
  "environment": "production",
  "eventType": "error_spike",
  "message": "500 errors exceeded 5% in the last 2 minutes",
  "severity": "high",
  "occurredAt": "2024-11-05T22:55:00Z",
  "fingerprint": "checkout-api|production|error_spike",
  "meta": {
    "errorCount": 324,
    "threshold": "5%"
  }
}
Recovery webhook request
POST /webhooks/incidents/recovery
Content-Type: application/json
X-Signature: <hex-hmac-from-WEBHOOK_HMAC_SECRET>

{
  "fingerprint": "checkout-api|production|error_spike",
  "occurredAt": "2024-11-05T23:05:00Z",
  "meta": {
    "note": "Service restored automatically"
  }
}
GitHub Actions failure hook
- name: Notify IncidentPulse when workflow fails
  if: failure()
  env:
    INCIDENT_URL: ${{ secrets.INCIDENTPULSE_ALERT_URL }}
    INCIDENT_SECRET: ${{ secrets.INCIDENTPULSE_HMAC }}
  run: |
    payload=$(cat <<'JSON'
    {
      "service": "ci-pipeline",
      "environment": "${{ github.ref_name }}",
      "eventType": "workflow_failure",
      "message": "Workflow ${{ github.workflow }} failed on ${{ github.sha }}",
      "severity": "high",
      "occurredAt": "${{ github.event.head_commit.timestamp }}",
      "fingerprint": "ci|${{ github.repository }}|${{ github.workflow }}"
    }
JSON
    )
    signature=$(echo -n "$payload" | openssl dgst -sha256 -hmac "$INCIDENT_SECRET" | sed 's/^.* //')
    curl -sSf -X POST "$INCIDENT_URL" \
      -H "Content-Type: application/json" \
      -H "X-Signature: $signature" \
      --data "$payload"
UptimeRobot payload template
{
  "service": "edge-api",
  "environment": "production",
  "eventType": "uptimerobot_{{ALERT_TYPE}}",
  "message": "Monitor {{MONITOR_NAME}} reported {{ALERT_TYPE}}",
  "severity": "high",
  "fingerprint": "uptimerobot|{{MONITOR_ID}}",
  "occurredAt": "{{ALERT_TIME_ISO}}",
  "meta": {
    "monitorUrl": "{{URL}}",
    "friendlyStatus": "{{ALERT_TYPE}}"
  }
}

Platform behaviour

  • Alerts dedupe on fingerprint for ten minutes while the incident is open; higher severity replays automatically escalate.
  • Recovery payloads resolve the open incident, set resolvedAt, and notify the assigned operator plus the incident timeline.
  • Admins are notified on creation, repeat alerts append timeline updates, and all interactions are included in audit logs.
  • Configure Slack, Discord, Microsoft Teams, or Telegram notifications from the dashboard's Webhooks tab once your HMAC secret is in place.
  • Need a refresher? Visit /docs#webhooks for Postman scripts, cURL examples, and troubleshooting tips.
07API Reference

Developer-friendly REST API

Pair the API with React Query on the frontend or integrate with other systems for automation and reporting.

POST/auth/login- Authenticate with email and password to receive a session token.
Request
POST /auth/login
Content-Type: application/json

{
  "email": "oncall@example.com",
  "password": "super-secret"
}
Response
200 OK
Content-Type: application/json

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "user": {
    "id": "usr_123",
    "name": "Riley SRE",
    "role": "admin",
    "team": "Site Reliability"
  }
}
GET/incidents- Return paginated incidents with optional filters for status, severity, owner, and impacted service.
Request
GET /incidents?status=active&severity=high
Accept: application/json
Response
200 OK
Content-Type: application/json

{
  "data": [
    {
      "id": "inc_481",
      "title": "Checkout latency spike",
      "severity": "high",
      "state": "investigating",
      "owner": "Alex Rivera",
      "opened_at": "2025-11-02T16:21:00.000Z"
    }
  ],
  "meta": {
    "page": 1,
    "pageSize": 25,
    "total": 74
  }
}
PATCH/incidents/:id- Update incident state, assignments, or post timeline entries. Requires operator or admin privileges.
Request
PATCH /incidents/inc_481
Content-Type: application/json

{
  "state": "mitigated",
  "assignee": "Jordan Lee",
  "timeline": [
    {
      "type": "update",
      "body": "Applied hotfix to rollback release 2025.11.02"
    }
  ]
}
Response
200 OK
Content-Type: application/json

{
  "id": "inc_481",
  "state": "mitigated",
  "resolved_at": "2025-11-02T17:05:00.000Z",
      "timeline": [
        {
          "at": "2025-11-02T16:45:00.000Z",
          "body": "Applied hotfix to rollback release 2025.11.02",
          "author": "Jordan Lee"
        }
      ]
}
POST/incidents/:id/attachments- Upload evidence files (screenshots, logs, HAR files) before posting an update. Limit five files per update, 10 MB each.
Request
POST /incidents/inc_481/attachments
Content-Type: multipart/form-data

file: outage-dashboard.png
Response
201 Created
Content-Type: application/json

{
  "error": false,
  "data": {
    "id": "att_901",
    "filename": "outage-dashboard.png",
    "mimeType": "image/png",
    "size": 418204,
    "url": "/uploads/incidents/inc_481/outage-dashboard.png"
  }
}
DELETE/incidents/:incidentId/attachments/:attachmentId- Remove a staged attachment before it is bound to a timeline update. Only admins or the original uploader can delete it.
Request
DELETE /incidents/inc_481/attachments/att_901
Response
204 No Content
POST/maintenance- Create a scheduled maintenance window (admin only). Planned downtime is announced separately from incidents.
Request
POST /maintenance
Content-Type: application/json

{
  "title": "Database maintenance",
  "description": "Upgrading storage tier. Read-only for 15 minutes.",
  "startsAt": "2025-11-10T01:00:00Z",
  "endsAt": "2025-11-10T01:15:00Z",
  "appliesToAll": false,
  "serviceId": "svc_db_primary"
}
Response
201 Created
Content-Type: application/json

{
  "error": false,
  "data": {
    "id": "mnt_901",
    "status": "scheduled",
    "startsAt": "2025-11-10T01:00:00.000Z",
    "endsAt": "2025-11-10T01:15:00.000Z",
    "appliesToAll": false,
    "service": {
      "id": "svc_db_primary",
      "name": "Primary Database"
    }
  }
}
GET/maintenance?window=upcoming- List upcoming and active maintenance windows. Use window=past for history or filter by serviceId.
Request
GET /maintenance?window=upcoming
Accept: application/json
Response
200 OK
Content-Type: application/json

{
  "data": [
    {
      "id": "mnt_901",
      "title": "Database maintenance",
      "status": "scheduled",
      "startsAt": "2025-11-10T01:00:00.000Z",
      "endsAt": "2025-11-10T01:15:00.000Z",
      "appliesToAll": false,
      "service": {
        "id": "svc_db_primary",
        "name": "Primary Database"
      }
    }
  ]
}
08FAQ & Troubleshooting

Answers to common questions

Guide operators and admins through frequent blockers to keep the platform running smoothly.

Why can't I see every incident?Expand

Operators only see incidents tied to their services or assignments. Ask an Admin to extend your service scope or grant temporary all-incident access when needed.

Why is my status page blank?Expand

The public page only shows incidents marked as customer-impacting or scheduled maintenance. Confirm incidents have published updates and that cache invalidation has completed.

Where are evidence attachments stored?Expand

The backend streams files into the directory referenced by UPLOAD_DIR (uploads/ by default, relative to the backend app) and serves them via the /uploads prefix. Point UPLOAD_DIR at persistent storage so artifacts survive restarts.

How do I enable Slack, Discord, Teams, or Telegram notifications?Expand

Admins can connect Slack, Discord, Microsoft Teams, or Telegram under System Settings → Integrations. Paste the webhook URL or bot credentials, choose the destination channel, and IncidentPulse will publish incident creation, assignment, and resolution events automatically.

Can I restore historical incidents for analytics?Expand

Yes. Import CSV data or use the bulk API to seed legacy incidents. Run "npm run seed" in the backend to load sample data for demos.

09Credits & License

Open collaboration encouraged

Developed by the IncidentPulse team for demonstration and portfolio scenarios. Contributions, bug reports, and feature requests are welcome.

License:MIT License