How to Write a Vulnerability Remediation SLA That Works

Peter Chofield Avatar
6–9 minutes

Most vulnerability remediation SLAs fail for one simple reason: they read like policy documents and behave like fiction. The deadlines look strict, the ownership model looks tidy, and the exception rules look clean. Then the first serious backlog wave arrives, a KEV-listed vulnerability lands on an internet-facing system, and nobody can tell whether the SLA is supposed to drive action or merely document disappointment after the fact.

A good remediation SLA does something more useful. It tells security, infrastructure, cloud, and application teams exactly how fast different classes of vulnerabilities need to move, who owns the decision at each stage, when exceptions are allowed, and what happens when urgency is driven by confirmed exploitation rather than by severity alone. That is the operational bridge between analysis and action.

This guide explains how to write a vulnerability remediation SLA that people will actually follow. It is designed to work with the logic already covered in Top 10 Signs a CVE Needs Emergency Patching, KEV vs CVSS vs EPSS: Which Signal Should Drive Patch Priority?, How to Build a KEV-Driven Patch Workflow Without Burning Out Your Team, and 5 KEV Lessons That Show How Patch Prioritization Fails.

Start by defining what the SLA actually covers

The first failure point is scope. If the SLA tries to cover every vulnerability in exactly the same way, it becomes too rigid to reflect reality and too vague to drive action. A workable remediation SLA should explicitly state which assets, teams, and vulnerability classes it applies to.

At minimum, it should define whether the SLA covers servers, endpoints, cloud workloads, SaaS integrations, network devices, identity systems, backup platforms, and third-party managed environments. It should also say whether emergency remediation for KEV-listed or otherwise actively exploited CVEs is handled inside the same SLA or under a separate accelerated process.

What to write: “This SLA applies to all production systems and security-relevant supporting infrastructure owned or operated by the organization, including on-premises, cloud, and managed environments, unless explicitly exempted in writing.”

Separate severity from urgency

Many bad SLAs collapse technical severity and remediation urgency into a single table. That is a mistake. Severity describes potential impact. Urgency reflects how quickly the organization needs to act based on exploitation evidence, exposure, and asset value.

This is where the distinctions covered in KEV vs CVSS vs EPSS: Which Signal Should Drive Patch Priority? matter. A high CVSS score does not always justify the fastest deadline. A KEV-listed flaw on an internet-facing identity or edge system often does, even if internal scoring debates are still happening.

What to write: “Remediation timelines are driven by exploitation status, exposure, and asset criticality, with CVSS used as technical context rather than as the sole determinant of due date.”

Use a small number of remediation tiers

If the SLA contains too many categories, no one will remember them. If it has too few, teams will use exceptions to recreate the missing nuance. A practical model usually works best with three or four tiers.

One example:

  • Tier 1 – Emergency: KEV-listed or otherwise confirmed exploited vulnerabilities affecting internet-facing, identity, administrative, or high-value systems. Target remediation: same day to 72 hours depending on exposure.
  • Tier 2 – Accelerated: High-likelihood vulnerabilities affecting exposed or business-critical systems, including high-EPSS issues or reliable public exploit availability. Target remediation: 7 days.
  • Tier 3 – Standard: High-severity or meaningful-risk vulnerabilities without confirmed exploitation and without immediate exposure pressure. Target remediation: 30 days.
  • Tier 4 – Routine: Moderate and lower-risk vulnerabilities or items with validated compensating controls. Target remediation: 60 to 90 days, depending on environment.

What to write: define the tier names, exact due windows, and the conditions that place a finding into each one.

Make exposure and asset criticality explicit

An SLA that ignores asset context will fail in production. The same CVE can require same-day action on one system and routine remediation on another. That is not inconsistency. That is risk management.

The SLA should explicitly elevate deadlines for public-facing systems, identity providers, remote access services, administrative consoles, backup systems, email security infrastructure, virtualization layers, and other assets whose compromise creates disproportionate downstream risk. The reasoning is consistent with the operational logic in Top 10 Signs a CVE Needs Emergency Patching and the case-study failures described in 5 KEV Lessons That Show How Patch Prioritization Fails.

What to write: “Public exposure and asset criticality may shorten remediation deadlines irrespective of baseline severity score.”

Assign ownership by role, not by wishful thinking

An SLA without named accountability usually turns into a blame map after deadlines slip. Security may identify the issue, but security does not always own the affected technology. The SLA should define who is responsible for validation, remediation, approval, exception review, and closure.

A simple ownership model works well:

  • Security team: identifies vulnerability, assigns tier, validates urgency signals, and tracks due date.
  • System owner or service owner: confirms applicability, executes remediation, and reports status.
  • Risk or governance owner: approves time-bound exceptions and monitors overdue items.
  • Executive sponsor: resolves cross-team disputes when operational resistance blocks urgent remediation.

What to write: define accountable roles in the SLA itself, not in a separate tribal process.

Write exceptions into the policy before you need them

Every real remediation program encounters difficult cases: fragile legacy systems, unavailable vendor patches, change freezes, dependency risks, or unsupported platforms. If the SLA has no exception mechanism, teams will build one informally and hide it in email, ticket comments, or delayed meetings.

A good SLA makes exceptions hard enough to discourage abuse but simple enough to use when they are genuinely necessary. Each exception should record the vulnerability, affected asset, reason for delay, compensating controls, approving role, and review date.

What to write: “All exceptions must be documented, approved by the designated risk owner, time-bound, and revalidated on a fixed review schedule.”

Measure compliance in a way that reflects real risk

The point of an SLA is not merely to produce compliance charts. It is to make the organization more predictable under pressure. That means the reporting model should track whether dangerous exposure is leaving the environment on time, not just whether tickets have been opened.

Useful measures include percentage of vulnerabilities remediated within SLA by tier, number of overdue Tier 1 and Tier 2 items, median time to validate exposure, median time to assign ownership, open exception count, and number of assets carrying repeated overdue findings.

What to write: “SLA performance will be reported monthly by remediation tier, business unit, asset class, and exception status, with separate visibility into actively exploited exposure.”

Review the SLA on a schedule, not only after failure

Many organizations update remediation policy only after an incident or audit finding. That is too late. A workable SLA should be reviewed on a fixed cadence so timelines, asset classes, tooling assumptions, and exception rules can evolve with the environment.

At minimum, review it quarterly or after a major exploitation event, tooling change, merger, cloud migration, or change in the organization’s threat profile. The KEV-driven operational model described in How to Build a KEV-Driven Patch Workflow Without Burning Out Your Team becomes much more effective when the SLA is reviewed before it drifts out of sync with reality.

What to write: “This SLA will be reviewed at least quarterly and after material incidents or changes to critical infrastructure, threat exposure, or vulnerability management processes.”

A simple sample SLA language block

The following model is intentionally plain:

Tier 1 vulnerabilities, including KEV-listed or otherwise confirmed exploited CVEs affecting internet-facing or high-value systems, must be remediated or effectively mitigated within 72 hours unless a documented exception is approved. Tier 2 vulnerabilities must be remediated within 7 calendar days. Tier 3 vulnerabilities must be remediated within 30 calendar days. Tier 4 vulnerabilities must be remediated within 90 calendar days. Security assigns remediation tier. System owners execute remediation. Risk owners approve time-bound exceptions. Overdue items are escalated according to business impact and exploitation status.

This language should be customized, but the structure is what matters: clear scope, clear tiers, clear owners, clear exceptions, clear escalation.

Final takeaway

A vulnerability remediation SLA works when it reflects how incidents actually happen. That means separating severity from urgency, shortening deadlines when exploitation and exposure align, naming accountable owners, documenting exceptions, and measuring what gets fixed on time. Teams do not follow SLAs because they are beautifully written. They follow them when the language matches operational reality and the deadlines are tough enough to matter without being detached from how IT actually works.

Tags