Back to blog
Guide

Root Cause Analysis Template for Engineering and Compliance Teams (Free)

Frank Sikora March 30, 2026 14 min read

A root cause analysis template gives teams a consistent structure for investigating failures — not just documenting what broke, but tracing it to why it happened and what needs to change to prevent recurrence. Without a formal template, RCA reports range from a paragraph of meeting notes to a 30-page narrative with no consistent field structure, which makes pattern analysis across incidents impossible. This guide provides a complete root cause analysis template built for regulated industry compliance workflows and SRE/DevOps incident postmortems. See also: regulatory compliance documentation and lessons learned template.

What Is a Root Cause Analysis Template?

A root cause analysis template is a structured document format for recording the investigation of a failure or nonconformance — capturing the problem statement, timeline, contributing factors, identified root cause, corrective actions, and the evidence trail connecting each finding to a specific preventive measure.

RCA templates are distinct from incident logs and lessons learned documents, though they are often confused. An incident log records what happened and when. A lessons learned document captures general takeaways for future projects. An RCA is an investigative artifact: it answers why the failure occurred and what systemic change will prevent it from recurring. In regulated industries, that distinction matters because auditors — FDA, AS9100, ISO 9001 — expect to see a documented investigation chain, not just a list of observations.

A well-formed root cause analysis does three things: it identifies the immediate cause (what triggered the failure), the contributing causes (conditions that allowed it to occur), and the root cause (the systemic deficiency that, if corrected, prevents recurrence). Templates that capture all three levels produce actionable corrective actions. Templates that stop at the immediate cause generate fixes that address symptoms rather than systems.

When Do You Need a Root Cause Analysis Template?

Several regulatory frameworks and operational contexts require or strongly benefit from formal RCA documentation.

RCA Methods: Which Template Format to Use

Different failure contexts call for different analytical methods. The RCA template format you choose should match the complexity of the problem and the expectations of your regulatory or operational context.

Method Best For Depth Compliance Fit
**5 Whys** Simple to moderately complex failures; process deviations Moderate ISO 9001, FDA CAPA (lighter investigations)
**Fishbone / Ishikawa** Complex failures with multiple contributing cause categories High AS9100, ISO 9001, manufacturing NCRs
**Fault Tree Analysis (FTA)** Safety-critical systems; events with multiple failure paths Very High Aerospace, defense, medical device risk management
**8D (Eight Disciplines)** Customer complaints; supplier defects requiring formal response High Automotive (IATF 16949), manufacturing, defense supply chain
**Incident Postmortem** Software service outages; security incidents Moderate–High SOC 2, ISO 27001, SRE operations

For most regulated industry contexts — FDA CAPA, ISO 9001, AS9100 — the 5 Whys or Fishbone method combined with a structured template is sufficient. For safety-critical aerospace and defense programs, Fault Tree Analysis is often required by the program contract. For customer-facing supplier quality responses, 8D is the expected format. The template below is structured to support both 5 Whys and Fishbone analysis within the same document.

Root Cause Analysis Template

The following template covers all required sections for a compliant RCA under FDA 21 CFR Part 820, ISO 9001 Clause 10.2, and AS9100 Rev D. Copy-paste and adapt for your organization’s document control system.

ROOT CAUSE ANALYSIS REPORT

Document Number: [RCA-YYYY-###]
Revision: [1.0]
Date Initiated: [YYYY-MM-DD]
Date Completed: [YYYY-MM-DD]
Status: [Open / In Progress / Closed / Verified Effective]

─────────────────────────────────────────────────────────────
SECTION 1: PROBLEM IDENTIFICATION
─────────────────────────────────────────────────────────────

1.1 Problem Statement
[Describe the nonconformance, failure, or incident in one to three
sentences. Be specific: what happened, where, when, and under what
conditions. Avoid conclusions about cause at this stage.]

1.2 Problem Detection
- Date/Time Detected: [YYYY-MM-DD HH:MM]
- Detected By: [Name / Role / System]
- Detection Method: [Inspection / Test / Customer Complaint / Automated Alert / Audit]

1.3 Scope and Containment
- Products/Services Affected: [List affected items, versions, or lots]
- Customer/End-User Impact: [None / Internal Only / External — describe]
- Containment Actions Taken: [Immediate steps taken to stop further impact]
- Containment Date: [YYYY-MM-DD]

1.4 Evidence References
- [List supporting documents, logs, test records, photos, data exports]

─────────────────────────────────────────────────────────────
SECTION 2: INVESTIGATION TIMELINE
─────────────────────────────────────────────────────────────

[Chronological sequence of events leading to the failure. Include
contributing changes, decisions, or system states.]

Date/Time | Event | Evidence
[YYYY-MM-DD HH:MM] | [Event description] | [Reference]
[YYYY-MM-DD HH:MM] | [Event description] | [Reference]
[YYYY-MM-DD HH:MM] | [FAILURE OCCURRED] | [Reference]

─────────────────────────────────────────────────────────────
SECTION 3: ROOT CAUSE ANALYSIS
─────────────────────────────────────────────────────────────

3.1 Analysis Method Used
[ ] 5 Whys   [ ] Fishbone (Ishikawa)   [ ] Fault Tree   [ ] 8D   [ ] Other: ______

3.2 Immediate Cause
[The direct trigger of the failure — the last event in the causal chain.
This is what you would fix to resolve this specific occurrence.]

3.3 Contributing Causes
[Conditions that enabled the failure to occur. May include process gaps,
training deficiencies, tool limitations, or environmental factors.]
- Contributing Cause 1:
- Contributing Cause 2:
- Contributing Cause 3:

3.4 Root Cause
[The systemic deficiency that, if corrected, prevents recurrence of this
class of failure. Trace back from the immediate cause using your chosen
method. The root cause is usually a gap in a system, process, or control —
not a human error.]

Root Cause Statement: [One to two sentences. Clearly link the systemic
deficiency to the failure that occurred.]

3.5 5 Whys Analysis (if applicable)
Why 1: [Why did the failure occur?] → [Answer]
Why 2: [Why did [Answer to Why 1] occur?] → [Answer]
Why 3: [Why did [Answer to Why 2] occur?] → [Answer]
Why 4: [Why did [Answer to Why 3] occur?] → [Answer]
Why 5: [Why did [Answer to Why 4] occur?] → [Root Cause]

─────────────────────────────────────────────────────────────
SECTION 4: CORRECTIVE AND PREVENTIVE ACTIONS (CAPA)
─────────────────────────────────────────────────────────────

4.1 Corrective Actions (address this specific occurrence)

Action | Owner | Due Date | Status | CAPA Reference
[Action description] | [Name/Role] | [YYYY-MM-DD] | [Open/Closed] | [CAPA-###]

4.2 Preventive Actions (address recurrence prevention)

Action | Owner | Due Date | Status | Document Reference
[Action description] | [Name/Role] | [YYYY-MM-DD] | [Open/Closed] | [SOP-### / Rev X]

4.3 Affected Documentation
[List any SOPs, work instructions, configuration plans, or design documents
that must be updated as a result of this RCA. Include document numbers and
revision levels.]
- [Document Number]: [Document Title] — [Required Change]

─────────────────────────────────────────────────────────────
SECTION 5: EFFECTIVENESS VERIFICATION
─────────────────────────────────────────────────────────────

5.1 Verification Method
[How will you confirm that the corrective action eliminated the root cause?
Describe the metric, inspection, test, or audit that will verify effectiveness.]

5.2 Verification Due Date: [YYYY-MM-DD]
5.3 Verification Completed By: [Name / Role]
5.4 Verification Result: [Effective / Not Effective — describe findings]

─────────────────────────────────────────────────────────────
SECTION 6: APPROVALS AND REVISION HISTORY
─────────────────────────────────────────────────────────────

Prepared By: [Name] | [Role] | [Date]
Reviewed By: [Name] | [Role] | [Date]
Approved By: [Name] | [Role] | [Date]

Rev | Date | Author | Description of Change
1.0 | [YYYY-MM-DD] | [Name] | Initial release

How to Conduct a Root Cause Analysis: Step-by-Step

Following the template structure is necessary but not sufficient. The quality of an RCA depends on how the investigation is conducted, not just how the document is formatted. These steps apply whether you are investigating a manufacturing nonconformance, a software outage, or a quality system audit finding.

  1. Define the problem precisely before starting the investigation. A vague problem statement produces a vague root cause. “Production deployment failed” is a log entry. “The v4.2.1 deployment to the payments service at 14:32 UTC on March 28 caused a 23-minute outage affecting all transactions above $10,000 due to a database connection pool exhaustion event” is a problem statement. Specificity constrains the investigation and makes the final corrective action verifiable.

  2. Contain the impact before investigating. Containment actions are not root cause fixes — they are emergency measures to limit ongoing harm. Document what containment was applied, when, and by whom. This distinguishes the immediate response from the systemic correction and prevents the two from being conflated in the CAPA record.

  3. Build the timeline with evidence. Reconstruct the sequence of events using logs, records, observations, and data — not recollections. Timelines built from memory collapse under audit. Every event in the timeline should reference a specific piece of evidence: a log timestamp, a signed record, a test result, a change ticket.

  4. Apply the analysis method consistently. Choose a method — 5 Whys, Fishbone, Fault Tree — and apply it to its conclusion. The most common failure in RCA documentation is stopping at the contributing cause and labeling it the root cause. Continue asking “why” until you reach a systemic gap: a missing control, an undefined process, an untrained role, a design assumption that proved false.

  5. Distinguish corrective from preventive actions. A corrective action addresses this specific occurrence. A preventive action addresses the class of failure. Both are required. “Rolled back the deployment” is a corrective action. “Added a pre-deployment database connection pool health check to the CI/CD pipeline” is a preventive action. In FDA CAPA documentation, conflating the two is a common 483 finding.

  6. Identify all affected documentation. RCA findings frequently require updates to SOPs, work instructions, configuration plans, training materials, or design documents. Identify these at the time of the RCA and open change requests against each affected document. The configuration management plan should define the process for linking RCA findings to configuration change requests.

  7. Plan and track effectiveness verification. The corrective action is not closed until it has been verified as effective. Define a specific, measurable verification method — not “monitor for recurrence” — and assign an owner and due date. Unverified CAPAs are a common ISO 9001 audit finding.

RCA in Regulated Industries: Closing the Loop with CAPAs and SOPs

The most significant gap between RCA best practices and what auditors actually find is the absence of a closed loop between the RCA finding and the organization’s controlled document set. Teams complete the investigation, identify the root cause, assign corrective actions, and then fail to update the SOPs, work instructions, or configuration documents that govern the process where the failure occurred.

Under FDA 21 CFR Part 820.100, the CAPA procedure must include processes to verify or validate the corrective and preventive action to ensure that such action does not adversely affect the finished device, and to implement and record changes in methods and procedures needed to correct and prevent identified quality problems. That last clause — “changes in methods and procedures” — is where most organizations have audit exposure. The RCA identifies the root cause. The CAPA assigns the fix. But the SOP or work instruction that governs the process in question never gets updated, or gets updated informally without document control.

Closing the loop requires three explicit steps in every RCA: First, identify every procedure or document that governs the process where the root cause exists. Second, open a formal document change request against each affected document, referencing the RCA number. Third, link the CAPA effectiveness verification to the release of the updated document. Until the updated standard operating procedure or work instruction is approved and in use, the corrective action is not complete — regardless of what the CAPA status field says.

For AS9100 and ISO 9001 contexts, the same principle applies to configuration management. If the root cause involves a component specification, a software configuration parameter, or a design baseline, the corrective action must include a configuration change request processed through the organization’s configuration management system. An RCA that fixes a process without updating the governing document creates a version conflict between what the document says and what the team actually does.

Incident Postmortem Template for SRE and DevOps Teams

SRE and DevOps teams conducting blameless postmortems use the same analytical structure as regulated industry RCAs, with lighter formality requirements and a stronger emphasis on system-level rather than individual-level analysis. The following template is optimized for software incident investigations.

INCIDENT POSTMORTEM

Incident ID: [INC-YYYY-###]
Severity: [SEV1 / SEV2 / SEV3]
Date of Incident: [YYYY-MM-DD]
Postmortem Date: [YYYY-MM-DD]
Incident Commander: [Name]
Participants: [Names and roles of postmortem contributors]

─────────────────────────────────────────────────────────────
INCIDENT SUMMARY
─────────────────────────────────────────────────────────────

Duration: [HH:MM — from first alert to full resolution]
Impact: [Number of users affected / services degraded / error rate / SLO breach]
Detection: [How was the incident detected? Alert, customer report, synthetic monitor?]

─────────────────────────────────────────────────────────────
TIMELINE
─────────────────────────────────────────────────────────────

[All times UTC]
HH:MM — [Event]
HH:MM — [Alert fired / detection]
HH:MM — [Incident declared / responders paged]
HH:MM — [Investigation steps]
HH:MM — [Mitigation applied]
HH:MM — [Service restored]
HH:MM — [Incident closed]

─────────────────────────────────────────────────────────────
ROOT CAUSE ANALYSIS
─────────────────────────────────────────────────────────────

Immediate Cause:
[What directly triggered the incident?]

Contributing Factors:
- [Factor 1: technical condition, configuration, or design choice that enabled the failure]
- [Factor 2]
- [Factor 3]

Root Cause:
[The systemic gap — in monitoring, deployment process, configuration management,
runbook coverage, or system design — that allowed this class of incident to occur.
Not a person. Not a typo. The system that failed to prevent it.]

─────────────────────────────────────────────────────────────
ACTION ITEMS
─────────────────────────────────────────────────────────────

Action | Owner | Priority | Due Date | Ticket
[Action description] | [Name] | [P1/P2/P3] | [YYYY-MM-DD] | [TICKET-###]

─────────────────────────────────────────────────────────────
LESSONS LEARNED
─────────────────────────────────────────────────────────────

What went well:
- [Detection, response, communication, tooling that worked]

What could have gone better:
- [Gaps in process, tooling, or response that contributed to duration or impact]

Where we got lucky:
- [Near-misses or conditions that limited scope of impact but could have been worse]

The key distinction between an SRE postmortem and a regulated industry RCA is the “where we got lucky” section. Blameless postmortem culture treats near-misses as equivalent to failures for investigation purposes — a system that almost failed reveals the same systemic gaps as one that did. Regulated industries can adopt this section as a risk-based analysis input.

Common Mistakes in Root Cause Analysis Documentation

After reviewing hundreds of RCA records, the same documentation failures appear repeatedly — both in regulated industry audits and DevOps postmortems.

How TechWrite Streamlines the RCA-to-CAPA Documentation Workflow

The most time-consuming part of compliant RCA documentation is not the investigation itself — it is the downstream document work: identifying which SOPs reference the affected process, opening change requests against each one, updating them, and maintaining an audit trail that connects the RCA finding to the final approved document revision.

TechWrite AI connects the RCA record directly to your controlled document set. When an RCA identifies a process gap, TechWrite searches your document library for every procedure, work instruction, and configuration document that governs the affected process — and surfaces the specific sections that need to change. You can generate a draft corrective revision, review it in context, and publish an updated document with a complete audit trail linking it back to the originating RCA. The result is a closed loop between finding and fix, with traceable documentation that satisfies FDA 21 CFR Part 820, ISO 9001 Clause 10.2, and AS9100 Rev D audit requirements.

Once your RCA is complete, pair it with a lessons learned document to capture broader takeaways for future projects, and review your technical documentation templates to ensure all affected document types are covered in your document control system.

Start your free trial — bring your own LLM key

Try TechWrite free

AI-powered autocomplete that learns from your own documents. Start writing better technical documentation today.

Get Started Free