What retrospective format should we use?

Match the format to the sprint. Sailboat (vision/anchors/winds/rocks) for sprints where the goal mattered. 4Ls (Liked/Learned/Lacked/Longed-for) after an incident or rough sprint. Lean Coffee for healthy teams that can self-direct. Don't default to Mad/Sad/Glad every time. Pattern fatigue kills participation.

How many action items should a retro produce?

No more than three actions per retro. More guarantees none get done. Each action has one named owner and a date (next sprint, end-of-month). "The team will..." with no owner is dead on arrival. The discipline of tracking those 3 actions to completion in the next retro is what makes retros change behaviour.

How do we make sure retro actions actually happen?

First agenda item of the next retro: status of each prior action. Done, in-progress, abandoned. Name each explicitly. Without this reopen-and-status step, action items become aspirational; with it, they become commitments the team feels accountable for.

Should retros include data or just feelings?

Both. Spend the first 5 minutes on facts (velocity, carry-over, defects shipped, on-call interruptions, goal vs delivered). Data anchors the conversation in reality and surfaces what humans would otherwise miss. Then move to discussion: feelings about the data, root-cause analysis, action items.

All articles in Sprint planning

Sprint planning

Retrospectives that change behavior

Formats that work (Mad/Sad/Glad, Sailboat, 4Ls, Lean Coffee), formats that don't, and the action-item discipline that turns retros into actual change.

May 16, 20269 min read

Most retrospectives are theatre. The team gathers, complains about the same issues for 30 minutes, votes on three action items, and then never tracks whether those action items happened. Six months later, the same complaints surface again.

A retrospective is supposed to change behaviour. If yours don't, you don't have retros. You have a recurring grievance forum.

This article is the difference: formats that work, formats that don't, and what AI can usefully do (and what it shouldn't).

What a retro is for

Two jobs.

Surface patterns from the sprint that the team can act on. Not every story-level issue, the issues with momentum across the sprint.
Decide on 1-3 specific behavioural changes that will be visible in the next sprint. Vague "communicate better" doesn't count. "Standup gets a 5-min hard timer, enforced by the scrum master" counts.

If your retro produces neither, it's wasted time.

Formats that work

Mad / Sad / Glad

The simplest format. Three columns. What made you mad? Sad? Glad?

Works because: low cognitive overhead, emotional access (good signal in software teams that under-discuss emotional load), trivial to facilitate.

Doesn't work when: the team always writes the same things in each column. At that point, switch formats.

Start / Stop / Continue

Three columns again. What should the team start doing? Stop doing? Continue doing?

Works because: each item maps directly to an action. "Stop scheduling planning the day after a release" is an immediately actionable change.

Doesn't work when: the team is gun-shy about saying "stop." Some teams accumulate "continue" items as a polite way to avoid hard conversations.

Sailboat

Drawing on a whiteboard (or its remote equivalent): a sailboat with wind, anchors, rocks, and an island.

Wind = what's pushing us forward
Anchors = what's holding us back
Rocks = risks ahead
Island = the goal

Works because: visual + metaphorical, surfaces risks and goals (which the other formats don't), good when the team is bored of the column-based formats.

Doesn't work when: the team includes people who hate metaphors. They'll stay quiet rather than engage.

The 4Ls (Liked / Learned / Lacked / Longed for)

Four columns this time. Adds "longed for": what's missing from the team's environment.

Works because: surfaces structural / environmental issues the team can escalate up. Useful when the issues are leadership-level rather than within-team.

Lean Coffee

No fixed columns. Team members brainstorm topics on stickies, vote on which to discuss, set a per-topic timer (5-7 min), and roll on to the next one when time's up. Topics that didn't make the cut get rolled to next retro.

Works because: democratic agenda, time-boxed, prevents one topic from eating the whole hour.

Doesn't work when: facilitator doesn't enforce the timer. Topics drag, ground covered halves.

Formats that don't work

No format. "Let's just talk about how the sprint went." Devolves into senior voices dominating and the same three issues coming up.

The Same Format Every Sprint. Even good formats get stale. Mad/Sad/Glad for 18 months produces fatigue. Rotate every quarter.

The Read-the-Burndown-Out-Loud retro. Some teams open with 15 minutes of metrics review. By the time the team is supposed to discuss, energy is dead. Metrics are inputs to the retro; they're not the retro.

The Confession Booth. The team only surfaces individual mistakes. Useful zero times. Retrospectives are about systems, not individuals. If someone made a specific error, the conversation is a 1:1, not a retro.

The action-item discipline

This is where most retros fail. Lots of energy in the meeting → 3 action items → none of them happen → next retro produces 3 more.

Three rules:

1. Each action item has an owner. Not "the team." A specific person who's accountable for the change happening.

2. Each action item has a definition of done. "Improve communication" is not a done-able thing. "Add a #blockers channel and require eng leads to post a daily blocker update by 10am" is.

3. Each action item gets reviewed at the start of the next retro. Did it happen? If yes, did it work? If no, why not? This is the discipline that turns retrospectives into behaviour change.

The role of AI

AI is genuinely useful in retros, for the data side. Specifically:

Pattern recognition. Across 6 sprints, the team's stories that touched the auth module took 1.7x their estimated time. The team didn't notice. The AI does.

Sentiment trends. Standup comments and PR review comments over the sprint can be summarised: the team's tone shifted negative around day 7. What happened then? The model finds the inflection point.

Action item tracking. Did last retro's action items happen? The AI can match the action ("Add a #blockers channel") against actual events (channel created? eng leads posting?) and surface the answer.

Pattern correlation. Sprints where the team hit their goal had different characteristics than sprints where they didn't. The model can surface those characteristics, usually 2-3 actionable patterns.

What AI should NOT do: lead the retrospective. The conversations of a retro depend on the team's emotional and political dynamics. The AI is a data input, not a facilitator.

The Plan module surfaces patterns across sprints: which stories took longer than estimated, where capacity drift happened, what changed when the team hit (or missed) its goal.

See AI retrospective surfacing

Frequently asked questions

What retrospective format should we use?: Match the format to the sprint. Sailboat (vision/anchors/winds/rocks) for sprints where the goal mattered. 4Ls (Liked/Learned/Lacked/Longed-for) after an incident or rough sprint. Lean Coffee for healthy teams that can self-direct. Don't default to Mad/Sad/Glad every time. Pattern fatigue kills participation.
How many action items should a retro produce?: No more than three actions per retro. More guarantees none get done. Each action has one named owner and a date (next sprint, end-of-month). "The team will..." with no owner is dead on arrival. The discipline of tracking those 3 actions to completion in the next retro is what makes retros change behaviour.
How do we make sure retro actions actually happen?: First agenda item of the next retro: status of each prior action. Done, in-progress, abandoned. Name each explicitly. Without this reopen-and-status step, action items become aspirational; with it, they become commitments the team feels accountable for.
Should retros include data or just feelings?: Both. Spend the first 5 minutes on facts (velocity, carry-over, defects shipped, on-call interruptions, goal vs delivered). Data anchors the conversation in reality and surfaces what humans would otherwise miss. Then move to discussion: feelings about the data, root-cause analysis, action items.