A team can ship every week and still go nowhere. Features land, dashboards stay flat. PRs merge, users do not notice. The roadmap fills with checkmarks and the business does not improve. This is not a hypothetical. It is the default state of most engineering organizations.

The reason is simple: it is easier to measure output than outcomes. Output is what you produced. Outcomes are what changed because of it. Output is fully under your control. Outcomes are not.

The output trap#

Engineering culture rewards visible activity. Lines of code. Story points. Features shipped. Tickets closed. These are easy to count, easy to celebrate, and easy to put in a quarterly review. They feel like progress.

But they are not progress. They are motion. A treadmill runs the same way whether you are training or wasting time. The number of features shipped tells you nothing about whether anyone benefited. The number of PRs merged tells you nothing about whether the product got better.

Worse, optimizing for output actively distorts behavior. If output is the metric, people produce more output — bigger PRs, more features, more code — regardless of whether any of it matters. The dashboard fills up. The product does not improve.

Output is comfortable#

The reason output dominates engineering reporting is not that it is useful. It is that it is comfortable. Output is fully under the team’s control. You can decide to ship five features this quarter; you cannot decide to move conversion 3%.

Reporting output lets the team claim success without ever finding out whether the work mattered. That comfort is exactly what makes it dangerous. The metric that protects you from bad news also protects you from learning.

Output is what hides failure#

A team that ships ten features a quarter looks productive. A team that ships three features a quarter that move the numbers looks slower. The second team is doing better work. The first team is doing more work.

When output is what gets reported, output is what gets optimized. Teams ship features whose value nobody bothered to measure. The roadmap is full. The business is stalling.

Output reports lie kindly#

A quarterly report that lists features shipped sounds good. It is also nearly content-free. “We shipped a new dashboard, a notifications system, and v2 of the search.” Did anyone use the dashboard? Did notifications drive engagement? Did v2 of the search outperform v1?

The output report does not say. The outcome report does. That is why the output report is more popular.

What outcomes are#

An outcome is a change in the world. A user can do something they could not do before. A metric moved. A problem disappeared. A frustration ended. Conversion went up. Latency went down. A team that took two weeks to deploy now deploys in two hours.

Outcomes are harder to measure because they require you to ask: did this work? That question is uncomfortable. The honest answer is sometimes “no”. Output never gives you that answer, which is why so many teams hide behind it.

Outcomes are about change#

An outcome is the delta between before and after. Not “signup rate is 12%” but “signup rate went from 9% to 12% after the redesign”. The number alone is a state. The change is the outcome.

If you cannot describe the before, you cannot describe the outcome.

Outcomes are about people#

Most outcomes are changes in human behavior. A user did something they were not doing before. A teammate completed work faster. A customer renewed instead of churning. The metric is just the proxy — the real outcome is the change in what people are doing or experiencing.

When designing a feature, start with the behavior change you want. Work backward to the metric that would measure it. Work backward again to the feature that would produce it.

Outcomes are partly out of your control#

This is the hard part. You can ship the feature. You cannot make users adopt it. You can write the code. You cannot make the metric move.

This is not a reason to avoid outcome thinking. It is the reason outcome thinking matters. Output measures what you did. Outcome measures what mattered. The work you cannot fully control is the work that determines whether your shipping was worth anything.

Define success before you ship#

The discipline of outcome-thinking starts before the work. You commit, in advance, to what success looks like — and you commit honestly enough that you might fail to meet it.

State the change you want#

Bad:

“Build a new onboarding flow.”

Better:

“Build a new onboarding flow. We expect activation rate to increase from 35% to at least 45%. If after a month it is below 40%, we will redesign or revert.”

The first describes the work. The second describes the outcome the work is supposed to produce. Only the second can be honestly evaluated.

Pick the right number#

The number you pick to measure success has to actually measure success. “Number of users who saw the new onboarding flow” is not it — every user will see it eventually. “Activation rate among users who go through the new flow” is closer.

If the number you can measure does not capture the change you want, the change you want is not yet well-defined. Refine it until the measurement matches the intent.

Set a threshold#

“We want the metric to go up” is not a threshold. “We want the metric to go from 35% to at least 45%” is.

The threshold should be a number you would be disappointed not to reach. If any number above the current state counts as success, you have given yourself permission to claim a win regardless of what happens.

Set a timeline#

Outcomes have a half-life. A feature that takes six months to show effect is harder to attribute than one that shows effect in a week. Pick a timeline at which you will evaluate.

If after the timeline the metric did not move, the feature did not deliver. That is information. Use it.

Ask “did this work?”#

After every shipped feature, ask the question. Honestly. Look at the data. Compare to the threshold you set.

Build the habit#

Most teams do not ask this question because the habit was never built. The retrospective covers process. The all-hands celebrates shipping. The quarterly review lists output. Nowhere does anyone systematically ask whether the things that shipped did what they were supposed to do.

Build the ritual. Once a sprint, or once a month, run through what shipped recently and what the data shows. Five minutes of honest review is more valuable than an hour of process discussion.

Be honest about “no”#

When the answer is “no, it didn’t work”, do not soften it. Do not redefine success around the actual outcome. Do not claim qualitative wins that the data does not support.

“We shipped X. The metric did not move. We do not yet know why.” That is a useful sentence. It is also an uncomfortable one. The discomfort is the point — it is what drives the next investigation.

Resist the partial-credit reflex#

When a feature does not move the headline metric, there is a strong urge to find some adjacent metric that did move. “Conversion didn’t change, but engagement among the users who saw the feature was higher.”

Sometimes that is genuinely useful information. Often it is rationalization. Be honest about which is which. Engagement among self-selected users is usually not the win it appears to be.

Kill what does not move#

The hardest part of outcome thinking is acting on the results. A feature that took a quarter to ship and did not move the metric should be killed — or fundamentally rethought.

Sunk cost is a lie#

The effort that went into shipping the feature is gone. It cannot be recovered. Whether you keep the feature or remove it does not change that.

What changes is the future cost of maintaining a feature that did not work. Every week the failed feature stays, someone has to keep it working, document it, support it, integrate around it. The cost compounds.

Sometimes the answer is to remove#

If a feature did not work and you cannot see why a different version would, remove it. Genuinely remove it. Not behind a flag. Not in a deprecated state. Off.

Code that does not deliver outcomes still incurs maintenance. Remove it and recover the budget for something that might.

Sometimes the answer is to iterate#

If a feature partly worked and you can identify why, iterate. A second version, informed by what you learned, often produces the outcome the first did not.

Outcome-thinking is compatible with iteration. It is incompatible with shipping and forgetting.

Outcome thinking changes the work#

Once a team starts working backward from outcomes, the rest of the work changes shape.

Smaller features, validated#

If you have to prove each feature moves a number, you will ship smaller features. You will instrument them before shipping. You will design them so the impact can actually be measured.

This is good. Small, validated features outperform big, unvalidated ones over time.

Roadmaps look different#

An outcome-oriented roadmap does not list features. It lists outcomes — with the features that are intended to produce them noted as the current hypothesis.

“Increase activation rate to 45% — hypothesis: redesigned onboarding flow.” If the redesigned flow does not produce the activation increase, the outcome stays on the roadmap and the hypothesis changes. The work continues against the goal, not against the feature.

Some features get killed before they ship#

When you ask “what outcome will this produce, and how will we measure it”, some features cannot answer. They were on the roadmap because someone wanted them, not because they were expected to move anything.

Those features should not ship. They will absorb engineering capacity, produce output, and move nothing. Cut them now.

Stop when done, not when planned#

If the goal is the outcome, the project is done when the outcome is achieved. Not when the plan is finished. Sometimes you reach the number with less work than expected. Stop. Move on to the next outcome.

The opposite — shipping everything in the plan even after the outcome has been achieved — is one of the most common ways teams waste capacity.

The flip#

The discipline is to flip the default question.

Instead of “what are we shipping this quarter”, ask “what should be different at the end of this quarter”.

Instead of “did we hit the deadline”, ask “did the thing we built do what we hoped”.

Instead of “how many features did we ship”, ask “what changed for users because we shipped them”.

This is a harder way to work. It means committing to changes you might not achieve. It means measuring honestly, including when the answer is unflattering. It means killing features that ship but do not move the number, even when the team is proud of them.

But it is the only way to know if engineering is actually working. Output is what you do. Outcome is what changes. Aim for the second.