How to Use AI Without Lying to Yourself

Out Loud

You hear it in every all-hands. Team shipped a thing in two days that used to take ten . Eight days saved . The slide is green . People nod .

Now, do the math nobody is doing.

Two days was generation . That is the model. That is what came out of the model's gut , after it spent a few seconds pretending to think. The work is not done. The work is done when a human has read what came out, knows what is in there, knows where it is going to break, and is willing to be the person who fixes it at 3 AM . That reading takes time . For the thing the model produced in two days , the reading is something like four.

Two plus four is six. Ten minus six is four. AI saved your team four days , not eight. The other four days , the ones you thought you saved, are not in your bank. They are unread code sitting in the repo, accruing interest, waiting.

Four days is still real money. AI gave you four days. Take them . But only if you actually spend the other four on the reading. If you don't , you did not save eight days. You saved two and shipped four days of unread code into production. The bill comes later. Bigger. Not on any slide.

Nobody is doing this math because the slide closes better with the bigger number . Saying out loud that AI saved you four days instead of eight feels like betraying the tool. The tool does not care. The meeting cares . The meeting decides what number lives.

The renegotiation is putting the real number on the slide.

The contract is one sentence. It is not subtle.

Work is not done when AI generates it. Work is done when a human has read it .

The rest of this chapter is how a team makes that real .

***

State the new done.

The Move 1: Bring the sentence to a meeting. Not a special meeting . Whatever meeting your team is already in this week. Retro , planning , 1:1 , doesn't matter. Say it. "I want us to agree on what done means . I think done is when a human has read the code, knows what is in there, and is willing to be the person who fixes it at 3 AM . Not when the model finished writing it. Not when the PR got approved . When someone on the team can explain it in their own words."

You will get reactions . Some will say  "we already do that."  Watch them. They probably do not . Some will say " this will slow us down. " Maybe. The alternative is shipping unread code into a codebase you have to maintain for the next five years. Pick . Some will agree, nod and change nothing. Some will think you are being precious about a problem they have not noticed yet.

Doesn't matter. You are not running a vote. You are putting the sentence in the room. The team can discuss it, push back on it, modify it, ignore it. What it cannot do is pretend nobody said it . That is the move .

The sentence is the spine . Every other move in this chapter depends on the team having agreed , formally or grudgingly , on what done means. Without it, the other moves sound like one person's preference. With it , they become how the team works.

If your team cannot get to the sentence at all , you have learned something . Probably that someone with authority over the team is allergic to the conversation , in which case the AI problem is not the biggest problem on your team . Useful information either way .

***

Every estimate is two numbers.

Every engineer on your team is already doing this math in their head. The PM asks how long this will take. The engineer thinks: a day to generate , three days to read it carefully. The engineer says : a day. Because that's what got asked for. Because saying " a day to generate, three days to read it" sounds like padding to anyone who hasn't done it.

It is not padding . It is the actual estimate . The team needs to say it out loud, in the room where estimates happen.

The Move 2: every estimate has two numbers.

Generation and review. Both go on the ticket. Both go in the planning doc . When the manager reports up, both go up. The word shipped gets reserved for reviewed. Generation gets its own word , whatever you want to call it, because it is its own thing, and pretending otherwise is what is currently making your timelines lie.

First time you give a two-number estimate, a stakeholder is going to push back. Why is review longer than generation? That question is the whole conversation the industry is not having. Answer it honestly. Because the model writes plausible-looking code, and plausible-looking code takes longer to verify than obviously-broken code, and we are not shipping plausible-looking code into production as if it were real . Real means read.

If your team cannot adopt this, look at the ticket template. The template is doing the lying. Most templates were designed when there was one number to put in. Two numbers needs two fields. If your tooling has one field for " estimate," add a second one. Call it whatever . Verification . Read time . The part where we actually check it. The field is the move.

This sounds small. It is. It is also the move that builds vocabulary the rest of the team uses without thinking about it for the next two years . The team that has two columns on the planning board will think in two numbers. The team that has one will keep getting surprised.

***

Read or refuse.

You approved a PR yesterday . Did you read it?

Be honest. You scrolled. The diff was 1500 lines. The author had been waiting all morning . You clicked

approve.

"Keep your PRs small" is advice your industry has been giving since the 2000s. It always assumed one thing that no longer holds: that the author was the one bearing the cost of the PR being too big. Writing 1500 lines was painful. The author felt the pain. PR size was naturally limited by the author's appetite for typing.

That friction is gone. The model produces 1500 lines as easily as 50. The author prompted once, got back the wall of code, and moved on. They did not feel the size of what they submitted. The reviewer feels it. The reviewer is now the only thing in the system that experiences PR size as a cost.

So the old advice is the same . The reviewer's job is not .

The reviewer used to be a checker. The reviewer is now a constraint . The reviewer is the only place left where the system slows down enough for a person to look. PR size on your team will not be limited by the authors writing them , because the authors are not writing them , the model is. PR size will be limited by what the reviewer is willing to read. Read whatever comes in , and PRs grow until the team is shipping unreadable diffs . Refuse , and PRS stay small enough that review can happen.

The Move 3: when the PR is too big to read carefully, you send it back. This is too big . Break it up. I will review the pieces . The author will think you are being difficult. The author has not had to think about PR size in a while. The reminder is part of the job now.

There is a second wrinkle . AI-generated code is harder to review per line than human code.

Wrong-but-plausible takes longer to spot than wrong-and-broken . The 500-line PR that was reviewable in 2022 is not reviewable now, because the bugs are subtler and the structure is sometimes the model's idea of structure rather than the author's.

The threshold for "too big" shrank , even if nobody adjusted the rulebook.

You will be unpopular for a sprint . The author will sigh . The PM will ask why velocity dropped . Let them . A sprint later, PRs are smaller, reviews are real, bugs go down. Velocity, by the honest measure, goes up.

This is not the old advice. The old advice told the author to be disciplined. The new move asks the reviewer to be the discipline. It is a different job . It is the job now .

***

Walk me through it.

You submit a PR. The reviewer comes back with a question. Walk me through this function. Why does it handle the error this way? What happens if the input is null?

You cannot answer .

You generated this . You did not read it carefully before submitting. You skimmed , ran the tests , watched them pass, hit submit.

This is the new author problem. The reviewer is doing their job . You are not doing yours.

Pre-AI, understanding your own code was automatic. You wrote it. The understanding came along for the ride. You did not have to manufacture it. Post-AI, understanding is no longer automatic , because writing is no longer the thing you did. The model wrote it. You prompted , accepted , submitted. None of those steps required you to understand what was in there.

If you do not understand what you submitted, you are not the author. You are the routing slip.

The Move 4: before you submit the PR, you write a short note on it. Three or four sentences. I asked the model to X. It produced Y. I read it . These parts are clear. This part I am less sure about. The thing most likely to break is Z. The note goes on the PR before review starts.

The note does three things. It forces you to read your own PR before submitting it , because writing the note is harder than not writing it. It tells the reviewer where to focus , which makes review faster , not slower . It creates a record that you took ownership of what was generated , which is the difference between authoring and forwarding.

You will resist this move. It feels like extra work. It is. It is also the work that was always supposed to happen, the work that used to happen automatically because writing code without understanding it was hard.

The model removed the friction that was doing this for you . You have to put it back deliberately , or the team is shipping unwitnessed code , which is what the contract from the cold open was supposed to prevent.

Adopt this and Move 3 together and something rare happens . The reviewer arrives at the PR already knowing where to focus, because the author told them. Review gets faster even as the bar gets higher. Work shrinks. Quality goes up . The team has stopped paying twice.

Hire for reading.

The candidates who breeze through your interview are the ones the model will replace first.

Every round on your loop tests the old job. Take-home: build a thing. Live coding : write a function. System design: architect a service . The candidate who aces all of this is the one fastest at producing code under pressure. Producing code under pressure is the part of the job the model already does, the part converging to a commodity, the part where five years from now nobody is paying a salary.

The skill that survives is the one your interview does not test. Reading code carefully. Catching the bug nobody else saw. Refusing the PR that is too big. Defending the code you submit. Articulating what you are not sure about. The model does none of this. The team needs all of it. The interview tests none of it.

The Move 5: redesign one round to test reading.

Bring a piece of code. Three hundred lines. AI-generated , with two or three subtle bugs that pass tests. Hand it to the candidate. This is going to production tomorrow . Walk me through it . Find what is wrong . Tell me what would break first .

Then watch . How they read . What they notice . Whether they push back on the size . Whether they refuse to bless something they cannot verify. Whether they spot the bug or the bug spots them .

The candidate who reads carefully , asks the right questions, articulates uncertainty , and refuses to approve what they cannot defend is the senior you actually need. The candidate who reads it once , says it looks fine , and tries to move on is the one who will keep your queue moving and ship the next bug. Your current interview cannot tell these two candidates apart. The reading round can .

This repairs the junior pipeline from Chapter 5. Juniors do less writing now. They can do more reading earlier than they used to. The skill seniors developed over three years of bug-hunting can be taught directly and tested for. The pipeline is not broken because juniors cannot write. It is broken because nobody is teaching the reading . Hire for it. Teach it. The bottom of the ladder comes back.

Hire for the part of the job the model is not doing . Everything else is converging to free. The skill you screen for should be the only skill left with a moat.

***

Translate the math upward.

You are in a leadership review . The VP is looking at the slide. Your team shipped 40% more this quarter than last quarter. The VP nods. The VP says good things. You know what the 40% is.

The 40% is generation. The 40% is the model writing code at a speed your team would have laughed at two years ago . The 40% does not include the review that did not happen, the PR your second senior refused last Tuesday for being too big, or the bug from week three that took three people to find .

The number on the slide is real and false at the same time . Real because it ships . False because it does not describe what the team did.

This is the manager's specific job now: translation . The team's actual work has become invisible to leadership . Velocity is up , but most of the new velocity is generation , and the verification that turns generation into shipped work is happening off-screen , in evenings and weekends , in the time of your most senior people, who you cannot afford to lose. Leadership cannot see any of this. They see the slide. The slide is lying for you.

If you do not translate , the lying slide becomes the truth on the next planning cycle . Leadership budgets for the lying slide. Headcount gets allocated on the lying slide . The team's compensation depends on the lying slide. The actual work , the review and the reading and the refusing, gets squeezed harder, because there is no budget for it, because nobody upstream knows it exists.

The Move 6: bring three numbers to every leadership conversation. Generation. Review. Net.

" We generated 40% more than last quarter. Our review hours doubled . Net velocity is up 15%."

That sentence is short. It is the entire move. Leadership will react in one of three ways . They will accept it as obviously the right framing, in which case you have just rewired how your team gets measured , which is most of your job. They will push back ( " why is review so expensive?"), in which case you arrive at the conversation the company needs to have , before a production outage forces it on you. They will dismiss it ( " just give me the topline " ), in which case you have learned what kind of company you work for , which is also useful information.

You will be tempted to soften the framing . To say "with quality improvements factored in" or "adjusted for verification overhead." Do not. The honest numbers are the move. Smooth language is the failure mode.

This is the new manager skill . It is not in any training program. It will be, eventually. The managers who figured it out early will be running the training program. The managers who did not will be looking for jobs .

If you are reading this and you are not a manager , this move is not optional for you either . The senior IC who can translate the math upward, to their own manager, is the IC who becomes legible to the layer above . Without it , you are paid for velocity , which is the metric your team is currently winning by lying.

***

Back to the math from the cold open . Pre-AI , the work took ten days . The model now writes the same work in two. Reading it carefully takes four more. Total: six days. The real savings is four, not eight, and only if the team actually does the reading.

If the team does not do the reading , the four days were imaginary. The team ships unread code, the bill arrives in production, and the team that thought it saved eight days pays for ten.

Every move in this chapter is one team's way of refusing to pretend .

State the new done. Two-number estimates. Read or refuse . Walk me through it . Hire for reading . Translate the math upward.

None of these are hard alone. The hard part is that every one requires someone on the team to say something nobody is saying. To send a PR back to its author. To write a note about a function you cannot defend. To tell a VP that the velocity number is two numbers and one of them was hidden.

The team that does this stops paying twice. The team that does not pays once in the work, again in the rework, and again in a quarter that has not arrived yet, when token budgets show up and someone in finance asks what was spent on what .

A team that cannot talk about the costs   is a team that pays them twice.

The most expensive bug in your codebase is not in your codebase. It is the same one Chapter 1 named: shame , scaled up . The shame that kept one engineer quiet at their desk is the silence that now runs your team. The unread PR , the missing review note , the slide with the dishonest topline , every one of them sits on top of a thing somebody was too uncomfortable to say.

Out loud is the only fix . This chapter is six ways to start.

***