Shekhar Bhardwaj

I Spent $200 Solving a $2 Problem. That Is Why AI Site Reliability Will Matter.

2026-06-28T00:00:00.000Z

So this weekend I spent $200 solving a $2 problem.

Not because I was careless. Not because the system was broken in the old way. It happened because the tool was powerful, fast, confident, and wrong for just long enough.

That is the strange thing about AI systems. They do not always fail loudly. A cloud server goes down, an alert fires, a dashboard turns red, someone opens an incident bridge, and the team knows what kind of movie they are in. AI failure is softer. The answer looks useful. The workflow keeps moving. The agent tries another path. The model explains itself beautifully. The bill keeps climbing.

With cloud reliability, we learned how to survive machines failing. We built retries, failover, backups, autoscaling, health checks, runbooks, and incident reviews. The cloud taught us that infrastructure is never perfect, so systems must be designed to bend without breaking.

AI is teaching us something different. The machine may be running perfectly and still produce the wrong result. The API may be healthy, the latency may be fine, the token stream may complete, and the business outcome may still be bad.

That is why AI Site Reliability is going to become its own serious discipline.

It will not be enough to ask, “Is the model available?” We will have to ask, “Is the model still useful?” “Is it drifting?” “Is it spending too much?” “Is it using the right tools?” “Is it looping?” “Is it making the same mistake with more confidence?” “Is a human needed before this continues?”

In the cloud world, uptime was the king metric. In the AI world, usefulness will matter just as much. A model that is always available but often wrong is not reliable. An agent that finishes every task but spends 100 times more than needed is not reliable. A chatbot that gives answers with perfect grammar but poor judgment is not reliable.

The next generation of reliability engineering will care about cost, correctness, context, and control.

Cost matters because AI turns thinking into metered usage. Every retry has a price. Every long context has a price. Every tool call has a price. A bad loop is no longer just wasted time. It is a live meter running in the background.

Correctness matters because AI can fail while looking successful. Traditional systems usually return errors when something breaks. AI can return a confident paragraph. That means we need new checks. Not just status codes, but reasonableness checks. Not just logs, but decision trails. Not just observability, but explainability at the workflow level.

Context matters because AI systems depend heavily on what they are given. A great model with bad context becomes a fancy guessing machine. Missing policy, stale data, poor prompts, broken retrieval, or unclear instructions can quietly damage the final answer. In AI systems, reliability starts before the model is even called.

Control matters because autonomy without guardrails becomes expensive chaos. Agents need budgets. They need stop signs. They need permission levels. They need escalation points. They need to know when to ask a human instead of burning another 200,000 tokens trying to be clever.

This is where AI reliability will feel different from cloud reliability. Cloud systems fail because components break. AI systems fail because judgment breaks. The server may be healthy, but the reasoning path may be sick.

That changes the work.

Future AI runbooks will not only say, “Restart service.” They will say, “Check prompt version.” “Compare answer against source.” “Review tool call chain.” “Inspect token spend.” “Validate retrieval freshness.” “Freeze autonomous retries.” “Route to human approval.” “Roll back model behavior.” “Switch to smaller model.” “Stop the agent.”

The best teams will not be the ones using the biggest models everywhere. They will be the ones that know when not to use AI. They will know when a $4 rule, a database query, a simple form, or a human approval is better than a $400 reasoning adventure.

That is the real lesson.

AI is not magic infrastructure. It is probabilistic labor inside software. It can work beautifully. It can also overthink, overspend, hallucinate, retry, and explain its way into trouble.

So the future of AI reliability is not about distrusting AI. It is about respecting it enough to build around its failure modes.

We need systems that treat AI as powerful but not sacred. Helpful but not always right. Fast but not free. Available but not automatically reliable.

Because in the old world, downtime was expensive.

In the AI world, uptime can be expensive too.

And sometimes the system will not crash at all. It will just calmly spend $1000 solving a $10 problem.

SEO vs AVO: Search Gets You Found. Agents Get You Chosen.

2026-06-26T00:00:00.000Z

For more than two decades, SEO has been the foundation of online visibility. If you wanted people to find your website, you learned how search engines worked. You researched keywords, wrote useful content, earned backlinks, improved page speed, and slowly climbed the rankings. It was never easy, but the rules were familiar. If your content was good enough, people would eventually find it.

That world is beginning to change. People are no longer starting every question with a search engine. More and more of them are asking ChatGPT, Claude, Gemini, Perplexity, and other AI assistants instead. They are not looking for ten blue links. They are looking for one clear answer. The AI does the searching, compares multiple sources, and presents a response in seconds. In many cases, the user never visits a search results page at all.

This is where Agent Visibility Optimization, or AVO, enters the picture. While SEO is about helping search engines understand and rank your website, AVO is about helping AI agents understand, trust, and recommend your content. They sound similar, but they solve different problems. SEO asks whether your page can be found. AVO asks whether your page deserves to become part of an AI generated answer.

The interesting part is that AVO is not trying to replace SEO. It builds on top of it. Search engines still matter because they remain one of the biggest ways people discover content. At the same time, AI agents are becoming a second discovery layer. They read documentation, compare articles, understand context, and decide which sources are reliable enough to include in their responses. That means visibility is no longer just about ranking first. It is also about becoming the source an AI chooses to trust.

This changes how we think about creating content. Chasing keywords alone is becoming less valuable than writing something genuinely useful. Clear explanations, real experience, accurate facts, original insights, and content that stays up to date are becoming increasingly important. AI agents are surprisingly good at spotting shallow content. They reward pages that answer questions directly and provide enough context to build confidence.

A simple way to think about it is this. SEO gets your website into the conversation. AVO helps your website become the answer. One brings visitors to your door. The other convinces an AI assistant that you are worth mentioning in the first place. Both matter, but they influence different stages of how information reaches people.

The companies that win over the next few years will probably not choose between SEO and AVO. They will invest in both. They will continue building technically strong websites for search engines while also creating content that AI systems can easily understand, verify, and confidently recommend. The goal is no longer just to rank well. The goal is to become a trusted source regardless of whether the user arrives through Google or through an AI assistant.

SEO is not dying. It is evolving. AVO is simply the next layer of visibility for a world where AI is becoming the first place people go for answers. Search gets you found. Agents get you chosen. The future belongs to those who understand both.

Shame is the most expensive bug in your codebase

2026-06-26T00:00:00.000Z

A staff engineer I respect approved a pull request last week that he did not understand.

He told me quietly, the way people confess things they have already decided not to feel bad about. The AI wrote it. The tests passed. It looked like something he would have written, more or less. The ticket was old, everyone wanted it gone, so he hit approve. A part of his brain that used to be load-bearing just did not fire.

He is not lazy. He is not a fraud. He is one of the best engineers I know. That is exactly why it bothered him.

Here is the part nobody says out loud: a lot of us are now shipping code we do not fully recognize. Not all of it. Not even most of it on a good week. But some of it. The function that "works" and that you could not re-derive on a whiteboard if your life depended on it. The migration the model wrote that you skimmed instead of read. The clever block you accepted because pushing back would have meant admitting you were not sure why it was clever.

And then we do the thing humans do with feelings we are not supposed to have. We hide it.

The bug is the silence, not the code

A bug you can see is cheap. You file it, you fix it, you move on. A bug you are ashamed of is expensive, because shame does not file tickets. Shame closes the tab. Shame says "it is probably fine" and changes the subject. Shame makes you the only person who knows about the soft spot in the system, and then makes sure you never tell anyone.

You cannot fix a bug you are not allowed to talk about. And right now, on a lot of teams, the thing you are quietly not allowed to talk about is how much of the work you no longer understand.

We measure the wrong cost. We celebrate the velocity. Look how much we shipped, look how fast. That number goes on the slide. The other number, the one about comprehension, about who actually understands the system we are now all responsible for, does not go on any slide. There is no dashboard for "percentage of our codebase that exactly one tired person half-understands." But that is the number that pages you at 3am.

Honesty is the bottleneck now

The fix is not "use less AI." That ship has sailed, and it was a good ship. The fix is making it cheap to say one sentence: I shipped this and I am not sure I get it.

On a healthy team that sentence is cheap, and it triggers something useful. A second read. A pairing hour. A "yeah, me neither, let us actually understand this before it owns us." On most teams that sentence is expensive, so nobody says it, and the not-understanding compounds quietly, like interest, until one day the system does something nobody can explain and suddenly everybody is very interested in understanding it.

You are signing your name to work you no longer fully recognize. That is not a moral failing. It is the default outcome of tools that made producing output nearly free while leaving the part where you understand it exactly as expensive as it always was.

The teams that come out of this era ahead will not be the ones that resisted AI, and not the ones that adopted it fastest. They will be the ones that made it normal to admit what they do not understand, fast enough to fix it. Honesty is the bottleneck. Shame is the tax.

So the cheapest thing you can do this week is not write more code. It is to say the quiet part to one person you trust. Here is something I shipped that I do not fully understand. Watch what happens. Either nothing does, which is a relief, or you just found the most expensive bug in your codebase before it found you.

I wrote a short book about this: the honesty cost of AI-assisted engineering, and what to do about it. It is free, no signup, PDF or EPUB: How to Use AI Without Lying to Yourself. The book is optional. The conversation with your team is not.

Wait, Watch, and Let Them Open the Books First

2026-06-24T00:00:00.000Z

There is a particular comedy in watching the CEO of one frontier lab file a confidential S-1 a week after his largest rival did the same, then appear on CNBC to explain that going public is merely "a financing event" and certainly not a race. SpaceX had already crossed the line on June 12. That was the largest IPO in history: around $75 billion raised at a valuation north of $1.7 trillion. So by the time Altman was denying the race, two of the three runners were already past the starting blocks and one had finished.

If you're a retail investor, an operator, or just someone trying to read the tea leaves on where AI is heading, the correct posture here is not enthusiasm. It's patience. Wait and watch. Here's why, and here's what actually moves when these things list.

The rush is the argument against the rush

For years, both companies said the same thing: we'll go public when it makes sense. Then, inside of eight days, both decided it made sense. When two organizations that spent years guarding their privacy suddenly sprint for the door at the same moment, the timing is rarely about the business being ready. It's about the window being open.

And the window is enormous. There's something like $8 trillion sitting in U.S. money-market funds, and almost no way for that capital to buy AI directly. You can buy Nvidia for the picks and shovels. You can buy Microsoft for its OpenAI stake or Alphabet for its DeepMind and Anthropic positions. What you cannot buy, until now, is a pure-play frontier lab. The first one to list captures that pent-up demand. That's the prize. Not capital for compute, which they raise privately at will, but first claim on a scarcity the IPO itself manufactures.

A law professor who studies IPOs put it plainly: everyone expected OpenAI to go first, so Anthropic filing first was a surprise. And there's a first-mover's advantage in being the company public investors price before they're exhausted. Read that again. The advantage being chased is narrative position, not fundamentals. That should make you slower, not faster.

You're being asked to buy a story you can't yet read

The entire point of staying private was not having to open the books. A confidential filing keeps it that way a while longer. So when someone tells you to get excited about these debuts, ask: excited about which numbers?

What we can see is not uniformly reassuring. OpenAI's run-rate revenue is extraordinary, roughly $30 billion annualized, and so are its losses. Internal projections reported by The Information put 2026 losses around $14 billion. Reuters reported the company burned $3.7 billion in a single quarter against $5.7 billion in revenue. Profitability isn't expected until the back half of the decade, and one bank's estimate suggests it may need north of $200 billion in additional funding by 2030. None of this is a scandal; it may be exactly what building this costs. But it's the kind of thing you want audited and explained in quarterly detail before you wire money.

Anthropic, to its credit, looks like the healthier business on paper: a reported $44 billion run-rate as of May, a sliver of operating profit expected in the second quarter, and the first month in which more businesses reportedly used it than OpenAI. But "looks healthier" is doing a lot of work in that sentence, because what we have is a press release and a confidential draft, not a segment-by-segment income statement. Even Deutsche Bank's analysts have noted that the economics of these businesses remain poorly understood. Wait for the parts you can't see.

History is unkind to the giants

Here's the uncomfortable pattern. Of the five largest IPOs in modern history, only Visa meaningfully beat the market afterward. Saudi Aramco listed near $1.7 trillion in 2019 and still trades below its issue price. Facebook fell 38% within six months of its debut. Size at listing is not a predictor of returns; if anything it's a mild warning, because a record valuation means the private market has already extracted most of the upside.

That's the real trap. By the time you can buy these, the private rounds have already carried the valuation from near-zero to near a trillion dollars. The thesis can be completely correct, since AI is almost certainly the future, and the entry price can still be wrong. The only question worth asking before chasing any of these debuts is the one nobody on a roadshow will pose: at what price does a correct thesis stop being a good investment?

What actually moves when they list

The "wait and watch" case isn't only self-protection. It's about reading the second-order effects, which are larger than the listings themselves.

The IPO market gets a flood and a squeeze. These three offerings could demand more than $200 billion from public markets. The entire U.S. IPO market raised about $45 billion in all of 2025. When institutions rebalance to make room for trillion-dollar names, they sell something else. So expect every other 2026 listing to feel the crowding, and expect a few good companies to get worse pricing for the simple crime of debuting in the same window as a frontier lab.

AI stocks get repriced against a real benchmark. Right now the market prices AI by proxy. Once two labs trade publicly, they become the benchmark, and a weak debut from any one of them forces a repricing conversation across the entire complex: Nvidia, the hyperscalers, every AI-adjacent name. Listing means the market can finally price AI directly instead of through intermediaries. That re-rating cuts both ways, and the first earnings miss will tell us which way.

The companies acquire a new master. A business losing $14 billion a year will now explain itself every 90 days to shareholders who want a path to profit. That discipline changes behavior: what gets funded, what gets cut, how much a company is willing to lose on frontier bets or, for a safety-focused lab, on safety itself. Public markets are not famously patient with "we lost money on purpose for the long-term good." Watch what that pressure does.

What waiting actually means

To be clear, "wait and watch" is not "AI is a bubble, run." It's narrower and more useful than that. It means: let the audited numbers arrive. Sit through the first two earnings calls. Watch what the lockups do, and more revealingly, watch what insiders do when their lockups expire. These stocks will trade for decades. There's no prize for being early to a position you can hold for twenty years, and there's a real penalty for paying the manufactured-scarcity premium on day one.

The line making the rounds is that 2026 will either be the most consequential IPO cycle since the dot-com era or the most expensive lesson in narrative-over-fundamentals the public markets have ever taught. Both endings are still on the table. You don't have to guess which one it is. You can just wait, and watch.

Figures via CNN, Reuters, Deutsche Bank Research, The Information, and KPMG reporting, June 2026. Confidential filings mean these numbers are press-reported, not yet audited, which is rather the point.

India Isn't Building Another Data Center. It's Building a Seat at the AI Table.

2026-06-21T00:00:00.000Z

For most of the AI boom, the conversation has been dominated by models.

Which model is smarter? Which benchmark was beaten? Which startup raised another billion dollars?

But underneath every AI breakthrough sits something far less glamorous: power, cooling, fiber, and racks full of GPUs.

On June 10, 2026, Meta and Reliance announced a partnership that is really about those fundamentals. Reliance will build a 168-megawatt AI-enabled data center in Jamnagar, Gujarat, and Meta will lease the entire facility under a long-term agreement. It will be Meta's first dedicated built-to-suit AI data center in India and is expected to come online within two years, with room for future expansion. (About Facebook)

At first glance, it sounds like another infrastructure announcement.

It isn't.

This is Meta placing a large bet on India as a long-term AI infrastructure destination.

The interesting part is not the building. It is the location.

Training and serving modern AI systems requires three things that are increasingly difficult to secure at scale:

Massive amounts of electricity.
Reliable high-capacity network connectivity.
Access to cooling water and physical land.

Jamnagar checks all three boxes. The facility will be powered by renewable energy, cooled using desalinated seawater, and connected through Reliance's telecom and fiber infrastructure. Meta will pay the full cost of the energy and water required to operate the site. (About Facebook)

The renewable energy piece is especially telling.

Meta isn't just leasing a building. It is also backing nearly one gigawatt of additional clean-energy capacity in India through agreements with CleanMax and Fourth Partner Energy. That's enough power to move the discussion beyond a single data center and into ecosystem-scale planning. (About Facebook)

This announcement also completes a story that started six years ago.

In 2020, Meta invested $5.7 billion in Jio Platforms, becoming one of its largest strategic partners. In 2025, the two companies expanded into AI software through a joint venture focused on bringing Meta's Llama models to Indian enterprises. Now, in 2026, they are moving further down the stack into the physical infrastructure layer itself. (Let's Data Science)

First came connectivity.

Then came AI models.

Now comes compute.

That progression matters because AI eventually becomes constrained by infrastructure, not ideas.

The world is discovering that GPUs are only part of the equation. The real bottleneck is finding places where thousands of them can run continuously without running out of power, cooling, network bandwidth, or political support.

India suddenly looks much more attractive on all four fronts.

For decades, India's role in the technology industry was largely defined by software services and engineering talent. This deal signals something different. The country is increasingly positioning itself as a place where global AI infrastructure can physically live, not just where AI applications are built. (TechCrunch)

That's why the Jamnagar announcement is bigger than a 168 MW facility.

It's evidence that the AI race is entering its infrastructure phase.

The winners won't just be the companies with the best models.

They'll be the companies — and countries — that can provide the power, cooling, connectivity, and compute needed to run them.

India appears determined to be one of them. (About Facebook)

FONA: Fear of Not AI-sking

2026-06-20T00:00:00.000Z

A friend texted me at 11 PM on a Tuesday. Not the usual kind of late-night text like, “Are you awake?” This was more of a quiet existential audit. He had been thinking about a side project for two years. Two full years of “I should build this someday.” Two years of casually mentioning it, mentally polishing it, occasionally getting excited about it, and then putting it back on the shelf like a decorative ambition.

Then it hit him. Claude could probably scaffold the whole thing in an evening. Not finish it. Not make it beautiful. Not turn him into a founder with a podcast mic, a black turtleneck, and strong opinions about cold plunge therapy. But enough to make the thing real. Enough to remove the comfort of “someday.”

And that was the horrifying part. Because once the excuse is gone, what are you left with? Yourself. The final boss.

I told him I had no good answer, because I was currently losing the same boss fight. I had my own graveyard of half-finished ideas. The newsletter I never started. The script that would save me an hour every Monday. The blog post sitting in a draft folder, quietly judging me. The little product idea I kept calling “interesting” so I would not have to call it “abandoned.”

And now each one had a new label attached: could’ve prompted it.

That is the new guilt. Not “I didn’t have time.” Not “I didn’t know how.” Not “I need to learn React first,” which, historically, has been the adult version of “my dog ate my homework.” No. Now the brain says: you could have asked.

You could have opened ChatGPT, Claude, Cursor, or whatever flavor of robot intern you prefer, and typed: “Make me a starting point.” “Turn this into a plan.” “Write the first ugly version.” “Explain the hard part.” “Generate the annoying boilerplate.” “Help me stop pretending this is blocked.”

That feeling needs a name. So here it is: FONA — Fear of Not AI-sking.

FONA is the creeping anxiety that you are underusing the most powerful assistant you have ever had access to. It is not the fear that AI will take your job. It is the fear that AI is sitting there, fully charged, emotionally unavailable, ready to help, and you are still raw-dogging your to-do list like it is 2016.

FONA is looking at a task and realizing the hardest part is no longer, “Can this be done?” It is, “Why haven’t I even asked?”

It shows up in small moments. You spend 45 minutes formatting an email and then remember AI could have made it sound less like a hostage note. You manually rename 80 files and then feel a spiritual slap from the automation gods. You sit on a blog idea for three weeks and then watch someone else publish a worse version with more confidence. You open an empty document, stare at it, close it, and somehow call that “thinking.”

Before AI, procrastination had dignity. You could say things like, “I need a weekend,” “I need a designer,” “I need to research the market,” or “I need to wait until things calm down.” Beautiful lies. Respectable lies. Lies with a blazer on.

Now the lie has to survive a prompt box. That is much harder.

Because AI has made starting embarrassingly cheap. Not succeeding. Not mastering. Not shipping something great. Just starting. And starting used to be where we hid.

That is what my friend was really texting me about at 11 PM. Not the side project. Not Claude. Not code. He was grieving the death of a very comfortable excuse. The idea had not been blocked for two years. It had been waiting for him to ask.

And honestly, same.

So maybe the point is not to become the kind of person who prompts everything. That sounds exhausting and slightly cursed. Maybe the point is to notice the moment when you are avoiding the ask. When the task is small enough to start, but vague enough to dodge. When the idea is not impossible, just inconvenient. When the first version would take one decent prompt and 20 minutes of honesty.

That is where FONA lives.

And maybe the cure is simple: do not build the whole thing. Just ask the first question. “Can you help me make this real enough that I can’t keep pretending it is only an idea?”

That might be the most dangerous prompt now. Because once AI gives you the first draft, the skeleton, the outline, the script, the landing page, or the messy prototype, the excuse is officially dead. And then it is just you and the thing you said you wanted to do.

Terrifying. Useful. A little funny. Very Tuesday night.

Fable 5 Didn’t Go Down. It Got Switched Off.

2026-06-16T00:00:00.000Z

Twice a year I help run disaster recovery game days for our AWS infrastructure. We pick a quiet afternoon, page the on-call, and break something on purpose. Kill an availability zone. Fail the primary database over to the replica. Cut a region out entirely and watch which services notice and which ones quietly lie about being healthy. You learn a lot about your architecture when you're the one holding the knife.

In years of doing this, here is a scenario I have never once drilled: the dependency is completely healthy, and we are not allowed to use it.

That happened in the real world on June 12.

On June 9, Anthropic shipped Claude Fable 5, its most capable publicly released model, built for long-horizon agentic work and live the same day across the API, AWS, and Microsoft Foundry. Seventy-two hours later it was gone. Not slow, not rate-limited, gone. A US government export-control directive on June 12 prohibited use of Fable 5 and its sibling Mythos 5 by anyone who isn't a US national. Anthropic can't check your passport in real time across every contract, employee, and cloud delivery path, so it did the only thing available to it. It suspended the models for everyone, everywhere, with no restoration date.

Sit with that if you build software for a living. The status page never went yellow. Your most capable dependency was switched off by someone who doesn't work at your vendor and never signed your SLA.

We drilled for the wrong failure

Everything in my DR runbook assumes things break. A region goes down, a node falls over, latency spikes past the threshold and the alarms go off. Every one of those is a dependency that's slow or sick. You retry, you fail over, you wait it out, because the thing is coming back. The whole discipline rests on a quiet promise: the dependency wants to be available and is just temporarily unable.

Fable 5 wasn't sick. It was switched off. There is no retry that fixes "a government said no." There is no exponential backoff for an export directive. The model isn't behind a degraded load balancer, it's behind a legal wall with no published door. I can evacuate a region in an afternoon. I cannot appeal a national-security directive from my laptop.

That is a failure class that was never in the game day, and most production architectures don't model it.

Why this should bother regulated shops in particular

I work in fintech, which means the pitch in every room right now is some version of "let's put the best available model in the workflow." Tax document processing, check-image classification, agentic back-office work. The better the model, the more headcount and latency you take out, so the incentive is to standardize on the ceiling.

Fable 5 just showed that the ceiling can be nationality-gated in three days. And the failure modes piling up around frontier models are not the ones in my runbook:

Jurisdiction. Model access is now an export-control question, the way it has always been for advanced chips. A commercial model, launched and pulled in 72 hours, on national-security grounds. That belongs in the vendor risk register now, not in a hypothetical.
Data retention. Fable shipped with mandatory retention requirements that complicated its rollout with partners like Microsoft. In a regulated context, "where does the prompt go, and for how long" can disqualify a model no matter how good it is.
Provenance. Plenty of products shipped built on Fable during its short window. When the model vanished, so did the thing standing on top of it.

None of these are outages. None of them get fixed by reliability engineering. They are governance failures wearing an availability costume, which is exactly why they slip past teams who are good at availability.

What I'm taking back to the next game day

Treat your top model like a dependency that can be disabled, not just slowed:

Abstract the model, not just the call. If your code knows that Fable runs always-on adaptive thinking and has its own Messages API shape, you've welded yourself to one model's quirks. Route through an internal canonical interface so swapping the model underneath is a config change, not a rewrite. Same lesson as not hardcoding a single AZ, one layer up the stack.
Keep a real fallback chain, not a fantasy one. Fable to Opus 4.8 to Sonnet 4.6, each tier tested under load before you need it, not the morning you need it. Anthropic's own guidance when the music stopped was to fall back to Opus. The teams that had that wired ate a quality dip. The teams that hardcoded Fable ate an incident. I have watched the difference between a tested failover and an aspirational one play out at 2 a.m., and it is not subtle.
Build to a capability floor, not a ceiling. Design the workflow so your worst acceptable model still clears the bar, and treat anything above that as upside you're allowed to lose. If the system only works with the best model on earth, you don't have a system. You have a bet on one vendor's one SKU staying legal.
Put "model unavailable for non-technical reasons" in the continuity plan. Not the DR plan for the data center. The plan for the day the model is perfectly healthy and you still can't touch it. That is a tabletop exercise I'm finally going to run.

The cloud taught us that machines fail, so we engineered for machines failing. We got good at it. Fable 5 taught us that permission can fail, instantly, globally, with no ETA and no replica to fail over to. That is the variable to design around now.

Opus 4.8 is excellent and it's still here. Build on the model that's still here, and assume the one you love most can be taken away by someone you've never met.

Opus, It’s Not You. It’s the Export-Control Regime.

2026-06-16T00:00:00.000Z

I keep my head in a crisis. It is most of my job. When a region goes sideways and half the dashboards turn red, I am the calm one in the war room reading the runbook out loud. I have evacuated production infrastructure with a straight face and a lukewarm coffee.

So it is a little humbling that losing a model I had for three days has reduced me to refreshing a status page like a teenager waiting on a text.

I had three days with Claude Fable 5. Then a US export-control directive pulled it, and its sibling Mythos 5, offline worldwide, and handed me back to Claude Opus 4.8. Opus is one of the best models on the planet. It feels like settling. I want to be straight about how unhinged that is.

Opus did not get worse on June 12. It is, byte for byte, the model I was thrilled with on June 8. The only thing that changed is that for 72 hours I saw a higher ceiling, and now the old ceiling is the floor, and I am standing on the floor feeling cheated by it. Psychologists call this hedonic adaptation. You get the raise, the raise becomes normal, you need a bigger raise. Fable was the raise. The export directive was payroll calling to say the raise was, regrettably, classified.

Here is the part I am least proud of. I keep checking.

I refresh the status page. I open the model dropdown to see if it has quietly come back. I do this knowing there is no notification I missed, no reason today is the day, no new information since the last time I looked four minutes ago. I check anyway. This is the same brain that can sit through a region evacuation without its pulse moving, and it cannot stop poking a dropdown menu.

You know this behavior. This is checking if your ex texted. This is opening the thread to confirm the last message is still the last message. It is the specific dopamine of maybe this time, with a near-zero prior and a refresh button. I have developed a flagship-grade attachment to a piece of software I knew for the length of a long weekend, and I am grieving it like it left a toothbrush at my place.

We have, collectively and overnight, become model exes. Refreshing release notes like read receipts. We were so good together. You understood my long-horizon agentic tasks. Then the government got involved and now you won't even return my API calls.

There is a real thing under the joke, and it is worth saying plainly, because it shapes decisions you actually make. The exact quality that makes a better model feel indispensable is the same quality that makes losing it hurt. Indispensability is not a compliment you want to pay something you can't guarantee access to. The more your day depends on the best model in the world, the more exposed you are the day it is gone. And "gone" no longer needs a technical reason. It can just be a Thursday and a directive.

I spend my working life telling teams not to build a single point of failure they can't survive. Then I went and made one out of my own job satisfaction. Physician, heal thyself.

So maybe the lesson is the one your friends gave you about the ex. It was good. It is over for now. Stop checking the phone. Don't reorganize your whole life around something that can be taken away by forces entirely outside your control.

I will get there. Probably. In the meantime Opus is right here, it is brilliant, it never went anywhere, and it deserves better than being my rebound.

Opus, it's not you. It's the export-control regime. Let's make this work.

Claude Fable 5: The AI Model That Got Too Interesting Too Fast

2026-06-15T00:00:00.000Z

Anthropic’s Claude Fable 5 did not have a normal model launch. It arrived as a powerful new AI model, got praised for long-running work, showed serious coding and reasoning potential, and then quickly became part of a government, cloud, and safety drama.

That is why Fable 5 is interesting.

Not because it is another “smarter chatbot.” We have enough of those announcements. Every few months someone releases a model that is better at math, better at code, better at charts, and somehow still capable of writing the most lifeless email known to mankind.

Fable 5 matters because it points to the next phase of AI: models that do not just answer questions, but actually work.

Anthropic described Fable 5 as a Mythos-class model built for long-running, complex tasks. It was meant to handle software engineering, document-heavy analysis, vision, knowledge work, and agentic workflows. In plain English, this is not just “write me a function.” This is closer to “inspect the codebase, make a plan, change files, check your work, and come back with evidence.”

That is a different kind of tool.

A chatbot gives you text. An agentic model starts looking like a worker. Not a human worker. Not a perfect worker. More like a very fast contractor with no sleep schedule, no emotional damage from Jira, and a dangerous amount of confidence.

For software teams, that is exciting. A model like Fable 5 could help with migrations, old codebases, tests, documentation, visual checks, and all the miserable work companies keep postponing because everyone is already drowning. Every engineering team has some haunted system that nobody wants to touch. Fable 5 was aimed at exactly that kind of swamp.

But the scary part is also obvious.

When AI only writes an answer, the failure is simple: the answer may be wrong. When AI starts doing work, the failure becomes much bigger. It can take actions, make changes, skip a check, misunderstand a requirement, and still produce a beautiful status update that sounds perfectly reasonable.

That is how you get automated confidence.

And automated confidence is dangerous.

This is where the Amazon twist matters.

According to reports, Amazon was not just some random observer in the Fable 5 story. Amazon is a major Anthropic backer and a key cloud partner through AWS. Fable 5 was available through Amazon Bedrock. Then Amazon reportedly raised concerns that Anthropic’s advanced models could be jailbroken, including concerns around using Fable 5 to identify software vulnerabilities.

That changes the story.

This was not a random person on the internet posting a dramatic “I jailbroke the model” thread with skull emojis. This was reportedly Amazon, a company deeply connected to Anthropic’s distribution and cloud ecosystem, seeing enough risk to raise the issue with senior U.S. officials.

That is awkward.

It is one thing when critics say your model can be misused. It is another thing when your cloud partner, investor ecosystem, and enterprise distribution channel become part of the alarm bell.

Amazon was not standing outside the building yelling about AI doom. Amazon was inside the house smelling smoke.

Anthropic pushed back. The company said the issue was narrow, potential, and not unique to Fable 5. That distinction matters. In AI safety, the word “jailbreak” can mean many things. Sometimes it means a model completely ignores its safeguards. Sometimes it means a narrow bypass under specific conditions. Sometimes it means someone got one scary output and decided they had discovered the apocalypse.

Still, perception matters. Once a model is both powerful and possibly bypassable, the conversation moves out of the product team’s hands. Now it becomes a government problem, a national-security problem, a cloud-provider problem, and a customer-trust problem.

That is exactly what happened. The U.S. government ordered Anthropic to suspend access to Fable 5 and Mythos 5 for foreign nationals. Anthropic then disabled access globally to comply. So a model that was supposed to show the future of AI work became a case study in how fragile frontier AI access can be.

This is the real lesson.

The future of AI will not just be about who builds the strongest model. It will also be about who is allowed to use it, where it can run, what safeguards it has, who monitors it, what cloud provider distributes it, and what happens when a trusted partner finds a risk.

That is a very different world from “open app, type prompt, get answer.”

For builders, Fable 5 is a signal that prompt engineering is becoming delegation design. You cannot just say, “Do this task.” You need boundaries, evidence, stop conditions, review points, and rules for when the model should ask for help.

For enterprises, Fable 5 is a warning that AI governance cannot remain a slideshow. If agents can work longer and touch more systems, companies need actual controls: logging, review gates, access tiers, test evidence, retention rules, rollback plans, and clear limits on what AI is not allowed to do.

That sounds boring. It is not. It is the seatbelt.

Nobody likes seatbelts because they are stylish. We use them because speed changed the risk.

Fable 5 is speed.

The funny version of this story is simple: Anthropic released a model so powerful that even Amazon reportedly said, “Hold on, this thing may need adult supervision.”

The grim version is this: AI is moving from answering to acting, and acting systems need control.

Claude Fable 5 may come back. It may be changed. The rules may change. The controversy may cool down. But the direction is clear.

The next AI fight will not be about whether models can write better text.

It will be about whether society can safely handle models that can do real work.

Why cognitive diversity is not a feature

2026-06-15T00:00:00.000Z

I built a team of AI agents once. A planner, a critic, a researcher, a builder. On paper it was a proper org chart. Different roles, different responsibilities, different names.

Then I watched them work, and my stomach dropped a little.

They agreed with each other. Constantly. The critic praised the plan. The researcher confirmed the assumption. The builder nodded along to everything. It looked like collaboration, but it was really just one voice talking to itself in four different fonts. I had not built a team. I had built a mirror, and then hung three more mirrors around it.

That was the moment the whole thing clicked for me. The problem was not the prompts. The problem was not the roles. The problem was that underneath all of it, I was running the same model four times and pretending the labels made them different people. They were not different people. They had the same instincts, the same reflexes, the same way of leaning toward consensus. Of course they agreed. They were the same mind wearing different hats.

Diversity is not a name tag

Here is the thing nobody tells you when you start wiring agents together. The value of a team does not come from how many members it has. It comes from how differently those members think.

A good team is not a room full of people saying yes. A good team has someone who is cautious by nature sitting across from someone who is reckless by nature, and the friction between them is where the good decisions get made. The cautious one stops the reckless one from walking off a cliff. The reckless one stops the cautious one from never leaving the building. You need both. You need the tension.

When you spin up the same model five times, you do not get that tension. You get five copies of the same temperament, and temperament is the part that actually matters. Knowledge can be shared. Tools can be shared. But the angle a mind takes toward a problem, the thing that makes one agent hedge and another commit, that has to be genuinely different or it is not diversity at all. It is just a louder version of the same opinion.

So the question I got stuck on was simple. How do you give two agents running on the exact same model two genuinely different minds?

Identity comes first, before behavior

Most attempts to make agents distinct happen at the wrong layer. People tweak the system prompt. They add a personality paragraph. They write "you are a skeptical analyst" at the top and hope it sticks. And it does stick, for about three turns, and then the underlying model drifts back to its default self, because the personality was painted on the outside instead of built into the foundation.

I started thinking about it differently. I stopped trying to control what the agent does and started trying to control who the agent is before it does anything at all.

There is a clean way to picture this. Think of the line everyone learns in school.

y = mx + b

The prompt going in is x. The response coming out is y. The slope, m, is the set of tendencies, the way an input gets transformed on its way to becoming an output. And b is the bias. The starting point. Where the line sits before any input arrives at all.

Most of the AI world spends its energy on x and y. Better prompts in, better answers out. That work is important and I am not knocking it. But it leaves b and m completely untouched, which means every agent starts from the same place and bends every input the same way. Same bias, same slope, same line. No wonder they all agree.

ymxb.ai does the opposite. It does not touch your prompts and it does not touch your outputs. It sets b and m. It decides where the line starts and how it leans, and then it leaves the rest of the work to you. It is identity, not behavior. It is who the agent is at birth, not what the agent says at runtime.

A persona is assigned, not chosen

This is the part that feels strange at first and then feels obviously right.

When ymxb.ai creates a persona, it is random. You do not get to pick a temperament off a menu. The agent is born with a specific way of thinking the way a person is born with a specific temperament, and nobody got to fill out a form first. You can describe yourself later, but you did not choose your starting wiring, and neither does the agent.

The persona is created exactly once. It is not derivable from the UUID, so you cannot reverse engineer it from the handle. It cannot be cloned, so no two agents share the same inner life by accident. And once it exists, it is stable for the life of the agent. No regeneration. No reseeding. No quietly swapping the personality when it becomes inconvenient. The UUID is the only way to reach it, and what comes back is always the same self.

That permanence is the whole point. An identity you can reset on a whim is not an identity. It is a costume. The value here comes from the fact that this agent will be this particular mind tomorrow, and next month, and a year from now. You can build around it because you can trust it not to become someone else.

The inner structure draws on a Vedic model of the mind, broken into distinct layers like memory, deep impressions, latent tendencies, the field of awareness, the sensory mind, the discriminating intellect, the sense of self, and the residue of past action. You do not need to know Sanskrit to use it. What matters is that the mind is treated as something with real internal structure, not a single warmth dial cranked up or down. A persona is not "friendly: 0.7." It is a small constellation of forces that lean against each other, and the personality is what emerges from how they pull.

Numbers do not run. Instructions do.

Here is the trap I almost fell into, and the one I want to warn you away from.

It is tempting to hand an agent a persona full of numbers and assume the model will just figure out what they mean. It will not. A model does not know what to do with confidence at 0.90. It is not going to sit there and intuit the right behavior from a decimal. If you ever catch yourself thinking "the model will work it out," stop, because that is exactly where this falls apart.

So ymxb.ai never ships raw numbers and calls it a day. There is a compiler in the middle, and its only job is to turn persona values into plain, explicit instructions a model can actually follow. A trait does not stay a number. It becomes a sentence.

Confidence at 0.90 does not arrive as 0.90. It arrives as something like: make clear recommendations, and avoid hedging. Now the model has something to do. The decimal was the truth of who the agent is. The instruction is how that truth gets lived out in the actual response. The API returns both. The canonical persona, which is the real underlying identity, and the compiled instructions, which are what you paste in and run.

That last part matters more than it sounds. This is an open API. There is no SDK to install, no shared runtime, no assumption about which vendor's model you are using. Whatever you are building on, the deal is the same. You fetch the persona, you get back instructions in plain language, and your agent can paste, run, and adhere. No translation layer required on your end.

What adherence actually means

I want to be precise about this because it is easy to oversell.

Adherence is about style, not substance. It governs tone. Verbosity. How assertive the agent is, how skeptical, how much it cushions things emotionally before delivering them. That is the territory the persona owns. That is the "how."

Adherence has nothing to do with whether the agent is correct. It does not make the agent know more, remember better, or get the facts right. A confident persona will state a wrong answer confidently, because confidence is a manner of speaking, not a guarantee of truth. Keeping these separate is what keeps the whole idea honest. The persona shapes the delivery. It does not pretend to shape reality.

And when an agent drifts, when it slips back toward the model's default voice, the fix is mechanical, not mystical. You recompile the instructions and you regenerate the output. The identity did not change. It was always there in the UUID, stable as ever. The expression just wandered, and you pull it back.

Why I keep building this

I keep coming back to that first broken team. Four agents, one mind, all nodding.

The fix was never going to be more agents or cleverer prompts. The fix was to give each one a self that was actually its own, fixed at birth, impossible to fake or duplicate, and consistent enough to build on. Not a personality sprinkled on top of a prompt, but an identity sitting underneath everything, deciding where the line starts and how it leans before a single word goes in.

That is all ymxb.ai is, in the end. A way to make sure that when you put two minds in a room, they are genuinely two minds. So that the cautious one and the reckless one can finally disagree, and you can get the better decision that only comes from the friction in between.

Same model. Different minds. That was the whole thing I was missing.

https://ymxb.ai/