🛑 Stop Building Agents

Why We Built an MCP Server—and What Broke First

Hugo Bowne-Anderson

Jun 27, 2025

Welcome to Vanishing Gradients!

This issue covers the hard parts of building and evaluating real AI systems:

A new podcast with Phillip Carter on what breaks when you ship LLMs to production — and why his team built an MCP server to handle it
A guest post on five patterns that beat agents (most of the time)
A live session with Zach Mueller (Hugging Face) on scaling training from Colab to clusters
A conversation with Dr. Fei-Fei Li on human-centered AI and what comes after the LLM
Two paths for July: a deep dive on evaluation from Hamel Husain & Shreya Shankar, or a full-stack LLM systems course with me and Stefan Krawczyk
Plus: our updated agent resource guide and the MLOps World | GenAI Summit call for speakers

Quick links below to what’s coming up, what just dropped, and how to plug in:

📺 Live Online Events

📩 Can’t make it? Register anyway and we’ll send the recordings.

→ June 30 — Workshop: From Images to Agents with Ravin Kumar (DeepMind)
→ July 1 — Lightning Lesson: GenAI’s 4 Pillars with John Berryman (AI Consultant, ex-GitHub)
→ July 2 — Human-Seeded Evals (Live Podcast) with Samuel Colvin (Pydantic, Logfire)
→ July 3 — Making LLM Agents Observable & Debuggable with Vincent Koc (Comet, ex-Microsoft, Qantas)
→ July 3 — Scaling AI: From Colab to Clusters with Zach Mueller (Hugging Face)

🎙 Podcasts & Recordings

→ Why We Built an MCP Server—and What Broke First – Phillip Carter (Salesforce, ex-Honeycomb)
→ A Field Guide to Rapidly Improving AI Products – Hamel Husain
→ High-Stakes AI Systems and the Cost of Getting It Wrong – Sudarshan Seshadri (Alto Pharmacy)

📍 In-Person Events

→ July 7–11 — SciPy 2025 (Tacoma, WA)

🎓 Courses

→ June 29 — Extended scholarship deadline for the July cohort of Building LLM Applications
→ July 21-Aug 15— Evaluation Systems with Hamel Husain and Shreya Shankar (use this link for $800 off)
→ Sept 1–Oct 3 — From Scratch to Scale: Distributed Training with Zach Mueller (includes $300 off for Vanishing Gradients readers)

🎧 Why We Built an MCP Server—and What Broke First

“Anywhere there’s an API that serves a purpose, there’s a use case for MCP. And the total economic value of that is enormous.”

Phillip Carter (Salesforce, ex-Honeycomb) joined me to break down what it really takes to ship LLM features to production. In early 2023, he helped launch one of the first real LLM-powered SaaS features—evaluated via spreadsheets and error analysis. More recently, he and his team built one of the earliest production-ready MCP servers.

We cover:

→ How to align model-as-judge behaviour without fancy infra
→ Why spreadsheets beat most eval tooling
→ What breaks when you go from demo to real users
→ Where MCP works—and where it falls short
→ Why observability matters far beyond debugging
→ The glue code that actually ships working LLM features

🎧 Listen on Spotify, Apple, or watch on YouTube
📄 Show notes and all episodes: Vanishing Gradients Podcast

🛑 Stop Building Agents — Now on Decoding ML

Flowchart of when you should or should not use an agent

A few of you already read Stop Building AI Agents when it first ran on High Growth Engineer. It’s now been republished on Decoding ML — and the conversation has picked up again.

If you missed it the first time, or want a refresher, this post breaks down why so many LLM agent systems fail in practice — and what to build instead.

I share:

→ Five workflow patterns that outperform agents in most cases
→ The (messy) story of building a three-agent CrewAI system that looked great on paper but fell apart in practice
→ Where agents actually do make sense — usually with a sharp human in the loop
→ How to debug brittle agent behaviors like tool misuse, memory drift, and unclear delegation

📖 Read the full post: Stop Building AI Agents

⚙️ Scaling AI: From Colab to Clusters — A Practitioner’s Guide

📅 July 3 · 11:00 AM PDT with Zach Mueller (Hugging Face)

Training big models used to be reserved for OpenAI or DeepMind. Not anymore.

Zach Mueller, Technical Lead for Accelerate at Hugging Face, joins me for a practical session on what scaling actually looks like in 2025—for solo devs and small teams, not just billion-dollar labs.

We’ll dig into:

→ When (and why) scale is actually worth it
→ How distributed training works under the hood
→ How to avoid wasted compute and bloated runtimes
→ Strategies for serving models that don’t fit on a single GPU
→ Why these skills now matter even for inference workflows

Whether you’re fine-tuning a model at work or tinkering with open weights at home, this session will help you navigate the messy middle between “just use Colab” and “rent 128 H100s.”

🔗 Register for the livestream

🎓 Bonus: Zach is teaching a full 4-week course on distributed training this September. He’s offering $300 off for Vanishing Gradients readers. Sign up here to redeem the discount.

📏 July Is for Evals — Two Courses, Two Paths

They tell you 2025 is the year of AI agents — and that’s true in many ways.

But it’s also becoming the year of evaluation. The tooling is evolving fast, but the real question is: what’s working, what’s not, and how do we measure it?

I recently took Hamel Husain and Shreya Shankar's course on evaluation, and it fundamentally changed how I think about building real AI systems. It reshaped how I consult, how I teach, and how I design systems that actually ship.

Their next cohort runs in July, and they’ve kindly given my network $800 off the list price:
🔗 Join their course with this link for $800 off

If you're looking for an end-to-end view of the AI software lifecycle — testing, logging, evaluation, agents, orchestration, and getting LLMs into prod — I’m co-teaching another July course with Stefan Krawczyk.

It starts in 10 days:
🔗 Join our course on building LLM applications

Two courses. Both in July. Pick what you want to go deep on — or do both if you’re feeling ambitious. 🧠

🧭 What Comes After the LLM? With Dr Fei-Fei Li

A few weeks ago on High Signal, I spoke with Dr. Fei-Fei Li about what it really means to build human-centered AI — and where the field might be headed next.

After that conversation, Duncan Gilchrist and I decided to write down some of our takeaways. We’ve now published them in a short piece over at O’Reilly Radar.

The article digs into deeper shifts that go beyond model quality and prompt tuning:

→ Why intelligence isn’t just prediction anymore
→ How interface design is becoming core to AI development
→ What human-centered AI really requires — beyond alignment
→ Where responsibility for AI behavior should (and shouldn’t) live in the stack

📖 Read the article here: What Comes After the LLM?

📣 Call for Speakers: MLOps World | GenAI Summit 2025

MLOps World | GenAI Summit 2025 Call For Speakers

The public Call for Speakers is now open for the 5th annual MLOps World | GenAI Summit!

This is one of my favourite communities and conferences to be part of and I’m proud to be on the steering committee again this year. David Scharbach and the team always put together an incredible program focused on real-world ML, GenAI, and agent systems in production.

If you’re working on infrastructure, observability, deployment, or agents — we’d love to hear from you.

✅ Virtual and in-person talks welcome
✅ Over 1,000 AI leaders and builders expected
✅ Real-world case studies and practitioner lessons encouraged

🔗 Submit a talk here

Some of the tracks this year:

→ AI Infrastructure Strategy & Platform Engineering
→ ML Deployments on Prem
→ LLM Observability
→ ML Version Control
→ ML Lifecycle Security
→ AI Agents in Production
→ Augmenting Agentic Workforces
→ And many more...

🛠️ Best Free Resources for Building AI Agents

Whether you’re building your first agent system or trying to move beyond prompt spaghetti, this resource list can help.

Originally compiled for our LLM application course, it’s since grown into a broader resource for anyone working with AI agents. It covers agent design, planning, tool use, failure modes, and when agents might not even be the right choice.

🚀 What’s inside:

→ Best practices for designing agents
→ When to use tools, memory, and workflows
→ Common pitfalls (and how to avoid them)

📚 Featured resources include:

Building Effective Agents by Erik Schluntz & Barry Zhang (Anthropic)
Emphasizes composable patterns over agentic complexity. When to use workflows vs. agents, how to add tools/memory, and common failure modes.

Agents by Chip Huyen
Focuses on autonomy, planning, tool use, and how to improve agent reasoning.

Agents by Julia Wiesinger, Patrick Marlow, Vladimir Vuskovic (Google)
Structured breakdown of agent capabilities in production systems: logic, tool use, and evaluation.

Introducing smolagents by Aymeric Roucher, Merve Noyan, Thomas Wolf (Hugging Face)
A minimalist, code-based approach to building multi-step agentic workflows.

Beyond Prompt and Pray by Hugo Bowne-Anderson & Alan Nichol
On why prompting isn’t enough, and how evaluation and integration play a role in real-world systems.

Stop Building AI Agents by Hugo Bowne-Anderson
Five alternative patterns to agents — with code examples and decision guides for when to use each.

🎁 No paywalls, no subscriptions — just great resources to explore.

🔗 Get the list here

Salesforce / Agentforce, LLMs + friends

No alternative text description for this image

Yesterday, my pal Stefan Krawczyk took me up the Salesforce tower in SF, where he’s working on the future of AI agent infrastructure 🤖

Last time I saw Stefan in person was late last year in Austin where we taught a 4 hour workshop on LLMs for software engineers.

Since then, we’ve turned that workshop into a 4 week Maven course, taught 150+ builders from Netflix, Meta, the US Air Force, the UN, Amazon, and more.

I'm excited to teach another cohort of our course on Maven starting July 8. Stefan may even share learnings from all the agentic patterns and infrastructure he's encountering with Agentforce!

To celebrate working with Stefan again, we’re offering 10% off. Use this link (valid until EOD ET June 30).

Want to Support Vanishing Gradients?

If you’ve been enjoying Vanishing Gradients and want to support my work, here are a few ways to do so:
🧑‍🏫 Join (or share) my AI course – I’m excited to be teaching Building LLM Applications for Data Scientists and Software Engineers again—this time with sessions scheduled for both Europe and the US. If you or your team are working with LLMs and want to get hands-on, I’d love to have you. And if you know someone who might benefit, sharing it really helps.
📣 Spread the word – If you find this newsletter valuable, share it with a friend, colleague, or your team. More thoughtful readers = better conversations.
📅 Stay in the loop – Subscribe to the Vanishing Gradients calendar on lu.ma to get notified about livestreams, workshops, and events.
▶️ Subscribe to the YouTube channel – Get full episodes, livestreams, and AI deep dives. Subscribe here.
💡 Work with me – I help teams navigate AI, data, and ML strategy. If your company needs guidance, feel free to reach out by hitting reply.

Thanks for reading Vanishing Gradients! Subscribe for free to receive new posts and support my work.

Thanks for reading Vanishing Gradients! This post is public so feel free to share it.

Thanks for reading Vanishing Gradients!

If you’re enjoying it, consider sharing it, dropping a comment, or giving it a like—it helps more people find it.

Until next time ✌️

Hugo

Vanishing Gradients