AI as a Civilizational Technology

Evaluating Agents

May 22, 2025

Welcome to Vanishing Gradients!

This issue is about what it actually takes to build with AI—debugging agents, validating outputs, navigating uncertainty, and avoiding the trap of systems that only look good on paper. Whether you’re testing LLM apps, deploying open-weight models, or trying to make metrics mean something, there’s a lot in here for you.

Quick links below to what’s coming up in my data/AI life, what just dropped, and how to plug in:

📅 Live Online Events
→ [May 27] Why Data and AI Still Break — Akshay Agrawal (Founder, marimo; ex-Google Brain, Netflix, Stanford)

→ [May 27] Building Production-Grade AI Systems — Aman Gupta (MasterClass)

→ [May 29] Build GenAI Systems Fast: AI Studio, Gemini, and Gemma — Ravin Kumar (DeepMind)

→ [June 2] Why Tool Calling Breaks — Alan Nichol (Co-founder, Rasa)

→ [June 15] Evaluating AI Agents (Live Workshop) — Ravin Kumar (DeepMind)

📍 In-Person Events
→ [June 16] Berlin Meetup: Build with AI — Ines Montani (Explosion), Hugo Bowne-Anderson

→ [June 17] Berlin Meetup: Agents & Evals — Alan Nichol (Rasa), Hugo Bowne-Anderson

→ [June 24–25] VentureBeat Transform — Workshop + Panels (San Francisco)

→ [July 7–11] SciPy 2025 — Tutorial + Talk (Tacoma, WA)

🎙 Podcasts & Clips
→ [Fei-Fei Li] AI as a Civilizational Technology

→ [Eoin O’Mahony] When Non-Determinism Is a Superpower

🛠 Hands-On
→ [Gemma Livestream Recap] Building AI Agents Locally — Ravin Kumar

→ [Lightning Lesson Replay] LLMs and Low-Hanging Fruit — Nathan Danielsen (Carvana)

→ [June 15 Workshop] Evaluating AI Agents: From Demos to Dependability

💡 Get Involved
→ Sign up for the upcoming Building LLM Applications for Data Scientists and Software Engineers course (timed for EU + US)

→ Subscribe to our lu.ma calendar for livestreams and workshops

→ Subscribe on YouTube for workshops, podcast livestreams, and more!

→ I’ll be in London, Paris, Berlin, and a few other European cities—hit reply if you want to co-host a meetup or bring me in to speak with your team

📖 Reading time: 12–15 minutes

AI as a Civilizational Technology

🧠 Fei-Fei Li on AI, Institutions, and Shared Prosperity

AI is a civilizational technology… It touches on geopolitics. It touches on productivity and shared prosperity. These are bigger societal problems that have to do with human-centered AI.

Fei-Fei Li opens our most recent episode of High Signal with the stakes: not just how AI works, but what it’s doing to labor, government, and global systems. She’s best known for creating ImageNet, and now leads Stanford’s Human-Centered AI Institute. She’s also a former VP and Chief Scientist of AI/ML at Google, and currently co-founder of World Labs.

In this episode, Fei-Fei, Duncan, and I talk about:
→ What it means to build AI infrastructure for society—not just for scale

→ Why spatial intelligence could shift the foundation of AI

→ The role of public education, civic trust, and long-term thinking

🎧 Listen to the clip below and check out the full episode here

Two Upcoming Lightning Lessons + One Past — 30 Minutes, Tactical, and Free

⚡ Build GenAI Systems Fast: AI Studio, Gemini, and Gemma

Wed, May 28 with Ravin Kumar (DeepMind)

Google’s AI Studio is one of my favourite tools to explore and play around with the Google/DeepMind ecosystem of AI tools, including Gemini and Gemma.

In this lightning lesson, Ravin (who works directly on Gemini, Gemma, and AI Studio) and I will share how these tools are actually designed to be used—so you can move from playing with prompts to building full systems that ship.

You’ll learn how to:
→ Prototype full GenAI systems using AI Studio

→ Build real GenAI features: retrieval, agents, and tools

→ Run locally with Gemma or scale with Gemini

→ Start building today with reusable code templates

🎟 Register here

⚡ Why Tool Calling Breaks AI Systems—and What to Do Instead

Mon, June 2 with Alan Nichol (Co-founder and CTO, Rasa)

Tool calling is one of the worst defaults in AI system design today.

Alan has spent over a decade helping Fortune 500s build real conversational AI. In this lightning lesson, we’ll break down why most tool use patterns fall apart—and how to fix them with Process Calling: a structured way to make agents reliable, inspectable, and explainable.

You’ll learn how to:
→ Design stateful, multi-turn business logic

→ Move beyond brittle prompt chaining

→ Build agents that handle branching, memory, and follow-up

→ Reduce flakiness and accelerate iteration with modular flows
🎟 Register here

⚡ LLMs and Low-Hanging Fruit: Finding GenAI Value Fast

With Nathan Danielsen (Builder of Great Products and Engineering Teams, Carvana)

Nathan and I recently gave a lightning lesson on all the low-hanging fruit in your organization that is already ripe for GenAI use cases. In it, you’ll learn

→ How to build GenAI apps with your existing internal data

→ A framework for finding GenAI wins in your org

→ How to go from idea to prototype fast—no new stack needed
You can check it out here.

Building AI Agents with Gemma 3🤖

Livestream with Ravin Kumar: LLM Agents, Tool Calling, and Real-World Debugging

Ravin Kumar (DeepMind) and I ran a 2+ hour workshop on building AI agents locally with Gemma 3. Over 200 people joined live—and we pushed things further by building an MCP server/client setup from scratch.

This wasn’t a polished demo. The code broke. The tools failed. But that made it real—and way more useful. We debugged in the open and showed what it actually looks like to build with open weights, live.
Some of what we covered:
→ Building a local LLM app and exploring Gemma models

→ Logging, observability, and debugging with real tools

→ Tool calling vs. agents—and why the distinction matters

→ MCP architecture, client/server setup, and prompt iteration

🧵 Watch the full workshop + access the repo + join the Discord

Evaluating AI Agents: From Demos to Dependability

🧪 Upcoming Workshop — June 15 with Ravin Kumar

I’m very excited for our 3rd live, online, and free workshop with Ravin Kumar (Deepmind). Most AI agent demos look impressive—until they break in practice. In this live, hands-on session, we’ll focus on what it takes to make agents dependable.

You’ll learn how to:
→ Trace tool use and model reasoning

→ Simulate real interactions and edge cases

→ Define what success actually means

→ Catch silent failures and iterate effectively

We’ll build a lightweight agent that can:
→ Query a SQL database

→ Run Python-based data analysis

→ Generate basic visualizations

And you’ll evaluate whether it:
→ Chose the right tool→ Executed the right logic→ Explained the result correctly

All running locally using Gemma 3 models and Ollama—no frameworks, no cloud dependencies.
This is the third session in a series focused on building real systems:
1️⃣ Local LLM apps + evaluation harnesses

2️⃣ Agents with tool use + dynamic behavior

3️⃣ Now: making those agents reliable and testable
🎥 Register for free for the June 15 livestream

🎙 When Non-Determinism Is a Superpower

Most teams try to make LLMs behave like deterministic APIs. But what if unpredictability is the point?

In this clip from High Signal, Eoin O’Mahony (ex-Uber) shares a moment that reshaped how I think about agent behavior:

If you gave it to 10 analysts and had them spend two weeks on each of it... do you expect them all to come up with the same answer?

We talk about why determinism isn’t always the goal—especially in agentic systems doing exploratory analysis, decision support, or reasoning under uncertainty.

The full episode touches on:
→ Analyst agents and diverse perspectives

→ When randomness reveals real signal

→ Operationalizing ML in complex systems

→ Why early experiments need impact, not just p-values

→ How network effects can flip your metrics

🎧 Watch the clip above and listen to the full episode here.

Live Online Events

📺 Why Data and AI Still Break at Scale (and What to Do About It)

Tues, May 27 with Akshay Agrawal (Founder of marimo; ex-Google Brain, Netflix, Stanford)
Why does so much AI work fall apart when it leaves the laptop? We’ll talk about reproducibility, research–prod gaps, and the hidden cost of bad tooling.
🎟 Register here

📺 Building Production-Grade AI Systems at MasterClass

Tue, May 27 with Aman Gupta (MasterClass)
A behind-the-scenes look at how MasterClass built LLM systems for real products—without relying on off-the-shelf APIs.We’ll cover infra, post-training, evaluation, and the realities of shipping AI in production.
🎟 Register here

📺 Building Reliable Agents with Open-Weight Models

Sun, June 15 with Ravin Kumar (DeepMind)
This live session will focus on debugging, evaluation, and making agents that actually work—built entirely with open-weight models and local tools.
🎟 Register here

In-Person Events📍

Build with AI — Berlin Meetup

Monday, June 16 · 6:00–9:00 PM GMT+2

🎤 Hosted by Explosion AI, Native Instruments, and Vanishing Gradients

🍕 Food & drinks provided
Two short talks, lightning demos, and time to connect with the Berlin AI/dev community.
Talks:
🧾 Conquering PDFs: Document Understanding Beyond Plain Text

Ines Montani — spaCy / Explosion

From messy formats to structured data using spaCy, Docling, OCR, and layout models.

🧪 Evaluation-Driven Development & Synthetic Data Flywheels

Hugo Bowne-Anderson — Vanishing Gradients

How to catch failures before users do—via synthetic data, eval harnesses, and feedback loops.
🎟 Register here.

📍 Agents & Evals — Berlin Meetup

Tuesday, June 17 · 6:00–9:00 PM GMT+2

🎤 Hosted by Vanishing Gradients and Rasa

🍻 Snacks and conversation included
Two short talks and an open-floor session on building LLM systems that actually work.
Talks:
🛠 Why Tool Calling Breaks Your AI Agents—and What to Do Instead

Alan Nichol — Co-founder & CTO, Rasa

How Process Calling helps agents handle memory, branching, and control.

🧪 Escaping POC Purgatory with Evaluation-Driven Development

Hugo Bowne-Anderson — Vanishing Gradients

Lessons from teams that escaped demo limbo and built testable, durable systems.
🎟 Register here.

📍 VentureBeat Transform — San Francisco, June 24–25

I’ll be at VentureBeat Transform this year—giving a workshop on building AI agents and hosting a few panels. Details still to come, but if you’ll be in SF, would love to see you there.

📍 SciPy 2025 — Tutorial + TalkJuly 7–11 · Tacoma, WA

I’ll be presenting a workshop and a talk at SciPy, 2025, and I hope to see you there!
🛠 Tutorial: Building LLM-Powered Applications for Data Scientists and Software Engineers (July 7–8)

🧪 Talk: Escaping Proof-of-Concept Purgatory: Building Robust LLM-Powered Applications (July 9–11)

I’ll also be in London, Paris, Berlin, and a couple of other European cities over the next month so if you’d like to do a meetup, have me give a talk at your company, or just chat with your team about what you’re building, hit reply. I’d love to hear what you’re working on.

Want to Support Vanishing Gradients?

If you’ve been enjoying Vanishing Gradients and want to support my work, here are a few ways to do so:
🧑‍🏫 Join (or share) my AI course – I’m excited to be teaching Building LLM Applications for Data Scientists and Software Engineers again—this time with sessions scheduled for both Europe and the US. If you or your team are working with LLMs and want to get hands-on, I’d love to have you. And if you know someone who might benefit, sharing it really helps.
📣 Spread the word – If you find this newsletter valuable, share it with a friend, colleague, or your team. More thoughtful readers = better conversations.
📅 Stay in the loop – Subscribe to the Vanishing Gradients calendar on lu.ma to get notified about livestreams, workshops, and events.
▶️ Subscribe to the YouTube channel – Get full episodes, livestreams, and AI deep dives. Subscribe here.
💡 Work with me – I help teams navigate AI, data, and ML strategy. If your company needs guidance, feel free to reach out by hitting reply.

Thanks for reading Vanishing Gradients!

If you’re enjoying it, consider sharing it, dropping a comment, or giving it a like—it helps more people find it.

Until next time ✌️

Hugo

Vanishing Gradients

AI as a Civilizational Technology

Evaluating Agents

AI as a Civilizational Technology

Two Upcoming Lightning Lessons + One Past — 30 Minutes, Tactical, and Free

⚡ Build GenAI Systems Fast: AI Studio, Gemini, and Gemma

⚡ Why Tool Calling Breaks AI Systems—and What to Do Instead

⚡ LLMs and Low-Hanging Fruit: Finding GenAI Value Fast

Building AI Agents with Gemma 3🤖

Evaluating AI Agents: From Demos to Dependability

🎙 When Non-Determinism Is a Superpower

Live Online Events

In-Person Events📍

Build with AI — Berlin Meetup

📍 Agents & Evals — Berlin Meetup

📍 VentureBeat Transform — San Francisco, June 24–25

📍 SciPy 2025 — Tutorial + TalkJuly 7–11 · Tacoma, WA

Want to Support Vanishing Gradients?

Discussion about this post