Building Reliable AI: Prompt Engineering, Fine-Tuned Models, and Efficient Workflows
Reliable Agentic Bots with Llama 8B: Efficiency at Scale 🤖
Welcome back to Vanishing Gradients! I started this newsletter to track what’s happening in data science, ML, and AI—and to share what I’ve been working on. This is still an experiment, so I’d love to hear what you’d like to see more (or less!) of as we keep this going.
This week, I’m diving into prompt engineering with LLMs, discussing open-source AI with Hailey Schoelkopf, exploring agentic bots built with Llama 8B, and sharing insights from my chat with Eric Ma on how Pixi is transforming data science workflows. Let’s get into it! 🤖
Prompt Engineering, Tigers, and the Future of AI Research 🐅
In my recent live recording with Sander Schulhoff (Learn Prompting), Denis Peskoff (Princeton), and Philip Resnik (UMD), we dove deep into the nuances of prompt engineering, the security challenges in Generative AI, and the future of AI research.
A particularly intriguing part of the conversation explored how working with large language models (LLMs) is less like traditional programming and more like training an animal. The idea came up that it’s often akin to training a dog 🐕—with patience and iteration—but at times can feel more like handling a tiger 🐅, especially when addressing adversarial techniques. You can watch the full livestream here:
🎧 As it was an action-packed 2 hour chat, we’re releasing the podcast in two parts. In Part 1 of this two-part release, we explore the rise of prompt engineering, its crucial role in AI accessibility, and the evolution of NLP, from early rule-based systems to modern prompt-based techniques like those captured in the Prompt Report. You can check out the first part here or on your favourite podcast app.
In the clip below, we discuss:
🤖 LLMs ≠ Traditional programming: It’s more about guiding behavior than hard-coding responses.
🐶 Prompting as animal training: Fine-tuning LLM outputs can feel like working with a well-meaning but unpredictable dog, or occasionally, a tiger!
⚠️ Adversarial techniques: These add complexity, making prompt engineering even more challenging and strategic.
The Future of Open-Source AI with Hailey Schoelkopf (Eleuther AI)🌟
In a recent Outerbounds fireside chat, I sat down with Hailey Schoelkopf from EleutherAI to discuss the future of open-source AI and the evolving research landscape. Hailey is deeply involved in AI model evaluation and plays a key role in maintaining the LM Evaluation Harness, which is widely used across the research community.
We touched on a wide range of topics:
💡 EleutherAI’s Origin Story: How a grassroots movement became a leading nonprofit research lab advancing open-source AI.
🔒 Challenges for Nonprofits: The advantages and hurdles of being open-source in a competitive space.
🧪 AI Evaluation: Why evaluating AI models is just as important as building them, with tools like the LM Evaluation Harness.
🛠️ Customization and Local Models: The importance of using local models for privacy, control, and flexibility.
🚨 Red Teaming and AI Safety: The critical role of stress-testing models for ensuring AI safety.
🌍 Future AI Infrastructure: What’s on the horizon for open-source infrastructure like Pythia and other transparent research projects.
🎙️ Check out the full conversation where we explore these topics in depth and discuss Hailey's vision for open-source AI in the next 6-12 months:
Hailey’s so cool we also made a little teaser about everything she does and is interested in (see the clip below):
👩💻 Who she is:
Hailey works at EleutherAI, a nonprofit research lab focused on making large-scale AI, like LLMs, more open and accessible.
🛠️ What she does:
Hailey spends a lot of time maintaining the LM Evaluation Harness, a tool for evaluating AI models. She loves seeing how the open-source community and researchers build on her work to push the field forward. 💡
🌍 What she's excited about for the future:
- New ways of interacting with AI beyond just chat, including multimodal models and tools like molmo and o1.
- Defining model limitations and strengths, and expanding how we use these technologies.
- The rapid evolution of AI and the potential for breakthroughs in the next 6-12 months! ⚡
Reliable Agentic Bots with Llama 8B: Efficiency at Scale 🤖
In collaboration with Rasa, I co-authored a blog post on building reliable agentic bots using Llama 8B. This blog explores how smaller models, like Llama 8B, can be fine-tuned to match the performance of larger models (like GPT-4) in conversational AI tasks, while dramatically cutting costs and reducing latency.
Some key takeaways:
💸 Cost Efficiency: Smaller models are not only more cost-effective but also provide greater control over AI infrastructure when deployed on Hugging Face or self-hosted environments.
🛡️ Avoiding Model Deprecation: Unlike providers like OpenAI, which deprecate models, self-hosted models ensure you’re in full control of versioning and updates, avoiding costly disruptions.
🔐 Privacy and Security: Self-hosting means sensitive data stays in-house, critical for industries with strict privacy requirements.
⚡ Low Latency for Real-Time Applications: Fine-tuning smaller models reduces response times, making them ideal for voice assistants.
🚀 Fine-Tuning and Deployment: The CALM paradigm streamlines fine-tuning, making it easier to scale without sacrificing performance.
If you prefer watching to reading, you can see my collaborator and co-author Daksh and myself live code a walkthrough below!
Accelerating Science with Eric Ma and Pixi 🚀
In a recent interview, I caught up with my old pal Eric Ma from Moderna, and we discussed how Pixi has become a game-changer for his data science workflows, ensuring reproducibility and drastically speeding up development.
Here’s a short AI-generated highlight reel 😛:
Key highlights from our conversation:
🔧 Reproducibility made simple: Pixi’s lock files ensure consistent environments across platforms, eliminating hours of debugging.
⚡ Speed and efficiency: With Pixi, setup times are drastically reduced, allowing for rapid iterations in data science and machine learning workflows.
🌐 Cross-environment compatibility: Whether working on a GPU tower or running locally, Pixi ensures smooth operations with a single command.
🧑🔬 Who should try it?: Pixi is a game-changer for data scientists and tool builders looking to streamline workflows and boost productivity.
🎥 Watch the full interview to hear more about how Pixi is transforming the way we work:
And/or check out Eric’s full demo of Pixi in action:
A huge shoutout to Wolf Vollprecht and the prefix.dev team for their incredible work on Pixi and beyond! 👏
Upcoming Workshops and Conferences 🚀
I’m excited to share some updates on upcoming conferences where I’ll be teaching workshops on Generative AI and multimodal apps!
🗽 PyData NYC (Nov 6-8, New York):I’ll be leading a workshop on Building Your First Multimodal Generative AI App. This is a hands-on session where we’ll dive into the practical steps for creating AI apps that can handle multiple forms of input and output. Sign up here to join me and explore the exciting world of multimodal AI! Also check out the repo if you’d like a sneak peek of what we’ll cover (but know there’ll be much more! 🚀 )
🤖 MLOps World and Generative AI World Conference (Nov 7-8, Austin): In Austin, I’ll be teaching a workshop on Generative AI for Software Engineers. If you're a developer or engineer looking to integrate AI into your software workflows, this workshop is for you. I’m also super honoured to be on the conference steering commitee 🤗 Plus, you can get 15% off with my discount code: Use this code to register and join me at the summit!
💡 PyData Global CFP: The Call for Proposals (CFP) for PyData Global is still open for the next week! If you’re interested in speaking or getting involved, don’t miss the chance. Submit your proposal or join the community in whatever way you can!
I’d love to see you at any (or all!) of these events, so definitely check them out, use the discount code, and sign up to PyData NYC!
I’ll be announcing more livestreams, events, and podcasts soon, so subscribe to the Vanishing Gradients lu.ma calendar to stay up to date. Also subscribe to our YouTube channel, where we livestream, if that’s your thing!
That’s it for now. Please let me know what you’d like to hear more of, what you’d like to hear less of, and any other ways I can make this newsletter more relevant for you,
Hugo