15 Privacy Questions Every AI Builder is Asking
Katharine Jarmul on system prompts, agent harnesses, and why observability isn't enough
Katharine Jarmul, Privacy in ML/AI Expert & Author of Practical Data Privacy, recently joined me from Berlin to discuss what technical privacy actually looks like when you’re shipping LLMs, agents, and multimodal systems into the real world, why system prompts (and your entire agent harness!) should be considered public by default, and why “privacy observability” is as critical as data observability for anyone building with LLMs today.
Many questions came up that were similar to the questions Katharine hears from builders so we thought to share the answers more broadly with the community. Feel free to read it in one go and/or use it as a resource to come back to. And let us know any other questions you have about privacy in the comments!
Each question below starts with a short answer and then a more practical explanation, linking to relevant resources and time-stamps in the podcast.
The top 15 questions were:
Should we be concerned about information and our agent harness being exfiltrated or becoming public?
I’d like to build an AI system that’s safe and compliant. How do I get started?
What are the biggest privacy concerns builders should be aware of?
How do you mature your AI privacy engineering to address diverse international privacy regulations?
What are the top 3 things builders can do to incorporate more privacy into their AI systems?
I also highly recommend checking out Katharine’s Probably Private YouTube channel and newsletter for the highest signal builder-focused resources on Privacy and Security in AI Systems.
Also join us on May 8 for our first episode of Show Us Your (Agent) Skills, where Thomas Wiecki (PyMC Labs) and I are on a mission to find out what experts such as Wes McKinney (creator of pandas), Jeremiah Lowin (Prefect), and Hilary Mason (HiddenDoor) are building with AI agents and how they’re doing it. Register to join live or get the recording afterwards. And check out the trailer here (volume up):
Let’s now dive in!
1. What is privacy?
Because privacy is subjective and contextual, it is best understood through three overlapping lenses: legal, social / cultural, and technical definitions.
The subjectivity of privacy. Privacy does not have a single, universal definition. If you ask different people what privacy means, you will get vastly different answers depending on their background, their location, and their profession. Katharine breaks these perspectives down into three primary categories that inform how systems are ultimately built.
Legal definitions. This is the regulatory layer. It encompasses frameworks like GDPR in Europe, HIPAA for healthcare in the US, various California privacy laws (like CCPA), and newer privacy regulations in countries like China and India. These legal definitions often vary wildly from region to region because they are informed by norms and politics from the nation-states they govern.
Social and cultural definitions. The legal landscape is deeply informed by how societies relate to privacy. Your understanding of what should be kept private is shaped by how and where you grew up. It varies by language, by city, by subgroup, and even by income level. These cultural expectations dictate what a specific population accepts as normal versus what feels like an intrusion. This can obviously vary even significantly within a country based on experience and perceived danger or risks of relaying additional information.
Technical definitions. This is the realm where builders operate. Technical privacy takes the more abstract definitions within law and the nuanced expectations of culture and attempts to properly translate them into systems that computers can execute.
“How do we take these definitions that are maybe cultural or legal or some overlap of both, and how do we actually define them in some sort of either mathematical, statistical or technical way so that we can actually implement them in systems.” — Katharine Jarmul [00:00:05]
In practice
Don’t rely on a single definition. When discussing privacy with stakeholders, clarify whether you are talking about legal compliance, user expectations, or technical architecture.
Expect regional variation. Because laws are downstream of culture, launching a global product means your technical privacy controls will likely need to vary region by region.
Bottom line. Privacy is not a static binary, but a combination of cultural expectations and legal requirements that engineers must translate into mathematical and architectural realities.
See also:
2. What is technical privacy?
Technical privacy is the discipline of privacy engineering—using math, statistics, and system design to avoid the common patterns that lead to legal violations or cause users to feel violated.
Avoiding the “icky” feeling. At its core, technical privacy is about user experience and trust. Katharine points out that most internet users have experienced a moment where they accidentally overshared or felt a system wasn’t transparent, leaving them with an “icky” or violated feeling. Privacy engineering aims to design software patterns that prevent those mistakes from happening in the first place.
Proactive engineering over reactive policy. Technical privacy goes beyond just writing terms of service. It involves actively engineering testing, red teaming, and system guardrails directly into the software lifecycle. This proactive approach helps organizations avoid both rigid legal problems and the softer, but equally damaging, loss of user trust.
“The field of privacy engineering, which I would call myself part of, is essentially how can we build engineering systems so that we try to avoid common design patterns that lead to those mistakes? How do we actually engineer testing and red teaming and these types of things into systems so that we can avoid both legal problems, but also these icky feelings?” — Katharine Jarmul [00:09:05]
Math and measurement. On the deepest level, technical privacy involves rigorous statistical and mathematical concepts. In the context of machine learning, there are ways to technically measure the privacy of a deep learning process or a resulting model. This allows engineers to reason about individual privacy mathematically, proving how much or how little information a system might leak.
In practice
Measure privacy quantitatively. Move beyond policy checklists and look into statistical ways of measuring privacy risk in your models and data flows.
Test for the “ick” factor. Incorporate privacy violations into your red teaming and system evaluations, testing not just for hard data leaks but for creepy or non-transparent user experiences.
Bottom line. Technical privacy is the concrete engineering work—from architectural design patterns to statistical measurement—that makes theoretical privacy rules functional in software.
See also:
3. What is privacy in AI?
Privacy in AI involves managing the information asymmetry between users and model providers, and deciding what information can be shared or hidden under what circumstances.
The information-inference tradeoff. AI, machine learning, and deep learning are fundamentally information-based systems. The larger these systems get, the more direct the link between information and privacy becomes. If you feed an AI all of your personal data, it might make a highly accurate inference about you—but it also might make a highly biased one. Privacy in AI is about giving users the transparency and agency to choose how the model makes inferences about them.
The asymmetry of information. There is a massive imbalance between what tech giants know about users and what users know about tech giants. Large organizations harvest incredible amounts of personal data to build their AI products, yet remain deeply opaque themselves. AI privacy pushes back against this asymmetry by demanding transparency about what aspects of a user’s profile, region, or device are being used to alter algorithmic outputs, such as pricing or recommendations.
“I can give you all of my information, you might make a better inference based on that information, but you also might make a more biased inference about me... I should have the right and the transparency to tell you how I want you to see me to some degree.” — Katharine Jarmul [00:09:52]
The risk of memorization and multimodal data. Because modern AI ingests murky data of unknown origins, there are unique risks around “artifacts of memorization.” Models can inadvertently regurgitate exact training data. Furthermore, multimodal systems introduce entirely new privacy threats. Katharine and Hugo highlight the case of OpenAI’s voice model sounding remarkably like Scarlett Johansson, and how visual language models (VLMs) can now re-identify patients using retina shapes across different medical facilities.
In practice
Design for transparency. Build interfaces that show users exactly what data is being used to generate an AI’s response or decision.
Give users control over their profile. Allow users to dictate what pieces of their information are visible to the model and what should be hidden, acknowledging the tension between convenience and privacy.
Beware of memorization. Understand that large deep learning models memorize training data; do not assume that PII scraped from the web is safely obfuscated inside model weights.
Bottom line. Privacy in AI is about rebalancing the power dynamic between the model and the user by enforcing transparency, controlling data flows, and defending against the unique memorization risks of deep learning.
See also:
4. Should we be concerned about information and our agent harness being exfiltrated or becoming public?
Yes. You must operate under the assumption that your entire agent harness (system prompts, retrieved context, and attached memories) can and will be reverse-engineered by users.
System prompts are not private. Builders often put sensitive business logic, API keys, or proprietary rules into system prompts, assuming they are hidden from the end user. This is a critical mistake. There are vast repositories of leaked system prompts online. Exfiltration attacks are incredibly common, and users consistently find ways to trick models into revealing their core instructions.
“System prompts are not private. It’s been proven many times... anything that you write in your system prompt, you should be comfortable writing on your public website.” — Katharine Jarmul [00:42:22]
The rest of your agent harness is equally vulnerable. The risk goes far beyond the system prompt. If you are feeding a large vector database, contextual memories, or RAG (Retrieval-Augmented Generation) inputs into an LLM, that data is at risk. Models use information to produce information. By interacting with the system and systematically altering inputs, an attacker can observe the differences in the outputs and deduce what the model knows, what it doesn’t know, and what is hiding in its memory.
“If we can test the system and we can change the inputs, we can start to observe differences in outputs... we can start to produce even more advanced attacks where I can really start to tell: This is what the model was trained on. These are the things the model knows.” — Katharine Jarmul [00:45:07]
The threat of exact extraction. Researchers like Nicholas Carlini have repeatedly proven that language models can be forced to regurgitate their training data. Carlini’s work has demonstrated LLMs successfully re-identifying hundreds of pieces of text to their correct authors. This threat also extends to multimodal models, where VLMs have been used to extract and re-identify sensitive medical images.
In practice
Assume public visibility. Never put anything in a system prompt, a connected database, or a RAG context window that you would not publish openly on your corporate website.
Limit contextual payloads. Only retrieve and feed the absolute minimum data required to answer a user’s prompt. Do not blindly pass entire customer profiles into the context window.
Red team your context. Actively try to trick your own system into revealing its hidden instructions and connected data sources before you ship.
Bottom line. If an AI system relies on hidden contextual data to generate answers, attackers will eventually use those answers to reverse-engineer the hidden data.
See also:
5. I’d like to build an AI system that’s safe and compliant. How do I get started?
Start by mapping your data flows and establishing basic privacy observability before you attempt advanced cryptographic or architectural solutions.
Map the data first. If you do not know where your sensitive data is stored, where it comes from, what it is being used for, and where it goes, you cannot secure it. The first step to building a compliant AI system is establishing “privacy observability.” Just as you build platform or data observability, you must map the flows of personal data across your multi-regional or federated setups to ensure legal sign-off.
Start small with evaluations. You do not need best-in-class, mathematically perfect privacy from day one. Start with simple evaluations (evals) in your CI/CD pipeline. For example, if you have a chatbot taking fast-food orders, users will inevitably type in their names, addresses, and phone numbers. Build a simple test that ensures your system can successfully identify and remove or encrypt that PII from the logs before the traces are stored for internal analysis.
“A simple test could be, can we take something like this text? Can we remove this information or encrypt parts of this information so that we can only decrypt it should we need it? ... That could be how small your evals start.” — Katharine Jarmul [00:48:30]
Involve the team through red teaming. If you want to build a culture of compliance, host a hack day. Do not isolate the responsibility to an overworked security team or a single lawyer. Give builders the goal of attacking their own system. Let them find vulnerabilities, break the prompt guardrails, and exfiltrate data. When the whole team discovers what is broken, it becomes much easier to prioritize the critical fixes in the next sprint.
Grow privacy champions. In smaller companies without dedicated privacy teams, compliance often becomes a hot potato where “everybody is responsible,” meaning nobody actually does it. To counter this, identify engineers who are interested in the topic and give them the training, responsibility, and career growth to become “privacy champions.”
In practice
Audit your traces. Look at your existing application logs and traces. Finding data you wish you hadn’t seen (like user PII in chat logs) is the best starting point for what needs fixing.
Implement privacy observability. Slap privacy tracking onto your existing data observability stack so you know exactly what is flowing to third-party LLM vendors.
Host an internal hackathon. Buy some pizza and challenge your engineering team to break the privacy of your AI products in a safe environment.
Bottom line. Safe and compliant AI does not start with advanced math; it starts with clear data governance, simple automated PII redaction, and a team culture that actively tests for vulnerabilities.
See also:
6. What are the biggest privacy concerns builders should be aware of?
The primary concerns are murky data origins, a lack of purpose limitation, and failing to handle sensitive user data with transparent care.
Purpose limitation. A core principle of data protection (particularly in GDPR) is purpose limitation: you must only use data for the specific reason it was collected. In AI, this is frequently violated. Data collected for hospital analytics or basic lab sharing is often swept up by data scientists to train deep learning models. If data is not properly marked with its original intent, builders inherit massive legal and ethical risks simply by using the datasets they stumble across internally.
“Were we training a deep learning model to recognize tumors? Were we sharing just blood lab results with another partner lab? What were we gonna do? And then are we properly marking that data when we save it... so that people like me... don’t just come across and say, oh, here’s some data. I guess I can use it.” — Katharine Jarmul [00:15:41]
Murky data origins and memorization. When training large deep learning systems, builders often ingest massive datasets without knowing where all the records originated. This creates murky and unknown data quality and consent. Combined with the phenomenon of model memorization—where models spit out exact training data—this murky provenance can lead to severe privacy breaches, such as regurgitating a specific person’s private information or copyrighted material.
Handling sensitive interfaces with care. As AI moves into domains like healthcare, builders must navigate how algorithms interface with highly sensitive scenarios. This includes not just the privacy of the data going into the system, but how the outputs are presented. For example, failing to communicate algorithmic uncertainty to a doctor or a nurse can lead to poor decision-making based on opaque, probabilistic outputs.
In practice
Tag data with its purpose. Ensure your data governance pipelines explicitly tag datasets with the consent and purpose for which they were gathered.
Audit your training data. Do not let data science teams use internal data simply because they have access to it; verify the provenance first.
Design for uncertainty. When building high-stakes AI (like in medicine), ensure the UI clearly communicates the model’s confidence levels and limitations to the end-user.
Bottom line. Builders must treat data not as an infinite, free resource, but as a restricted asset governed by strict rules about why it was collected and how it can be legally utilized.
See also:
7. What types of privacy controls can people implement?
Privacy controls operate on a spectrum, ranging from simple text redaction and architectural routing to advanced algorithmic guardrails and encrypted computation.
Basic controls: Sanitization and Pseudonymization. The first layer of defense involves stripping sensitive information out of inputs and databases. This includes input sanitization (searching a user’s prompt for PII and removing it before it hits the model) and pseudonymization (replacing identifying data with masks, hashes, or encrypted values that can only be decrypted by authorized personnel).
Architectural controls: Privacy Routing. You can use infrastructure design to protect data. “Privacy routing” involves dynamically deciding which model handles which request based on data sensitivity. A standard query might be sent to a third-party AI vendor (like OpenAI or Anthropic), while a query containing corporate secrets or patient data is automatically routed to a locally hosted, open-weights model to ensure the data never leaves your infrastructure.
Guardrails: Three layers of defense. Katharine outlines a taxonomy of three distinct types of guardrails:
External deterministic: Fast, software-based input/output filters using hash trees or regex to block known bad inputs or copyrighted text.
External algorithmic: A secondary, usually smaller, classification model (like Llama Guard) that sits outside the main LLM to judge if a prompt is safe or if an output has leaked sensitive data.
Alignment training: Tuning the core model itself via RLHF or fine-tuning so its baseline behavior refuses dangerous or privacy-violating requests.
“These are things like input filters... done by essentially having some sort of hash tree structure that matches known copyright training data... And then just blocking those things and returning a different response.” — Katharine Jarmul [00:27:05]
Advanced controls: Anonymization and Cryptography. Anonymizing data is notoriously difficult. Katharine notes that removing a face from Google Street View doesn’t anonymize the photo if the person’s exact location and outfit are visible. To achieve true anonymization, engineers use techniques like differential privacy. For multi-organizational collaboration, techniques like federated learning allow models to be trained locally without moving the underlying data, often combined with encrypted computation so that the model updates are kept secret.
In practice
Implement dynamic routing. Set up an API gateway that scans prompts for sensitive data and routes risky queries to local models instead of third-party APIs.
Layer your guardrails. Don’t rely solely on model alignment. Use fast, deterministic software filters first, back them up with an algorithmic classifier, and let the LLM handle the rest.
Minimize data. Always ask: what is the absolute minimum amount of information required to complete this task? Drop everything else.
Bottom line. Privacy controls are not an “on/off” switch, but a layered defense strategy that combines data scrubbing, smart infrastructure routing, and multi-tiered model guardrails.
See also:
Privacy Observability and Routing for LLM Prompts (Lightning Lesson)
Learn more advanced techniques in Katharine’s book Practical Data Privacy
Guardrails explainers: Algorithmic-based and software-based
8. What are guardrails and how should I use them?
Guardrails are interventions that guide or restrict AI inputs and outputs. These can be deterministic software filters, external classifier models, and internal alignment training.
The three categories of guardrails. The term “guardrail” is frequently overloaded. Katharine categorizes them into three distinct architectural layers to disambiguate what is actually happening under the hood. The first layer is external and deterministic. These are fast, hard-coded software filters that sit outside the model, relying on memory structures like hash trees to catch known issues—such as blocking outputs that perfectly match copyrighted training data.
External algorithmic guardrails. The second layer involves using entirely separate models to evaluate the inputs or outputs of your primary AI system. These are essentially classification algorithms acting as referees. Instead of checking against a static list, they dynamically analyze the semantic intent.
“And that model is deciding is it safe to answer this prompt or that model is deciding, did we accidentally leak the prompt? Or that model is deciding, did the output of the model say something dangerous?” — Katharine Jarmul [00:27:56]
Internal alignment training. The final layer resides within the primary model itself. This encompasses the reinforcement learning (RLHF), fine-tuning, and policy-based tuning performed before the model is deployed. This training dictates how the model inherently predicts and outputs answers, ensuring it will decline to respond to dangerous inputs or provides safer answers without needing an external filter to intervene.
Balancing latency, utility, and safety. Implementing guardrails isn’t a zero-cost decision. Every guardrail you add introduces a new step in your architecture. Builders must weigh the benefits of security and privacy against the potential degradation of the user experience. You need business buy-in because you are actively choosing to alter the product’s performance profile.
“...guardrails are gonna introduce one more bump in the line, which may or may not be perceived latency or perceived filter, censorship of the model.... So you have to have that business buy-in and you have to know how do you want the system to work.” — Katharine Jarmul [00:30:53]
In practice
Start with external deterministic filters: If you have fast microservices or API gateways, use them to filter known harmful inputs or PII before they ever reach the model.
Utilize open-weight algorithmic models: If you are building your own guardrails, look at Meta’s Llama Guard. It provides a strong, continuously updated foundation for evaluating prompt and response safety.
Leverage vendor defaults: If you lack the infrastructure to build custom filters, utilize the algorithmic and deterministic guardrails provided natively in vendor consoles (like AWS or OpenAI).
Bottom line. Choose your guardrails based on your tolerance for latency and the specific risks of your use case, layering fast deterministic filters with smarter algorithmic models where necessary.
See also:
Guardrails explainers: Algorithmic-based and software-based
9. What are the most common tools used in data privacy, and what would you suggest people use to get started?
The privacy tooling ecosystem is continuously evolving, offering basic controls like built-in database masking and open-source text redactors to advanced guardrail classifiers and encryption libraries.
Start with your existing database. You likely already have access to powerful privacy tools without installing anything new. Many modern databases offer built-in features for dynamic data masking, on-the-fly hashing, and pseudonymization. This allows an engineer to run a query that automatically redacts parts of a phone number or ID before it is ever exposed to the application layer.
Open-source redaction libraries. For unstructured text and images, Microsoft Presidio is a highly recommended open-source library. It utilizes NLP frameworks like spaCy to detect and redact entities (like names, addresses, and credit cards) from text on the fly. Presidio also includes OCR (Optical Character Recognition) capabilities to detect and block sensitive information found within images, such as photos of driver’s licenses or passports.
Algorithmic guardrail models. If you want to evaluate whether prompts or outputs violate privacy or safety policies, look into open-weights classification models. Katharine specifically recommends Meta’s Llama Guard suite. These models are continuously updated based on real-world security threats and can act as an algorithmic filter between the user and your main LLM.
“If you’re just getting started with guardrails, I really recommend taking a look at Llama Guard... is definitely informed by their own security practices and the own dangers that they see on their platforms and is a great open weight model resource to at least to get started.” — Katharine Jarmul [00:28:18]
Advanced cryptographic tools. For teams with mature data products, the tooling extends into complex mathematics. This includes libraries for differential privacy, which inject statistical noise into datasets so individual records cannot be identified. At the cutting edge, there is homomorphic encryption—which is already running quietly on modern iOS devices—allowing systems to literally perform mathematical operations on data while it remains fully encrypted.
In practice
Check your database docs. Before building a custom PII scrubber, check if your data warehouse already supports dynamic row-level security or data masking.
Deploy Presidio in your pipeline. Add Microsoft Presidio as a microservice step before your LLM call to strip PII from user prompts automatically.
Experiment with Llama Guard. Download a small algorithmic guardrail model and run your application traces through it to see what privacy violations you are currently missing.
Bottom line. You do not have to invent privacy from scratch; leverage your database’s native features, deploy proven open-source redaction libraries, and experiment with off-the-shelf guardrail models.
See also:
10. How do I know if my AI privacy engineering is working?
You know it is working when you have functioning reporting patterns, active observability that captures and traces real usage, and a culture that routinely uncovers and prioritizes vulnerabilities.
The illusion of zero incidents. One of the biggest fallacies in engineering is assuming that a quiet system is a secure system. If no one is flagging privacy concerns or data leaks, it does not mean your AI system is flawless—it almost universally means your detection and reporting mechanisms are nonfunctional. Success in privacy engineering looks like actively discovering problems, not burying them.
“...like we say in security, if you’ve never had a privacy incident reported or you never had a security incident reported, it doesn’t mean it didn’t happen. It just means your reporting is broken.” — Katharine Jarmul [00:38:13]
Traces are your early warning system. You cannot manage what you cannot see. As you build out evaluations and monitor your AI product to improve the user experience, you will naturally start collecting traces of user inputs and model outputs. You know your privacy awareness is maturing when you look at those traces and realize you are capturing sensitive data that shouldn’t be sitting exposed in your logs.
Distributed responsibility fails. When companies say “everyone is responsible for privacy,” it usually results in no one actually owning the risk or the implementation. To make privacy engineering work, incentives and ownership must be clear. This is why specialized roles or dedicated programs are highly effective in ensuring privacy isn’t just an afterthought.
“...a lot of organizations like to say everybody’s responsible. What happens when everybody’s responsible? [...] there may or may not be legal implications of being the responsible party.” — Katharine Jarmul [00:35:56]
The Privacy Champion model. At ThoughtWorks, Katharine helped scale a “Privacy Championship” program across thousands of engineers. By identifying individuals who were naturally interested in security and privacy, giving them specialized training, and embedding them back into higher risk engagement teams, the organization created a distributed but accountable network. This builds organizational maturity and ensures real-world privacy problems are surfaced and solved at the team level.

In practice
Audit your traces: Actively monitor data flowing through your AI system. If you find yourself thinking “I wish I hadn’t seen that data,” you have found your starting point for privacy engineering.
Establish clear ownership: Do not rely on collective responsibility. Designate specific individuals or create a “Privacy Champion” program to ensure someone actively advocates for data protection on every AI project.
Talk to your DPO: If you are a smaller company launching in regions like Europe, utilize your Data Protection Officer (DPO) to review your trace data and establish basic, legally compliant strategies before scaling.
Bottom line. Your privacy engineering is working when your team is actively finding, reporting, and fixing data vulnerabilities, rather than assuming silence means safety.
See also:
11. Can we evaluate, test, or observe privacy in an AI system?
Yes, though the industry is still defining best practices, you can build privacy evaluations by starting with simple input-output tests, implementing privacy observability, and testing algorithmic guardrails.
The wild west of AI privacy testing. Unlike traditional software testing, privacy in generative AI and complex multimodal systems does not have a standardized playbook yet. Determining exactly what information might leak through a complex data flow—especially when documents, images, and user prompts are intertwined—is an evolving discipline.
“The good news is there’s no best practices yet... how do we build evals for privacy? How do we build testing for privacy and how do we build appropriate auditing and observability, especially in what we talk about these more complicated flows... we’re still really defining best practices for that as a field.” — Katharine Jarmul [00:46:40]
Starting small with data minimization evals. You don’t need a massive, complex framework to begin. Katharine uses the example of a fast-food ordering bot (like the “Chipotle bot”). If a user inputs their name, address, and phone number, a foundational privacy evaluation would simply test whether your pipeline can detect, encrypt, or remove that PII before the log is saved or used for future model evaluations or performance testing.
“Now that might be information that you need to hold for a short period of time, and then you probably wanna delete that and you certainly wanna delete it if you’re gonna start using it for your own testing or infrastructure... A simple test could be, can we take something like this text? Can we remove this information or encrypt parts of this information so that we can only decrypt it should we need it?” — Katharine Jarmul [00:48:30]

Evaluating algorithmic guardrails. If you introduce external algorithmic guardrails to protect privacy, you evaluate them just as you would any other machine learning classifier. You feed the system known edge cases, test the precision and recall of its privacy flags, and integrate those tests directly into your CI/CD pipeline so you have a baseline of safety before launching to users.
Moving toward Privacy Observability. As systems grow, standard data observability isn’t enough. You need “privacy observability”—the ability to sample data flows and explicitly monitor for privacy or security violations. By capturing the errors and leaks you observe in production, you can iteratively build more sophisticated, product-specific evaluations based on actual user behavior.
In practice
Build CI/CD tests for PII: Create automated tests that inject dummy PII into your system prompts to verify that your redaction or encryption logic successfully catches and masks the data.
Sample your flows: Implement observability tools that allow you to randomly sample inputs and outputs specifically to audit for unexpected personal data leakage.
Iterate on production errors: Use the privacy failures you discover in real-world traces as the direct baseline for your next batch of automated evaluations.
Bottom line. Privacy evaluation in AI is an emerging field, but you can build a robust foundation by starting with simple PII redaction tests and continuously evolving them based on real production traces.
See also:
Mastering LLM Application Testing (Lightning Lesson) with Hugo Bowne-Anderson & Stefan Krawczyk
Evals Skills for AI Agents by Hamel Husain
Interested in following the latest developments? Subscribe to the Probably Private Newsletter
12. How do you mature your AI privacy engineering to address diverse international privacy regulations?
Maturation involves mapping data flows, adapting technical controls to regional legal requirements, and advancing from basic pseudonymization to advanced privacy techniques over time.
Understand your data origins and flows. Before you can write a single line of privacy-preserving code, you must achieve basic data governance. You need to know where sensitive data comes from, where it is stored, and exactly what purpose it was collected for. In regions governed by GDPR, “purpose limitation” is a strict legal requirement, meaning data collected for hospital analytics cannot arbitrarily be thrown into a deep learning model to recognize tumors.
Adapt controls to regional jurisdictions. As you scale globally, a one-size-fits-all approach to privacy controls breaks down. What qualifies as sufficient protection in one region might fail in another. Building privacy observability helps you deploy to multi-region architectures by clarifying what interventions are needed where.
“...what you’ll find out is that the idea of controls or interventions for privacy will differ. Region to region. And so you’re gonna have situations where basic controls work. So this is true for HIPAA Safe Harbor, which basically means things like pseudonymization...” — Katharine Jarmul [00:19:01]
Progressing from basic to advanced controls. Maturity happens in stages. You start with basic data minimization: input sanitization, redaction, and pseudonymization (masking or hashing parts of an ID or phone number). As your system complexity grows, you move toward much harder technical and mathematical challenges, such as anonymization. However, anonymization is highly context-dependent; removing a face from Google Street View doesn’t anonymize the subject if you know where they live and what clothes they wear.
Privacy routing and cryptographic computation. At the highest levels of maturity, organizations use advanced architecture to protect data. This could include “privacy routing”—dynamically deciding whether a user’s prompt can safely be sent to a third-party LLM or if it must be routed to a local, self-hosted model to protect corporate secrets. It also includes today’s leading advanced privacy technologies, like doing computation directly on encrypted data.
“If you have an iOS that came out within the past two years, you also have homomorphic encryption running on your iOS. What does that mean? It means we can do math on encrypted data.” — Katharine Jarmul [00:53:18]
In practice
Tag data with its purpose: Ensure your data pipelines document data governance information, such as why data was collected so machine learning engineers don’t accidentally violate purpose limitations by using it for unauthorized training.
Implement Pseudonymization for quick wins: Use open-source libraries like Microsoft Presidio (which leverages spaCy) to identify and redact most of the sensitive entities in text on the fly or leverage internal pseudonymization tools from data sources directly.
Build local-first fallbacks: If you handle highly sensitive or confidential data, consider serving local or internal models rather than third-party APIs.
Bottom line. Maturing your privacy engineering means moving from merely tracking where your data lives to implementing dynamic, region-specific technical controls and advanced privacy technologies..
See also:
13. What is red teaming and how important is it?
Red teaming is the practice of attacking your own systems to uncover hidden vulnerabilities, serving as both a crucial security diagnostic and a powerful tool for building organizational maturity.
Thinking like an attacker. You cannot defend an AI system solely by hoping your users act benevolently. Red teaming forces your organization to adopt an adversarial mindset. By intentionally trying to break your own guardrails, exfiltrate data, or force the model into unintended outputs, you discover edge cases that standard evaluations and testing miss.
“Your goal in red teaming is to attack yourself and it can sound a bit weird because why would you try to attack yourself, but it’s a really good exercise both in practicing thinking like an attacker... but also in, growing, again, that maturity at the organization...” — Katharine Jarmul [00:39:00]
Democratizing the attack. Red teaming shouldn’t be isolated to a siloed, overburdened security team. Katharine highly recommends organizing internal “hack days.” When you involve developers, data scientists, and product managers in the process of attacking the system, you generate wildly creative attack vectors. More importantly, when the broader team discovers a critical vulnerability themselves, they are heavily incentivized to prioritize the fix in the very next sprint.
“...let it not just be your security team that’s probably overloaded, that already has enough on their plate, and also that has no responsibility for deciding priority of tickets... if everybody’s involved in a hack day the team can say, Hey, guess what? We broke this thing. It’s critical. Let’s get it into the next sprint.” — Katharine Jarmul [00:40:12]
The reality of prompt and context leakage. A primary target for AI red teaming is exfiltration. You must operate under the assumption that your system prompts and injected context can be made entirely or partially public. Attackers have routinely proven they can extract system prompts (as seen with “Pliny the Prompter”). Furthermore, because AI models use information to generate information, clever attackers can reverse-engineer the underlying context, vector databases, or even multimodal training data—such as re-identifying medical patients from retina shapes using Visual Language Models.
“...anything that you write in your system prompt, you should be comfortable writing on your public website.” — Katharine Jarmul [00:42:43]
In practice
Host a pizza-and-falafel hack day: Gather your engineering and product teams for a dedicated session to creatively attack your AI features. Make it collaborative and fun.
Assume total prompt visibility: Never hardcode API keys, proprietary business logic, or sensitive customer instructions into your system prompt. It will be leaked.
Test for data regurgitation: Actively prompt your system to see if it will reveal the specific documents or context chunks fed into it via RAG (Retrieval-Augmented Generation).
Bottom line. Red teaming transforms abstract privacy concerns into concrete, prioritized engineering tasks by exposing exactly how your system will inevitably be abused.
See also:
14. What’s the role of federated learning today? Is it really a solution for keeping data private at the customer’s side and fulfilling legal requirements?
Federated learning keeps raw data localized, but model updates can still leak information. You can implement additional layers like differential privacy and encrypted computation to offer even better privacy controls.
The promise of localized data. Federated learning is highly appealing for heavily regulated industries like healthcare. Imagine hospitals in Hamburg and Berlin that want to collaborate to build a more generalizable model, but legally cannot merge their patient databases. Federated learning solves the initial hurdle: the raw data never leaves the hospital’s local servers.
“The training happens locally for me. The training also happens locally for you. And then what we exchange is actually the updates for the model.” — Katharine Jarmul [00:57:29]
The vulnerability of gradients. The critical caveat is that the model updates (the gradients) exchanged between the local servers and the central aggregator are not inherently private. If an organization acts maliciously, or if the aggregator is compromised, an attacker can analyze these specific mathematical updates to reverse-engineer and infer sensitive details about the original Hamburg dataset.
Hiding in the crowd with Differential Privacy. To make federated learning genuinely secure, practitioners add layers on top of it. The first layer is often Differential Privacy (DP), which adds statistical noise to the updates. This provides individual or group privacy protections by ensuring that any specific “sticky outlier points” are masked. The goal is to let individual data points “hide amongst friends,” making it improbable to tell if a specific patient was part of the training set.
Layering encrypted computation. For enhanced privacy and security, updates can be sent and processed using encrypted computation. Katharine’s team at Cape Privacy pioneered applying this to federated learning. The local models send their updates entirely encrypted. The central server adds the differential privacy noise and aggregates the updates while they are still encrypted, and they are only decrypted once the final, combined model update is complete.
“That means all the updates were sent encrypted to a few different aggregation points. Those were then added with some noise and decrypted only after they were all added together... somebody would have to break the encryption... [and] we all see the result together... it tells us far less about Hamburg contributions versus Berlin.” — Katharine Jarmul [00:58:30]
In practice
Do not rely on federated learning alone: If you are building a multi-organization federated pipeline, recognize that gradient updates can leak data. You must evaluate the trust level between your node partners.
Apply Differential Privacy: Inject calibrated noise into your model updates before transmitting them to the central aggregator to prevent reverse-engineering of edge cases.
Explore Secure Aggregation: If operating in ultra-secure environments, look into computing model aggregations in encrypted space so the central server never sees the raw gradient updates.
Bottom line. Federated learning is a powerful architectural start for privacy, but it requires the mathematical reinforcements of differential privacy and encryption to protect against sophisticated data extraction.
See also:
15. What are the top 3 things builders can do to incorporate more privacy into their AI systems?
Understand the intimacy of conversational AI, treat privacy as a grayscale spectrum rather than a binary switch, and embrace the deep technical challenge of the field.
Acknowledge the intimacy of the interface. Builders must recognize that modern AI interfaces—chat, voice, and video—elicit a level of user vulnerability that standard software never did. People naturally anthropomorphize these systems, interacting with them as confidants rather than databases. As a result, engineers are unknowingly capturing incredibly sensitive personal context.
“...we’re getting closer and more personal to lots of people... we’re kind of entrusted with people’s, in a lot of ways, deepest, darkest secrets. [They] might be lying in our database or through our engineering systems.” — Katharine Jarmul [01:00:51]
Privacy is a grayscale, not a toggle. A massive misconception in engineering is that privacy is a binary state: you either remove everything and are perfectly safe, or you leave it open and are fully vulnerable. Real-world privacy exists on a vast spectrum. You do not have to architect the most complex privacy engineering system on day one. Every incremental step you take to minimize data collection or mask an identifier provides tangible protection for the user.
“...what we really have is a gray scale... any step towards offering more privacy for the same utility and performance or for, a very small trade off in those things is a step [towards] protection that you’re offering...” — Katharine Jarmul [01:02:13]
The field desperately needs builders. Technical privacy is not just a compliance hurdle; it is one of the most fascinating, interdisciplinary, and technically challenging domains in modern computer science. It requires balancing deep mathematics, systems engineering, product design, and societal ethics. It is highly rewarding work that directly protects people.
“I think we need at least twice as many privacy engineers as we need builders right now to just help keep up with the pace.” — Katharine Jarmul [01:03:57]
In practice
Design for vulnerability: Assume that your users will eventually paste highly sensitive, personal, or corporate information into your AI text boxes. Build your retention and logging policies around that assumption.
Take incremental steps: Don’t let the complexity of advanced privacy technologies stop you from acting. Start with something simple today, like implementing shorter data retention windows or masking logs.
Upskill in technical privacy: Read foundational literature (like Katharine’s Practical Data Privacy), study how advanced privacy works, and bring those concepts back to your product team.
Bottom line. Building private AI systems requires empathy for the user’s vulnerability, a willingness to make incremental technical improvements, and the courage to tackle some of the hardest mathematical problems in software.
See also:
Don’t forget to check out Katharine’s Probably Private YouTube channel and newsletter for the highest signal builder-focused resources on Privacy and Security in AI Systems.
How You Can Support Vanishing Gradients
Vanishing Gradients is a podcast, workshop series, blog, and newsletter focused on what you can build with AI right now. Over 70 episodes with expert practitioners from Google DeepMind, Netflix, Stanford, and elsewhere. Hundreds of hours of free, hands-on workshops. All independent, all free.
If you want to help keep it going:
Share this with a builder who’d find it useful














Super insightful
The "system prompt = public website" framing deserves a corollary for the entire context layer, not just system prompts. Anything you cache, anything you log for evals, anything you persist for retrieval is also potentially extractable — and the cache mechanics most teams don't think about make this worse than people realize.
Anthropic's prompt cache requires a 4,096-token minimum block on Haiku 4.5 — same as Opus, not the 1,024 that older docs imply. Below the threshold, the cache silently fails. Above it, the cached content is structured and persistent in ways that change the threat model. Teams that think they're not caching are; teams that are caching may be caching things they wouldn't have put in a system prompt voluntarily.
Most privacy threat modeling still treats the model as the dangerous component. The more interesting failure surface is the scaffolding around it — what you cache, what you log, what your retrieval store actually retains. That's the part the threat model is least mature for, and the part that's growing fastest as agent stacks get more complicated.