Shadow AI Exposed (2/2): Building a Governance Program That Actually Works

Most organizations that block ChatGPT at the firewall still have a shadow AI problem six months later. The URL is different. The data still leaves.

Part one covered why that happens: the behavior is driven by a productivity gap, not by negligence. Blocking one tool without closing the gap shifts the problem. It doesn’t solve it. This part covers what closing the gap looks like.

Key Takeaways

Technical controls address what’s visible. The personal account problem is structural and won’t be solved by URL filtering or DLP policy alone

An approved AI catalog doesn’t just give employees a sanctioned option. It makes unauthorized use a deliberate choice rather than an oversight

DLP policy built for email and file sharing doesn’t map cleanly onto AI interactions. The patterns are different

Governance erodes without feedback loops. The catalog you build today is outdated before the year is out

Where do you start?

Most teams I work with start with controls. Deploy Entra Internet Access, configure Purview DLP, set up session policies in Defender for Cloud Apps. The technical layer for that is covered in the companion article. It matters, and I recommend building it. But controls deployed before you know what you’re controlling are controls built on assumptions.

Start with inventory. What AI tools are employees actually using, across which teams, for which tasks?

The Defender for Cloud Apps discovery report is one input. Entra Internet Access traffic signals are another. Neither is complete. In practice, I find that asking employees directly surfaces more than the logs do. A short survey, framed as capability assessment rather than audit, gets honest answers. People will tell you which tools they use if they’re not afraid the answer will get the tools taken away.

The goal is a map: which tools are in use, for what, and whether those use cases involve data that creates exposure. That map is what the governance program is built on.

Before you can define data classification limits in the catalog, you need a classification foundation. That means sensitivity labels in Microsoft Purview Information Protection, sensitive information types, or both. Without any classification infrastructure, “no confidential data in tool X” is a guideline your DLP policy can’t enforce. Employees can’t apply limits to data they haven’t been taught to recognize.

Do you have a policy?

The catalog is a list. An AI Use Policy is what makes it binding. No regulation mandates a policy by that name, but GDPR, NIS2, and ISO 27001 collectively require documented evidence of how you govern information systems, and AI tools are information systems. Without a written policy that defines approved use, restricted data, and prohibited tools, you can’t hold employees accountable for violations, and an auditor has nothing to assess. The policy doesn’t need to be long. It needs to define what data can go into which tools, what happens when an employee finds something new, and what the consequence of unauthorized use is. The catalog operationalizes the policy. Without the policy, the catalog is advisory.

What goes in an approved AI catalog?

An approved AI catalog is not a list of blocked tools with a few exceptions. It’s a set of tools the organization has evaluated and taken contractual responsibility for, paired with use case guidance that tells employees what they can do with each one.

The distinction matters because a catalog without use case guidance doesn’t change behavior. Employees who aren’t told what data is appropriate for a given tool make that decision themselves. Most of them make reasonable choices most of the time. Some don’t. That’s the gap.

For most Microsoft 365 environments, Microsoft 365 Copilot is the obvious starting point. It processes data in-tenant, within your existing compliance boundary. The core workloads don’t transmit data externally. Two exceptions worth knowing: web search grounding sends queries to Bing, and external agents or plugins can route data outside the tenant. With both disabled, the governance story is straightforward.

But Copilot doesn’t cover everything. Employees use AI tools for code assistance, image generation, document translation, and tasks Copilot handles less well. A realistic catalog includes alternatives for those cases. If the sanctioned option is worse than the unsanctioned one for most of what employees actually do, the catalog adds bureaucracy alongside the shadow AI use. It doesn’t replace it.

What I include in a minimal viable catalog:

Approved tools with confirmed enterprise terms or data processing agreements
Approved use cases per tool, including explicit data classification limits
A clear path for adding new tools, with a defined review timeline

That last item matters more than it sounds. Employees find new AI tools faster than any catalog can keep up. If the review process takes four months, employees treat it as a no and find another route. A 30-day evaluation window with a named owner changes that dynamic.

Not all enterprise agreements provide the same protection. In the DPA, check for: whether data is used to train models, which sub-processors handle your data, data retention periods, data residency commitments, and whether the processor is contractually obligated to support deletion requests. The NSW case turned on the absence of a DPA entirely. A DPA that exists but lacks model training restrictions or sub-processor transparency is a different problem, not a solved one.

The personal account problem

On managed devices, Conditional Access and session policies cover this. How to configure that is in the companion article.

On personal devices, MAM is the right layer. Mobile Application Management wraps corporate apps (Outlook, Teams, Edge for Business) without requiring full device enrollment. It creates a managed container around those apps. Data inside that container can’t be copied out to unmanaged apps. A ChatGPT browser tab is an unmanaged app. The paste never lands.

On Windows, Edge for Business with a managed browser profile creates the same isolation. Corporate sessions run inside the managed profile. Without enrolling the device.

What MAM doesn’t cover is data employees type from memory or rephrase in their own words. For that, the governance layer matters: clear use case limits in the AI catalog and a clear explanation to employees of what the container does and doesn’t protect against.

What does DLP look like when it’s built for AI?

Standard Purview DLP is designed for file movement. It catches sensitive patterns in email attachments, OneDrive documents, Teams uploads. The patterns are good. The architecture doesn’t translate cleanly to AI interactions.

AI interactions are conversational. Employees paste content into a chat window. They don’t attach files. The destination is a single endpoint, not a distribution list. And the data submitted tends to be complete, not a fragment of a contract but the full text, because that’s what the tool needs to work.

Purview added AI app categories for exactly this. Policies scoped to AI destinations, with sensitive information types applied to prompt submissions, work differently from file-based DLP. The technical configuration is in the Purview article. What’s worth noting from a governance perspective: source code needs to be treated as a sensitive category explicitly. The default sensitive information type coverage doesn’t include it.

One pattern that works well in practice is staging enforcement. Start in audit mode to understand what’s being submitted and to which tools. Run it for 30 days. Use what you find to calibrate the policy before switching to block or override modes. Deploying block mode on day one generates false positives, and the exceptions process becomes the de facto policy as employees learn to work around it.

What makes this sustainable?

Controls deployed without a maintenance cycle erode. The catalog you build today is outdated before the year is out. New tools appear constantly: Harmonic’s data shows the average organization encounters 23 previously unknown AI tools per quarter, even in environments with active controls. The program has to account for that rate.

What I recommend as a minimum cadence:

Monthly: review new AI tools appearing in traffic logs, flag anything with significant usage for catalog evaluation
Quarterly: update the approved catalog, review which sanctioned tools employees are actually using versus which they’re routing around
Annually: re-evaluate the overall program, check whether governance policy still reflects what the tools do and what the data risk has become

The NSW Reconstruction Authority case from part one illustrates what happens without this. Seven months elapsed between the incident and disclosure because nobody was actively monitoring. A contractor submitted data through a consumer account with no enterprise terms, and there was no mechanism to detect it. The monitoring layer exists to close that window. A governance program without a detection cycle is a policy document, not a control.

The maintenance cycle covers what you find proactively. Incident response covers what you find after something has already happened. Under GDPR Article 33, the 72-hour notification window to your supervisory authority starts the moment you have a reasonable degree of certainty that personal data has been compromised, not when the investigation is complete. Notification is required when the breach is likely to result in a risk to individuals. Either way, you need a defined process: who makes that determination, who notifies, who contacts affected individuals. Without one, you will miss the window.

Frequently Asked Questions

How many tools should be in an approved AI catalog?

There’s no minimum or maximum. A catalog with two tools that employees actually use is more effective than one with twenty that nobody trusts. Start with the tools identified during the inventory phase that have legitimate enterprise terms available. Add based on what employees actually need, not what looks complete on paper.

Do you need Microsoft 365 Copilot to run a shadow AI governance program?

No. Copilot is a useful anchor for the catalog because it covers a wide range of use cases within the Microsoft compliance boundary, but it’s optional. A governance program can work with any combination of approved tools, as long as those tools have appropriate data processing terms and the approved use cases are clearly defined.

What do you do about AI tools employees use on personal devices?

MAM is the technical answer. It wraps corporate apps in a managed container without requiring full device enrollment. Data inside that container can’t be copied to unmanaged apps. On Windows, Edge for Business with a managed browser profile creates the same isolation. What MAM doesn’t cover is data employees type from memory, which is where use case limits in the AI catalog and clear communication to employees fill the gap.

What happens when a new AI tool appears that isn’t in the catalog?

Ideally: an employee submits it through a defined review process with a 30-day SLA, and gets an answer before they’ve already made a decision. In practice, most organizations don’t have that process yet, and employees make individual judgments. The catalog needs a submission path before you can expect employees to use it. Without one, the catalog is a list of what’s approved, not a program for managing what isn’t.

Shadow AI governance isn’t a security program. It’s a response to a gap between what employees need and what the organization provides. The controls matter, but they’re only sustainable when the gap is actually closing. That means sanctioned tools that work, corporate data contained at the app layer so it can’t reach unauthorized AI, and a maintenance cycle that keeps the program current.

The organizations that get this right aren’t the ones with the best DLP policies. They’re the ones that made it easier to use the approved tool than to find a workaround.