logo

Follow the data: lineage as a prerequisite for secure and effective AI 

Vebjorn Kragebakk

Summary

Every answer an AI gives has a history. It was based on training data, assembled from files, emails, and pages that someone created, classified, and gave permission to read.

When that history is visible, AI is something you can trust and audit. When it is not, you are guessing. This post looks at data lineage as the groundwork for secure and effective AI, and at what Microsoft Purview and Microsoft 365 Copilot make possible today.

What lineage means here

Lineage is the record of where data came from and where it went. For AI it runs in two directions. Upstream sits the question of which sources fed a given answer. Downstream sits the question of where that answer travelled afterwards, and what protection it carried with it. Most teams think about lineage only for analytics pipelines. Generative AI turns it into a daily concern, because a single prompt can pull from dozens of locations and produce content that lands in a chat, a document, or another agent within seconds. 

Figure 1. A prompt is grounded in sources the user can already access; the answer returns with citations that trace back to where it came from.

What Copilot actually does with your data

Microsoft 365 Copilot does not answer from training data alone. It grounds each prompt in your tenant through Microsoft Graph, drawing on the emails, files, chats, meetings, and sites the user can already see. The Semantic Index respects the identity-based access boundary, so grounding only reaches content the user is authorized to open, and the response comes back with citations to the sources used to ground it (Microsoft, 2026a). That prompt-and-response pair is then retained as the user’s activity history. The lineage, in other words, is already being captured. The real question is whether you can see it and govern it. If an organization does not have identify-based access boundaries, you risk sharing your entire data estate with whatever AI tools are connected to it but that’s a story for another day. 

Labels that travel

Permissions decide what Copilot can reach. Sensitivity labels decide what happens to it next. When a label applies encryption, Copilot returns the content only if the user holds both the VIEW and the EXTRACT usage right, and that protection is honored even when the file sits outside Microsoft 365 but is open in an Office app (Microsoft, 2026b). A labelled, encrypted document cannot be summarized by someone who lacks permission to read it. Content with no label has no protection to pass on. This is the practical heart of downstream lineage: the label is the thing that travels with information as it moves. 

Figure 2. A sensitivity label travels with content from source to stored interaction. Unlabelled content has nothing to inherit.

One map for every agent

Visibility has to cover more than Copilot. Purview builds a central map of the data estate through automated scanning, metadata extraction, and lineage mapping in its Data Map and Unified Catalog (EPC Group, 2026). On top of that, Data Security Posture Management for AI (DSPM for AI) gives one place to discover AI usage, see which prompts touch sensitive data, assess risk, and apply one-click policies. The scope is deliberately wide: first-party Copilots and Copilot Studio, enterprise apps such as ChatGPT Enterprise and Anthropic Claude (Enterprise), and consumer tools picked up through browser activity (Microsoft, 2026b). One control plane, every agent. 

Where the trail breaks

The hard part is not the governed path. It is everything running beside it. Staff paste customer data into a chatbot in a browser tab, and nothing records that it happened. Purview narrows the gap by bringing AI interactions into the unified audit log for investigation and eDiscovery, applying data loss prevention to prompts and responses in real time, and setting retention on AI interactions (Microsoft, 2026b). Shadow AI is the part of the map left blank. Lineage is the discipline of filling it in. 

Figure 3. Governed tools keep a connected provenance trail; ungoverned shadow AI leaves a gap that lineage work exists to close.

Lineage and the rules that are coming

This is not only good practice. When the bulk of the EU AI Act applies in August 2026, providers and deployers of higher-risk systems will need to show data governance, documentation, and logging. Purview’s lineage supports the data-governance expectations of Article 10, its catalog supports the documentation expected under Article 11, and its audit logging supports Article 12, with Compliance Manager carrying a dedicated EU AI Act assessment template (EPC Group, 2026). 

Effective, not only secure

Trustworthy lineage does more than reduce risk. It makes AI worth using. When a marketing team at Microsoft moved onto the Unified Catalog, the payoff was that people could trust they were working from current, authorised information rather than stale or unauthorised copies (Microsoft, 2026c). Citations a reviewer can follow, labels a system can enforce, a usage map a security team can read. That is the line between an AI pilot people quietly distrust and one they actually adopt. 

How Infotechtion can help

Infotechtion helps organizations build the lineage foundation before scaling AI. We map your data estate, configure Microsoft Purview and DSPM for AI for your Copilot and agent scenarios, design a sensitivity-label taxonomy that travels with your content, and surface shadow AI before it turns into a liability. We also prepare your reporting and documentation for the AI Act and align your teams on a shared process for moving AI from pilot to production. If you want AI your people can trust and your auditors can follow, we can help you put the groundwork in place. Reach us at contact@infotechtion.com. 

Sources

EPC Group. (2026). Microsoft Purview data governance guide 2026https://www.epcgroup.net/blog/microsoft-purview-data-governance-guide 

Microsoft. (2026a). Data, privacy, and security for Microsoft 365 Copilot. Microsoft Learn. https://learn.microsoft.com/en-us/microsoft-365/copilot/microsoft-365-copilot-privacy 

Microsoft. (2026b). Microsoft Purview data security and compliance protections for Microsoft 365 Copilot and other generative AI apps. Microsoft Learn. https://learn.microsoft.com/en-us/purview/ai-microsoft-purview 

Microsoft. (2026c). Powering data governance at Microsoft with Purview Unified Catalog. Inside Track Blog. https://www.microsoft.com/insidetrack/blog/powering-data-governance-at-microsoft-with-purview-unified-catalog/ 

Microsoft. (n.d.). What information does Copilot use to answer my prompt? Microsoft Support. Retrieved June 9, 2026, from https://support.microsoft.com/en-us/topic/what-information-does-copilot-use-to-answer-my-prompt-934f537d-ff7d-4059-9fec-a751e4651307 

© 2025 Infotechtion. All rights reserved

Facebook
Twitter
LinkedIn
Email

By submitting this form you agree that Infotechtion will store your details and send future resources. You may opt-out any time.

Recent posts

Job application.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorestandard dummy text ever since.

Please fill the form

Job application.

Join Infotechtion for an impactful career filled with passion, innovation, and growth. Embrace diversity, collaboration, and continuous learning. Discover your potential with us. Exciting opportunities await!

Please fill the form

By submitting this form you agree that Infotechtion will store your details.
All information provided is stored securely and in line with legal requirements to protect your privacy. You may opt-out any time.