Making RAG Respect Permissions

Most enterprise RAG failures are not model failures. They are retrieval failures. The assistant answers a question with a chunk that is semantically relevant but not actually authorized for the user who asked it. From the outside, this looks like a normal answer. Under the hood, it is an access-control bug.

The fix is not to ask the model to "be careful." The fix is to make authorization part of retrieval itself. If unauthorized chunks never become retrieval candidates, they never become prompt context, and the model never gets an opportunity to leak them.

Why naive RAG breaks in enterprise environments

A naive pipeline treats the corpus as one shared search space. It chunks documents, stores vectors, retrieves top-k by similarity, and sends results into prompt assembly. That pattern works for public content and fails for internal systems with differentiated access controls.

Enterprise data is usually governed by NTFS ACLs, inherited folder permissions, explicit deny rules, and group-based identity in AD or IAM. A search path that does not account for those controls is incomplete. The issue is not whether the answer looks good. The issue is whether the retrieval candidate set was legal.

The core architecture change

ACL-aware RAG adds a policy envelope to each chunk during indexing, then enforces that policy envelope at query time inside the vector search request. The indexing path and query path must share the same principal normalization logic, otherwise the policy model drifts and edge cases start appearing.

At indexing time, the system reads document content and security descriptors, normalizes principal identifiers, and attaches allow and deny metadata to every chunk generated from that document. At query time, the system resolves effective user identity, including nested memberships, then applies allow and deny filters directly in retrieval.

Metadata model and retrieval predicate

A minimal payload shape can look like this:

{
  "document_id": "doc-01a7",
  "path": "\\\\corp-fs01\\Shared\\Confidential\\Quarterly_Plan.docx",
  "allowed_groups": ["CORP\\Finance_Read", "CORP\\Leadership_Read"],
  "denied_groups": ["CORP\\External_Contractors"],
  "acl_version": "2026-03-20T00:00:00Z"
}

The associated retrieval filter must be part of the vector search call, not an application cleanup step:

acl_filter = Filter(
    must=[FieldCondition(key="allowed_groups", match=MatchAny(any=list(user_groups)))],
    must_not=[FieldCondition(key="denied_groups", match=MatchAny(any=list(user_groups)))],
)

This is the security boundary. If your system fetches unrestricted top-k first and filters later, you have already created a risky path.

Design tradeoffs in index strategy

The right storage strategy depends on scale and policy requirements. In most organizations, one shared index with strong ACL filtering gives the best operational balance.

Strategy	Isolation	Operational Cost	Typical fit
Per-user indexes	Very high	Very high	Narrow workloads with strict isolation mandates
Shared index with ACL filter	High	Moderate	Most internal assistants and enterprise search workloads
Partitioned indexes plus ACL filter	Medium to high	Moderate	Large orgs with clear domain boundaries

Per-user indexes are easy to reason about from an isolation standpoint, but they multiply storage, indexing, and reprocessing costs quickly. Shared indexes are usually practical as long as policy enforcement is robust and auditable.

The operational reality

Most incidents in ACL-aware systems come from drift rather than initial code defects. Directory memberships change, file permissions shift, documents move, and stale vectors remain unless invalidation is reliable. If ACL metadata refresh is slow, retrieval behavior can lag behind actual policy.

The practical answer is disciplined operations: change-aware re-indexing, source move and delete invalidation, nested-group resolution tests, and retrieval decision logging that can explain why a chunk was included. Those controls are less glamorous than prompt engineering, but they are what keeps the system trustworthy.

Final note

ACL-aware RAG is fundamentally a systems problem. When permissions are captured at indexing time, resolved correctly at query time, and enforced inside retrieval, the model layer becomes much easier to trust. When any of those pieces are missing, relevance quality can hide authorization defects until they surface in production.