How to Verify What an AI System Tells You

The most common concern practitioners raise about AI is accuracy. What happens when it is wrong? Can it be trusted?

These are the right questions. The answer a well-designed system gives is not "trust me." The answer is: here is the source. Verify it yourself.

How general-purpose AI systems fail

Large language models generate text by predicting probable continuations. When a user poses a question, the model produces a response that is statistically consistent with its training data. This produces fluent, confident prose regardless of whether the underlying claims are accurate.

This behavior, called confabulation, is not a defect that future releases will eliminate. It is a characteristic of how these systems work. A model trained on legal briefs will produce text that reads like a legal brief. A model trained on pharmaceutical SOPs will produce text that reads like an SOP. The confidence of the output is independent of its accuracy.

For casual use, confabulation is a nuisance. For professional use, it is a liability. An attorney who relies on a cited case that does not exist, a physician who acts on a fabricated summary of a patient chart, or a quality manager who acts on a fabricated deviation record each face a problem that cannot be attributed to the software.

What citation-grounded retrieval changes

A different architecture produces different behavior.

In a document-grounded system, the generation step is constrained. When a user submits a query, the system does not draw on training data to formulate an answer. It retrieves relevant passages from the practice's own documents, then uses those passages as the evidence from which an answer is composed.

Every claim in the response can be traced back to a specific passage in a specific document. The system presents those citations alongside the answer. The user can follow any citation to the source material and read the original passage in context.

That traceability is not a feature added for comfort. It is the mechanism that makes the system usable by professionals with accountability obligations.

What the refusal behavior looks like

A well-configured system declines to answer when it cannot find adequate grounding in the available documents. If a query has no good answer in the available documents, the system says so rather than fabricating a response from training data.

This behavior is counterintuitive to users accustomed to systems that always produce an answer. A system that says "I do not find adequate support for that in your documents" may feel less capable than one that returns confident prose. For professional use, the opposite is true.

The refusal is information. It tells the professional that this question requires a different source, a human expert, or additional research. That is a useful output. A fabricated answer that sounds correct is not.

The judgment question

A document-grounded system reframes what kind of tool this is in a professional setting. It is a retrieval and summarization instrument, not an authority.

The professional reads the answer. The professional follows the citation to the source document. The professional decides whether the source says what the system says it says, and whether that conclusion applies to the matter at hand. The presence of the system in that sequence does not change who is responsible for the decision — whether that person is an attorney, a physician, an accountant, or a compliance officer in a regulated manufacturing environment.

What changes is the speed at which relevant material surfaces. A question that previously required a manual search through 200 case files returns cited passages in seconds. The professional still reads the source. The professional still exercises judgment. The search step is faster and more thorough.

Logs and the audit record

A system operating on professional documents should maintain a record of what was asked and what was returned. This is not primarily about accountability, though accountability matters. It is about the ability to reconstruct how a piece of information entered the picture.

If a professional later needs to trace how a conclusion was reached, the log is the record: what was queried, which documents were retrieved, what the system returned. That record carries the same professional value as a note in a file or a notation in a chart.

The standard

A private AI system handling professional documents should produce a citation with every answer, decline when it cannot find adequate grounding in the available documents, maintain a log of what it does, and leave the professional in the position of deciding what to do with the information it returns.

Those are not ambitious requirements. They are the minimum standard for a tool that handles professional documents.

If a system cannot meet that standard, it should not be used for professional work. If it can, the professional's task is not to trust it. The task is to verify it. The system exists to make that verification fast.