Why We Publish Our Real Scores — Even When They Are Not Perfect
TraceGov's production TRACE scores are 60-67%. We publish them because in regulated AI, trust starts with telling the truth about your own system.
Our production TRACE scores are 60-67%. Not 95%. Not "industry-leading." Sixty to sixty-seven percent.
We publish them anyway.
The Problem with AI Compliance Claims
Most AI governance vendors claim high accuracy. Few publish production numbers. Fewer still explain the gap between laboratory benchmarks and real-world performance.
This matters because the EU AI Act — specifically Article 13 — requires transparency about AI system capabilities and limitations. Not just for end users. For deployers, regulators, and affected persons.
If a vendor claims 95% compliance confidence but can't show production evidence, that claim is untestable. And untestable claims fail audits.
What 60-67% Actually Means
TRACE scores every AI response across five dimensions:
- T — Transparency: Does the response disclose its reasoning and limitations?
- R — Reasoning: Is the logic structured and evidence-based?
- A — Auditability: Can an auditor independently verify this response?
- C — Compliance: Does it align with applicable regulatory frameworks?
- E — Explainability: Can an affected person understand the decision?
A score of 60-67% means the system demonstrates moderate governance maturity. It scores well on reasoning and auditability — because the underlying knowledge graph and retrieval pipeline produce structured, source-referenced responses. It scores lower on transparency disclosure and framework-specific compliance language.
That's honest. That's measurable. And that's improvable.
Why Honest Numbers Build Trust
Three reasons publishing real scores works better than inflated claims:
1. Regulators respect transparency over perfection.
Article 13(1) of the EU AI Act requires deployers to understand AI system capabilities "including the level of accuracy, robustness, and cybersecurity." A deployer who says "our system scores 63% on transparency, and here is the evidence" is in a stronger regulatory position than one who says "our vendor assured us it's compliant."
2. Gap attribution tells you WHERE to improve.
A single score is a grade. Five dimension scores with gap attribution — showing exactly why each dimension isn't 100% — is a diagnostic tool. TraceGov tells you whether the gap comes from Source Coverage (SCG), Prior Knowledge Confidence (PKC), Depth Limitation (DLT), or Assertion Density (ADG).
This is the difference between "you got a B-" and "you lost 12 points on source diversity and 8 points on assertion density — here are the specific chunks that would close the gap."
3. Production numbers create a baseline for improvement.
Tracking 60% → 65% → 72% over three months tells a Board something meaningful: governance is improving. Claiming 95% from day one tells them nothing — because there's nowhere to go.
The Cost of Compliance Theater
Compliance theater is when an organisation invests in the appearance of compliance without building actual governance capability. The risk is not abstract:
- EUR 7.1 billion in cumulative GDPR fines demonstrate that EU regulators enforce meaningfully.
- The EU AI Act introduces fines up to EUR 35 million or 7% of global turnover (Article 99).
- Enforcement begins August 2, 2026. That is 150 days from today.
A governance tool that claims 95% and can't prove it is worse than no tool at all — because it creates false confidence. The organisation stops looking for gaps. The regulator finds them anyway.
How TRACE Scoring Works
TRACE uses formula-based deterministic scoring, not machine learning classification. This is a deliberate design choice:
- Article 13 transparency: Every score is reproducible. Given the same inputs, you get the same score. No black-box model weights.
- Article 14 human oversight: A compliance officer can inspect any dimension, understand the formula, and override if needed.
- Auditability: Every score comes with the parameters that produced it — source count, quality ratios, claim estimates, coverage percentages.
The formula is transparent. The evidence is hash-verified. The audit trail is immutable.
What We Are Doing About 60-67%
Publishing honest numbers is not the same as accepting them. Here is our improvement roadmap:
- Expand governance framework coverage. More frameworks in the Governance Library means more compliance-specific evidence per response.
- Improve source diversity. The primary gap driver is Source Coverage — responses often draw from 6-8 chunks when 15-20 would improve coverage ratios.
- Adaptive retrieval. TAMR+ multi-phase retrieval already dynamically adjusts the number of evidence chunks based on query complexity. Production data shows this consistently improves Transparency and Auditability scores.
Our target: 75%+ average TRACE score by Q3 2026. We will publish the updated numbers when we get there.
For Deployers Evaluating Governance Tools
Three questions to ask any AI governance vendor:
-
"What are your production scores?" If they can't answer with specific numbers from real deployments — not benchmarks, not demos — the tool hasn't been tested under production conditions.
-
"Can I see the formula?" If the scoring is a black box, it fails Article 13 transparency. You are responsible for understanding how the governance tool works, not just that it produces a number.
-
"What happens when the score is low?" A tool that only tells you "compliant" or "not compliant" is a checkbox. A tool that tells you why the score is 63% and which specific evidence would raise it is a governance system.
TraceGov answers all three. Starting with the honest number: 60-67%.
TraceGov scores every AI response across five governance dimensions. Start with the Explorer tier — free, EU-hosted, no credit card required. Start Tracing — Free →