"Intended Purpose" vs. "Effect" Under the EU AI Act
You documented the purpose. But the regulation often asks about the effect.
It’s 6pm on a Thursday and you should go home.
But you don’t. You’re looking at your presentation for tomorrow’s board meeting and — for the first time in months — you’re actually proud. Not the exhausted kind of proud where you survived something. The real kind. The kind where you built something right.
Your company is deploying its first high-risk AI system. Candidate screening — CV analysis, applicant ranking, shortlisting for the hiring managers. It’s Annex III, Point 4 under the EU AI Act. You knew from day one it was high-risk. And you did everything the regulation asks.
Risk management system — built, iterative, documented. Human oversight — two senior recruiters trained, with override authority. Technical documentation reviewed. Data governance assessed. Fundamental rights impact assessment — done. The provider’s instructions for use — read, annotated, cross-referenced against your deployment context. AI literacy training — rolled out to every hiring manager who touches the system.
You did this. You and the team. Six months of work. And tomorrow you get to stand in front of the board and say: we’re ready. We’re compliant. This is what good looks like.
You should go home. But the presentation is tomorrow and you want to be sharp — so you pull up the AI Act one more time. Not to build anything. Just to flip through, mark a few notes for potential board questions. A confidence pass.
You’re skimming. Recitals, mostly — the interpretive context you might need if someone asks a “but what does that actually mean” question. And then your eyes land on Recital 29.
You’ve read it before. You must have. But this time — maybe because you’re not building anything, just reading — a sentence catches you in a way it didn’t before.
”It is not necessary for the provider or the deployer to have the intention to cause significant harm, provided that such harm results from the manipulative or exploitative AI-enabled practices.”
You stop scrolling.
It is not necessary to have the intention.
You look at your presentation. Slide 4 provides “Compliance Architecture”. Every bullet describes what the system is for. Its intended purpose. Its documented design. The governance built around the use case as the provider defined it.
You know what the system is supposed to do. You documented it thoroughly. But now the question: how do you know its real-world effect matches that purpose? How are you measuring what actually happens to candidates once the system processes them? How would you catch the unexpected — the drift, the bias that emerges only in your specific context, the effect on people that the team designed for but that shows up anyway after three months of real data, real applicants, real hiring managers learning which outputs to trust?
Your governance covers the intended purpose. But the regulation — in the provisions that carry actual consequences — asks about effect. And you have nothing that tracks it. No metric. No monitoring. No evidence that what the system does to people is what your documentation says it should do.
You’re not going home at 6pm.
The Two Questions the AI Act Asks
The EU AI Act is built on a concept called “intended purpose”. Article 3(12) defines it — the use for which an AI system is intended by the provider, including the specific context and conditions of use, as specified in the instructions for use, promotional or sales materials and statements, and technical documentation.
Intended purpose is the foundation of everything. Risk classification flows from it. Documentation is structured around it. Testing is scoped to it. The entire compliance architecture of the AI Act assumes a world where a provider says what the system is for, a deployer uses it accordingly, and obligations attach based on that stated purpose.
This is the comfortable part of compliance. You control it. The provider defines it. You document around it. It’s the legal perimeter you draw yourself.
But the AI Act has a second mode. One that shows up and says something different:
We don’t care what you intended. Show us what the system does.
The first mode gives you governance. The second mode creates liability.
Where the Act Says: Effect Governs
The shift from purpose to effect isn’t buried in one obscure recital. It runs through the entire regulation — from the prohibitions to the risk management system to the human oversight requirements to the incident reporting obligations. Here are the provisions that matter most.
The prohibited practices: “with the objective, or the effect of”
Article 5(1)(a) — the prohibition on subliminal and manipulative techniques — uses language that you don’t want to skim past too quickly:
The prohibition covers AI systems deploying manipulative or deceptive techniques “with the objective, or the effect of materially distorting the behavior of a person or a group of persons by appreciably impairing their ability to make an informed decision, thereby causing or being reasonably likely to cause that person, another person or group of persons significant harm.”
That “or the effect of” is doing critical work. But note what follows it — the harm threshold. The distortion of behavior must cause or be reasonably likely to cause significant harm. Both elements matter: the effect-based trigger (you don’t need to intend the manipulation) and the cumulative condition (the resulting harm must be significant). If your AI system produces both — distorting behavior that causes significant harm — you’re in violation regardless of what you built it for.
Article 5(1)(b) uses the same construction — exploitation of vulnerabilities due to age, disability, or social/economic situation. Same language: “with the objective or the effect of materially distorting the behavior”.
And then Recital 29 removes any remaining ambiguity:
”It is not necessary for the provider or the deployer to have the intention to cause significant harm, provided that such harm results from the manipulative or exploitative AI-enabled practices.”
Read that carefully. Intent is explicitly irrelevant. Your governance documents, your stated purpose, your carefully crafted instructions for use, none of it matters if the system’s actual effect crosses the line.
A recruitment AI that wasn’t designed to exploit anyone but in practice pushes candidates toward accepting unfavorable contract terms by presenting information in a way that impairs informed decision-making? That’s caught. Not because you intended it. Because of what it does.
The practices where purpose is entirely irrelevant
Some Article 5 prohibitions don’t even engage with purpose at all.
Article 5(1)(f) prohibits AI systems that infer emotions in workplaces and education institutions, except where the use is intended for medical or safety reasons. It doesn’t matter what the system is “intended” for — wellbeing monitoring, engagement measurement, productivity tracking. The practice itself is prohibited. Purpose cannot save you. (Unless your use falls within the narrow medical/safety carve-out — stress detection as part of occupational health monitoring prescribed by a physician, for example. That exception exists. It’s narrow. And if you’re relying on it, you’d better be able to prove it.)
Article 5(1)(e) prohibits untargeted scraping of facial images from the internet or CCTV to build recognition databases. It doesn’t matter whether you scrape those images to build a security product, an art project, or an academic dataset. The act of scraping is the violation. Purpose is irrelevant.
Article 5(1)(c) — social scoring — requires two things. First, the AI system must classify or evaluate people based on social behavior or personal characteristics. Second, that classification must lead to detrimental treatment in contexts unrelated to those in which the data was generated, or treatment that is disproportionate to the social behavior. Both prongs must be satisfied — the scoring and the resulting harm. But notice: the prohibition is triggered by what the score leads to, not by what the system is labelled. A loyalty programme that cross-references social media activity to deny services in unrelated areas could trigger this. What matters is the combination of the classification and its downstream effect on people. Not what the system was called.
Effect masquerading as purpose
This one is tricky.
Article 6(3) offers an escape from high-risk classification for Annex III systems — but the statutory language is broader than you might initially think. The exemption applies where the system does not pose a significant risk of harm to health, safety, or fundamental rights, including by not materially influencing the outcome of decision making. “Materially influences outcomes” is one factor, but it sits within a wider assessment of significant risk.
In practice, though, the outcome-influence test is where most deployers will live or die. Does the system materially influence outcomes, in practice?
A system described as “decision-support” that generates scores which hiring managers follow 94% of the time? That system materially influences outcomes. The documentation can call it advisory all day long. The effect says otherwise. And if a market surveillance authority pulls your data and sees that pattern, your Article 6(3) exemption collapses.
The test isn’t what the system is supposed to do. It’s what actually happens to decisions when the system is in the room.
You used it differently, now you own it
Under Article 25, if a deployer uses an AI system for a purpose the provider didn’t intend — and that new use makes it high-risk — the deployer becomes the provider. Full provider obligations. Conformity assessment. Technical documentation.
This is the EU AI Act acknowledging that actual use diverges from intended purpose — and assigning legal consequences when it does.
A company buys a general analytics tool — not classified as high-risk — and deploys it to rank job candidates. The provider’s intended purpose was “workforce analytics and reporting.” The deployer’s actual use is “recruitment and selection of natural persons.” That’s Annex III, Point 4. The deployer just became the provider of a high-risk AI system — with no documentation, no conformity assessment, and no risk management system.
Effect trumps purpose. And the liability follows.
Human oversight watches for effect, not purpose
Human oversight under Article 14 of the EU AI Act isn’t “check that the system is working as documented”. It’s “detect anomalies, dysfunctions, and unexpected performance”. The overseers must monitor the system’s operation — including what it’s doing that it wasn’t supposed to do.
The AI Act requires overseers to remain aware of “automation bias” — the tendency to over-rely on AI outputs. Why? Because the effect of automation bias is that the human oversight becomes meaningless. The person clicks “approve” without independently assessing the output. The system is making the decisions in practice, even if the process chart says otherwise.
Human oversight is an effect-monitoring function. It exists to catch the difference between what the system should do and what it does.
When it goes wrong, intent vanishes
Serious incident reporting doesn’t ask why something happened. It asks what happened.
Under Article 73, a serious incident — death, serious health harm, disruption to critical infrastructure, fundamental rights violations — must be reported based on a causal link between the AI system and the harm. Not based on intent. Not based on whether the harm fell within the system’s intended purpose.
If your recruitment AI causes systematic discrimination that rises to the level of a fundamental rights violation — that’s a reportable serious incident. It doesn’t matter that the system was intended to be neutral. It doesn’t matter that your documentation says “non-discriminatory”. The effect triggered the obligation.
The Pattern Continues
Those were the provisions that hit hardest. But the pattern runs deeper than many people realize. Across the Act, effect-based language appears in:
Risk management (Article 9) — providers must assess risks not just under intended purpose, but under “reasonably foreseeable misuse”. You must anticipate effects you didn’t design for.
Post-market monitoring (Article 72) — an ongoing obligation to collect and analyze data on the system’s real-world performance throughout its lifetime. This is pure effect tracking — what is the system doing now, not what was it designed to do.
Fundamental rights impact assessment (Article 27) — deployers must assess “the impact on fundamental rights that the use of such system may produce.” Forward-looking effect prediction. Not backward-looking purpose description.
Deployer monitoring (Article 26(5)) — deployers must “monitor the operation” of the system. Not check the documentation. Monitor what it’s doing.
Transparency (Article 50) — obligations triggered by what the system does (generates synthetic content, interacts with humans) regardless of why it’s deployed.
GPAI systemic risk (Article 51) — classification based on “actual or reasonably foreseeable negative effects” on public health, safety, or fundamental rights. Entirely detached from any downstream deployer’s intended purpose. The model’s capabilities determine its risk, not its use case.
Input data relevance (Article 26(4)) — deployers must ensure input data is representative for the system’s intended purpose. But if your real-world data differs from the provider’s assumptions — different demographics, different distributions — the effect will differ from the documented performance. You’re responsible for that gap.
More than a dozen separate provisions where the Act either explicitly or functionally shifts from purpose to effect. The entire enforcement architecture — prohibitions, incident reporting, post-market monitoring, fundamental rights — runs on effect. Purpose built the compliance file. Effect determines liability.
Comfortable vs. Uncomfortable Compliance
There’s one problem that bothers me lately: many companies know how to document intended purpose. Almost no company knows how to prove effect.
For intended purpose, you have a playbook:
Read the provider’s instructions for use. Document your use case. Write the risk assessment. Assign human oversight. File the FRIA. Train your staff. Build the governance file. Check the boxes.
For effect, there is no playbook. There’s barely a market. And the question the deployers can’t answer is brutally simple:
What is your AI system actually doing to the people it affects — and how do you know?
Not what the documentation says. Not what the provider claims. What’s actually happening. Right now. In your specific context. With your specific data. To your specific population.
What Deployers Should Do About “Effect”
The EU AI Act doesn’t define “evidence” as such. But that’s what it demands in some cases and the question is how to acquire the evidence practically.
It might looks something like this:
Layer 1: Before you deploy — baseline the system against your reality
Before the system goes live, test it against your context. Not the provider’s test data. Yours.
Does it perform as claimed on your applicant pool? Does it produce different outcomes for different demographics? What happens at the edges — unusual CVs, non-traditional career paths, gaps in employment history? Does the provider’s stated accuracy hold when you feed it data that looks like what you’ll actually feed it?
This is acceptance testing. It’s not in Article 26 by name. But it’s the only way to answer the question regulators will ask:
Did you have reason to believe this system would produce the effects it produced?
If you deploy without testing against your own context — and the system produces discriminatory effects — the defense “but the provider said it was accurate” won’t survive scrutiny.
Layer 2: During operation — monitor what the system does, not what it should do
Article 26(5) says deployers must monitor the operation of the system. Here’s what that means if you take it seriously:
Track outputs. Not just “the system is running” What is it outputting? Which candidates get shortlisted? Which get rejected? At what rates? Log this. Keep it for at least six months — that’s the minimum under Article 26(6). Longer is better.
Track outcomes. Where possible, follow the chain. Of the candidates the system shortlisted — who got hired? Who succeeded? Who didn’t? If the system’s recommendations correlate poorly with actual job performance — that’s a performance problem. If they correlate with protected characteristics — that’s a fundamental rights problem.
Track overrides. When human overseers disagree with the system, document it. Why did they override? Was it a one-off or a pattern? High override rates in one direction may signal systematic bias. Low override rates may signal automation bias — the humans aren’t actually overseeing, they’re rubber-stamping.
Track drift. Compare current performance against your deployment baseline. Are outputs shifting? Are certain groups being affected more over time? Data drift, model drift, population drift — they all create gaps between what the system was tested on and what it’s processing now.
Track complaints. When candidates challenge decisions — when they say “that doesn’t seem right” — log it. Not just the individual case. The patterns. If complaints cluster around specific demographics or specific types of decisions — that’s signal.
Layer 3: Periodic review — is it still what you think it is?
Monthly or quarterly — depending on volume and risk — step back and assess:
Has the system’s deployment context changed? Are you using it for decisions you didn’t originally scope? Have the hiring managers started relying on it for things the provider didn’t intend?
Are your input data distributions stable? Or has your applicant pool shifted — new geographies, new demographics, new career profiles the system wasn’t trained on?
Does the provider’s stated performance still match what you observe? If accuracy was 92% at deployment and it’s 78% now — that’s a problem no governance document will catch.
Re-test. Compare. Update your risk assessment based on what you’ve observed — not what you predicted.
Layer 4: When something goes wrong — react within the deadlines
Define — before it happens — what constitutes a serious incident in your deployment context. A systematic pattern of discriminatory outcomes affecting fundamental rights? That’s reportable under Article 73. A single incorrect screening decision? Probably not — unless it causes serious individual harm.
Build the detection mechanism. Build the escalation path. Know who reports, to whom, within what timeline (15 days from awareness — shorter for widespread harm or death).
And cooperate with your provider. Article 72 creates a feedback loop — the provider is supposed to be collecting post-market monitoring data from deployers. If your system is producing unexpected effects, the provider needs to know. Not just because the AI Act says so — because they may have data from other deployers showing the same pattern.
Do You Need a Third Party?
The EU AI Act doesn’t explicitly require deployers to hire external testers. But practically — for most deployers of high-risk systems — the answer is: probably yes. At some point.
Not because the law mandates it. Because most deployers lack three things:
Technical capability. Testing an AI system for bias, fairness, and real-world performance isn’t something you do with a spreadsheet. It requires statistical expertise, access to disaggregated outcome data, and tools for measuring disparate impact across protected groups. Most HR departments don’t have this.
Independence. Self-assessing whether your own system discriminates has obvious limitations. A market surveillance authority will give more weight to independent verification — the same way financial regulators give more weight to external audits.
Access. You’re a deployer. You don’t have access to the system’s internals — the training data, the model weights, the feature importance rankings. You can only test inputs and outputs. A third party engaged by the provider — or one with contractual access — can go deeper.
You might need external help when:
The system affects fundamental rights — hiring, credit, insurance, criminal justice
You’re seeing patterns you can’t explain internally
Your deployment context differs significantly from the provider’s assumptions
You want defensible evidence — not just for a regulator, but for a court
You don’t need it (yet) when:
The system is lower-risk category
You have internal data science capability to run bias analyses
The provider offers robust, verifiable performance data specific to your context
But here’s the thing: “I didn’t know” isn’t a defense under Recital 29. The system’s effect is your problem whether or not you measured it. The question isn’t whether to build the evidence. It’s whether you build it proactively — or a regulator builds it for you, after someone files a complaint.
The point is not to outsource accountability, the deployer still owns the system. The point is to create a defensible evidence record that someone independent of the build team can inspect, challenge, and explain.
What to Demand from Your Provider
Before spending on third-party testing, exhaust what you’re entitled to.
Article 72 requires providers to collect post-market monitoring data on the system’s real-world performance. Article 13 requires instructions for use that include performance metrics, known limitations, and conditions of use. Article 9 requires risk assessment covering foreseeable misuse.
Ask your provider:
What performance data have you collected from other deployers? What do the aggregated results show?
What known limitations exist for specific demographic groups or data distributions?
Have you conducted Article 60 testing in real-world conditions? Can you share the results?
What incidents have been reported by other deployers?
What populations and contexts were used for testing and validation?
What monitoring tools do you provide — or what data can you share to support our monitoring obligation?
If the provider’s answer to these questions is vague — “the system performs well” without disaggregated data, “no known issues” without evidence of looking — that’s a red flag. Not just about the system. About whether you can meet your own deployer obligations with what they’re giving you.
The Timeline — What’s Live and What Moved
This matters for the “effect” question more than you might think.
Already enforceable (since 2 February 2025):
All Article 5 prohibited practices — including every effect-based prohibition discussed above,
AI literacy (Article 4).
The effect-based provisions with the highest stakes — the outright bans — are live. Right now. If your system is producing prohibited effects today, you are already in violation.
Deferred (Digital Omnibus agreement, May 7, 2026 — provisional):
Annex III high-risk obligations (Article 26 deployer duties, Article 27 FRIA): deferred to 2 December 2027,
Annex I product-embedded high-risk obligations: deferred to 2 August 2028,
The monitoring obligations, the log retention, the formal evidence requirements — those got more runway. But “more runway” isn’t “irrelevant”. The obligation is clear. The deadline moved. The underlying requirement didn’t.
What Does It Mean for the Board Meeting?
It’s past 8pm now. The presentation is still on your screen. You can see slide 4, “Compliance Architecture”. It’s still good. The work is still real.
But you’re not looking at the presentation anymore. You’re looking at a blank document.
You know something now that you didn’t know two hours ago. The governance you built covers the intended purpose — and covers it well. But it doesn’t answer the question a regulator will ask if something goes wrong. Or the question a candidate will ask if they suspect the system treated them unfairly. That question:
Can you prove the system’s real-world effect on people matches what your documents say it’s supposed to do? And if it doesn’t — how would you even know?
You start typing. Not another policy. A monitoring plan.
What are we tracking? Shortlist rates by demographic — to catch disparate impact before someone else catches it for us. Override rates by recruiter — to know whether human oversight is real or rubber-stamping. Outcome correlation — does the system’s ranking actually predict job performance, or is it pattern-matching against historical biases the provider trained on? Complaint patterns. Drift from baseline.
How often are we reviewing? Monthly for the metrics. Quarterly for the full assessment.
Who reviews? Not the hiring managers who use the system daily — they’re too close. Someone with distance. Maybe external, if the stakes are high enough.
What triggers escalation? A disparity ratio above what threshold? Complaints from how many candidates in the same category? A drift of what magnitude before someone stops the system and asks why?
You write it down. One page. Then two. It’s rougher than the governance file. Less polished. Harder to present to a board. Because it doesn’t describe what the system is designed to do — it tracks whether reality matches the design.
Tomorrow, you’ll give the presentation. You’ll tell the board the compliance architecture is solid — because it is. But you’ll add a slide. Slide 12, maybe. “What we still need to build.” The monitoring. The evidence. The proof that the system’s effect on real people is what we say it is — not just today, but next month, and the month after.
The governance covers the purpose. The plan you’re writing now — at 8pm on a Thursday, because you happened to read one sentence in a recital — covers the effect.
You needed both all along. Now you know.





