Ch. 3: The Principle of Separation — Why AI Must Not Decide the Law

“Who decides? Who decides who decides?”
Shoshana Zuboff, The Age of Surveillance Capitalism, PublicAffairs, New York 2019

The Irreplaceable Role of Human Judgment and Democratic Legitimacy

There is a question that every serious proposal for the automation of governance must answer, not as a technical question and not as a question of capability, but as a constitutional one: what is it that human beings must do, not because machines cannot do it, but because democracy requires that humans do it?

This is not a question about the current limitations of artificial intelligence. It is not an argument from technological immaturity, to be revisited when the models improve. It is an argument from constitutional principle, from the nature of democratic legitimacy and the irreducible requirements of the rule of law, that holds regardless of how capable AI systems become.

The argument begins with a fact about law that is easy to state and often underappreciated: law is not a description of the world. It is a prescription for it. When a legislature enacts a statute, it does not record a fact about how things are. It makes a choice about how things ought to be: about which interests should prevail when they conflict, which behaviours should be encouraged or prohibited, which distributions of burden and benefit are just. These choices are political in the deepest sense. They reflect values, they embody compromises, they express the collective will of a political community as mediated through its representative institutions.

The legitimacy of those choices derives entirely from the process through which they are made. A norm is law not because it is wise, or optimal, or technically well-specified, but because it was enacted by a body with the authority to bind the community, an authority that ultimately derives from the democratic mandate of the governed. Remove that mandate, and the rule is not law. It is merely coercion wearing law’s clothes¹.

This has a direct and non-negotiable consequence for the governance of AI: no AI system can be the source of a legal norm. It can support the application of one. It can prepare the factual basis for one. It can reason about one. But it cannot create one, authorise one, or substitute its judgment for one, because it has no democratic mandate, no political accountability, and no standing in the constitutional order that gives law its binding force. An AI system that produces a legal outcome on the basis of its own reasoning, without a verifiable connection to a norm enacted by a legitimate authority, is not administering the law. It is displacing it.

This is not a theoretical concern. It is already happening, in the form of systems that apply machine learning models to administrative decisions without a formally specified connection between the model’s outputs and the statutory norms those outputs are supposed to implement. The outputs may be correct. The process is constitutionally illegitimate, not because AI is untrustworthy, but because trustworthiness is not the standard. Democratic accountability is the standard. And a system whose decision pathway cannot be traced to the enacted will of a democratic legislature does not meet it, however accurate its predictions.

The irreplaceable role of human judgment in governance is not, therefore, primarily about the quality of human reasoning relative to machine reasoning. In many narrow domains, machine reasoning is already more consistent, more accurate, and more equitable than human reasoning. The irreplaceable role of human judgment is about something prior to quality: it is about legitimacy. The human official who makes an administrative decision does not do so merely as an individual. They do so as a representative of a constitutional order, exercising authority delegated through a chain of democratic accountability that runs from the legislature through the executive to the individual act of governance. That chain cannot be delegated to a machine, because machines cannot be democratically accountable. They can be technically reliable. They can be auditable. They can be accurate. But they cannot be answerable to the people in the way that democratic governance requires.

The Constitutional Line: Where It Runs and How It Can Be Drawn

The preceding section establishes what AI must not do. This section examines how that prohibition translates into architectural requirements.

The most important architectural distinction in the field of AI-enabled governance is the distinction between probabilistic and deterministic reasoning. It is a technical distinction with direct constitutional significance, and understanding it precisely is essential to understanding both the promise and the limits of every approach to Law as Code, including the one proposed in this paper.

A probabilistic system produces outputs that are likely to be correct for a given input, based on patterns learned from historical data. Its outputs are, in a rigorous statistical sense, informed estimates. Given the same input, a probabilistic system may, under certain conditions, produce different outputs. Its relationship to correctness is one of confidence levels and error rates, not of logical necessity. This is not a defect of current AI systems that future systems will overcome. It is a feature of the probabilistic approach to reasoning under uncertainty, an approach that is extraordinarily powerful precisely because it can handle the complexity, ambiguity, and incompleteness of real-world data.

A deterministic system produces the same output for the same input, every time, without exception. Its relationship to correctness is logical rather than statistical: given a precisely specified set of rules and a precisely specified set of inputs, the output is not probable. It is necessary. This is not a virtue of simplicity (deterministic systems can be enormously complex). It is a virtue of rule-governance: the output is the product of the rules, fully and without residue².

The constitutional significance of this distinction is direct. The rule of law requires that identical facts, evaluated under the same norms, produce the same legal outcome, regardless of who processes the case, which system implements the norm, or when the decision is made. This is the operational definition of equality before the law. And it is a requirement that only a deterministic system can guarantee in the strict sense.

A probabilistic system, however accurate on average, cannot guarantee that two citizens with identical facts will receive identical outcomes. It can guarantee that they will probably receive similar outcomes. But “probably similar” is not a constitutional standard. It is a statistical one.

Humans Are Not Deterministic Either. Humans are Complicated

At this point, an objection must be addressed that is both obvious and important: human decision-makers are not deterministic. Two officials in the same agency, applying the same provision to the same facts, regularly arrive at different outcomes. The legal system has developed mechanisms for managing this variability (appeal procedures, administrative guidelines, supervisory review, the self-binding effect of consistent practice under Article 3 of the German Basic Law). No one has ever demanded that human administration be perfectly deterministic. The demand has been that it be lawful, reasoned, and subject to review.

If that is the standard for humans, why should it not also be the standard for machines?

This is the strongest argument for what might be called the Legal Agent approach: AI systems equipped with legal reasoning capabilities sufficient to apply human-readable law directly, the way a trained legal professional would, without the intermediation of a formalised Decision Tree. Such systems would read the statute, identify the applicable provisions, subsume the facts, and produce a decision, just as a human official does. The approach is conceptually attractive. It eliminates the translation losses inherent in any formalisation. It scales naturally with legislative change. And it handles open texture and teleological reasoning as normal working conditions rather than as system boundaries requiring special treatment.

Given the current state of AI technology, the Legal Agent approach on its own cannot satisfy the following three constitutional requirements simultaneously.

First, verifiability. It is not yet possible to verify that a language model’s stated reasoning corresponds to its actual processing path. When a human official says “I applied § 40 VwVfG and weighed the following considerations,” we can generally take this as an account of what actually happened in their mind. When a language model produces the same statement, it may be an accurate description of its processing, or it may be a plausible post-hoc narrative that bears no reliable relationship to the computational pathway that produced the result. Until this distinction can be resolved technically, the right to reasons (as constitutional law understands it) cannot be guaranteed by a Legal Agent alone.

Second, consistency at scale. A human official who misinterprets a provision produces a wrong decision. The error affects one case, or a handful. It is correctable through appeal. An AI agent that carries a systematic interpretive error produces that error across thousands of cases per hour, at a speed that outpaces any review mechanism. A published Decision Tree makes the interpretive choice visible and correctable at the source. A Legal Agent’s interpretive tendency is embedded in its model weights and accessible to no one.

Third, auditability. A court reviewing a Decision Tree can examine it node by node, verify each derivation against the statutory text, and identify precisely where an error occurred. A court reviewing a Legal Agent’s determination can examine the output and the generated explanation, but cannot verify whether the explanation corresponds to the actual reasoning. The audit trail of a Decision Tree is structural. The audit trail of a Legal Agent is narrative.

These are technological limitations, not conceptual ones. If future AI systems achieve verifiable reasoning (reasoning whose pathway can be independently confirmed, not just narrated after the fact), the constitutional case for the Decision Tree as a necessary intermediary would weaken. Our draft OLRF proposal is designed with that possibility in mind.

Three Models of AI Participation in Legal Governance

The preceding analysis leads to a conclusion that is more nuanced than either “AI must never touch legal reasoning” or “AI can do whatever a human legal professional can do.” It leads to a spectrum of models, each appropriate to different types of legal decisions, different levels of AI capability, and different constitutional requirements. We now will describe three models out of this spectrum.

Model A: Deterministic Evaluation

The Decision Tree contains the complete normative logic. AI systems are strictly confined to fact-finding: reading documents, extracting data, validating inputs. The deterministic evaluation engine applies the norm. The AI has no normative function. This model provides the strongest guarantees of consistency, auditability, and equal treatment. It is appropriate for bound decisions with clearly defined conditions and deterministic outcomes (tax calculations, standard benefit determinations, fee assessments). It corresponds to the class of decisions that the German §35a VwVfG permits for full automation³.

Model B: Guided Evaluation

The Decision Tree defines the normative structure (conditions, consequences, exceptions, discretion points), but a qualified Legal Agent performs the subsumption in individual cases, using the statutory text as primary source and the Decision Tree as a normative framework for orientation and validation. The agent’s result is checked against the tree’s structure. Deviations are documented. This model preserves substantial auditability while allowing the agent to handle open texture, unstructured fact patterns, and context-dependent interpretation. It is appropriate for decisions involving structured discretion, assessment margins with established case law, and complex norms with many indeterminate legal concepts. It covers the large majority of real administrative work.

Model C: Autonomous Legal Reasoning

A highly qualified Legal Agent applies the statutory text directly. The Decision Tree is used retrospectively, as an audit and transparency instrument against which the agent’s determination is documented and checked for consistency. This model offers the fullest use of AI reasoning capability and eliminates translation losses entirely. It is not, however, currently deployable for binding legal decisions, because the three constitutional requirements identified above (verifiability, consistency at scale, auditability) cannot yet be satisfied. Model C is appropriate today for preparatory analyses, legal opinions, compliance assessments, and scenarios in which the result is subsequently validated by a human decision-maker or through Model A or B. It will become constitutionally viable for binding decisions when verifiable AI reasoning is technically achievable.

These three models are not competing alternatives. They are points on a spectrum. A single administrative process may use all three simultaneously: Model A for the income threshold calculation, Model B for the assessment of “special hardship,” and Model C for the preparatory research on relevant case law that the official reviews at a Discretion Point. What determines the appropriate model is not a general policy preference but the legal character of the specific decision element: whether it is bound, discretionary, or evaluative, and what level of verifiability the constitutional order requires for that type of decision.

What Is Constant Across All Three Models

The three models differ in how AI participates in legal governance. They do not differ in what they require from that participation. Five constitutional requirements apply across the entire spectrum:

1: normative traceability

Every automated determination, regardless of the model, must be traceable to the specific legislative text from which it derives. In Model A, this traceability is provided by the Decision Tree’s sub-normative linkage. In Model B, it is provided by the validation of the agent’s result against the tree’s anchor structure. In Model C, it is provided by the retrospective documentation of which statutory provisions the agent relied upon.

2: public verifiability

The normative basis of every determination must be publicly accessible and independently verifiable. No model permits the normative logic to be proprietary, commercially confidential, or accessible only to the operator.

3: democratic accountability

The authority to specify how a norm is applied must derive from, and be accountable to, the democratic process that enacted the norm. This applies the same way to the authority that publishes a Decision Tree (Models A and B) and to the authority that certifies a Legal Agent’s competence (Models B and C).

4: the right to reasons

Every person affected by an automated determination is entitled to an explanation, in legally meaningful terms, of why the determination was made. The explanation must be specific to their case, not a generic description of how the system works.

5: judicial reviewability

Every automated decision must be subject to judicial review on terms that allow to assess whether the norm was correctly applied, whether the facts were correctly established, and whether discretion (where applicable) was lawfully exercised. The quality of the material available for judicial review differs across models (it is richest in Model A, most limited in Model C), and this difference is itself a factor in determining which model is constitutionally appropriate for a given type of decision.

The Role AI Can and Should Play

The separation principle does not diminish the role of AI in governance. It defines it. And the role it defines is not marginal. It is transformative.

Across all three models, AI systems contribute capabilities that human administration cannot match. They process unstructured documents in multiple languages. They extract relevant facts from complex evidence. They query registers across jurisdictions. They identify patterns across millions of cases that suggest systematic errors. They generate explanations in plain language accessible to citizens without legal training. They monitor regulatory changes and flag cases requiring review under new norms. They assemble, validate, and organise the factual records on which legal determinations depend.

The bottleneck in administrative justice is not, typically, the application of norms. The bottleneck is everything that precedes and follows that application: the gathering, validation, and organisation of facts; the identification of applicable norms; the detection of relevant exceptions and edge cases; the production of comprehensible communications to citizens. These are precisely the tasks where AI systems add the most value⁴.

n Model B and (where constitutionally viable) Model C, AI goes further: it participates in the normative evaluation itself, applying legal reasoning to established facts within a framework that makes that reasoning verifiable, auditable, and challengeable. This is not a concession to convenience. It is a recognition that the complexity of modern legal systems exceeds what purely deterministic formalisation can capture, and that qualified AI reasoning, properly constrained and institutionally embedded, can bring capabilities to legal governance that neither pure automation nor pure human administration can achieve alone.

What remains non-negotiable is the framework within which this reasoning occurs. The normative basis must be public. The reasoning must be traceable. The result must be auditable. And the democratic authority of the legislature must remain the ultimate source of every norm that the system applies. These requirements are not relaxed as the model moves from A to C. They are satisfied by different mechanisms, but they are satisfied in every case.

The Alternative and Its Consequences

The alternative to the OLRF framework proposed in this paper is not a world without AI in governance. That world is already gone. The alternative is a world in which AI systems make consequential decisions about people’s lives on the basis of probabilistic reasoning over privately implemented, unverifiable approximations of legal norms, without a published normative specification, without a structured audit trail, and without any mechanism through which a citizen, a court, or a legislature can verify that the system’s outputs correspond to the law as democratically enacted.

That alternative is not a technically convenient simplification of governance. It is the structural erosion of the rule of law, conducted gradually, incrementally, and almost entirely without public awareness of what is being lost.

The principle of separation, as articulated in this chapter and operationalised through the three models described in this paper, is proposed as the constitutional foundation for a different path. It is demanding. It requires institutional commitment, technical investment, and sustained political will. But it is achievable. And the cost of not achieving it is measured not in euros or efficiency metrics, but in the democratic legitimacy of the state itself.

Böckenförde, Staat, Gesellschaft, Freiheit (1976) ↩
Sandra Wachter et al., “Why a Right to Explanation of Automated Decision-Making Does Not Exist in General Data Protection Regulation”, International Data Privacy Law 7(2), 2017 ↩
Braun Binder, N., “Vollständig automatisierter Erlass eines Verwaltungsaktes und Bekanntgabe über Behördenportale”, DÖV 2016, pp. 891 ff.; Braun Binder, N., “Weg frei für vollautomatisierte Verwaltungsverfahren in Deutschland”, Jusletter IT, 22 September 2016. Braun Binder’s analysis of the 2017 amendments to the AO, VwVfG, and SGB X provides the doctrinal foundation for the boundary that Model A operationalises architecturally: automation is constitutionally permissible only for bound decisions whose outcome follows deterministically from the application of legal conditions to established facts. The OLRF’s three-model framework generalises this boundary: Model A implements the §35a zone (deterministic evaluation, no normative AI function), while Models B and C extend beyond it and therefore require the additional constitutional safeguards (validation, audit, certification) that the OLRF specifies. ↩
Coglianese & Lehr, “Regulating by Robot”, Georgetown Law Journal 105, 2017 ↩