Ch. 16: From Specification to Implementation

“The problem of ensuring that AI systems do what we want them to do --- and not something else --- cannot be solved by good intentions. It requires getting the design right.” --- Stuart Russell, Human Compatible: Artificial Intelligence and the Problem of Control, Viking 2019

The Implementation Challenge: Why Architecture Alone Is Insufficient

The structure of this paper is unusual. It combines constitutional analysis with institutional design and technical architecture in a way that academic convention would normally separate into distinct publications. That combination is deliberate. It reflects the conviction that the digital provision of law is not a technical problem with constitutional side-effects, nor a constitutional problem with technical implications, but a single problem that must be addressed at both levels simultaneously if it is to be addressed at all.

But no architecture, however rigorous, changes public administration simply by being described. The pathologies identified at the outset of this paper (fragmentation, inconsistency, opacity, and the escalating cost of maintaining legal logic across disconnected systems) remain the practical condition of government until an alternative is implemented, adopted, and embedded at sufficient scale. The question for this chapter is therefore no longer whether the OLRF is conceptually sound. It is how a conceptually sound architecture can become operational public infrastructure¹.

That transition cannot be managed through a single leap from specification to universal deployment. The institutional diversity of public administration is too great, the installed base of legacy systems too entrenched, and the governance demands of machine-executable law too significant. For that reason, the OLRF adopts a deliberately graduated implementation strategy. It does not require every authority to arrive immediately at the most advanced form of adoption. It creates a structured path by which jurisdictions can begin with normative transparency, move toward operational integration, and only then extend into AI-assisted evaluation and cross-jurisdictional execution. This is not a compromise with ambition. It is the condition of making ambition durable.

Conformance Classes: A Graduated Path to Adoption

The OLRF specification defines three conformance classes, representing three levels of implementation completeness. Each is meaningful in its own right. Each builds on the previous. Each constitutes a genuine improvement on the pre-OLRF baseline even without advancement to the next level. Their purpose is not merely technical classification. They are the mechanism by which adoption becomes politically and institutionally tractable. An all-or-nothing model would leave most authorities where they are today, waiting for perfect readiness and therefore never beginning. A graduated model allows improvement to start immediately, and to deepen as institutional capacity develops².

Class A: Normative Transparency

Class A is the foundational level of conformance. It requires that a Decision Tree be published in the Registry with valid sub-normative linkage, a Coverage Map (including model assignments and agent certification requirements), and a cryptographic signature from the responsible authority. At this level, the executable representation of the norm need not yet be tied to a live evaluation engine or to AI-assisted fact preparation. What matters is that the operative logic of the norm becomes public, attributable, and legally arguable. In practical terms, Class A is the point at which the hidden implementation of law ceases to be merely private software and becomes a public legal artefact.

Its significance is greater than its modest technical demands might suggest. Class A already addresses the most pervasive failure of current automated governance, which is not always incorrect automation, but invisible automation. Once the Decision Tree is publicly registered, courts can inspect the executable specification, legislatures can examine the Coverage Map, and civil society can contest omissions, anchors, or classifications with precision rather than conjecture. An authority that reaches Class A has not yet rebuilt its operational stack. But it has already crossed the most important constitutional threshold: it has made the normative basis of automated decision-making publicly reviewable.

Class A is also the level at which the OLRF’s relationship to existing projects becomes most immediately productive. An authority that currently operates an OpenFisca-based benefit calculation, a RegelSpraak-based tax specification, or any other rule engine can achieve Class A without replacing its existing system. It publishes a conformant Decision Tree in the Registry alongside whatever software currently applies the norm (the Connector Pattern described in Chapter 17). The existing system continues to operate. What changes is that its normative basis is now publicly documented, sub-normatively anchored, and subject to democratic oversight. Class A is therefore the lowest-cost entry point for any jurisdiction, including those with substantial installed bases of existing automation.³

Class B: Operational Integration

Class B adds live operational use. At this level, the Registry-published Decision Tree is not merely a public specification. It becomes the norm actually used by a conformant evaluation engine. Citizen cases are processed through the OLRF interface, the evaluation is performed by the engine, and every determination produces a signed Decision Package. Full interface functionality, audit logging, and a legally adequate explanation capability are part of this level. Facts may still be assembled through conventional administrative workflows rather than AI-assisted processes, but the legal determination itself must already occur through the OLRF architecture.

The step from Class A to Class B is not incremental in effect, even if it is incremental in implementation. It is at Class B that the OLRF’s accountability claims become operational reality. The duty to give reasons is no longer satisfied, if at all, by retrospective reconstruction. It is fulfilled by design through the Decision Package. Judicial review is no longer aimed at a black-box outcome supported only by general system descriptions. It is anchored in a complete and signed record of the actual reasoning path. And systemic oversight gains a stable evidentiary object in the audit trail and the population of Decision Packages generated over time.

Under the three-model framework, Class B encompasses both Model A (deterministic evaluation) and the beginning of Model B (guided evaluation). An authority that operates Class B with Model A uses the evaluation engine for deterministic processing. An authority that operates Class B with Model B additionally integrates certified Legal Agents (Chapter 10) that perform subsumtion, with the evaluation engine validating their output against the Decision Tree. Class B with Model B requires that the agents hold valid domain-specific certifications, that the validation framework be published in the Registry, and that every deviation be classified and documented in the Decision Package⁴.

Class C: AI-Integrated Execution

Class C is the most advanced implementation level. Here, the OLRF’s full three-model architecture is operational: deterministic evaluation (Model A), guided evaluation with validated agent subsumtion (Model B), and autonomous legal reasoning with retrospective audit (Model C). Class C additionally integrates multi-agent coordination for complex administrative processes, cross-jurisdictional norm evaluation through the federated Registry network, and the complete agent certification lifecycle including monitoring, recertification, and revocation.

Precisely because it is the most ambitious level, Class C must not be treated as the starting point. It presupposes the institutional discipline learned through Class A and Class B. It also presupposes the security controls, audit architecture, certification infrastructure, and coordination arrangements required for AI-assisted public administration at scale. The virtue of the conformance model is that it does not ask authorities to improvise these capacities all at once. It allows them to accumulate them in the right sequence, so that by the time Class C is reached, the legal, technical, and organisational foundations on which it depends already exist.

Class C is also the level at which the sovereignty question (Chapter 14) becomes most acute. An authority operating at Class C with Model C has permitted autonomous AI agents to reason from its law. The certification system (Chapter 10) ensures that those agents are qualified. The retrospective audit ensures that their reasoning is assessed. But the strategic question remains: who provides the foundation models on which those agents are built, and what sovereignty implications follow from that dependency? Class C adoption at scale requires that this question be addressed at the institutional level, not merely at the technical level⁵.

The Implementation Sequence Phases

The roadmap from the current pre-OLRF baseline to widespread Class C implementation is organised in four phases. Each has specific objectives, deliverables, and success criteria. The phases are designed to be pursued in parallel across jurisdictions, with the outputs of each phase creating the conditions for the next, while ensuring that authorities at different stages of institutional readiness can enter the roadmap at the appropriate point without waiting for all participants to synchronise.

Phase 1: Foundation (Years 1 and 2) --- Specification, Tooling, and Pioneer Implementations

The foundational phase has three parallel workstreams. The first is the completion and stabilisation of the OLRF specification to version 1.0 through a structured multi-stakeholder review process (the standardisation strategy is described in Chapter 18). The version 1.0 process is not a technical exercise alone. It is a governance exercise: the process through which the communities that will rely on the specification (public authorities, courts, civil society organisations, and commercial implementers) establish their stake in its quality and their confidence in its durability.

The second workstream is the development of the foundational tooling required for Class A adoption: a Registry reference implementation, a Decision Tree authoring tool accessible to legal professionals without specialist technical training, a validation service that checks Decision Trees for structural correctness and sub-normative linkage completeness, and a Coverage Map generator that guides responsible authorities through the six classifications (including model assignment and agent certification requirement). These tools are not merely technical infrastructure. They are the institutional prerequisites for Class A adoption at scale, ensuring that the barrier to publishing a compliant Decision Tree is a question of legal analysis rather than technical capability⁶.

The third workstream is the identification and support of pioneer implementations: jurisdictions or authorities willing to implement Class A for specific norms, contributing real-world validation of the specification’s fitness for purpose and generating the evidence base that subsequent adopters require. Pioneer implementations should be selected to represent the full range of norm types (social welfare, tax, licensing, enforcement) and the full range of jurisdictional scale, from municipal authorities to national ministries to EU-level agencies. The learning from pioneer implementations feeds directly into the specification refinement process, ensuring that version 1.0 reflects operational experience rather than purely architectural reasoning.

The foundational phase also initiates the ecosystem connection strategy described in Chapter 17: establishing the Connector, Validator, and Certifier patterns through which existing Law-as-Code projects and agentic administration projects can dock into the OLRF infrastructure without abandoning their existing systems.

Phase 2: Consolidation (Years 2 and 3) --- Class B Adoption and Ecosystem Development

The consolidation phase builds on the stable version 1.0 specification and the pioneer implementations to drive Class B adoption across a significant population of automated governance systems. Its primary objective is the establishment of a functioning ecosystem of conformant evaluation engines (open-source reference implementations and commercial implementations) that authorities can deploy to move from Class A publication to Class B operational integration without building evaluation infrastructure from scratch.

This phase requires particular attention to the institutional arrangements for Decision Tree governance: the processes through which responsible authorities manage the lifecycle of their Decision Trees (authoring, review, pre-publication scrutiny, publication, monitoring, amendment, and revocation). These arrangements do not exist in most jurisdictions today, because the normative specifications of automated governance systems are not currently managed as formal legal publications. Creating them requires both institutional design work (defining roles, responsibilities, review processes, and publication procedures) and the development of governance tooling that makes the management of Decision Tree lifecycles tractable for authorities that do not have large technical teams⁷.

The consolidation phase also establishes the agent certification infrastructure (Chapter 10). The first certifying bodies are designated, the first test suites for Model B certification are published, and the first agents are certified for specific normative domains. The certification infrastructure is a critical-path item for Phase 3, because Class C adoption with Model B or C requires a functioning certification system.

Phase 3: Scaling (Years 3 through 5) --- Class C Integration and Network Effects

The scaling phase targets Class C adoption across a critical mass of high-volume automated governance systems: the tax administration, social welfare, and regulatory licensing systems that together account for the largest share of automated citizen-state interactions. Class C adoption at this scale requires the mature AI integration infrastructure that Class B adoption will have developed, the cross-jurisdictional coordination framework that Phase 2 will have established, and the commercial ecosystem of certified agents, evaluation engines, and workflow orchestration tools that the open standard model will have enabled.

Network effects are the central dynamic of the scaling phase. As the population of Class C implementations grows, the value of each additional implementation increases: more Decision Trees in the Registry mean more norms available for AI-assisted processing; more certified agents become available for deployment across normative domains; more cross-jurisdictional workflows become possible as the federated Registry network grows; and the evidentiary base for judicial review, legislative oversight, and civil society scrutiny of automated governance becomes correspondingly richer.

The scaling phase also sees the first systematic assessment of the OLRF’s impact on the pathologies identified in Chapter 1: fragmentation, inconsistency, opacity, and escalating costs. The composite audit trail and the Registry’s population of Coverage Maps provide the evidence base for this assessment⁸.

Phase 4: Normalisation (Year 5 and beyond) --- OLRF as Default Infrastructure

The normalisation phase is not a discrete project with a defined end date. It is the condition toward which the preceding phases are directed: the establishment of OLRF conformance as the default standard for automated governance systems across jurisdictions. The condition in which a new automated governance system that does not publish a Registry-compliant Decision Tree is the exception requiring justification, rather than the norm requiring no explanation.

Normalisation requires regulatory embedding: the incorporation of OLRF conformance requirements into the procurement rules, digital governance standards, and administrative procedure laws of adopting jurisdictions. Several existing regulatory instruments provide the vehicles for this embedding: the Interoperable Europe Act’s requirements for public sector digital interoperability, the EU AI Act’s technical documentation and human oversight requirements for high-risk AI systems, and national administrative procedure law reforms triggered by the growing deployment of automated administrative decision-making⁹.

Institutional Prerequisites: What Must Be in Place Before Class A

The roadmap assumes institutional prerequisites that are not currently in place in most jurisdictions and that require deliberate action to establish. Three prerequisites are critical-path items.

Legal authority for machine-executable publication is the foundational prerequisite. A responsible authority that publishes a Decision Tree in the Registry is performing a formal act with legal consequences. It is publishing, under its signature, an authoritative specification of how it will apply a statutory norm to individual cases. In most jurisdictions, the existing legal framework for delegated legislation and administrative rulemaking does not explicitly recognise machine-executable publication as a form of such specification, and does not provide a clear legal basis for the authority to publish Decision Trees with binding effect. Establishing this legal authority (whether through legislative amendment, through formal administrative guidance that interprets existing rulemaking authority as extending to machine-executable publication, or through the development of new administrative procedure provisions specifically addressing automated norm application) is the first and most fundamental prerequisite for OLRF adoption¹⁰.

Institutional capacity for Decision Tree authoring and governance is the second prerequisite. Publishing a conformant Decision Tree requires a combination of legal expertise (to perform the sub-normative analysis that produces the linkage structure) and technical expertise (to express the result of that analysis in the OLRF format with the precision the specification requires). Neither expertise alone is sufficient. The institutional model that best addresses this combination is a dedicated digital legislation unit: a team whose function is the translation of legislative intent into machine-executable form, equipped with both legal and technical expertise. The challenge of building this capacity is not hypothetical. Breidenbach’s two-decade programme at the Centre for Legislation and Digitalisation at the Europa-Universität Viadrina, developing tools for norm visualisation and automated norm implementation, represents the most sustained attempt in the German-speaking world to address exactly this institutional gap. The experience of that programme is directly relevant: the barrier to machine-executable norm specification is not primarily technical. It is the absence of tools that make the legal analysis of sub-normative structure tractable for legal professionals who are not software engineers¹¹.

Governance processes for the Decision Tree lifecycle are the third prerequisite. A Decision Tree is not a static artefact. It must be maintained over time: amended when the statutory norm it implements is changed by the legislature, updated when a court ruling identifies an error in its normative derivation, revised when operational experience reveals gaps or miscalibrations, and revoked when the norm it implements is repealed. Managing this lifecycle requires formal governance processes that determine who has authority to amend a published Decision Tree, what review and pre-publication scrutiny a proposed amendment must undergo, how the transition between versions is managed to avoid gaps in coverage, and how citizens, courts, and oversight bodies are notified of changes.

The Measurement Framework: How Success Is Defined

A roadmap without a measurement framework is a schedule without accountability. The OLRF’s implementation progress must be assessed against concrete criteria that can be verified independently, not merely counted (how many Decision Trees are in the Registry?) but evaluated for quality (are they correctly derived from the statutory text? are their Coverage Maps complete? are their Decision Packages legally adequate?).

The measurement framework operates at three levels. At the adoption level, it tracks the population of Class A, B, and C implementations across jurisdictions, measured as a proportion of total automated determination volume. At the quality level, it assesses the normative correctness of published Decision Trees through structured review processes (pre-publication scrutiny supplemented by post-publication audit sampling). At the impact level, it measures the constitutional outcomes that the OLRF is designed to produce: the rate of successful judicial challenges to automated determinations, the frequency of citizens reporting adequate understanding of decisions affecting them, the consistency of outcomes across comparable cases in different jurisdictions, and the speed with which legislative amendments are reflected in updated Decision Trees¹².

Five specific indicators provide the operational picture. The conformance certification rate: the proportion of deployed implementations that hold valid conformance certificates at the appropriate class. The vendor diversity index: the number of distinct commercial vendors providing certified implementations in each market segment. The pioneer implementation coverage rate: the proportion of high-volume automated governance systems in adopting jurisdictions that have achieved at least Class A conformance. The judicial utilisation rate: the proportion of legal challenges to automated administrative decisions in which the Decision Package and Registry are referenced in court proceedings. The amendment response time: the average time between a legislative amendment and the corresponding update to affected Decision Trees in the Registry.

These measures, tracked systematically across adopting jurisdictions and published in the ecosystem’s annual governance report, provide the evidence base for the claim that the OLRF delivers on its constitutional promise. They are also the feedback mechanism through which gaps in the specification, the tooling, or the conformance framework are identified and addressed.

The distance between the current governance environment and the OLRF’s normalisation phase is substantial. It will not be covered quickly, uniformly, or without significant institutional friction. But it is a distance that can be covered, because the architecture is sound, the constitutional case is compelling, the regulatory environment is becoming increasingly demanding of exactly the properties the OLRF provides, and the cost of remaining in the pre-OLRF baseline is rising with every expansion of automated governance into domains where its accountability deficits produce consequences that democratic institutions cannot indefinitely accept.

The observation that architectural quality is necessary but insufficient for institutional adoption is well documented in the standards literature. Hanseth, O. and Lyytinen, K., “Design Theory for Dynamic Complexity in Information Infrastructures: The Case of Building Internet”, Journal of Information Technology, Vol. 25, No. 1, 2010, pp. 1 ff., demonstrate that the adoption of information infrastructure depends not on the elegance of the design but on the alignment between the design’s entry points and the installed base of existing practice. The OLRF’s conformance class model is designed to address this alignment directly: Class A requires no replacement of existing systems, only the publication of a normative artefact alongside them. ↩
The graduated adoption model draws on the maturity model tradition in software engineering and public administration. See: Andersen, K. V. and Henriksen, H. Z., “E-Government Maturity Models: Extension of the Layne and Lee Model”, Government Information Quarterly, Vol. 23, No. 2, 2006, pp. 236 ff. The OLRF’s conformance classes differ from conventional maturity models in a constitutionally significant respect: each class is not merely a stage on the way to the next, but a constitutionally meaningful achievement in its own right. Class A alone (normative transparency) already satisfies the promulgation requirement that the pre-OLRF baseline fails to meet. ↩
The Connector Pattern is described in Chapter 17. For the specific compatibility between the OLRF Decision Tree format and the OpenFisca parameter structure: OpenFisca, “Country Package Documentation”, https://openfisca.org/doc/; for RegelSpraak: Corsius, M. et al., “RegelSpraak: a CNL for Executable Tax Rules Specification”, CNL 2020/21; for the general principle of format-neutral interoperability: W3C, “Data on the Web Best Practices”, 2017, Recommendation 14 (use standard data formats that enable interoperability without requiring format lock-in). ↩
The integration of the three-model framework into the conformance class structure means that Class B is not a single level but a range: from Class B with Model A only (deterministic evaluation, no agent involvement) to Class B with Model B (agent subsumtion with validation). This range is intentional. It allows authorities to adopt Class B at the level of AI integration that matches their institutional readiness and their constitutional comfort, without being forced into a binary choice between no AI involvement and full agent integration. ↩
The sovereignty implications of Class C with Model C are analysed in Chapter 14. For the specific challenge of foundation model dependency: Stanford HAI, “Artificial Intelligence Index Report 2025”, Chapter 1. ↩
The Decision Tree authoring tool is the implementation component that most directly determines whether Class A adoption is achievable at scale. See: Breidenbach, S., Was Gesetze sein könnten: Mit Methode zum guten Gesetz, C. H. Beck 2025, Kap. 3 (visualisation of decision logic as prerequisite for legal-professional accessibility). ↩
The institutional arrangements for Decision Tree governance are analogous to the lifecycle management processes for delegated legislation. In the German system: Bundesministerium der Justiz, Handbuch der Rechtsförmlichkeit, 3. Aufl., 2008 (procedural requirements for drafting, reviewing, publishing, and amending delegated instruments). The OLRF’s governance processes must achieve the same institutional functions for machine-executable specifications. ↩
The measurement of the OLRF’s impact on the pathologies identified in Chapter 1 follows the tradition of evidence-based policy evaluation. See: OECD, “Measuring Regulatory Performance”, OECD Reviews of Regulatory Reform, 2012; for the specific challenge of measuring the quality of automated governance: Yeung, K., “Algorithmic Regulation: A Critical Interrogation”, Regulation and Governance, Vol. 12, No. 4, 2018, pp. 505 ff. ↩
Regulation (EU) 2024/903 (Interoperable Europe Act), establishing requirements for digital interoperability of public sector systems across Member States; Regulation (EU) 2024/1689 (AI Act), Arts. 9, 13, 14, 15 (requirements for high-risk AI systems that the OLRF’s architecture is designed to satisfy by construction); for national administrative procedure law reform: §35a VwVfG (Germany); Prell, L., “§35a VwVfG und der ‘vollständig automatisierte Erlass eines Verwaltungsaktes’”, NVwZ 2018, S. 1255 ff. ↩
The legal authority question is not merely academic. In the German system, §35a VwVfG authorises fully automated administrative acts only where the applicable norm provides neither discretion nor assessment margins. The OLRF’s Models B and C operate beyond this boundary. A legislative basis for machine-executable publication under Models B and C does not currently exist in German administrative procedure law and would need to be created through amendment of the VwVfG or through sector-specific legislation. See: Berger, A., “Der automatisierte Verwaltungsakt”, NVwZ 2018, S. 1260 ff. ↩
Breidenbach, S., Was Gesetze sein könnten: Mit Methode zum guten Gesetz, C. H. Beck, München 2025; Breidenbach, S. and Rath, C., “Digitalisierung der Gesetzgebung und automatisierten Normumsetzung”, ZGDigital/BMJ, ongoing since 2010; Rulemapping Group, www.rulemapping.com (SPRIND-funded since 2024). For the broader institutional challenge: Nationaler Normenkontrollrat, “Monitor Digitale Verwaltung”, 2024, identifying the absence of digital legislation capacity as a structural bottleneck in German administrative modernisation. ↩
The measurement framework’s three levels (adoption, quality, impact) follow the evaluation methodology proposed in: OECD, “Measuring the Digital Transformation: A Roadmap for the Future”, OECD Publishing 2019; for the specific challenge of measuring constitutional outcomes of automated governance: Binns, R., “Algorithmic Accountability and Public Reason”, Philosophy and Technology, Vol. 31, 2018, pp. 543 ff. ↩