Document under review: _posts/2020-08-30-cxx-reg-access.md Date: 2026-02-24

A. C++ MMIO Register Access Libraries

Template-based approaches

  • Kormanyos, Real-Time C++: The book that inspired this post. Uses static template parameters for register address, value, and access type. Compile-time constant evaluation eliminates runtime overhead. The blog post’s critique — that Kormanyos’s approach is verbose and error-prone — is the motivating design decision.
    • https://github.com/ckormanyos/real-time-cpp
  • Kvasir: Template metaprogramming library for Cortex-M register access. Distinguishes itself with compile-time register field validation and atomic bit manipulation. Uses apply() to batch multiple field writes into a single register access, reducing read-modify-write cycles. Generates from SVD via its own tooling.
    • https://github.com/kvasir-io/Kvasir
  • cppreg (Sendyne): C++11 register access library. Defines register “packs” (contiguous MMIO regions) with type-safe field access. Enforces access policies (read-only, write-only, read-write) at compile time. Targets Cortex-M but platform-independent. No code generation — register definitions are written manually.
    • https://github.com/sendyne/cppreg
    • https://sendyne.com/cppreg/
  • AllThingsEmbedded: C++17 approach very similar to this blog post. Uses if constexpr for compile-time optimization of field access paths. Blog series walks through the design rationale. Demonstrates the same single-field vs multi-field optimization pattern.
    • https://allthingsembedded.com/post/2019-01-03-arm-cortex-m0-register-access/

HAL/framework approaches

  • modm (Modular Open Mobile Devices): Full platform HAL generated from SVD and vendor data for STM32 and SAM families. Register access is one layer of a larger framework including GPIO, DMA, UART, SPI, etc. Uses lbuild as the code generation engine rather than Jinja directly.
    • https://modm.io/
    • https://github.com/modm-io/modm
  • Genode OS MMIO framework: Mmio::Register_set base class with declarative bitfield definitions using struct Register : Mmio::Register<offset, width>. Used throughout the Genode microkernel for platform drivers. Different design point — an OS framework rather than a standalone library.
    • https://genode.org/documentation/developer-resources/index
  • Embedded Template Library (ETL): General-purpose C++ library for embedded systems. Includes some register access utilities but focuses more broadly on containers, algorithms, and utilities for resource-constrained environments.
    • https://www.etlcpp.com/

Comparison table

Library Standard Code Gen SVD Input Compile-Time Safety Access Optimization Scope
Kormanyos C++11/14 No No Address/value as template params Compile-time branching Register access only
Kvasir C++14 Yes (SVD) Yes Field validation, type checking Batched writes via apply() Register access + atomic ops
cppreg C++11 No No Access policy enforcement Single write optimization Register packs
modm C++20 Yes (SVD+) Yes Full type safety Platform-specific Full HAL
AllThingsEmbedded C++17 No No if constexpr branching Same pattern as blog post Register access only
Blog post C++17 Yes (SVD+Jinja) Yes if constexpr branching Single/multi-field optimization Register access only

The blog post’s combination of SVD-based Jinja code generation with C++17 if constexpr optimization is relatively unique. Most libraries either require manual register definitions (Kormanyos, cppreg, AllThingsEmbedded) or are part of a larger framework (modm, Genode).

B. SVD-Based Code Generation Landscape

C++ generators

  • modm/lbuild: The most mature C++ SVD-based generator. Processes SVD files plus vendor-specific data sheets to produce a complete HAL. Uses a Python-based build system (lbuild) rather than generic templates.
    • https://github.com/modm-io/modm
  • Kvasir tooling: Generates Kvasir-compatible register definitions from SVD. Tightly coupled to Kvasir’s type system.
    • https://github.com/kvasir-io/Kvasir
  • svdtools: Python library for modifying/patching SVD files before feeding them to generators. Addresses the common problem of vendor SVD files containing errors or omissions. Used by both Rust and C++ ecosystems.
    • https://github.com/rust-embedded/svdtools

Rust generators (the dominant ecosystem)

  • svd2rust: The de facto standard for SVD-to-code generation. Produces Peripheral Access Crate (PAC) definitions with ownership semantics and closure-based field writes. 1000+ generated PAC crates published on crates.io covering most ARM and RISC-V vendors.
    • https://github.com/rust-embedded/svd2rust
  • chiptool (Embassy): Alternative Rust SVD generator focused on async-first embedded. Generates for the Embassy HAL framework.
    • https://github.com/embassy-rs/chiptool

The blog post’s Jinja approach

The blog post’s use of generic Jinja2 templates for SVD code generation is architecturally distinctive:

  • Flexibility: Jinja templates can produce any output format — C, C++, Rust, documentation, test harnesses. The template is the product, not the generator.
  • Transparency: Templates are readable and modifiable without understanding a code generator’s internals.
  • Trade-off: Less sophisticated than purpose-built generators. No SVD patching, no cross-peripheral deduplication, no vendor-specific workarounds.

Most generators (modm, svd2rust, chiptool) embed the output format in procedural code. The template-driven approach is closer to how ARM’s own CMSIS tools work.

C. Zero-Cost Abstraction Evidence

The blog post’s proof

The disassembly listing showing template code compiling to movs, ldr, orr.w, str — identical to hand-written C — is the standard proof for zero-cost MMIO abstractions. This is the strongest argument for the C++ approach over C macros or bitfield structs.

Industry consensus

  • Kormanyos provides similar disassembly comparisons in Real-Time C++
  • cppreg documentation includes code size comparisons
  • Kvasir claims zero overhead with benchmarks
  • Rust PACs (svd2rust output) make the same claim with equivalent evidence

Caveats not discussed in the post

  • Debug builds (-O0): Zero-cost abstractions are only zero-cost with optimization enabled. At -O0, template instantiation produces significantly larger code with actual function calls. This is a known pain point — debugging optimized code is harder, but debug builds lose the “zero cost” property.
  • Link-time optimization (LTO): Some optimizations (cross-translation-unit inlining) require LTO. The post’s single-file example doesn’t encounter this, but real projects with registers accessed across files may.
  • Compiler differences: GCC, Clang, and ARM Compiler (armclang) can produce different output for the same template code. The post shows one compiler’s output.

D. C++20/23 Improvements Since the Post

The post targets C++17. Several newer language features are relevant:

Feature Standard Relevance
consteval C++20 Guarantees compile-time evaluation — stronger than constexpr for address/offset calculations
Concepts C++20 Could replace SFINAE for constraining register/field types (e.g. template<RegisterType R>)
std::bit_cast C++20 Type-safe reinterpretation — potential replacement for reinterpret_cast in some patterns
Volatile compound deprecation C++20 volatile compound assignment (|=, &=) deprecated — directly affects read-modify-write patterns
constexpr virtual C++20 Enables polymorphic register interfaces at compile time
static operator() C++23 Could simplify functor-based register access patterns

The volatile deprecation in C++20 is particularly relevant: the post’s read-modify-write pattern (reg_value = *reinterpret_cast<volatile r_datatype_t*>(...)) uses separate read and write through volatile pointers, which remains valid. But naive patterns like *reg |= mask are deprecated.

E. The Rust Comparison

svd2rust / PAC architecture

Rust’s svd2rust has become the benchmark for SVD-based register access. The architecture is parallel to the blog post’s:

Concept Blog post (C++) svd2rust (Rust)
Device definition FPGAIO_dev<BASE_ADDR> Peripherals::take().FPGAIO
Register access FPGAIO_i.LED.write(val) p.LED.write(\|w\| w.bits(val))
Field access FPGAIO_i.LED.LED0.write(1) p.LED.write(\|w\| w.led0().set_bit())
Base address Template parameter Singleton with take()
Code generation Jinja2 templates from SVD Procedural Rust from SVD
Safety model Compile-time optimization Ownership + closure-based writes

Key differences

  • Ownership: Rust’s Peripherals::take() returns Option<Peripherals> — only one caller gets the peripheral set. This prevents aliased register access at the type level. The C++ approach has no equivalent protection.
  • Closure-based writes: reg.write(|w| w.field1().bits(x).field2().set_bit()) batches multiple field writes into a single register write by construction. The C++ approach requires the developer to use the register-level write() for batching.
  • Enumerated values: svd2rust generates enum types for fields with defined values in SVD, providing exhaustive match checking. The blog post’s approach passes raw integers.

Why this matters

The Rust embedded ecosystem has grown significantly since the blog post was written (2020). svd2rust PACs exist for most ARM and RISC-V vendors. In discussions of MMIO register access, the Rust approach is now the primary point of comparison. The blog post doesn’t mention Rust, which was reasonable in 2020 but is a notable absence by current standards.

F. Alternative C/C++ Approaches

Bitfield structs

The traditional C approach — and what ARM CMSIS headers partially use:

typedef struct {
    uint32_t LED0 : 1;
    uint32_t LED1 : 1;
    uint32_t      : 30;
} LED_t;

Pros: Simple, familiar, good IDE support. Cons: Bitfield layout is implementation-defined (not portable), no control over access width, no read-modify-write optimization.

C11 _Generic

Type-generic macros can provide some of the same dispatch:

#define reg_write(reg, val) _Generic((reg), \
    volatile uint32_t*: write32, \
    volatile uint16_t*: write16)(reg, val)

Limited compared to C++ templates — no compile-time field optimization, no type-safe field access.

Volatile struct overlay (CMSIS pattern)

#define FPGAIO ((FPGAIO_Type *)0x40028000UL)
FPGAIO->LED = 0x01;

The most common pattern in production embedded C. Simple, debugger-friendly, well-understood. The blog post acknowledges this (“ARM do with CMSIS headers”) and positions the C++ approach as enabling “more optimizations, protect some register accesses, and even use custom instructions.”

G. Unique Differentiators

The blog post has several genuine differentiators worth preserving:

  1. Three-tier generated architecture (param/regs/dev): Separates constants, typed register classes, and device composition. Cleaner than monolithic generation.

  2. Template base address with honest tradeoff: The advantage (no stored pointer, pure load/store) and disadvantage (template propagation) are stated directly. Most MMIO library authors avoid discussing the template-propagation cost.

  3. Jinja-based generation: More accessible and modifiable than procedural generators. A developer can read the template and understand the output without learning a generator framework.

  4. if constexpr optimization paths: The three-way branch (single bit, single field, read-modify-write) is clearly motivated and well-documented in code.

  5. Disassembly proof: Showing the actual compiler output is the gold standard for zero-cost claims.

Summary

The blog post occupies a specific position: a C++17 template-based MMIO register access library with SVD code generation via Jinja templates. Its closest peers are AllThingsEmbedded (same if constexpr pattern, no code gen), cppreg (similar type safety, no SVD input), and Kvasir (SVD input, more complex type system). The Jinja-based generation approach is distinctive — most generators embed format in code rather than templates. The zero-cost abstraction proof via disassembly is solid, though the -O0 caveat and LTO considerations are unmentioned. The Rust svd2rust ecosystem has become the dominant comparison point for SVD-based register access since 2020, offering ownership semantics and closure-based writes that the C++ approach lacks. C++20/23 bring relevant improvements (concepts, consteval, volatile deprecation) that could inform a future update but don’t diminish the post’s C++17 contribution.