Extensions

Extensions let you transform the corpus of extracted symbols before any generator runs. A typical use case is rewriting metadata across many symbols at once: backfilling briefs from a naming convention, tagging symbols by group, or marking generated code as "see below" in the output.

Extensions are user-supplied scripts written in JavaScript or Lua. They run between extraction (turning C++ source into a corpus of symbols) and rendering (turning the corpus into output files), so any change they make is visible to every generator.

A worked example: brief from naming convention

Many codebases follow a convention like "any function whose name starts with is_ is a predicate returning true if its name (in plain English) holds." That convention encodes information the documentation could repeat verbatim, so writing the brief on every declaration is busy-work.

An extension walks the corpus, picks every function whose name starts with is_ and writes a brief synthesised from the rest of the identifier:

function transform_corpus(corpus)
    for _, sym in ipairs(corpus.symbols) do
        if sym.kind == "function"
           and sym.name:sub(1, 3) == "is_" then
            local subject = sym.name:sub(4):gsub("_", " ")
            mrdocs.set(sym._id, "doc", {
                brief = {
                    children = {
                        { kind = "text",
                          literal = "Returns true if "
                              .. subject .. "." }
                    }
                }
            })
        end
    end
end

Same shape, JavaScript:

function transform_corpus(corpus) {
    for (var i = 0; i < corpus.symbols.length; ++i) {
        var sym = corpus.symbols[i];
        if (sym.kind === "function" &&
            sym.name.startsWith("is_")) {
            var subject = sym.name.slice(3).replace(/_/g, " ");
            mrdocs.set(sym._id, "doc", {
                brief: {
                    children: [
                        { kind: "text",
                          literal: "Returns true if " + subject + "." }
                    ]
                }
            });
        }
    }
}

After this runs, every is_foo_bar function shows up in the generated docs with the brief "Returns true if foo bar." — no edits to the C++ source. The same shape generalises:

  • deprecation-by-suffix (every …​_legacy function gains a deprecated block);

  • "see-below" marking for symbols matching a glob;

  • tagging symbols by group via a synthetic field;

  • rewriting return types to advertise a concept rather than the concrete coroutine type they actually return (see the lua-set-return-type golden fixture for a working example).

File layout

Extension scripts live under <addons>/extensions/, with the .lua or .js extension. See Addons for where addon roots come from (addons, addons-supplemental) and how scripts across multiple roots are aggregated — scripts run in alphabetical order by full path, with the two languages interleaved.

The transform_corpus hook

A script extends Mr.Docs by exposing a function named transform_corpus(corpus). Mr.Docs calls it once with a flat read-only view of the corpus. The script inspects symbols and calls mutation functions on the pre-registered mrdocs object to apply changes.

A script that does not define transform_corpus is silently ignored, so an extension file can be empty during development without breaking the build.

// <addons>/extensions/rename.js
function transform_corpus(corpus) {
    for (var i = 0; i < corpus.symbols.length; ++i) {
        var sym = corpus.symbols[i];
        if (sym.kind === "function") {
            mrdocs.set(sym._id, "name", "renamed_" + sym.name);
        }
    }
}
-- <addons>/extensions/rename.lua
function transform_corpus(corpus)
    for _, sym in ipairs(corpus.symbols) do
        if sym.kind == "function" then
            mrdocs.set(sym._id, "name", "renamed_" .. sym.name)
        end
    end
end

The corpus argument

The corpus argument has a single field today:

  • corpus.symbols — an array containing every symbol Mr.Docs extracted.

That is the entire shape of corpus. There is no corpus.namespaces, no corpus.config, no corpus.lookup; scripts that need such queries walk corpus.symbols and filter.

Each entry in corpus.symbols is the same lazy DOM view that Mr.Docs’s built-in Handlebars generators see, with every described member of the underlying symbol type. The fields you’ll reach for most often are _id (the flat base16 string you pass back to mrdocs.set to identify the symbol to act on), kind, and name; deeper navigation (id, doc, loc, params, bases, …​) works exactly as in templates — in particular symbol.id is the recursive Symbol object, identical to what templates see, so symbol.id.name reads the same way in either context. For the full set, see the Handlebars/templates documentation.

In JavaScript, iterate with corpus.symbols.length and corpus.symbols[i], or with for (var s of corpus.symbols).

In Lua, corpus.symbols behaves like a regular Lua sequence: it is 1-indexed, #corpus.symbols is its length, and ipairs and pairs work as expected.

The mrdocs API

Mutations go through the pre-registered mrdocs global.

mrdocs.set(symbol_id, field, value)

Assign one allowlisted field of a symbol. The function dispatches through reflection, but the set of fields scripts may write is intentionally curated: extensions cannot, for example, change a symbol’s kind, re-parent it, or rewrite its structural collections, because doing so would break invariants the rest of the corpus relies on.

  • symbol_id: the _id string read from a symbol in corpus.symbols.

  • field: one of the allowlisted names below (camelCase, matching the read view).

  • value: the new value. Supported value types are strings, booleans, enumerator names (kebab-case strings), null (to clear an optional field), arrays (the array replaces the existing vector<T> field wholesale — there is no in-place edit of individual elements), objects (assigned key-wise to a described struct field), and objects with a kind selector for a polymorphic base (the kind picks the concrete derived class registered through MRDOCS_DESCRIBE_KINDS, and remaining keys are forwarded to that class).

The currently allowlisted fields are listed below. This table is generated at build time from src/lib/Extensions/AllowedFields.json, which also feeds the runtime allowlist consumed by mrdocs.set, so the table and the gate cannot drift.

Field Type Description

name

string

The unqualified symbol name.

extraction

enum

Extraction mode - one of regular, see-below, implementation-defined, dependency.

isCopyFromInherited

bool

Whether the symbol was generated by base-member inheritance.

loc

struct

Source location information.

doc

optional struct

The full doc-comment tree. Pass null to clear, or a partial object to overwrite individual fields (brief, returns, params, …​). Brief text is rewritten by passing { brief: { children: [{ kind: "text", literal: "…​" }] } } — literal is the DOM key for text inlines; see the Handlebars reference for the rest of the shape.

returnType

polymorphic Type

A function’s return type. The kind selector picks a concrete Type variant; remaining keys are forwarded to that variant. TypeKind is the one polymorphic base whose kind values come from toString(TypeKind) (e.g., lvalue-reference) rather than from the kebab-case of the enumerator (e.g., l-value-reference as seen in the XML writer).

The setter validates its arguments and raises an error on misuse: unknown symbol id, field not on the allowlist, type mismatch (for example, a non-string passed to name), an enumerator name that does not exist on the field’s enum, or a kind tag that does not name a derived class registered for the polymorphic base. An uncaught error inside an extension aborts the build with the script’s path and the error message.

The allowlist grows as concrete use cases come up. The type machinery covers strings, booleans, described enums, Optional<T>, vector<T>, described structs, and Polymorphic<T> for any base whose hierarchy was registered with MRDOCS_DESCRIBE_KINDS.

Lifecycle

Extensions run between corpus finalization and the first generator invocation. The order is:

  1. Mr.Docs walks the source files and extracts a corpus of symbols.

  2. Built-in finalizers post-process the corpus (for example, sorting members and resolving inheritance).

  3. Extensions run, in alphabetical order by full path.

  4. The selected generator renders the (possibly mutated) corpus.

Because step 3 happens before step 4, an extension that mutates a symbol is visible to every output format, not just one.

Invariants and operations

The extension surface today is deliberately narrow. This section spells out what that means: which corpus properties the rest of Mr.Docs depends on, what scripts can do today without violating them, and where the boundary will move next.

Corpus invariants

The corpus passed to extensions has already been finalized. The finalizers and the generators that read the corpus afterwards rely on a handful of structural properties:

  • Every SymbolID is unique and identifies exactly one symbol. Lookups (cross-references, @ref, derived-class lists, …​) resolve through these IDs and must find a live symbol on the other end.

  • A symbol’s kind is fixed (a FunctionSymbol stays a function, a RecordSymbol stays a record). Generators dispatch on kind to pick the right partial; finalizers (overload merging, base-class inheritance, …​) assume their kind-typed inputs.

  • Parent/child links are consistent in both directions. If symbol X lists Y as a member of one of its tranches, Y.Parent must point at X; if Y.Parent is X, then X must list Y.

  • Structural collections are coherent with the records they describe. The RecordInterface of a class lists the same members the class actually has; the Specializations and DeductionGuides back-pointers populated by finalizers reference real symbols of the right kind.

Breaking any of these is what the docs mean by "corpus invariant violation". A generator hitting an inconsistent corpus does not necessarily crash; it might silently render the wrong cross-link or omit a member. The intent of the extension surface is to make those classes of bug unreachable from a script.

Within-symbol guarantees (what’s safe today)

mrdocs.set is the entire write surface, and it guards every invariant above:

  • Symbol kind never changes. kind is not in the allowlist; attempting to write it fails with "field is not user-settable".

  • Symbol identity never changes. id is not in the allowlist; the symbol you address by _id is always the one you mutate.

  • Parent/child links and structural collections never change. Parent, the RecordInterface, base and derived lists, member tranches, Specializations, DeductionGuides, and similar fields are all off the allowlist by design.

  • Kind never escapes its hierarchy on polymorphic writes. The Polymorphic<T> write path requires a kind: selector that names a derived class registered for the polymorphic base; passing an unrelated kind is a clean error, not an unsafe cast.

  • Off-shape writes are rejected. Setting a string field with a boolean, an enum with an unknown enumerator name, or a struct field with an unknown sub-field all return errors before any mutation happens.

The fields that are writable today are leaf, presentation-layer properties: a symbol’s name, extraction, doc-comment tree, source location, return type, and the "inherited from base" flag. None of them can break a finalizer invariant.

Useful patterns today

Even with that narrow surface, the things scripts can do today cover a lot of practical ground:

  • Backfill documentation by convention. Rewrite doc.brief (or doc.params, doc.returns, …​) for every symbol matching a pattern — see the "worked example" above.

  • Rename for presentation. Change a symbol’s display name without renaming it in the source. Useful for cleaning up internal-prefixed identifiers in the rendered docs.

  • Hide symbols from output. Set extraction to dependency or implementation-defined to drop a symbol from the regular output (or push it into a "see-below" section, depending on the generator).

  • Tag symbols by group. Stamp a uniform extraction value on a set of symbols matching some predicate, then let the generator’s partials use that tag to lay them out.

  • Rewrite return types to expose a concept rather than the concrete coroutine/transport type. The lua-set-return-type fixture shows this end to end.

What scripts cannot do today (and why):

  • Add symbols. New symbols would need a fresh SymbolID, a real parent, a place in the parent’s tranches, and possibly cross-links back from other symbols. Each of those is invariant-bearing and there is no syntax for declaring them from a script yet.

  • Remove symbols. Outright removal would leave dangling cross-references (every base list, derived list, Specializations, @ref and so on that targets the removed symbol). The closest thing today is making extraction non-regular, which keeps the symbol in the corpus but suppresses it from the rendered output.

  • Merge symbols. Re-ID, re-parent, re-link — not yet expressible.

  • Move symbols across parents. Same as merge: it would require rewriting structural collections on both sides.

  • Change kinds. A symbol cannot be turned from a function into a variable; the rest of the pipeline assumes the kind it had at finalization.

Per-operation outlook

The table below records the current status and the rough plan for each structural operation. None of the "later" entries are part of this release.

Operation Status today Why not yet / how it might land

Mutate writable field

Supported via mrdocs.set

This is the entire current surface.

Add a custom data tag

Not yet

A separate "tags" bag on each symbol — written by scripts, never read by the C++ core — is the lowest-risk extension since it carries no invariant. Templates would read it through the same DOM they already use. This is the next planned addition.

Hide a symbol

Supported via extraction = non-regular

Works today; no new mechanism needed.

Remove a symbol

Not supported

Requires sweeping every cross-reference to it (base lists, derived lists, Specializations, Overloads, …​). The corpus would have to be re-finalized post-script to re-validate; not yet wired.

Add a symbol

Not supported

Needs an ID-allocation API, a parent slot, and a kind-checked builder. The custom-data-bag pattern covers many of the cases where users currently ask for "add a symbol" without the structural risk.

Merge symbols

Not supported

Strict superset of add+remove. Would require re-ID, re-link, and re-finalization.

Change a symbol’s kind

Not supported

The kind is encoded in the C++ type of the symbol; scripts cannot reach across kinds. The standard answer is "create a new symbol of the right kind", which is the unsupported case above.

Create a new C++-side data type

Not supported by extensions

For example, niebloid support would today be added on the C++ side. The extension layer is not a metaprogramming layer over the corpus shape; it is a curated mutation surface over an already-extracted shape.

The shape of the trade-off the project is sitting at today: no structural changes from scripts. That is the strict end of the spectrum. The other end — "anything goes, re-validate after" — needs a post-extension finalization phase that re-establishes every invariant; that machinery does not exist yet. The custom-data-bag addition is the natural next step because it loosens the surface without weakening any invariant.

Enabling and disabling extensions

There is no per-script enable/disable flag in the configuration today. Any .lua or .js file found under an addon’s extensions/ directory runs unconditionally on every Mr.Docs invocation that includes that addon root.

If you want a script to stop running, the practical options today are:

  • Move the file out of the extensions/ directory.

  • Rename it so it no longer matches .lua / .js.

  • Use a different addons-supplemental list in the configuration for the build where you don’t want the script to run.

A finer-grained enable/disable mechanism (a config key listing which extensions to load, or a script-side opt-out) is on the same roadmap as the registration-based extension API; see issue #1210.

Stability

The script surface and the C++ types it reflects evolve together. The contract scripts can rely on:

  • New C++ fields appear in the script DOM automatically once they are described with MRDOCS_DESCRIBE_STRUCT. Scripts that only read them gain visibility on the next Mr.Docs release without any script-side change.

  • New writable fields require an allowlist addition. The mrdocs.set allowlist is curated by Mr.Docs maintainers; a C++ field becoming writable is a deliberate decision, not an automatic consequence of being described. The list grows as concrete use cases come up.

  • New mrdocs. functions are additive.* A future mrdocs.transformFoo(…​) does not break scripts that don’t call it.

  • Breakage only on allowlisted rename or removal. Renaming a C++ field that is in the allowlist (e.g., returnTyperesult) would break scripts that wrote to it. The allowlist gives Mr.Docs a precise list of fields to keep stable or alias when refactoring.

In other words: read access tracks the C++ types automatically; write access only changes by deliberate maintainer decision.

Design rationale

The shape of this extension API is intentional. Three choices stand out and have alternatives worth naming.

One gated setter, not many helpers

mrdocs.set is a single reflection-driven function dispatched by field name and gated by an allowlist. Four shapes were considered:

  1. (A) Domain-specific helpers — mrdocs.rename, mrdocs.deprecate, one function per intent. Stable and readable, but every helper is hand-written, has to be kept in sync with the C++ types, and the surface grows linearly with use cases.

  2. *(B) Stable script-side schema independent of C names.* A parallel schema where script-facing field names are decoupled from C identifiers. Scripts don’t break when C members are renamed; the cost is an extra schema to author and maintain alongside the C types, plus a translation layer.

  3. © Direct DOM manipulation. Mutable corpus, scripts assign fields directly via sym.field = value. The most "natural" shape for script authors, but there is no allowlist gate — nothing stops a script from breaking corpus invariants by changing a symbol’s kind, re-parenting it, or mutating structural collections.

  4. (D) Reflection-driven generic setter — what this ships. One function (mrdocs.set(id, field, value)); dispatch by field name; an allowlist controls reachable fields; reflective sub-dispatch handles nested and polymorphic values. The surface stays in sync with the C types for free, since the contract is "the normalized C member name is the script API name."

(D) was picked because it auto-tracks C changes (read side stays in sync without code), keeps the write surface narrow (allowlist), and doesn't require a parallel schema (cost of (B)) or hand-written domain helpers (cost of (A)). The trade-off is real: renaming an allowlisted C member would break scripts — aliases can mitigate that when the time comes (see also the Stability section above).

Reserved function name, not registration

A script announces itself by defining a function named transform_corpus. Mr.Docs calls it on every loaded extension. The classic alternative is registration: the script body calls into Mr.Docs to attach handlers (Darktable’s Lua plugins, GDB’s Python API, LLVM passes, …​).

The reserved-name pattern is the simplest thing that works for the current use cases. Its limitation is that an extension can only do one thing: there is no obvious place to attach helpers, secondary callbacks, or per-symbol hooks without inventing additional reserved names.

We’re shipping the reserved-name pattern deliberately, and tracking the larger design question — when (and how) to climb to a registration-based API — in issue #1210.

Asymmetry with the C++ side

The read view scripts see (corpus.symbols[i].name, symbol.doc.brief, …​) is the same DOM the Handlebars generators already render from. It is not custom for scripts. Only the write surface (mrdocs.set) is new, and it is deliberately narrow: dynamically typed, sandboxed, and routed through a single gated function rather than allowing direct assignment.

The asymmetry is by design: C++ wants strong types and refactor safety; scripts want a small, well-defined surface that survives refactors of the underlying types.