Design Notes
AST Traversal
During the AST traversal stage, the complete AST (generated by the clang frontend)
is walked beginning with the root TranslationUnitDecl
node. It is during this
stage that USRs (universal symbol references) are generated and hashed with SHA1
to form the 160 bit SymbolID
for an entity. With the exception of built-in types,
all entities referenced in the corpus will be traversed and be assigned a SymbolID
;
including those from the standard library. This is necessary to generate the
full interface for user-defined types.
Bitcode
AST traversal is performed in parallel on a per-translation-unit basis.
To maximize the size of the code base MrDocs is capable of processing, Info
types generated during traversal are serialized to a compressed bitcode representation.
Once AST traversal is complete for all translation units, the bitcode is deserialized
back into Info
types, and then merged to form the corpus. The merging step is necessar
as there may be multiple identical definitions of the same entity (e.g. for class types,
templates, inline functions, etc), as well as functions declared in one translation
unit & defined in another.
The Corpus
After AST traversal and Info
merging, the result is stored as a map of Info`s
indexed by their respective `SymbolID`s. Documentation generators may traverse
this structure by calling `Corpus::traverse
with a Corpus::Visitor
derived
visitor and the SymbolID
of the entity to visit (e.g. the global namespace).
Namespaces
Namespaces do not have a source location. This is because there can be many namespaces. We probably don’t want to store any javadocs for namespaces either.
Paths
The AST visitor and metadata all use forward slashes to represent file pathnames, even on Windows. This is so the generated reference documentation does not vary based on the platform.
Exceptions
Errors thrown by the program should always have type Exception
. Objects
of this type are capable of transporting an Error
object. This is important
for the scripting to work; exceptions are used to propagate errors from
library code to scripts and back to the invoking code. For exceptional cases,
these thrown exceptions should be uncaught. The tool installs an uncaught exception
handler that prints a stack trace and exits the process immediately.