Annex A — Compile output
markspec compile --output <dir> <paths...> writes the compiled traceability
graph to a directory of static files — the /api/ directory. This output is
the archival, published artifact: what CI produces, what downstream projects
federate against, and what auditors and rendering pipelines consume.
The compile output is designed to be served as static files (e.g., on GitHub Pages or GitLab Pages) and consumed by downstream tools without requiring a running MarkSpec process.
Generating the compile output
# Compile all Markdown and source files in docs/ and src/
markspec compile --output api/ docs/**/*.md src/**/*.rs
# Force streaming form (NDJSON) regardless of entry count
markspec compile --output api/ --split-threshold 0 docs/**/*.md
# Add a SQLite mirror for analytics consumers
markspec compile --output api/ --with-sqlite docs/**/*.md
Directory layout
<output-dir>/
├── manifest.json always present
├── compiled.json small projects (< 1 000 entries by default)
or
├── entries.ndjson large projects (≥ split-threshold)
├── entries.idx index for O(1) entry lookup by display ID
└── edges.ndjson trace edges (forward + generated inverses)
The threshold between the two forms is controlled by --split-threshold
(default: 1 000 entries). Both forms contain the same data — consumers should
check manifest.entries.format to determine which is present.
Manifest schema
manifest.json is always small enough to parse in full. It is the entry point
for all consumers — read it first, then follow its pointers to the entry and
edge data.
{
"markspecSchemaVersion": 1,
"generator": {
"name": "markspec",
"version": "0.5.0"
},
"project": {
"name": "my-project",
"version": "1.2.0"
},
"counts": {
"entries": 1234,
"edges": 456
},
"entries": {
"format": "ndjson",
"file": "entries.ndjson"
},
"edges": {
"format": "ndjson",
"file": "edges.ndjson"
},
"sqliteMirror": null,
"federation": [],
"reserved": {}
}
For small projects, entries.format is "inline" and entries.file is
"compiled.json".
| Field | Type | Notes |
|---|---|---|
markspecSchemaVersion | integer | Schema version; currently 1 |
generator.name | string | Always "markspec" |
generator.version | string | MarkSpec release version (informational only) |
project.name | string | From project.yaml |
project.version | string | From project.yaml |
counts.entries | integer | Total number of entries |
counts.edges | integer | Total number of edges (including generated inverses) |
entries.format | "ndjson" | "inline" | Which form is present |
entries.file | string | Relative path to the entry data |
edges.format | "ndjson" | "inline" | Which form is present |
edges.file | string | Relative path to the edge data |
sqliteMirror | string | null | Relative path to mirror.db, or null |
federation | array | Upstream registries (see Federation section) |
reserved | object | Reserved for future use; consumers must ignore |
Entry record
Each entry record appears as one JSON object — either as a line in
entries.ndjson (streaming form) or as a value in the entries map in
compiled.json (inline form).
{
"displayId": "SRS_BRK_0107",
"id": "01HGW2Q8MNP3RSTVWXYZABCDEF",
"shape": "Authored",
"type": "requirement",
"title": "Sensor debouncing",
"body": "The sensor driver shall debounce raw inputs to eliminate noise.",
"rawAttributes": [
{ "key": "Id", "value": "01HGW2Q8MNP3RSTVWXYZABCDEF" },
{ "key": "Type", "value": "requirement" },
{ "key": "Derived-from", "value": "SYS_BRK_0042" },
{ "key": "Labels", "value": "ASIL-B" }
],
"location": { "file": "docs/requirements.md", "line": 42, "column": 1 },
"properties": {
"file.path": "docs/requirements.md",
"file.mtime": "2026-05-19T07:00:00Z",
"git.sha": "a88ba34",
"git.author": "Alice <alice@example.com>"
}
}
| Field | Type | Notes |
|---|---|---|
displayId | string | Human-readable ID, e.g. SRS_BRK_0107 |
id | string | null | ULID or URI; null if no Id: trailer |
shape | "Authored" | "Reference" | Determined by Id: format |
type | string | null | Resolved type name; null if unresolved |
title | string | Entry title text |
body | string | Entry body text (trimmed) |
rawAttributes | {key, value}[] | All trailer attributes in source order |
location | {file, line, column} | Source file path, 1-based line and column |
properties | object | Observed facts (see Properties namespaces) |
Properties namespaces
The properties object is partitioned by namespace prefix. Only namespaces that
are available are populated — a property whose source data is absent is omitted
entirely rather than set to null.
| Namespace | Fields | Notes |
|---|---|---|
file.* | file.path, file.mtime, file.size | Always included |
git.* | git.sha, git.author, git.committer | Included when git history is available |
source.* | source.language, source.function | Included for in-source entries (doc comments) |
sync.* | (various) | Never included — privacy boundary (see Privacy rules) |
Edge record
Each edge record appears as one JSON object in edges.ndjson (streaming form)
or in the edges array in compiled.json (inline form).
{ "from": "SRS_BRK_0107", "to": "SYS_BRK_0042", "kind": "satisfies", "generated": false }
{ "from": "SYS_BRK_0042", "to": "SRS_BRK_0107", "kind": "satisfied-by", "generated": true }
| Field | Type | Notes |
|---|---|---|
from | string | Source display ID |
to | string | Target display ID |
kind | string | Relation name in lowercase-with-hyphens |
generated | boolean | true for inverse edges written by MarkSpec |
The kind field uses lowercase-with-hyphens form (satisfies, not
Satisfies). This matches the display ID convention used in entries.idx.
entries.idx
entries.idx is a JSON object mapping display ID to byte offset in
entries.ndjson. This allows O(1) random access to any entry without reading
the full NDJSON file:
{
"SRS_BRK_0107": 0,
"SRS_BRK_0108": 1847,
"SYS_BRK_0042": 3694
}
A consumer looking up SRS_BRK_0107 reads the offset (0), seeks to that
position in entries.ndjson, reads one line, and parses the JSON object.
Small-project form (inline)
For projects below the split-threshold, all data is in compiled.json:
{
"entries": {
"SRS_BRK_0107": {
"displayId": "SRS_BRK_0107",
"id": "01HGW2Q8MNP3RSTVWXYZABCDEF",
"shape": "Authored",
"type": "requirement",
"title": "Sensor debouncing",
"body": "The sensor driver shall debounce raw inputs to eliminate noise.",
"rawAttributes": [ ... ],
"location": { "file": "docs/requirements.md", "line": 42, "column": 1 },
"properties": { ... }
}
},
"edges": [
{ "from": "SRS_BRK_0107", "to": "SYS_BRK_0042", "kind": "satisfies", "generated": false }
]
}
The entries field is a map keyed by display ID. The edges field is a flat
array. Both forms carry identical data — the split is a performance optimization
for large projects, not a semantic distinction.
Privacy rules
The following rules govern what is and is not serialized in the compile output. The output is designed to be published world-readable; these rules exist to prevent sensitive data from leaking into the artifact.
sync.*properties are never serialized. They may contain external-system tokens, user IDs, session timestamps, or workspace paths that should not appear in a published artifact.git.contributorsis opt-in — it requires an explicit--with-contributorsflag. By default, onlygit.sha,git.author, andgit.committerare included, and only when git history is available.file.pathrecords the path as written in the compile command, which may be relative or absolute depending on the invocation. CI pipelines should use project-relative paths for reproducible output.
Schema versioning
markspecSchemaVersion is a monotonically increasing integer. The current
version is 1.
Compatibility rules:
- Consumers must reject output with a
markspecSchemaVersionhigher than they support. - Consumers must ignore unknown keys within any object. Schema evolution is additive-only — new fields are added, existing fields are never removed or renamed within a major version.
- Consumers must use
markspecSchemaVersion, notgenerator.version, for compatibility checks. The generator version is informational.
When a breaking change is needed, markspecSchemaVersion is incremented and a
migration guide is published.
Federation
manifest.federation lists upstream registries that this project federates
against. Downstream projects can resolve display IDs that refer to entries in an
upstream project’s compile output.
{
"federation": [
{
"id": "upstream-safety",
"url": "https://ci.example.com/safety-project/api/",
"markspecSchemaVersion": 1
}
]
}
Resolution works as follows:
- A display ID is not found in the local
entries.idx. - MarkSpec walks the federation list in order.
- For each federated entry, it fetches
<url>/manifest.jsonto confirm the schema version is compatible. - It then fetches
<url>/entries.idxand looks up the display ID. - If found, it fetches the specific byte range from
<url>/entries.ndjsonusing an HTTP Range request.
Federation is read-only and acyclic — the protocol is just static file fetches. There is no federation server. A federated project cannot modify the local compile output.
SQLite mirror
markspec compile --output <dir> --with-sqlite produces an additional
mirror.db file alongside the NDJSON files. This is the same data in SQLite
form, for analytics consumers (coverage dashboards, traceability explorers,
etc.) that prefer SQL queries over NDJSON.
manifest.sqliteMirror points to the mirror file when present:
{
"sqliteMirror": "mirror.db"
}
The SQLite mirror is never used as the LSP’s working index — the LSP maintains its own in-memory index rebuilt from source files. The mirror is an output artifact only.
Table schema (abbreviated):
CREATE TABLE entries (
display_id TEXT PRIMARY KEY,
id TEXT,
shape TEXT NOT NULL,
type TEXT,
title TEXT NOT NULL,
body TEXT NOT NULL,
file TEXT NOT NULL,
line INTEGER NOT NULL
);
CREATE TABLE edges (
from_id TEXT NOT NULL,
to_id TEXT NOT NULL,
kind TEXT NOT NULL,
generated INTEGER NOT NULL -- 0 or 1
);
CREATE TABLE properties (
display_id TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL
);