Run Book · Part 2 — The Pipeline

Overview

What this part delivers, and why

Assess told us where the client starts. The Pipeline is where we actually build. Six stages take content from freeform pages to a structured, semantic system that people and machines can use — and the order matters: you cannot skip ahead to agents and delivery before the model underneath them exists. Every stage carries its proven framework and the AI layer that now sits on top of it; the run book below is how a delivery team executes each one without reinventing the approach.

The six stages at a glance

1 · Plan — agent-assisted ideation, content-gap audits and governed briefs. Requires the model to exist first.
2 · Model — content modelling, taxonomy, metadata, schema, the knowledge graph. The highest-leverage stage — most of the value lives here.
3 · Structure / Author — modular, componentised content and semantic HTML; AI drafting and auto-tagging on top.
4 · Govern — what governance touches the build. Kept light here; the detail lives in the Governance run book (Part 4).
5 · Deliver — headless / composable / MACH, API-first, and agents as a delivery target via MCP.
6 · Optimize — the measurement loop. Kept light here; the detail lives in the Measurement run book.

Everything downstream is paid for upstream. Spend the budget on the model, and Plan, Structure and Deliver almost build themselves; starve it, and you pay for the shortcut at every stage that follows.

the line that frames the whole pipeline · worth repeating at kickoff

0×

factual accuracy from knowledge-graph grounding vs a baseline LLM

data.world benchmark

translation / localisation cost cut by structured single-sourcing

reuse across channels · up to

stages turn freeform pages into an AI-ready content system

Plan → Optimize

The pipeline, mapped

The pipeline as a system

One build, mapped three ways. The flagship map shows the six stages as connected nodes with the gates that sit between them — click any stage or gate to see what it takes in, what it ships, who owns it, and the bar it has to clear. Then walk it step by step, or read it as a People-vs-Process swimlane below.

Flagship · interactive pipeline map

click a stage ◆ or gate ◇

Stage 1 / 6

Plan

Month 1 · early Build

Timebox · ~2–3 weeksLead: Content strategist + ConsultantFormat: planning workshop + async

Objective

Decide what content earns its place and why — the editorial plan, the gaps, the briefs — and set up the AI-assisted planning loop. Critically, this stage depends on Stage 2: agents can't usefully audit gaps or plan against a taxonomy that doesn't exist yet, so in practice Plan and the first pass of Model run hand-in-hand.

Planning workshop agenda (half day)

Recap the Assess findings & the chosen entry point (15m)
Audience, journeys and the editorial priorities for the period (45m)
Content-gap audit — run the agent across the estate, review at scale (45m)
Brief generation — turn priorities into structured briefs (40m)
Agree the prompt library standard — RACE: Role, Action, Context, Expectations (15m)

Inputs → outputs

Inputs

Assess findings & entry point
Existing editorial strategy (if any)
Draft taxonomy from Stage 2's first pass

Outputs

Editorial plan for the period
Prioritised content-gap list
Structured content briefs
Governed prompt library (v1)

◆ From the field

The first thing a client wants to do with AI is generate content. The first thing you should make them do is govern the prompts. A prompt that lives in someone's chat history is a one-off; the same prompt version-controlled and shared is an asset the whole team compounds on. Make that switch in week one or you'll be untangling it in month four.

Governed prompt — RACE skeleton (copy & version it)

# RACE prompt · content-gap audit · v1 · owner: strategist
ROLE:        You are a content strategist auditing a structured estate.
ACTION:      Compare the editorial plan against the live taxonomy and
             list every topic/audience/funnel-stage with no covering asset.
CONTEXT:     Taxonomy = {topic_tree}. Editorial priorities = {priorities}.
             Content types and their metadata = {model_spec}.
EXPECTATIONS: Return a prioritised table: gap, why it matters,
             suggested content type, owner. Cite the taxonomy term.
             Flag any gap that needs a new content type — do not invent one.

Model

Months 1–3 · the core of Build

Timebox · ~6–10 weeksLead: Modular content architectWith: taxonomist, content engineerFormat: modelling workshops + iteration

Objective

This is the highest-leverage stage in the entire playbook. Define the content model, taxonomy, metadata and schema — and, where it earns its place, a knowledge graph / semantic layer. This is where content stops being pages and becomes structured, machine-readable data. Get this right and every stage after it gets easier; get it wrong and you'll feel the drag for years. We work through Cleve Gibbon's three passes: Conceptual (types + high-level relationships) → Design (attributes, refined relationships) → Implementation (CMS-level detail).

Conceptual modelling workshop (half day)

Inventory the content the strategy actually needs (not what exists today) (30m)
Draft the content types and their relationships — entities, not strings (75m)
Pressure-test: does each proposed type earn its keep, or is it a variant of another? (45m)
Sketch the taxonomy spine & the metadata that every type carries (30m)

Design & implementation activities

Content model spec — every type, its fields, types-of-field, required/optional, relationships
Taxonomy & metadata schema — the controlled vocabularies, applied consistently across the estate
Schema / semantic markup — schema.org on key types so engines and AI understand meaning
Knowledge graph / semantic layer — where the client's domain is relationship-rich, model it as a graph (SKOS / RDF / OWL); turn "strings into things"
Map the model into the chosen repository (headless CMS / CCMS) at implementation level

Inputs → outputs

Inputs

Editorial plan & content types needed
Audit findings (ROT, current structure)
Repository / CMS decision

Outputs

Content model spec (the key artifact)
Taxonomy & metadata schema
Schema/semantic markup plan
Knowledge graph / semantic layer (where warranted)

◆ From the field

The single most common failure here isn't under-modelling, it's over-modelling. A team gets excited and ends up with forty content types nobody can hold in their head, half of which differ by a single field. Our rule: if two types share more than ~80% of their fields, they're one type with a variant flag. One client landed on fourteen content types — twelve earn their keep, and we keep threatening to kill the other two and never do, because someone always finds an edge case. Fourteen you can govern. Forty you cannot.

▲ Watch out

The opposite trap is just as real: under-modelling, where everything is a "page" or an "article" with a freeform body. That's the blob problem in a new outfit — it ships fast and quietly defeats the entire point of the engagement, because AI retrieval can't get a clean chunk out of a freeform field. If the client pushes to "just ship something simple and structure it later," that later never comes. Structure it now.

Why this stage pays off

Knowledge-graph grounding has been benchmarked at roughly 3× the factual accuracy of a baseline LLM (the data.world benchmark moved accuracy from 16% to 54%). It costs more to build than plain retrieval, so reserve the graph for the relationship-rich parts of the domain — but where it fits, this is the work that makes the AI layer actually trustworthy. This is also the clearest professional-services wedge: the modular content architect who owns ontologies and metadata is the specialist the market now needs.

“The more structure you have, the less hallucination you will have.”

Rahel Bailie · Content Integrity Model · the case for spending most of the budget here

Structure / Author

Months 2–4 · overlaps Model

Timebox · ~4–6 weeksLead: Content engineer + content teamFormat: authoring patterns + migration

Objective

Turn the model into how content is actually authored: modular, componentised content, real semantic HTML, single-sourcing and reuse — so one piece of content can be assembled across many channels. Then layer AI on top for first-draft generation, variant generation and autonomous tagging. The principle that keeps this honest: write for machines and you get better content for humans — explicit, unambiguous, well-chunked content serves both.

Activities

Define content type components & the authoring patterns for each (the spec below)
Establish semantic HTML standards — each <h2> is an extractable answer unit, real lists and tables, no decorative markup
Set up single-sourcing & reuse (conref-style) so there's one source of truth, not copy-paste
Migrate / re-author a representative slice into the new structure (prove the pattern before scaling)
Wire in the AI layer — draft & variant generation, autonomous metadata enrichment and auto-tagging — with human review

Inputs → outputs

Inputs

Content model spec & taxonomy
Structured briefs from Stage 1
Existing content to migrate

Outputs

Modular component spec / content type definitions
Semantic authoring standards
A migrated, structured content slice
AI drafting + auto-tagging workflow

◆ From the field

Authors don't resist structure because they're difficult — they resist it because the first structured-authoring interface they're handed feels like filling in a tax return. Spend real effort on the authoring experience: sensible field order, helper text, sane defaults. A model that's technically perfect and miserable to write into gets quietly worked around, and then you're back to blobs.

Govern

Threaded through Build

Timebox · light here · see Part 4Lead: Consultant + client owner

Objective

Governance is its own run book — see the Governance run book · Part 4 for the full layered control stack, oversight modes, disclosure and provenance. Here we only note what governance touches the build, so the pipeline doesn't ship something that governance later has to unpick.

What governance touches in the pipeline

The governed prompt library started in Stage 1 — version-controlled, reusable, not scattered
Machine-readable brand & editorial rules — so AI conforms by default, set up alongside the model in Stage 2
Risk-tiering & oversight modes wired into the CMS in Stage 3 — Agent-assisted → Human-in-the-loop → Human-on-the-loop → Human-out-of-the-loop
Provenance & disclosure plumbing (C2PA, metadata) on the delivery layer in Stage 5
Least-privilege agent access & audit logs before anything in Stage 5 runs autonomously

Inputs → outputs

Inputs

Governance run book (Part 4) outputs
The model, components & delivery layer

Outputs

Governance hooks built into the system
Oversight modes mapped to content types

Deliver

Months 4–6 · Build into Scale

Timebox · ~4–6 weeksLead: Content engineer + devsFormat: architecture + integration

Objective

Make the content addressable as data — delivered to any channel through APIs, on a headless / composable / MACH architecture — and stand up the newest delivery target: agents. The contract for agents is MCP (Model Context Protocol): content lives behind a server, the agent reads the schema, retrieves and acts through a standard interface. Reuse compounds here — structured single-sourcing alone can cut translation/localisation cost by 30–50% across channels.

Activities

Confirm the headless / composable architecture & the API surface for each channel
Expose content as structured data via API — the model from Stage 2 becomes the contract
Stand up an MCP server so external agents get governed, scoped, structured access
Bake governance into the access layer — least-privilege agent identities, gateways, server vetting (MCP is a new attack surface)
Verify omnichannel assembly — one source, many channels, no re-keying

Inputs → outputs

Inputs

Structured content & component library
Repository / headless CMS
Governance access rules (Stage 4)

Outputs

Delivery / API & channel spec
Live API-first delivery
MCP server for agent access

▲ Watch out

The seductive mistake of 2026: building the agent and the MCP server before the model exists. A client will ask to "just put an MCP server in front of the current CMS." Don't. If the content underneath is unstructured, all you've done is give an agent a fast, governed pipe to your mess — it will retrieve blobs and hallucinate confidently. The MCP server is only as good as the model behind it. Stage 2 first, every time.

MCP tool surface over the model (illustrative — copy to adapt)

// content lives behind a server; the agent reads the schema and acts
tool search_content(query, type?, taxonomy?) // scoped retrieval over the model
tool get_entity(id)                       // returns structured fields, not a blob
tool list_types()                        // the content model = the contract

guardrails:
  identity: least-privilege, scoped per agent
  access:   published-status only
  audit:    every call logged · server vetted

Optimize

Phase 3 · Scale, ongoing

Timebox · light here · see Measurement run bookLead: Consultant + client

Objective

Close the loop: measure whether the engineered system is actually performing, and feed the gaps back into Model and Structure. Optimize is its own run book — see the Measurement run book for the full scorecard, sampling method and cadence. Here we only flag the two signature AI-era outputs and point to where they're detailed.

The signature outputs (detailed in the Measurement run book)

AI Share of Voice — your brand's citations ÷ total category citations, sampled 30+ times per prompt across the engines that matter, because AI citations swing month to month. Where AI doesn't yet know you becomes the next thing to model.
The "Can You Tell?" test — the honest quality bar: put engineered content head-to-head with human-written and see if a panel can spot the difference. Near a 50% guess rate, it's cleared the bar; if they can tell, the giveaways feed straight back to Structure and Model.

Inputs → outputs

Inputs

Live, delivered content system
Analytics + AI citation sampling

Outputs

Measurement loop feeding back to Stages 2–3
(Full scorecard → Measurement run book)

Roles & effort

RACI & effort summary

Who does what across the six stages. R Responsible · A Accountable · C Consulted · I Informed. The model stage is where the architect leads and most of the value concentrates.

Stage	Sponsor	Content lead	Modular architect	Devs / Martech	Consultant
1 · Plan	I	R	C	I	A
2 · Model	I	C	R	C	A
3 · Structure / Author	I	C	A	R	C
4 · Govern	A	C	C	C	R
5 · Deliver	I	I	C	R	A
6 · Optimize	C	R	C	I	A

Month	Focus	Stages in flight
Month 1	Plan & conceptual modelling — editorial plan, gaps, briefs, first content types	1 · 2
Month 2	Model design — full content model spec, taxonomy & metadata schema; begin structuring	2 · 3
Month 3	Model implementation & knowledge graph; component spec & semantic authoring standards	2 · 3 · 4
Month 4	Migrate a structured slice; wire AI drafting/tagging; begin delivery architecture	3 · 4 · 5
Month 5	API-first delivery live; stand up MCP server with governed access	5 · 4
Month 6	Omnichannel verification; hand into Scale & the measurement loop	5 · 6

The same six stages, split by People vs Process

A second read on the pipeline: the People lane is the human craft each stage demands; the Process lane is the system / artifact it produces. Hover a cell for the detail.

Lane

1 Plan

2 Model

3 Structure

4 Govern

5 Deliver

6 Optimize

People

Strategist runs the workshop

Architect leads modelling

Engineer + authors

Consultant + owner

Devs + martech

Consultant + client

Process

Editorial plan + governed prompts

Model spec + taxonomy + graph

Components + semantic HTML

Hooks + oversight modes

API-first + MCP server

Feedback loop → Stages 2–3

Templates & worksheets

The artifacts you use and leave behind

Three core templates are spelled out below — the content model spec, the taxonomy & metadata schema, and the delivery / API & channel spec. The full set produced in this part is indexed at the end.

Template 1 · Content model spec (Stage 2)

One row per content type — the blueprint

Content type	Core fields (type)	Relationships	Metadata / taxonomy	Reuse
Article	title (text), summary (text), body (rich/modular), author (ref), hero (ref:image)	→ author, → topic, → related articles	topic, audience, funnel stage, date	body chunks single-sourced
Product	name (text), spec (structured), price (number), description (modular)	→ category, → related products, → docs	category, use case, region	spec reused across channels
FAQ / Q&A	question (text), answer (rich), entity (ref)	→ product, → topic, → article	topic, intent, last-reviewed	answer = an AI answer unit
Author / person	name (text), bio (text), credentials (list)	→ articles, → topics of expertise	expertise area	referenced, never copied

Rule of thumb: if two proposed types share >80% of their fields, collapse them into one type with a variant flag. Capture required vs optional per field. Aim for the smallest set of types that covers the strategy — the team has to hold them in their head.

Template 2 · Taxonomy & metadata schema (Stage 2)

Controlled vocabularies — entities, not strings

Topic taxonomy — the controlled subject tree (SKOS-style: broader/narrower/related), one term per concept, with synonyms mapped.
Audience / persona — the fixed list of who content is for; no free-text variants.
Funnel / journey stage — awareness → consideration → decision → retention (or the client's equivalent).
Content format — the rendered shape (guide, FAQ, comparison, case study…), distinct from content type.
Lifecycle metadata — owner, created, last-reviewed, review cadence, status, provenance (human/AI-assisted + approver).
Entity references — products, people, locations as linked entities (RDF/OWL where a graph is warranted), so AI gets relationships not just keywords.

Every content type in Template 1 must declare which of these it carries, and which are required. Apply consistently across the whole estate — inconsistent tagging is the gap that defeats retrieval.

Template 3 · Delivery / API & channel spec (Stage 5)

How content reaches every channel — and every agent

Channel / target	Delivery method	What it pulls	Governance
Website / DXP	Headless API (REST/GraphQL)	Assembled components by type + taxonomy	Published-status only
Mobile / app	Same API, different assembly	Same source, channel-shaped variant	Published-status only
AI agents	MCP server	Schema + scoped retrieval over the model	Least-privilege, scoped identity, audit log
Localisation	API + translation memory	Single-source content → reuse 30–50% savings	Locale approver per market

One source of truth, many delivery shapes. The MCP row is the 2026 addition: agents are a delivery target like any channel — but with their own access-control and audit requirements.

Full template index for this part

Editorial plan — what to make this period, and why

Content-gap list — prioritised gaps from the agent audit

Content brief — the structured brief per piece

Governed prompt library — RACE-structured, version-controlled prompts

Content model spec — types, fields, relationships (above)

Taxonomy & metadata schema — controlled vocabularies (above)

Schema / markup plan — schema.org on key types

Knowledge graph map — entities & relationships (where warranted)

Modular component spec — content type definitions for authoring

Semantic authoring standards — semantic HTML & reuse rules

Delivery / API & channel spec — channels + MCP (above)

Governance hooks checklist — what the build wires in (→ Part 4)

Done criteria

Entry & exit gates

The quality bar that says this part is genuinely ready to start, and genuinely finished. The exit gate is deliberately strict — a half-built model is worse than none.

The path to done — a vertical recap

A third view of the pipeline: stages stacked as a journey, with the leverage marker on the stage that earns the budget. Scroll to light each step.

Plan

Month 1 · ~2–3 weeks · strategist + consultant

Agent-assisted ideation, content-gap audits and governed briefs. Runs hand-in-hand with Model's first pass.

Gate → editorial plan + prompt library

Model

Months 1–3 · ~6–10 weeks · modular architect

Content model, taxonomy, metadata, schema and the knowledge graph. Pages become structured, machine-readable data.

★ Highest leverage — spend the budget hereGate → model spec complete

Structure / Author

Months 2–4 · ~4–6 weeks · content engineer

Modular, componentised content and semantic HTML; AI drafting and auto-tagging on top, with human review.

Gate → slice migrated, pattern proven

Govern

Threaded through Build · consultant + owner

What governance touches the build — oversight modes, provenance, the prompt library. Full detail in Part 4.

Gate → hooks wired in

Deliver

Months 4–6 · ~4–6 weeks · engineer + devs

Headless / composable / MACH, API-first delivery, and agents as a delivery target via an MCP server.

Gate → content addressable as data

Optimize

Phase 3 · Scale · ongoing · consultant + client

The measurement loop — AI Share of Voice and the "Can You Tell?" test feed gaps back into Model and Structure.

Gate → loop handed to Measurement

Before you start (entry)

Assess complete — entry point & roadmap phase agreed
Phase 2 scope & budget signed off; team named (architect, taxonomist, devs)
Repository / CMS direction chosen (or scoped to decide in Stage 2)
Editorial strategy clear enough to model against

Before you finish (exit)

Content model spec + taxonomy & metadata schema complete and implemented
Schema/semantic markup in place; knowledge graph where warranted
A representative slice authored modularly & migrated — pattern proven
Content addressable as data via API; MCP server with governed access where in scope
Governance hooks wired in (oversight modes, provenance, prompt library)
Measurement loop handed to the Optimize / Measurement run book

algomarketing

Run Book · Part 2 · The Pipeline (v0.1) · the core build of the engagement — Plan → Model → Structure → Govern → Deliver → Optimize. Governance detail in Part 4; measurement detail in the Measurement run book. ← back to the playbook hub