What this part delivers, and why
Assess told us where the client starts. The Pipeline is where we actually build. Six stages take content from freeform pages to a structured, semantic system that people and machines can use — and the order matters: you cannot skip ahead to agents and delivery before the model underneath them exists. Every stage carries its proven framework and the AI layer that now sits on top of it; the run book below is how a delivery team executes each one without reinventing the approach.
- 1 · Plan — agent-assisted ideation, content-gap audits and governed briefs. Requires the model to exist first.
- 2 · Model — content modelling, taxonomy, metadata, schema, the knowledge graph. The highest-leverage stage — most of the value lives here.
- 3 · Structure / Author — modular, componentised content and semantic HTML; AI drafting and auto-tagging on top.
- 4 · Govern — what governance touches the build. Kept light here; the detail lives in the Governance run book (Part 4).
- 5 · Deliver — headless / composable / MACH, API-first, and agents as a delivery target via MCP.
- 6 · Optimize — the measurement loop. Kept light here; the detail lives in the Measurement run book.
The pipeline as a system
One build, mapped three ways. The flagship map shows the six stages as connected nodes with the gates that sit between them — click any stage or gate to see what it takes in, what it ships, who owns it, and the bar it has to clear. Then walk it step by step, or read it as a People-vs-Process swimlane below.
Plan
Decide what content earns its place and why — the editorial plan, the gaps, the briefs — and set up the AI-assisted planning loop. Critically, this stage depends on Stage 2: agents can't usefully audit gaps or plan against a taxonomy that doesn't exist yet, so in practice Plan and the first pass of Model run hand-in-hand.
- Recap the Assess findings & the chosen entry point (15m)
- Audience, journeys and the editorial priorities for the period (45m)
- Content-gap audit — run the agent across the estate, review at scale (45m)
- Brief generation — turn priorities into structured briefs (40m)
- Agree the prompt library standard — RACE: Role, Action, Context, Expectations (15m)
- Assess findings & entry point
- Existing editorial strategy (if any)
- Draft taxonomy from Stage 2's first pass
- Editorial plan for the period
- Prioritised content-gap list
- Structured content briefs
- Governed prompt library (v1)
The first thing a client wants to do with AI is generate content. The first thing you should make them do is govern the prompts. A prompt that lives in someone's chat history is a one-off; the same prompt version-controlled and shared is an asset the whole team compounds on. Make that switch in week one or you'll be untangling it in month four.
# RACE prompt · content-gap audit · v1 · owner: strategist ROLE: You are a content strategist auditing a structured estate. ACTION: Compare the editorial plan against the live taxonomy and list every topic/audience/funnel-stage with no covering asset. CONTEXT: Taxonomy = {topic_tree}. Editorial priorities = {priorities}. Content types and their metadata = {model_spec}. EXPECTATIONS: Return a prioritised table: gap, why it matters, suggested content type, owner. Cite the taxonomy term. Flag any gap that needs a new content type — do not invent one.
Model
This is the highest-leverage stage in the entire playbook. Define the content model, taxonomy, metadata and schema — and, where it earns its place, a knowledge graph / semantic layer. This is where content stops being pages and becomes structured, machine-readable data. Get this right and every stage after it gets easier; get it wrong and you'll feel the drag for years. We work through Cleve Gibbon's three passes: Conceptual (types + high-level relationships) → Design (attributes, refined relationships) → Implementation (CMS-level detail).
- Inventory the content the strategy actually needs (not what exists today) (30m)
- Draft the content types and their relationships — entities, not strings (75m)
- Pressure-test: does each proposed type earn its keep, or is it a variant of another? (45m)
- Sketch the taxonomy spine & the metadata that every type carries (30m)
- Content model spec — every type, its fields, types-of-field, required/optional, relationships
- Taxonomy & metadata schema — the controlled vocabularies, applied consistently across the estate
- Schema / semantic markup — schema.org on key types so engines and AI understand meaning
- Knowledge graph / semantic layer — where the client's domain is relationship-rich, model it as a graph (SKOS / RDF / OWL); turn "strings into things"
- Map the model into the chosen repository (headless CMS / CCMS) at implementation level
- Editorial plan & content types needed
- Audit findings (ROT, current structure)
- Repository / CMS decision
- Content model spec (the key artifact)
- Taxonomy & metadata schema
- Schema/semantic markup plan
- Knowledge graph / semantic layer (where warranted)
The single most common failure here isn't under-modelling, it's over-modelling. A team gets excited and ends up with forty content types nobody can hold in their head, half of which differ by a single field. Our rule: if two types share more than ~80% of their fields, they're one type with a variant flag. One client landed on fourteen content types — twelve earn their keep, and we keep threatening to kill the other two and never do, because someone always finds an edge case. Fourteen you can govern. Forty you cannot.
The opposite trap is just as real: under-modelling, where everything is a "page" or an "article" with a freeform body. That's the blob problem in a new outfit — it ships fast and quietly defeats the entire point of the engagement, because AI retrieval can't get a clean chunk out of a freeform field. If the client pushes to "just ship something simple and structure it later," that later never comes. Structure it now.
Knowledge-graph grounding has been benchmarked at roughly 3× the factual accuracy of a baseline LLM (the data.world benchmark moved accuracy from 16% to 54%). It costs more to build than plain retrieval, so reserve the graph for the relationship-rich parts of the domain — but where it fits, this is the work that makes the AI layer actually trustworthy. This is also the clearest professional-services wedge: the modular content architect who owns ontologies and metadata is the specialist the market now needs.
Structure / Author
Turn the model into how content is actually authored: modular, componentised content, real semantic HTML, single-sourcing and reuse — so one piece of content can be assembled across many channels. Then layer AI on top for first-draft generation, variant generation and autonomous tagging. The principle that keeps this honest: write for machines and you get better content for humans — explicit, unambiguous, well-chunked content serves both.
- Define content type components & the authoring patterns for each (the spec below)
- Establish semantic HTML standards — each
<h2>is an extractable answer unit, real lists and tables, no decorative markup - Set up single-sourcing & reuse (conref-style) so there's one source of truth, not copy-paste
- Migrate / re-author a representative slice into the new structure (prove the pattern before scaling)
- Wire in the AI layer — draft & variant generation, autonomous metadata enrichment and auto-tagging — with human review
- Content model spec & taxonomy
- Structured briefs from Stage 1
- Existing content to migrate
- Modular component spec / content type definitions
- Semantic authoring standards
- A migrated, structured content slice
- AI drafting + auto-tagging workflow
Authors don't resist structure because they're difficult — they resist it because the first structured-authoring interface they're handed feels like filling in a tax return. Spend real effort on the authoring experience: sensible field order, helper text, sane defaults. A model that's technically perfect and miserable to write into gets quietly worked around, and then you're back to blobs.
Govern
Governance is its own run book — see the Governance run book · Part 4 for the full layered control stack, oversight modes, disclosure and provenance. Here we only note what governance touches the build, so the pipeline doesn't ship something that governance later has to unpick.
- The governed prompt library started in Stage 1 — version-controlled, reusable, not scattered
- Machine-readable brand & editorial rules — so AI conforms by default, set up alongside the model in Stage 2
- Risk-tiering & oversight modes wired into the CMS in Stage 3 — Agent-assisted → Human-in-the-loop → Human-on-the-loop → Human-out-of-the-loop
- Provenance & disclosure plumbing (C2PA, metadata) on the delivery layer in Stage 5
- Least-privilege agent access & audit logs before anything in Stage 5 runs autonomously
- Governance run book (Part 4) outputs
- The model, components & delivery layer
- Governance hooks built into the system
- Oversight modes mapped to content types
Deliver
Make the content addressable as data — delivered to any channel through APIs, on a headless / composable / MACH architecture — and stand up the newest delivery target: agents. The contract for agents is MCP (Model Context Protocol): content lives behind a server, the agent reads the schema, retrieves and acts through a standard interface. Reuse compounds here — structured single-sourcing alone can cut translation/localisation cost by 30–50% across channels.
- Confirm the headless / composable architecture & the API surface for each channel
- Expose content as structured data via API — the model from Stage 2 becomes the contract
- Stand up an MCP server so external agents get governed, scoped, structured access
- Bake governance into the access layer — least-privilege agent identities, gateways, server vetting (MCP is a new attack surface)
- Verify omnichannel assembly — one source, many channels, no re-keying
- Structured content & component library
- Repository / headless CMS
- Governance access rules (Stage 4)
- Delivery / API & channel spec
- Live API-first delivery
- MCP server for agent access
The seductive mistake of 2026: building the agent and the MCP server before the model exists. A client will ask to "just put an MCP server in front of the current CMS." Don't. If the content underneath is unstructured, all you've done is give an agent a fast, governed pipe to your mess — it will retrieve blobs and hallucinate confidently. The MCP server is only as good as the model behind it. Stage 2 first, every time.
// content lives behind a server; the agent reads the schema and acts tool search_content(query, type?, taxonomy?) // scoped retrieval over the model tool get_entity(id) // returns structured fields, not a blob tool list_types() // the content model = the contract guardrails: identity: least-privilege, scoped per agent access: published-status only audit: every call logged · server vetted
Optimize
Close the loop: measure whether the engineered system is actually performing, and feed the gaps back into Model and Structure. Optimize is its own run book — see the Measurement run book for the full scorecard, sampling method and cadence. Here we only flag the two signature AI-era outputs and point to where they're detailed.
- AI Share of Voice — your brand's citations ÷ total category citations, sampled 30+ times per prompt across the engines that matter, because AI citations swing month to month. Where AI doesn't yet know you becomes the next thing to model.
- The "Can You Tell?" test — the honest quality bar: put engineered content head-to-head with human-written and see if a panel can spot the difference. Near a 50% guess rate, it's cleared the bar; if they can tell, the giveaways feed straight back to Structure and Model.
- Live, delivered content system
- Analytics + AI citation sampling
- Measurement loop feeding back to Stages 2–3
- (Full scorecard → Measurement run book)
RACI & effort summary
Who does what across the six stages. R Responsible · A Accountable · C Consulted · I Informed. The model stage is where the architect leads and most of the value concentrates.
| Stage | Sponsor | Content lead | Modular architect | Devs / Martech | Consultant |
|---|---|---|---|---|---|
| 1 · Plan | I | R | C | I | A |
| 2 · Model | I | C | R | C | A |
| 3 · Structure / Author | I | C | A | R | C |
| 4 · Govern | A | C | C | C | R |
| 5 · Deliver | I | I | C | R | A |
| 6 · Optimize | C | R | C | I | A |
| Month | Focus | Stages in flight |
|---|---|---|
| Month 1 | Plan & conceptual modelling — editorial plan, gaps, briefs, first content types | 1 · 2 |
| Month 2 | Model design — full content model spec, taxonomy & metadata schema; begin structuring | 2 · 3 |
| Month 3 | Model implementation & knowledge graph; component spec & semantic authoring standards | 2 · 3 · 4 |
| Month 4 | Migrate a structured slice; wire AI drafting/tagging; begin delivery architecture | 3 · 4 · 5 |
| Month 5 | API-first delivery live; stand up MCP server with governed access | 5 · 4 |
| Month 6 | Omnichannel verification; hand into Scale & the measurement loop | 5 · 6 |
A second read on the pipeline: the People lane is the human craft each stage demands; the Process lane is the system / artifact it produces. Hover a cell for the detail.
The artifacts you use and leave behind
Three core templates are spelled out below — the content model spec, the taxonomy & metadata schema, and the delivery / API & channel spec. The full set produced in this part is indexed at the end.
One row per content type — the blueprint
| Content type | Core fields (type) | Relationships | Metadata / taxonomy | Reuse |
|---|---|---|---|---|
| Article | title (text), summary (text), body (rich/modular), author (ref), hero (ref:image) | → author, → topic, → related articles | topic, audience, funnel stage, date | body chunks single-sourced |
| Product | name (text), spec (structured), price (number), description (modular) | → category, → related products, → docs | category, use case, region | spec reused across channels |
| FAQ / Q&A | question (text), answer (rich), entity (ref) | → product, → topic, → article | topic, intent, last-reviewed | answer = an AI answer unit |
| Author / person | name (text), bio (text), credentials (list) | → articles, → topics of expertise | expertise area | referenced, never copied |
Rule of thumb: if two proposed types share >80% of their fields, collapse them into one type with a variant flag. Capture required vs optional per field. Aim for the smallest set of types that covers the strategy — the team has to hold them in their head.
Controlled vocabularies — entities, not strings
- Topic taxonomy — the controlled subject tree (SKOS-style: broader/narrower/related), one term per concept, with synonyms mapped.
- Audience / persona — the fixed list of who content is for; no free-text variants.
- Funnel / journey stage — awareness → consideration → decision → retention (or the client's equivalent).
- Content format — the rendered shape (guide, FAQ, comparison, case study…), distinct from content type.
- Lifecycle metadata — owner, created, last-reviewed, review cadence, status, provenance (human/AI-assisted + approver).
- Entity references — products, people, locations as linked entities (RDF/OWL where a graph is warranted), so AI gets relationships not just keywords.
Every content type in Template 1 must declare which of these it carries, and which are required. Apply consistently across the whole estate — inconsistent tagging is the gap that defeats retrieval.
How content reaches every channel — and every agent
| Channel / target | Delivery method | What it pulls | Governance |
|---|---|---|---|
| Website / DXP | Headless API (REST/GraphQL) | Assembled components by type + taxonomy | Published-status only |
| Mobile / app | Same API, different assembly | Same source, channel-shaped variant | Published-status only |
| AI agents | MCP server | Schema + scoped retrieval over the model | Least-privilege, scoped identity, audit log |
| Localisation | API + translation memory | Single-source content → reuse 30–50% savings | Locale approver per market |
One source of truth, many delivery shapes. The MCP row is the 2026 addition: agents are a delivery target like any channel — but with their own access-control and audit requirements.
Entry & exit gates
The quality bar that says this part is genuinely ready to start, and genuinely finished. The exit gate is deliberately strict — a half-built model is worse than none.
A third view of the pipeline: stages stacked as a journey, with the leverage marker on the stage that earns the budget. Scroll to light each step.
- Assess complete — entry point & roadmap phase agreed
- Phase 2 scope & budget signed off; team named (architect, taxonomist, devs)
- Repository / CMS direction chosen (or scoped to decide in Stage 2)
- Editorial strategy clear enough to model against
- Content model spec + taxonomy & metadata schema complete and implemented
- Schema/semantic markup in place; knowledge graph where warranted
- A representative slice authored modularly & migrated — pattern proven
- Content addressable as data via API; MCP server with governed access where in scope
- Governance hooks wired in (oversight modes, provenance, prompt library)
- Measurement loop handed to the Optimize / Measurement run book