What this part delivers, and why
Most content teams measure only output — how much they published. This run book sets up something better: a scorecard that measures the whole system — the foundations that make content work (leading indicators we can move this quarter) and the payoff they produce (lagging indicators the business cares about). We choose the right few metrics, capture a baseline, wire up the tooling, and hand over a live dashboard and a cadence that survives after we leave.
- 1 · Choose the metric set — pick the right few across five categories; don't measure everything.
- 2 · Baseline the current state — capture starting numbers so improvement is provable.
- 3 · Instrument the tooling — analytics, automated quality scoring, AI Share-of-Voice sampling, and the "Can You Tell?" test.
- 4 · Build the dashboard & cadence — what's reviewed weekly / monthly / quarterly, and who sees it.
- 5 · Tie to ROI & the business case — connect content to pipeline, not just clicks.
- Efficiency & throughput (leading) — time-to-publish, production velocity, reuse rate, revision cycles, cost per asset.
- Quality & trust (leading) — automated quality score, accessibility, factual accuracy, brand-voice conformance, the "Can You Tell?" pass rate.
- Audience & business outcomes (lagging) — engagement, conversion, pipeline influenced, organic + AI-referred traffic, content ROI.
- AI visibility (lagging) — AI Share of Voice (entity & citation), AI referral traffic, citations per engine, presence on cited sources.
- Capability & maturity (leading) — the Part 1 maturity score re-run quarterly, % of content AI-ready, governance compliance, content-debt trend.
A worked example of the dashboard this run book stands up — leading indicators (teal) feeding lagging outcomes (purple). Numbers are illustrative.
Leading indicators are the foundations you can move this quarter; lagging indicators are the business payoff that follows. The scorecard tracks both so you can act before the lagging number is already late.
Every number on the scorecard traces back to an activity the team controls. Read top-down: what we do produces what we ship, which moves what the business cares about.
Choose the metric set
Pick the right handful of metrics across the five categories — enough to measure the system, few enough that the client will actually maintain them. Tie each chosen metric to a decision someone makes, so nothing is collected for its own sake.
- Recap the five categories and why leading + lagging together (15m)
- For each category, shortlist 2–4 candidate metrics against client goals (50m)
- Pressure-test each: can we source it? does it drive a decision? (25m)
- Cut the list — agree the minimal viable scorecard (20m)
- Assign an owner and a target to each survivor (10m)
- Client goals & the Part 1 findings
- The five-category metric menu
- What the board already asks about
- Agreed metric set (the minimal scorecard)
- Owner + target per metric
The single most useful thing you do in this part is kill vanity metrics. "Pageviews" and "assets published" feel like progress and steer nothing. Ask of every candidate: who changes what they do when this number moves? If nobody, strike it — a five-metric scorecard people read beats a thirty-metric one nobody opens.
Baseline the current state
Capture a starting number for every chosen metric — and write down exactly how it was measured — so that any improvement later is provable rather than asserted. No baseline, no proof.
- Pull a current value for each metric from its agreed source
- Record the measurement method beside each — so the next reading is comparable
- Take the first AI Share-of-Voice reading as a baseline (sampled, not single-shot)
- Run a first "Can You Tell?" panel to set the quality starting point
- Note any metric you can't baseline yet — that gap is a Step 3 task
- Agreed metric set (Step 1)
- Analytics & CMS access
- Baseline snapshot (value + method per metric)
- List of metrics not yet measurable
Instrument the tooling
Make every chosen metric collectible on a repeatable schedule without manual heroics — analytics tagged, automated quality scoring running, the AI Share-of-Voice sampling set up, and the "Can You Tell?" test ready to re-run.
- Analytics — confirm tagging captures engagement, conversion and AI-referred traffic; segment AI referrers
- Automated quality scoring — wire clarity / consistency / tone / compliance checks into the pipeline so quality is scored, not eyeballed
- AI Share-of-Voice sampling — set up the prompt set, the 30+ samples per prompt across ChatGPT, Google AI, Perplexity and Copilot, and capture entity vs citation results
- "Can You Tell?" test — stand up the swipe panel, the snippet pool, and the scoring sheet so it re-runs each cycle
- Document the run schedule and owner for each instrument
- Metric set + baseline gaps
- Analytics, CMS & SoV tooling access
- Instrumented, repeatable data sources
- SoV sampling protocol live
- "Can You Tell?" panel ready to re-run
Never treat a single AI Share-of-Voice reading as signal. AI answers vary run to run — the research puts month-to-month swing around 40–60% — so one query tells you nothing. Sample each prompt 30+ times across the engines and average it before anyone draws a conclusion, or you'll report noise as a trend.
Whatever can't be collected on a schedule won't survive past the engagement. If a metric needs someone to hand-pull a spreadsheet every Friday, it dies the first busy week. Automate it or cut it — a slightly cruder number that arrives every cycle beats a perfect one that stops.
Build the dashboard & cadence
Turn the instrumented metrics into one live scorecard, and define the review rhythm — what's looked at how often, and who sees it. The cadence is what makes the dashboard a habit rather than a one-off chart.
- Weekly — efficiency & quality leading indicators, for the content team
- Monthly — quality trend, business outcomes and AI visibility, for the content lead + sponsor
- Quarterly — re-score maturity (the Part 1 instrument), review the full scorecard and targets, for the exec sponsor
- Confirm who owns each review and where the dashboard lives
- Instrumented data sources (Step 3)
- Baseline values (Step 2)
- Live scorecard / dashboard
- Reporting cadence + named owners
A worked example of the trend view in the live scorecard. Toggle between metrics; each shows eight cycles against its target. Illustrative numbers.
Tie to ROI & the business case
Connect the lagging outcomes to pipeline and revenue, not just clicks — so the scorecard answers the question the board actually asks. This is where measurement becomes a business case the sponsor can defend.
- Map the chain from content → engagement → conversion → pipeline influenced → revenue
- Attach efficiency gains — reuse and localisation savings cut cost per asset (structured reuse alone can cut translation cost ~30–50%)
- Express ROI as value ÷ cost, with honest assumptions stated
- Set the north star: move up one maturity level a year while AI Share of Voice climbs against named competitors
- Package it into the readout for the sponsor
- Live scorecard + baseline
- Pipeline / revenue data from the client
- Content-to-pipeline ROI model
- Business-case readout
The "Can You Tell?" pass bar is ~50% — content the panel can't reliably tell apart from human writing has cleared the bar. But treat it as one signal, not proof of quality on its own. A piece can read as human and still be wrong, off-brand or useless; pair it with the factual-accuracy and brand-voice scores before you call anything good.
RACI & effort summary
Who does what across the part. R Responsible · A Accountable · C Consulted · I Informed.
| Activity | Sponsor | Content lead | Analytics / Martech | Lead consultant | Analyst |
|---|---|---|---|---|---|
| Choose the metric set | C | C | C | R | C |
| Baseline current state | I | C | C | A | R |
| Instrument tooling | I | I | R | A | R |
| Dashboard & cadence | C | C | C | R | R |
| Tie to ROI & business case | A | C | I | R | C |
| Week | Focus | Consultant days |
|---|---|---|
| Week 1 | Choose metric set, start baselining | ~2.5 |
| Week 2 | Finish baseline, instrument tooling | ~3 |
| Week 3 | Dashboard & cadence, start ROI model | ~2.5 |
| Week 4 | Business case, readout, handoff | ~1.5 |
The artifacts you use and leave behind
Four core templates are spelled out below; the full set produced in this part is indexed at the end.
One row per metric — so it stays measurable
| Metric | Formula / definition | Source | Cadence | Owner | Target |
|---|---|---|---|---|---|
| Time-to-publish | Days from idea approved to live | CMS / workflow tool | Weekly | Content ops | ↓ |
| Content reuse rate | Reused components ÷ total components × 100 | CMS | Monthly | Content lead | ↑ |
| Automated quality score | Clarity + consistency + tone + compliance, scored | Quality tool | Weekly | Analyst | 80+ |
| "Can You Tell?" rate | % panel guesses correct (≈50% = pass) | Swipe panel | Quarterly | Content lead | ≈50% |
| Pipeline influenced | Revenue of deals content touched | CRM / analytics | Monthly | Sponsor | ↑ |
| AI Share of Voice | Brand citations ÷ total category citations × 100 | SoV tool | Monthly | Analyst | ↑ vs comp. |
| Maturity level | Part 1 scorecard average, 1–5 | Quarterly re-score | Quarterly | Consultant | +1 / yr |
Keep the formula and source explicit — it's the only way a later reading is comparable to the baseline. One owner per row, always.
Turning a noisy signal into a defensible number
- Formula — your brand's citations ÷ total citations in your category × 100.
- Prompts — a fixed set of buyer-intent prompts for your category; version them so the set stays constant run to run.
- Engines — sample across ChatGPT, Google AI, Perplexity and Copilot (4–5 engines); report per engine and blended.
- Sample size — 30+ samples per prompt per engine, then average — because AI answers swing 40–60% month to month, one reading is noise.
- Entity SoV — is yours the brand the AI names as the answer?
- Citation SoV — are you cited as a source it links to? Track both, benchmarked against named competitors.
- What to do with gaps — topics/entities where AI doesn't know you feed straight back into the Model & Structure stages as the next content to engineer. That's how the loop closes.
The blind human-vs-machine quality bar
- Panel — internal reviewers or, better, a target-audience panel; the closer to the real reader, the more honest the result.
- Sample — interleave engineered/AI-assisted snippets with genuinely human-written ones; one snippet at a time, swipe "AI" or "Human".
- Pass bar — ≈50% guess rate, i.e. indistinguishable from human. Below that they can tell; the content still reads as machine-made.
- What to do with results — where the panel reliably spots the machine, that's a precise signal that feeds back into the Structure and Model stages.
- One signal only — passing means it reads human, not that it's accurate, on-brand or useful. Always pair with the factual-accuracy and brand-voice scores.
What the live scorecard must show
- The five categories grouped — leading (teal) above lagging (purple)
- Each metric: current value, baseline, target, and trend arrow
- AI Share of Voice broken into entity vs citation, vs named competitors
- "Can You Tell?" rate shown alongside accuracy & brand-voice, never alone
- Maturity level (1–5) with the quarterly re-score date
- A view per audience — team (weekly), lead+sponsor (monthly), exec (quarterly)
- Owner and last-refreshed date visible on every panel
- The north-star line: maturity ↑ one level / year while SoV climbs
Entry & exit gates
The quality bar that says this part is genuinely ready to start, and genuinely finished.
- Analytics & martech owner and content lead engaged
- Access granted to analytics, CMS and any SoV tooling
- Part 1 findings and client goals available to anchor metric choice
- Minimal metric set agreed, each with owner + target
- Baseline captured with method documented per metric
- Tooling instrumented — analytics, quality scoring, SoV sampling, "Can You Tell?"
- Live scorecard + weekly/monthly/quarterly cadence handed over
- Content-to-pipeline ROI model and business case delivered