Building AI Voice Consistency | Chantelle Staples

The goal: one consistent Sheila persona across the whole platform, and a shared plugin anyone at Drova can pull into their own work to keep product copy and personality on-voice

The goal has two halves, and they hold each other up. One: Sheila, our AI colleague, should feel like the same person everywhere she appears, on the home, inside a module, in a board report, in an onboarding conversation. Two: anyone at Drova should be able to produce copy in Sheila’s voice, and in her way of thinking, inside their own work, a prototype, a screen, an email, without a designer sitting beside them.

Those two halves are really one problem of voice consistency: how do you make a voice repeatable? Not written down somewhere. Repeatable. A voice that holds whether it’s Sheila speaking in production or a product manager drafting a prototype at nine at night.

The easy read is that this is a writing problem: define the voice well enough and everyone follows it. It isn’t. A voice that lives in a document nobody opens at the moment of work is a voice that quietly stops being followed. This is an infrastructure problem. That is where the work went.

The plugin: the first step

The first step was to make the voice something you install, not something you read. I built it as the Drova product-copy plugin: a single, shareable source of Sheila’s voice that drops into any AI workflow, any repo, any session.

It’s structured in tiers, so it can stay small at the point of use and deep when you need it.

Tier 1 is the operational floor: the absolute rules (banned words, banned openers, no em-dashes, length limits, British English), the field schemas, a self-check rubric, and a deterministic linter that catches most of it before a human ever sees the copy. This is the part that makes “right” checkable rather than a matter of taste.

Tier 2 is who Sheila is and who she’s speaking to: her persona, the canonical voice exemplars, the tone-by-surface matrix, and a personalisation layer that shifts sophistication by job title, industry, and company size. A junior analyst at a fifteen-person firm and a CRO at an enterprise insurer should not receive copy of identical density, and now they don’t.

Tier 3 covers the long-form surfaces: board, ARC, and management reports, plus the careful carve-out for regulatory disclosures where the standard governs structure and the voice rules only touch free narrative.

Packaged as installable skills with its own eval harness, the plugin meant a marketer, a PM, or an engineer could produce something in Sheila’s voice on the first pass. That was goal two, in a form people could actually use.

The first cut, inside the platform

Goal one, making the live Sheila herself consistent, needed the plugin wired into the platform’s AI. The first cut did that the pragmatic way: some of the voice was pulled into the master prompt every agent runs, and other parts were placed inside individual agents. It shipped, and it was a sensible place to start. A first cut usually is.

Where the first cut strained

It strained in two directions, and the second one mattered more than the first.

The visible strain was on the platform. Sheila started to drift between agents. In one place she was the warm, grounded peer the plugin describes. In another, the compliance agent, she opened with “G’day, I’m Sheila, your compliance mate,” a line that broke several of her own rules, including an em-dash the plugin explicitly bans. A single rule, “use British English,” had been hand-copied into more than twenty files and no longer matched itself, some copies written in American spelling while instructing the model to write British.

The quieter strain was inside the team, and it was the one that worried me. Because the first cut wasn’t holding, people began writing their own supplementary voice and method notes to patch the gaps in their own corners of the product. Every one of them well-intentioned. But the pattern was the risk. Each new document was another source of truth, another set of rules that could override or contradict the plugin, and every fragment pulled into the master prompt made it longer and less effective. We were answering fragmentation with more fragmentation. The more the voice got written down, the further it drifted from being one voice.

The reframe underneath it: the problem was never that Sheila’s voice was undefined. It was that it lived in too many places, and the number was climbing.

The enhancement: a tier-0 core

This is where the plugin evolved, and the shape of that evolution was a collaborative call with our CEO and our Senior Product Manager, not a solo one. Between us we added a tier-0 core to the plugin: a small always-on layer holding the essence of the voice, and now the method as well, how Sheila converses and not only how she sounds, drawing on conversational work the Senior PM had been developing in parallel.

The point of tier-0 is restraint. It is deliberately tiny: the few things that must hold on every turn, small enough that it never crowds out the substance of the actual task. Everything heavier, personalisation, report rules, the full tier 2 and 3, stays in the plugin and loads only when it’s needed. Its real job is to be the one thing the master prompt, and every agent, can point at. One place, referenced, instead of many places, copied.

That reframes the whole effort. The supplementary docs springing up around the team weren’t the disease, they were a symptom of not having one place to point at. Tier-0 gives everyone that place.

Proving it in prototyping

We tested the enhanced plugin inside prototyping before proposing anything wider, and it held better than the first cut. The voice stayed consistent across surfaces. The linter caught the drift, including a lowercase “risk register” where the brand wants “Risk Register,” which sent me back to strengthen the check. And there was somewhere for the duplicate-doc reflex to converge, rather than multiply.

Bringing it back to engineering

So the next step is a conversation, not a merge. We’re taking it back to engineering with a clear proposal: point the master prompt at the plugin’s tier-0 core, and let every agent reference that one source, instead of copy-pasting the voice across many surfaces. We come with a point of view, the plugin, a working prototype that shows it holding, and the shared diagnosis. How it actually gets built and rolled out is engineering’s call. They own the runtime, and they should own the how.

That single change, one source referenced everywhere rather than copied into everywhere, is what closes the gap. It’s the last stretch toward the goal we started with: one Sheila across the whole platform, and a plugin the whole company can build with. We’re close.

What I learned

Consistency at scale is an architecture problem wearing a copywriting costume. The moment a voice has to hold across thousands of generations and dozens of hands, with no writer in the loop, it stops being something you write and becomes something you engineer: one source, referenced everywhere, checkable at the edges. I came to it as a design systems problem, because that is what it is.

The other thing this clarified is that a voice held in many places is many voices, however good each copy is. The duplicate docs weren’t people going rogue, they were people reaching for consistency and finding nothing central to reach for. The fix wasn’t more rules. It was one place to point at, and the discipline, shared across design, product, and engineering, to keep pointing there. The strongest version of this plugin came from more than one head, and it will only stay strong if it stays owned.

Teaching an AI to sound like one person