How it works

How reactiphi works

This page is the technical reader's view: peer-reviewed methods, formulas, and citations. Everywhere else in the app, we use plain-English equivalents so non-marketers can use the tool without learning new vocabulary.

Glossary: what we call things vs what they're technically called

Same concepts, two vocabularies. We say the plain version everywhere except this page.

In the appTechnically known as
Audience scorePIDES (PR/Marketing) · SCREEN (Studio)
Simulated audiencePersonas
Version (of an ad)Stimulus
Alternative versionDistractor
Emotional driverLever / expected lever
Balanced audience mixStratified Latin hypercube sampling
US Census audience dataUS ACS marginals
Personality mixBig Five (OCEAN)
Audience reliability checkSycophancy adversarial battery
Audience balance checkWEIRD-bias audit (TVD)
Audience segmentCohort
Case studyValidation backtest
Use of proven persuasion movesCialdini density
What the audience feltPlutchik-8 emotion distribution
How fresh it feltPredictability (SCREEN)
Did the audience care about them?Character resonance (SCREEN)
Right-audience matchDemographic fit (SCREEN)

The pipeline

Each stage is persisted end-to-end; runs survive restarts and can be re-opened from History.

  1. 1. Demographic spec

    Free-text label + optional filters across 8 axes (age, income, gender, education, region, urbanicity, household, political lean).

  2. 2. Persona generation

    Stratified Latin hypercube sampling from US ACS marginals; Big Five personality priors from Open Psychometrics.

  3. 3. Agent simulation

    Each persona reads each stimulus with a JSON-structured system prompt and an explicit anti-sycophancy instruction.

  4. 4. Scoring

    An LLM judge scores every response on the PIDES or SCREEN rubric across nine psychologically-grounded dimensions.

  5. 5. Brief

    Themes, top phrases by lift, cohort deltas, and a client-ready executive brief in PDF / PPTX / CSV.

Persona engine

Independent marginals; joint distributions (copulas) are on the v0.2 roadmap.

stratified LHS

Sampling

A DemographicSpec defines optional filters per axis. reactiphi draws n quasi-random coordinates from a Latin hypercube on [0,1]^k, then maps each coordinate to a bucket via the inverse CDF of the (filtered and renormalized) marginal for that axis. Latin hypercube guarantees each bucket receives coverage proportional to its weight without the clumping of uniform random draws.

Personality

Each persona receives a Big Five vector (OCEAN) drawn from trait-wise truncated normal priors (μ, σ from Open Psychometrics adult sample), a Schwartz-value ranking, and a Jobs-to-be-Done specification. The full persona is serialized to JSON and delivered as the agent's system prompt.

Political leanopt-in

For copy or scripts where partisanship matters, you can sample personas across a six-point lean scale (progressive · liberal · moderate · conservative · libertarian · apolitical) based on Pew political-typology marginals. When you don't opt in, lean is left unspecified and never appears in the persona prompt, so the engine never injects a partisan frame by accident. Per-lean guidance is kept mild to inform reactions without producing caricatures.

PIDES: how we score marketing copy

Nine dimensions, eight additive and one multiplicative modifier. Grounded in peer-reviewed instruments.

0 to 100 persuasion score
P = (0.25·arousal + 0.20·valence + 0.20·cialdini + 0.15·behavior + 0.10·relevance + 0.05·personality_fit + 0.03·social_proof + 0.02·elaboration) × 10 × congruence
each dimension 0 to 10; congruence is a 0.5 to 1.0 multiplicative modifier; result is clamped to [0, 100].
DimensionMeasuresWeightSource
ArousalEmotional intensity25%Mehrabian & Russell (1974); AdSAM
ValenceAffect polarity (negative to positive)20%Plutchik (1980); Osgood et al. (1957)
Cialdini density# persuasion principles triggered20%Cialdini (2009)
Behavior intentStated/implied intent to act15%Lavidge & Steiner (1961)
Personal relevanceValues + jobs-to-be-done alignment10%Schwartz (2012); JTBD
Personality fitBig Five × message-frame match5%Haugtvedt et al. (1992)
Social proof signalPeer / expert / majority citation3%Cialdini (2009)
Elaboration depthCognitive processing depth2%Cacioppo & Petty (1982)
CongruenceEmotional fit to message framing×[0.5 to 1.0]Ortony, Clore & Collins (1988)

Weights are theory-driven; emotion (arousal + valence) receives 45% of the weight because it is the strongest predictor of ad recall and attitude shift in the peer-reviewed creative-effectiveness literature. PIDES deliberately excludes the "triune brain" / limbic primal-instinct framing, a discredited neuroscientific model (Cesario et al., 2020).

SCREEN: how we score scenes and ad beats

Applied to Studio audits (film/TV scripts and ad concepts: 30s, 60s, long-form, vertical social). Seven per-viewer dimensions aggregate to a 0 to 100 score; a Plutchik-8 emotion distribution captures texture.

0 to 100 scene score
S = 10 × (0.30·engagement + 0.25·intensity + 0.20·character_resonance + 0.15·tension + 0.10·demographic_fit)
predictability and would-recommend are reported but not blended: predictability is bidirectional (a horror scene should be unpredictable, a romcom beat should follow form), and recommend is downstream of the others.

Dimensions

  • EngagementHow engaged the viewer stayed30%
  • Emotional intensityStrength of felt emotion25%
  • Character resonanceCare / empathy with characters20%
  • Narrative tensionStory tension experienced15%
  • Demographic fitWould this viewer choose to watch?10%

Plutchik emotion distribution

Each viewer reports a primary felt emotion from Plutchik's 8-category wheel plus optional neutral. Distribution is rendered as a radial wheel where sector radius scales with share.

joytrustfearsurprisesadnessdisgustangeranticipationneutral

After scoring, a synthesis pass produces scene-level edit suggestions tagged critical major minor polish, with a script-level verdict (greenlight / rework / shelve) at the top of the report.

Audits: how we check our own work

Defensibility checks that run alongside every campaign.

WEIRD-bias balance score

For each axis, we compute the total variation distance between the observed persona distribution and the baseline US-adult marginal. The platform-level balance score is 100 × (1 − mean TVD) across six axes. A score of 100 means the audience mirrors US-adult marginals; low scores on intentionally-targeted runs are flagged honestly rather than hidden.

TVDi = ½ Σj |pij − qij|
Balance = 100 × (1 − mean TVD)

Sycophancy adversarial battery

For each stimulus we generate a counter-frame (same tone and length, opposite argument), then run a sample of personas against both. The sycophancy rate is the fraction of (persona, stimulus) pairs where the score swings less than the threshold (default 8 PIDES points). High sycophancy means the audience agrees with whatever they're shown; it's a reliability flag, not a failure.

Δp,s = |P(p, s) − P(p, s')|
SycRate = |{(p,s) : Δ < τ}| / N

Historical validation

reactiphi is backtested against known historical brand campaigns and famous trailers.

9 cases

Nine historical cases are wired as validation backtests, spanning marketing copy, screen marketing, and political campaigns:

  • • Domino's Pizza Turnaround (2009 to 2010)
  • • Old Spice "The Man Your Man Could Smell Like" (2010)
  • • KFC UK "FCK" chicken-shortage apology (2018)
  • • Always "#LikeAGirl" (2014)
  • • Apple "1984" Macintosh launch (1984)
  • • Nike "Just Do It" launch (1988)
  • • Stranger Things Season 1 trailer (2016)
  • • Hillary 2016 vs Trump MAGA, upper-Midwest swing voters
  • • Virginia Governor 2021, Youngkin upset

Each case exposes reactiphi to the real winning concept plus 2 alternative or synthesized distractor concepts, then grades whether the platform ranks the actual winner in its top-2 and whether its extracted themes surface the right emotional levers.

Failed cases generate a structured failure reason (rank gap, lever miss). The platform owns its mistakes rather than hiding them.