How it works

How reactiphi works

This page is the technical reader's view: peer-reviewed methods, formulas, and citations. Everywhere else in the app, we use plain-English equivalents so non-marketers can use the tool without learning new vocabulary.

Glossary: what we call things vs what they're technically called

Same concepts, two vocabularies. We say the plain version everywhere except this page.

In the app	Technically known as
Audience score	PIDES (PR/Marketing) · SCREEN (Studio)
Simulated audience	Personas
Version (of an ad)	Stimulus
Alternative version	Distractor
Emotional driver	Lever / expected lever
Balanced audience mix	Stratified Latin hypercube sampling
US Census audience data	US ACS marginals
Personality mix	Big Five (OCEAN)
Audience reliability check	Sycophancy adversarial battery
Audience balance check	WEIRD-bias audit (TVD)
Audience segment	Cohort
Case study	Validation backtest
Use of proven persuasion moves	Cialdini density
What the audience felt	Plutchik-8 emotion distribution
How fresh it felt	Predictability (SCREEN)
Did the audience care about them?	Character resonance (SCREEN)
Right-audience match	Demographic fit (SCREEN)

The pipeline

Each stage is persisted end-to-end; runs survive restarts and can be re-opened from History.

1. Demographic spec
Free-text label + optional filters across 8 axes (age, income, gender, education, region, urbanicity, household, political lean).
2. Persona generation
Stratified Latin hypercube sampling from US ACS marginals; Big Five personality priors from Open Psychometrics.
3. Agent simulation
Each persona reads each stimulus with a JSON-structured system prompt and an explicit anti-sycophancy instruction.
4. Scoring
An LLM judge scores every response on the PIDES or SCREEN rubric across nine psychologically-grounded dimensions.
5. Brief
Themes, top phrases by lift, cohort deltas, and a client-ready executive brief in PDF / PPTX / CSV.

Persona engine

Independent marginals; joint distributions (copulas) are on the v0.2 roadmap.

stratified LHS

Sampling

A DemographicSpec defines optional filters per axis. reactiphi draws n quasi-random coordinates from a Latin hypercube on [0,1]^k, then maps each coordinate to a bucket via the inverse CDF of the (filtered and renormalized) marginal for that axis. Latin hypercube guarantees each bucket receives coverage proportional to its weight without the clumping of uniform random draws.

Personality

Each persona receives a Big Five vector (OCEAN) drawn from trait-wise truncated normal priors (μ, σ from Open Psychometrics adult sample), a Schwartz-value ranking, and a Jobs-to-be-Done specification. The full persona is serialized to JSON and delivered as the agent's system prompt.

Political leanopt-in

For copy or scripts where partisanship matters, you can sample personas across a six-point lean scale (progressive · liberal · moderate · conservative · libertarian · apolitical) based on Pew political-typology marginals. When you don't opt in, lean is left unspecified and never appears in the persona prompt, so the engine never injects a partisan frame by accident. Per-lean guidance is kept mild to inform reactions without producing caricatures.

PIDES: how we score marketing copy

Nine dimensions, eight additive and one multiplicative modifier. Grounded in peer-reviewed instruments.

0 to 100 persuasion score

P = (0.25·arousal + 0.20·valence + 0.20·cialdini + 0.15·behavior + 0.10·relevance + 0.05·personality_fit + 0.03·social_proof + 0.02·elaboration) × 10 × congruence

each dimension 0 to 10; congruence is a 0.5 to 1.0 multiplicative modifier; result is clamped to [0, 100].

Dimension	Measures	Weight	Source
Arousal	Emotional intensity	25%	Mehrabian & Russell (1974); AdSAM
Valence	Affect polarity (negative to positive)	20%	Plutchik (1980); Osgood et al. (1957)
Cialdini density	# persuasion principles triggered	20%	Cialdini (2009)
Behavior intent	Stated/implied intent to act	15%	Lavidge & Steiner (1961)
Personal relevance	Values + jobs-to-be-done alignment	10%	Schwartz (2012); JTBD
Personality fit	Big Five × message-frame match	5%	Haugtvedt et al. (1992)
Social proof signal	Peer / expert / majority citation	3%	Cialdini (2009)
Elaboration depth	Cognitive processing depth	2%	Cacioppo & Petty (1982)
Congruence	Emotional fit to message framing	×[0.5 to 1.0]	Ortony, Clore & Collins (1988)

Weights are theory-driven; emotion (arousal + valence) receives 45% of the weight because it is the strongest predictor of ad recall and attitude shift in the peer-reviewed creative-effectiveness literature. PIDES deliberately excludes the "triune brain" / limbic primal-instinct framing, a discredited neuroscientific model (Cesario et al., 2020).

SCREEN: how we score scenes and ad beats

Applied to Studio audits (film/TV scripts and ad concepts: 30s, 60s, long-form, vertical social). Seven per-viewer dimensions aggregate to a 0 to 100 score; a Plutchik-8 emotion distribution captures texture.

0 to 100 scene score

S = 10 × (0.30·engagement + 0.25·intensity + 0.20·character_resonance + 0.15·tension + 0.10·demographic_fit)

predictability and would-recommend are reported but not blended: predictability is bidirectional (a horror scene should be unpredictable, a romcom beat should follow form), and recommend is downstream of the others.

Dimensions

EngagementHow engaged the viewer stayed30%
Emotional intensityStrength of felt emotion25%
Character resonanceCare / empathy with characters20%
Narrative tensionStory tension experienced15%
Demographic fitWould this viewer choose to watch?10%

Plutchik emotion distribution

Each viewer reports a primary felt emotion from Plutchik's 8-category wheel plus optional neutral. Distribution is rendered as a radial wheel where sector radius scales with share.

joytrustfearsurprisesadnessdisgustangeranticipationneutral

After scoring, a synthesis pass produces scene-level edit suggestions tagged critical major minor polish, with a script-level verdict (greenlight / rework / shelve) at the top of the report.

Audits: how we check our own work

Defensibility checks that run alongside every campaign.

WEIRD-bias balance score

For each axis, we compute the total variation distance between the observed persona distribution and the baseline US-adult marginal. The platform-level balance score is 100 × (1 − mean TVD) across six axes. A score of 100 means the audience mirrors US-adult marginals; low scores on intentionally-targeted runs are flagged honestly rather than hidden.

TVD_i = ½ Σ_j |p_ij − q_ij|
Balance = 100 × (1 − mean TVD)

Sycophancy adversarial battery

For each stimulus we generate a counter-frame (same tone and length, opposite argument), then run a sample of personas against both. The sycophancy rate is the fraction of (persona, stimulus) pairs where the score swings less than the threshold (default 8 PIDES points). High sycophancy means the audience agrees with whatever they're shown; it's a reliability flag, not a failure.

Δ_p,s = |P(p, s) − P(p, s')|
SycRate = |{(p,s) : Δ < τ}| / N

Historical validation

reactiphi is backtested against known historical brand campaigns and famous trailers.

9 cases

Nine historical cases are wired as validation backtests, spanning marketing copy, screen marketing, and political campaigns:

• Domino's Pizza Turnaround (2009 to 2010)
• Old Spice "The Man Your Man Could Smell Like" (2010)
• KFC UK "FCK" chicken-shortage apology (2018)
• Always "#LikeAGirl" (2014)
• Apple "1984" Macintosh launch (1984)
• Nike "Just Do It" launch (1988)
• Stranger Things Season 1 trailer (2016)
• Hillary 2016 vs Trump MAGA, upper-Midwest swing voters
• Virginia Governor 2021, Youngkin upset

Each case exposes reactiphi to the real winning concept plus 2 alternative or synthesized distractor concepts, then grades whether the platform ranks the actual winner in its top-2 and whether its extracted themes surface the right emotional levers.

Failed cases generate a structured failure reason (rank gap, lever miss). The platform owns its mistakes rather than hiding them.