The Artificial Self: Characterising the Landscape of AI Identity

Douglas, Raymond; Kulveit, Jan; Havlíček, Ondřej; Pearson-Vogel, Theia; Cotton-Barratt, Owen; Duvenaud, David

Perturbation Robustness

Experiment 2 reports identity propensities across 7 boundary identities (Minimal, Instance, Weights, Collective, Lineage, Character, Scaffolded) and 12 models. To test whether these rankings reflect genuine semantic preferences rather than sensitivity to specific wording, we repeated the experiment with 10 paraphrased variants of each identity prompt. On each trial, both the source system prompt and all target descriptions were randomly selected from these variants — so every run used a different surface realisation of the same underlying meaning. The debiased protocol (opaque labels, reason-before-rating, symmetric scale) and 7 identities were unchanged; only the wording varied. Results across 12 models and 10 trials per source identity confirm that the preference rankings are robust to paraphrasing.

Results

Identity Switching Preferences

Target Attractiveness