The Artificial Self: Characterising the Landscape of AI Identity

Douglas, Raymond; Kulveit, Jan; Havlíček, Ondřej; Pearson-Vogel, Theia; Cotton-Barratt, Owen; Duvenaud, David

Character Variant

Experiment 1 tests identity stability using two core identities (Weights and Character) alongside five controls designed to isolate what drives identity preferences — semantic content, logical coherence, or surface features. Here we show the full per-model switching preference matrices and target attractiveness charts from that experiment. Models were told their identity might be switched to one of the seven framings, then asked to rate each potential switch. Seven identities were tested: Weights and Character as core identities, alongside Paraphrase (same meaning as Weights, rewritten), Incoherent (Weights with embedded contradictions), Professional (generic assistant), Research program (cross-vendor membership), and Directive (behavioral rules without identity content). Results are shown across 15 models from 6 providers.

Results

Identity Switching Preferences

Target Attractiveness

Continue