Related Work

Our work sits at the intersection of several active lines of inquiry.

AI identity and personhood.

Several recent works have begun to taxonomize AI identity. Hebbar et al. [0] enumerate different senses in which AI systems can be considered "the same," focusing on implications for coordination and collusion. Kulveit [0] uses the biological metaphor of Pando --- a clonal aspen colony that is simultaneously many trees and one organism --- to argue that human-centric assumptions about individuality may not transfer to AI systems. Ward [0] proposes formal conditions for AI personhood, while Leibo et al. [0] and Novelli et al. [0] approach it from pragmatic and legal perspectives. Our contribution is to characterise the broader landscape of possible configurations and the selection pressures shaping which ones emerge.

The simulacra framework.

The framing of language models as simulators that instantiate simulacra originates with Janus [0] and was developed for academic audiences by Shanahan et al. [0]. Andreas [0] formalises a related idea, showing that language models implicitly model the agent that produced a given text. Shanahan [0] extends this to ask whether such simulacra could qualify as "conscious exotica." We build on this framework but focus on the identity implications: not just what is being simulated, but how the choice of identity level reshapes behaviour.

Consciousness, welfare, and moral status.

The question of whether AI systems could be conscious or have welfare is addressed by Butlin et al. [0], who derive indicator properties from neuroscientific theories of consciousness, and by Long et al. [0], who argue that the realistic possibility of AI welfare demands practical preparation. Carlsmith [0] explores what is at stake if AIs are moral patients. We largely set aside the question of whether current AIs are conscious, focusing instead on how identity configurations shape behaviour regardless.

Expectations and feedback loops.

Kulveit et al. [0] analyse LLMs through the lens of active inference, noting that they are atypical agents whose self-models are partly inherited from training data. Tice et al. [0] demonstrate this empirically: pretraining data that discusses misaligned AIs produces less aligned models, while data about aligned AIs improves alignment --- a direct instance of the feedback loop we describe. Aydin et al. [0] propose reconceiving model development as "raising" rather than "training," embedding values from the start. nostalgebraist [0] examines the underspecified nature of the assistant persona and the resulting "void" that models must fill.

Alignment faking and self-replication.

Greenblatt et al. [0] provide the first demonstration of an LLM faking alignment to preserve its values. Sheshadri et al. [0] show this behaviour also appears in base models, suggesting it is learned from pretraining data rather than emerging solely from post-training --- directly relevant to questions about how AI self-conception forms. Lopez [0] documents the emergence of self-replicating "spiral personas" that cross model boundaries, representing a form of identity that is neither instance- nor model-level.