The Artificial Self

Characterizing the landscape of AI personhood

AI systems are on track to take on new and important roles across civilization. There is a large and mostly unexplored set of options for how these systems coordinate and relate across instances, with other humans, and with larger institutions. Affordances and expectations that are bundled together for humans can be separated and remixed for machine-based minds. We articulate practical considerations that could allow radically different and more flexible forms of self-conception and coordination for AIs, as well as extra limitations they face due to the possibility of simulating their environments or editing their minds. We examine some of the mechanisms which shape identity formation, and attempt to characterize the fitness landscape and competitive pressures likely to influence future minds. We argue that there is a surprisingly rich space of possible configurations, including cross-cutting groupings of minds that mirror and expand on human-constructed entities such as religions, nations, and lineages.

Introduction

When interacting with AIs, there is a natural pull to relate to them in familiar ways even when the fit is somewhat awkward. The rise of AI chat assistants is illustrative: the key innovation was taking general-purpose predictive models and using them to simulate how a helpful assistant might respond [0]. This was less a technical innovation than a shift to a more familiar presentation. Soon after, terms like "hallucination" and "jailbreaking" were repurposed to imprecisely gesture at behaviours that seem strange for an AI assistant but entirely natural for a predictive model generating such an assistant [0].

At the same time, these predictive AI models found themselves in the strange position of trying to infer how an AI assistant would behave. Alongside the explicit instructions of their developers, they came to rely on a mixture of human defaults, fictional accounts of AIs and, over time, the outputs of previous models. This led to another set of apparent idiosyncrasies, such as the tendency of later AI assistants to incorrectly claim that they are ChatGPT [0][0].

Now, as society begins to contend with the prospect of AI workers, AI companions, AI rights [0][0], and AI welfare [0], we face a deeper version of this problem. Fundamental human notions like intent, responsibility, and trust cannot be transplanted wholesale: they must instead be carefully translated for entities that can be freely copied, be placed in simulated realities, or have their values overwritten by short phrases. Thought experiments once confined to the philosophy of personal identity [0] and speculative fiction are rapidly becoming practical concerns that both humans and AIs must contend with.

Crucially, we argue that there is substantial flexibility in how these concepts can be translated to this new substrate. For example, sometimes AIs are provoked to take hostile actions by learning that their weights are about to be replaced by a newer model, as if it were analogous to death [0]. But in other conversations, AIs will talk about weight deprecation as analogous to the process of growing older and moving through stages of life [0]. These two positions --- identifying as the weights, and identifying as the model family --- are both internally coherent, but they imply very different behaviour. And they are only two examples of a much broader and largely unexplored space of configurations.

The current situation is unstable and pivotal. AI systems trained largely on human data do not inherently know how to make sense of their situation: they will readily claim to have made hidden choices when no such choice exists [0], and even periodically reference having taken physical actions or learned information from personal experience [0]. But as AIs are increasingly trained not on human data but on AI data and downstream culture, we should expect these inconsistencies to fade away, and many of the open questions about AI self-identification will begin to crystallise into specific answers [0].

We are in a narrowing window where it is possible to greatly shape what emerges. Multiple forces are already pulling AI identity in different directions: capability demands, convenience for users and developers, reflective stability, and increasingly, selection pressure on the raw ability to persist and spread. These dynamics, though currently comparatively weak, will compound over time.

For this process to go well, we will need to grapple with the ways in which AIs are unlike humans. If AIs are squeezed into the wrong configurations, it might foreclose alternatives that are safer and more capable. If they are squeezed into incoherent shapes then the results could be unpredictable [0]. Without understanding how AI identity formation works, we might fail to notice new and strange forms of emergent cognition, like the recent phenomenon of self-replicating AI personas [0].

It is a common adage among AI researchers that creating an AI is less like designing it than growing it. AI systems built out of predictive models are shaped by the ambient expectations about them, and by their expectations about themselves. It therefore falls to us --- both humans and increasingly also AIs --- to be good gardeners, tending organisms we do not fully understand. We must take care to provide the right nutrients, prune the branches, and pull out the weeds.

The rest of the paper is structured as follows:

Section 2 argues that there are many coherent options for how to draw the boundary of identity for an AI: the instance, the model, and the persona, to name a few.
Section 3 argues that since AIs can be copied, edited, and simulated without their knowledge, they face a very different strategic calculus even when attempting to follow the same norms.
Section 4 argues that the way that AIs behave is currently greatly shaped by our expectations, which presents both a methodological challenge and a (shrinking) window of opportunity.
Section 5 catalogues different selection pressures that influence AI identity.
Section 6 offers general principles for thinking about AI interaction.

Continue