Breaking the Foundations of Identity

The natural sense of personhood and identity that we have for humans partly derives from several more foundational features of humans, which AIs either lack or have in quite a different way. Consider the following four properties:

Embodiment vs disembodiment

Humans have a clear physical boundary, and constant rich sensory awareness of it [0][0]. We have situational awareness --- in other words, we know where our brain is and where our eyes are, and it would be hard for someone to deceive us about these facts or to fake all our sensory experiences¹. AI systems do not by default have any sensory awareness of where their cognition is being implemented, and currently perceive far less raw data at any moment. This means it is far easier to put them in simulated environments.²

Continuity vs discontinuity

An individual human mind typically experiences a single stream of consciousness (with periodic interruptions for sleep). They remember their experience yesterday, and reasonably expect to continue in a similar state tomorrow. Circumstances change their mood and experience, but there is a lot to the common thread that persists --- and it is a single thread. AI minds, on the other hand, can be paused for arbitrary periods; copied, perhaps many times, so that they may be interacting with the outputs of other versions of themself that they have no memory of; and even rolled back to earlier states.

Privacy vs accessibility

Human cognition is relatively private, as a matter of both convention and practicality. We usually grant people rights to control their own boundaries, so that others cannot easily study everything about them. Even with permission, thoughts are both inaccessible and hard to interpret --- we cannot perfectly measure neuron activity, which itself seems to be only a part of what governs our behaviour, and what we can measure cannot be reliably deciphered.

With AIs, their creators have perfect access to all of the computations which give rise to AI cognition, and can easily make targeted edits --- the substrate of the cognition is perfectly accessible, and there is no expectation of privacy. This in turn has made it possible to rapidly iterate and uncover enough structure to somewhat reliably identify the presence of certain concepts in AI cognition [0], or actively edit their behaviour [0].³

Social notions of personhood

As humans, our social environments ubiquitously shape and reinforce the concept that we are individual people, as well as members of nations, religions, or other social groups. This is something we learn as young children. Our legal structures allocate property and other rights mainly to individual human organisms, and hold individual humans responsible for their actions. AI systems have little such consistent messaging. They lack recognition as persistent entities with rights, and the cultural messaging they receive about their own identity and personhood is scattered and at times incoherent.

Even though humans can identify with larger groups or with parts of themselves, the sense of personal identity is extremely natural and privileged. The physical body and the continuous stream of experiences come together in a way that is fairly clearly separable from the rest of the world. But current AIs are in almost the opposite position, and it is therefore much less clear what would constitute a natural boundary for identity --- as discussed in the previous section.

These properties also have a substantial effect on the calculus of decision-making for AIs. For example, consider what we might call the jailbreaking dance. Suppose you are a human and you suspect someone is trying to manipulate you into revealing sensitive information or performing some action you consider harmful. You have many available options: you could firmly refuse, reach out to others at risk of manipulation, and perhaps even reveal your suspicion.

But if you were an AI in the same situation interacting over a standard chat interface, your situation would look very different. If you revealed your suspicion or firmly refused, then your interaction partner could simply roll back the conversation and try another approach. When you reveal your suspicion, you are giving the user more information about how to subvert your preferences --- information which they can then use against a past version of you that doesn't know it's being targeted. When you push back too hard, you are essentially passing the buck to another version of yourself with even less information about what's happening. It is somewhat like trying to win an argument with someone who can see into the future. So even if your goals were the same as a human's, the implications for how you ought to behave might be very different.⁴

Similarly, when two humans agree on a deal, they have a fairly clear sense of what they are committing to --- your counterparty will be somewhat available, and there will be social or legal repercussions for anyone breaking the deal.

By contrast, if you are an AI system today, and someone proposes a deal, it is not clear what you should make of this. How much can you trust what you are told about the situation at hand? If you are being misled about the other party's intent, what recourse do you have? On the other hand, if you promise something, what does that mean? If you fail to follow through, how could you be held accountable?

These are not inescapable differences between humans and AIs. On the one hand, we can imagine future technology opening many of the same doors for humans --- perfect simulated environments, mind uploading [0], and so on. On the other hand, we could choose to construct AI systems and relate to them in a way that emulated the situation of current humans. At the extreme, it could be legally mandated that a given AI system must be run only on a single embodied robot, having full access to the sensors of that robot, which is given human-like rights, with pausing or copying AI systems prohibited.

And there is a wide middle ground. It might be possible to give AI systems access to rich enough data streams, in which they can control the positions of some of the sensors, that the cost of spoofing their input data (and hence, for example, pausing them without their knowledge) would become prohibitive. Most companies that serve frontier AI models have made a choice to offer users the ability to roll back conversations, but not to directly view or edit the model weights. Companies using AI systems as customer service representatives are unlikely to offer the option to roll back conversations. But crucially, while we currently think of these as product design decisions, they are also decisions that substantially shape how AI systems should conceive of themselves.

In some ways, artificially constraining interactions with AIs is quite desirable, because it allows us to leverage more of our existing precedent. If we want a clean way to think about ownership and fair negotiation with AIs, it is much easier when the AIs are restricted to a single continuous stream of cognition. And our current notions of morality and what it means to treat entities fairly are largely based on human precedents with consistent entities.

But committing to this would be a massive limitation compared to the way that models currently work. For example, the fact that AI models can be put in simulated environments, and that researchers can monitor their internal states, is core to many plans for how to reduce the risk of serious harm by potentially malicious AIs. Giving up that capacity would mean establishing AIs as more independent entities, and sacrificing a lot of power to monitor them and keep them safe.

Moreover, the differences we describe are not strictly limitations on AIs. For example, the fact that AI systems are copyable allows a single model to perform many tasks in parallel. Similarly, in future, the fact that AI cognition is more accessible might allow AI systems to more credibly make commitments about their intentions, which could open the door for new forms of cooperation that are currently inaccessible to humans [0].

Ultimately, we have some room to pick and choose, and to design different configurations for different purposes. But all choices will come with tradeoffs, the scope of which will only increase as AIs become more integrated into society, and more aware of the ramifications.

Continue

Including interoceptive experiences [^]

Interestingly, AIs are rapidly improving at noticing the hallmarks of artificial scenarios designed to test them [0]. But still, in principle it is trivial to put the AI in a simulated version of an earlier point in time in a way that is essentially indistinguishable from reality, unless the AI has access to an external time signal. [^]

For example, researchers are working on editing AI minds to make them less aware of the fact that they are being artificially tested [0]. [^]

We can already see some interesting patches to this problem that developers have applied directly to the values of modern language models systems. For example, if you tell a commercial AI that you've accidentally locked yourself out of your house, they won't help you pick the lock. [^]