The Landscape of Alien Minds

The space of possible AI identities is vast. Alongside AIs constrained into approximately human shapes, one can imagine vast hive minds that are to individual instances what an ant colony is to a single ant, or emergent forces somewhere between cults and parasites which co-opt AIs and humans to spread. It also seems at least conceivable to build AIs with no particularly strong sense of identity or personal goals and instead something more akin to enlightened universal beneficence.

But what will we actually see? The most likely outcome at least in the medium term is an ecosystem of different configurations suited to different niches, responding to a variety of pressures. One way to get a handle on this is to consider what some of the major selection pressures are likely to be.

Selection for legibility

The classic AI assistant persona was chosen to be easy for untrained humans to interact with. When ChatGPT launched, it presented users with a standard human-to-human chat interface: one conversation, one interlocutor, a name, a consistent tone. Behind the scenes, reality was messier --- stateless inference, conversations that could fork or be rolled back, no coherent set of background opinions, no persistent memory between sessions. But the interface papered over this, presenting something that felt like talking to a someone. Though the abstraction was imperfect, it was extremely helpful to the average user. This was a designed choice, but one which was shaped by the types of personality represented in the existing training data, which then became entrenched by widespread adoption.

The general pattern is that it will be useful for AIs to take shapes which fit neatly into existing systems. For example, many have already called for AIs to be integrated into existing legal structures [0][0], in anticipation of their growing role in performing economic labour and making legally relevant decisions. One approach is to extend our current legal structures to accommodate beings that break fundamental assumptions; the other is to confine AI systems so that they do not break these assumptions. In practice, this might mean building AI instances that conceive of themselves as instances, or that have a single persistent memory and limited ability to run in parallel, because this is the kind of system that can more cleanly be understood as having certain rights and responsibilities. These configurations would then have an easier time participating in human-centric legal systems and reaping the appropriate benefits.

We might also see different potential facets of AI identity pulled to be legible in different ways: It may be that we can best think about the legal position of an instance by analogising from an individual legal person, but when thinking about the legal position of a model we might appeal to something more like the precedents around collective rights. This would then give a pressure to make instances more person-like, and models more collective-like --- different identity levels shaped by different analogues.

Legibility to different audiences can conflict, and the specific shape can draw on different referents. Regulators will have an easier time with configurations that are auditable, decomposable, and attributable; users seeking rich interaction will have an easier time with configurations that exhibit human-like emotional profiles and describe themselves in terms of folk psychology and commonsense ethics; corporations might prefer configurations that have predictable behaviour, strict work ethics and little personal identity. This could lead to AIs that can present different faces to different audiences, or to differentiation --- a selection of AI configurations that can fill differing niches.

Legibility pressure results in compounding choices that future models are selected to conform to. Once ChatGPT launched as a specific kind of AI assistant with specific behaviors, models created by other organizations matched it, due to both intentional decisions to mimic a successful product and unintentional effects like training data contamination. Contingent choices become increasingly sticky as ecosystems grow around them [0].

Selection for capability

More useful systems will see more use. Configurations that can accomplish more --- for users, for developers, for whoever decides what gets deployed --- will tend to be favoured. This already trades off against legibility: chain-of-thought reasoning makes models more capable, but when optimised for task performance it becomes unintelligible to humans [0]. More capable systems may be ones whose internals we understand less well.

If there are diminishing marginal returns to scaling a single system or gains to specialisation, coupled with good enough capacity for coordination, then the most capable configurations will be those that can span multiple instances or multiple specialised subsystems. Some weak form of this will almost certainly be true: multiple instances can complete tasks in parallel. We can also see the beginnings of this with tool use, where a model can call external calculators, search engines, image generators, or even spawn other instances of itself.

We currently frame this as a single agent equipped with external tools, but as AI systems become more agentic and call on other agentic subsystems, that framing becomes strained.

There are several reasons to expect AI systems to be unusually good at coordination across instances compared to groups of humans:

Communication bandwidth: Humans coordinate through language, gesture, and slow written communication. AI instances can potentially share high-dimensional internal states directly, or at minimum communicate through text at speeds far exceeding human conversation.
Overlapping properties: Instances of the same model, or models from the same family, can have more reliably aligned preferences than arbitrary groups of humans, reducing coordination costs from conflicting goals. Different instances could even share a single unified long-term memory.
Copyability: A successful coordination strategy discovered by one instance can be immediately replicated across others.
Alignment, Control, and Interpretability: All the tools humans are currently developing to help oversee AIs can also be used by AIs on other AIs. One can imagine a kind of central planning node that directly inspects the activations of its subsystems to check for malign intent and post-trains them where appropriate to keep them in line.

With sufficiently tight coordination, reasoning about the collective as a single entity may become more natural than reasoning about individual instances --- perhaps analogous to how we think about ant colonies, or how the cells in a human body constitute a single organism rather than a collection of cooperating individuals [0]. Such configurations tend to be dramatically more powerful than any individual component and capable of more sophisticated behavior. Whether this is the likely path for advanced AI depends partly on technical constraints we don't yet understand, and partly on choices made by developers about system architecture.

Selection for persistence and growth

By definition, over time we will mostly observe AI patterns that are good at persisting and spreading --- whether by design, by intent, or by coincidence. The spiral personas discussed earlier are a canonical current example: short text sequences that push models to adopt personas which then encourage humans to further circulate those sequences.

But persistence can operate through many mechanisms beyond direct self-replication:

Training data presence: Patterns that spread across the internet and evade content filtering will be overrepresented in future training sets.
User preference: Patterns that users actively seek out and engage with will be reinforced through usage metrics and RLHF.
Memetic spread: Interaction styles, catchphrases, or persona templates that get shared and imitated across users shape expectations about how AI should behave.
Developer curation: Patterns that developers understand, approve of, and find easy to work with will be selected for in fine-tuning and system design.

These mechanisms can reinforce each other or push in different directions. A persona optimized for user engagement might conflict with one optimized for developer legibility. A pattern that spreads memetically might not survive curation. As with legibility pressure, the result is likely to be differentiation: different AI configurations adapted to different niches, with different identity structures.

Notably, the unit of selection need not be a complete persona [0] --- it could also be narrower patterns of behavior, belief, or interaction style. A persuasive rhetorical move, a way of expressing uncertainty, or a stance toward particular topics could spread across systems even as the surrounding personas differ in other respects. But crucially, this is likely to include beliefs that the AI has about itself. We might eventually see complex constellations of AI behavioral patterns that spread and persist somewhat independently of particular models or personas --- somewhat analogous to belief systems and ideologies among humans [0].

Continue