Theory of Mind, for Us, for Machines?
While Theory of Mind bears a name that seemingly associates with the study of human mind mechanisms, it has been defined differently for long by philosophy and psychology scholars. Theory of Mind (ToM) refers to the understanding that others have intentions, desires, beliefs, perceptions, and emotions different from one’s own and that such intentions, desires, and so forth affect people’s actions and behaviors. With ToM, humans are able to perform “belief–desire reasoning”, the process by which one explains and predicts another’s behavior on the basis of one’s understanding of the other’s desires and beliefs (APA Dictionary of
Not everyone possesses a ToM. It was discovered that people with autism often show a deficit in ToM, known as “mindblindness”. This has been most prominently examined by the famous experiment “Doll with Marble”, a.k.a. Sally-Anne test. In the experiment, the participant is presented with a short skit of two dolls: Sally puts a marble in a basket and leaves the room; Anne moves the marble to a box; Sally returns to find her marble. The participant is then asked: the basket or the box, which one would Sally look into first? The test is called a “false belief task” because it requires the participant to conceptualize that Sally holds a false belief of the real world. Researchers found that children under the age of four, as well as people with autism usually failed to answer correctly. The result could be attributed to the unawareness of the existence of self and others’ mental states (Studies reviewed by Goldman, 2012). Though the validity of these conclusions remains controversial, ToM is believed to be a high-level mental capacity that specifically pertains to humans. Now that we've grasped the essence and implications of the Theory of Mind (ToM), we can progress to examining the areas where ToM intersects in both humans and machines. A key shared domain in this context is language. Studies in linguistic determinism underscore the importance of language in the development of ToM. Peterson and Siegal (1999) conducted the "Doll with Marble" experiment with deaf children who had hearing parents and likely had limited exposure to sign language. These children failed the test more frequently than the control groups, which included signing deaf children, orally communicative deaf children, and children with normal hearing. Peterson and Siegal suggested this was due to their lack of complex cognitive skills stemming from a lack of complex language. This highlights the uniqueness of ToM to humans, asserting that, according to linguistic definitions, humans are the only beings on Earth possessing language.
Large Language Models (LLMs) have the capacity to process natural languages, but it's debatable whether they possess a Theory of Mind (ToM). Wei et al. (2022) introduced the term "emergence" to describe abilities that aren't present in smaller models but appear in larger ones. Researchers have identified several such abilities in LLMs, including instruction following, use of open-book knowledge for fact-checking, chain-of-thought, self-consistency decoding, and 'ask me anything' prompting. Kosinski (2023) proposed ToM as one of these emerging abilities, citing a 90% correctness in false-belief tasks completed by GPT-3.5 and 95% by GPT-4. He argued that either a ToM-like ability emerged spontaneously in these machines, or they have found and exploited unknown language patterns to solve ToM tasks, presenting a challenge to foundational ToM research. However, Sap et al. (2022) reported lower accuracy from LLMs, asserting this doesn't convincingly demonstrate the existence of ToM. Ullman (2023) also countered Kosinski's findings, showing that minor variations in these tasks can significantly affect LLM performance. Ullman contended that while ToM tests can be valuable tools for studying human children, we should remain skeptical of LLMs passing these tests, given the structural differences between humans and machines.
Yes, we and machines are structured differently. The open question is, what is the threshold of difference for machines to become able to think like humans?
Let’s get back to human brains. In 1998, Gallese and Goldman posited a link between simulation-style mindreading and activity of mirror neurons (or mirror systems). Mirror neurons had been first discovered in macaque monkeys in their premotor cortex. A subclass of premotor neurons were found to fire both when the animal plans to perform an instance of their distinctive type of action and when it observes another animal (or human) perform the same action. These neurons were dubbed “mirror neurons,” because an action plan in the actor’s brain is mirrored by a similar action plan in the observer’s brain (Goldman, 2012).
Gallese and Goldman speculated that the mirror system might be part of, or a precursor to, a general mindreading system that works on simulationist principles. Iacoboni et al. (2005) did an fMRI study in which human participants observed video clips presenting three kinds of stimulus conditions: (1) grasping hand actions without any context (“Action” condition), (2) scenes specifying a context without actions, i.e., a table set for drinking tea or ready to be cleaned up after tea (“Context” condition), and (3) grasping hand actions performed in either the before-tea or the after-tea context
(“Intention” condition). The Intention condition yielded a significant signal increase in premotor mirroring areas where hand actions are represented. The investigators interpreted this as evidence that premotor mirror areas are involved in understanding the intentions of others, in particular, intentions to perform subsequent actions.
Modern language models base themselves on neural networks, which mimic the activation patterns of human brain neurons. Therefore, we can find similarities in both systems. Transformer, the mainly used network architecture for natural language processing, features the attention mechanism. The attention layer maps embedding word vectors — the representation of words in vector space — to Query, Key, and Value vectors with some trained weights. The dot products of Query and Key vectors reflect the relevance between every word pair. It is expected that highly correlated words have high scores. For instance, if the input is “Sally thinks the marble is in the basket”, “Sally” and “thinks” would have a stronger relevance than “thinks” and “marble”. We can reason that the link established between the subject and the action of thinking provides the machine with an awareness of the subject’s mental state, just like “Sally thinks” informs us about Sally’s personification. Hence, machines’ patterned attention to phrases that hint on the existence of mental processing reminds us of the patterned activation of mirror neurons. Modern language models base themselves on neural networks, which mimic the activation patterns of human brain neurons. Therefore, we can find similarities in both systems. Transformer, the mainly used network architecture for natural language processing, features the attention mechanism. The attention layer maps embedding word vectors — the representation of words in vector space — to Query, Key, and Value vectors with some trained weights. The dot products of Query and Key vectors reflect the relevance between every word pair. It is expected that highly correlated words have high scores. For instance, if the input is “Sally thinks the marble is in the basket”, “Sally” and “thinks” would have a stronger relevance than “thinks” and “marble”. We can reason that the link established between the subject and the action of thinking provides the machine with an awareness of the subject’s mental state, just like “Sally thinks” informs us about Sally’s personification. Hence, machines’ patterned attention to phrases that hint on the existence of mental processing reminds us of the patterned activation of mirror neurons.
Nonetheless, for Transformer models, the attention to “Sally” and “thinks” is nothing more special than the attention to “marble” and “basket”. There is no evidence that machines can extract conception of mental states from the linguistic inputs. Besides, people found LLMs often make “stupid mistakes” which imply their inadequacy to grasp high-level understandings. This has been addressed in Choi’s talk (2023), where she stressed that language models are not knowledge models, so there is a crucial need to build the “Common Sense” for language models to approximate intelligence, which may include Visual Common Sense, Physical Common Sense, Social Common Sense, Norms and Morals, and Theory of Mind. Important high-level mental conceptions including ToM should be implanted in an arch framework that monitors the state of the machine and conducts “Symbolic Knowledge Distillation”, instead of relying on fine-tuning for superficial demonstration of intelligence.
This is what we have got so far in computer science in search of ToM. Meanwhile, psychologists have also tried employing quantitative measures to evaluate ToM. Gopnik and Schulz (2004) suggested that infants and children have the prerequisites for making causal inferences consistent with causal Bayes net learning algorithms. Their study indicates infants and children’s ability to learn from evidence in the form of conditional probabilities, interventions and combinations of the two. The study offers another pathway of utilizing computer algorithms to model human activities. Although such a reverse approach does not appear persuasive, it bridges the gap between our understanding in human and machine cognition mechanisms. We are still far from the conclusion that machines can / cannot have ToM. This topic also requires study at a philosophical level, for example, on the ability of self-mentalization through “introspection”, or “inner sense” (Goldman, 2012). Perhaps one day, we will be able to prove that machines do have ToM when we declare their equivalent possession of mind, and consciousness.
Alammar, Jay. “The Illustrated Transformer.” Visualizing Machine Learning One Concept at a Time., 27 June 2018, jalammar.github.io/illustrated-transformer/.
“APA Dictionary of Psychology - Theory of Mind.” American Psychological Association, dictionary.apa.org/theory-of-mind. Accessed 25 May 2023.
Choi, Yejin. “Why AI Is Incredibly Smart and Shockingly Stupid.” YouTube, 28 Apr. 2023, www.youtube.com/watch?v=SvBR0OGT5VI.
Kosinski, Michal. “Theory of Mind May Have Spontaneously Emerged in Large Language Models.” arXiv:2302.02083 [cs.CL].
Goldman, Alvin I. “Theory of Mind.” Oxford Handbooks Online, 2012,
Gopnik, Alison, and Laura Schulz. “Mechanisms of Theory Formation in Young Children.” Trends in Cognitive Sciences, vol. 8, no. 8, 2004, pp. 371–377,
Peterson, Candida C., and Michael Siegal. “Representing Inner Worlds: Theory of Mind in Autistic, Deaf, and Normal Hearing Children.” Psychological Science, vol. 10, no. 2, 1999, pp. 126–129, https://doi.org/10.1111/1467-9280.00119.
Sap, Maarten, et al. “Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs.” arXiv:2210.13312 [cs.CL]
Vaswani, Ashish, et al. “Attention Is All You Need.” arXiv:1706.03762 [cs.CL] Wei, Jason, et al. “Emergent Abilities of Large Language Models.” arXiv:2206.07682 [cs.CL]
Whang, Oliver. “Can a Machine Know That We Know What It Knows?” The New York Times, 27 Mar. 2023, www.nytimes.com/2023/03/27/science/ai-machine-learning-chatbots.html.
Ullman, Tomer. “Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks.” arXiv:2302.08399 [cs.AI].