top of page

Theory of Mind in A.I. and Computer Ethics


     GPT models have gained a soaring popularity in public since November 2022 when the meta version of GPT-4 came out. As a Large Language Model (LLM) itself, GPT-4 has demonstrated an extraordinary power to solve “false-belief” tasks. The capacity to understand mental states, including emotions and intentions, of one and the other, is called “Theory of Mind (ToM).” The latest GPT model, updated GPT-4 published in March 2023, has shown its capability of highly attributing mental states, having a high level consciousness in ToM. However, the term ToM, is often seen in the psychological field for researching and interpreting certain phenomena from creatures, and the discussion of this interdisciplinary study sparkles a further connection between the field of artificial intelligence, neuroscience and humanities.

          Psychologists have long accepted that the theory of mind (ToM) can only be adopted by biologically living beings. For instance, dogs are capable of distinguishing between positive and negative emotions in other dogs and humans. Meanwhile, humans have a far more advanced mechanism for perceiving others' emotions and reacting accordingly. The development of ToM in humans is a time-series result of biological development: A person can grasp the concept of false belief between the ages of 3 to 4, and it is usually fully developed by the age of 5.  

Surprisingly, GPT-3.5 demonstrates the ability to identify false-belief tasks in a similar manner to a six-year-old human child.

How is this the case? The answer is actually quite intuitive: the need to perceive mental states in order to improve model performance. This suggestion is supported by the article 'Theory of Mind May Have Spontaneously Emerged in Large Language Models' by Michal Kosinski from Stanford University. The article states that the strong capacity of the ToM in GPT models is a byproduct of the language model's improvement in language skills, rather than an intentional feature engineered into the structure of the GPT model.


For example, when a ChatGPT user types questions in the query entry such as 'What should I do if I fail an interview that I have prepared for a long time?' or 'What if I get rejected from my favorite program? Should I share this with my friends?' the GPT model with a high degree of Theory of Mind (ToM) is capable of understanding the concerns the user has. It can provide feedback based on considerations that take care of the users' moods. This embedded feature is autonomously and spontaneously developed throughout the evolution of language models, based on their need to gain better understanding of the environment in order to provide reasonable and personalized suggestions.

          However, it's worth noting that GPT models do not process details with Theory of Mind (ToM) in the same way as creatures in nature do. When people think about others, there are electrical activities occurring between neurons within the nervous system. This results from information processing and transmission, phenomena that can be explained by neuroscience. In contrast, GPT models, and artificial intelligence models in general, do not exhibit similar neural interactivity. Instead, they generate responses through statistical training and testing based on the algorithms and information provided to them. This shows that these models do not fully autonomously manage ToM in their programming. Instead, they offer ToM feedback in accordance with different contexts in the environment. 

          This dependency on ToM raises significant ethical questions: Is it ethical to present the results of ToM feedback without understanding the origins of the training sets? This is similar to ethical issues surrounding the application of machine learning models in real life. AI models that generate responses containing ToM elements may also contain biases from the training datasets provided by the designers for the algorithms. These datasets may carry historical biases, selection biases, or even statistical biases that result from uncleaned data. The biases present in reflections provided by AI models, to varying degrees, correspond to the degree of biases and discrimination in the dataset. These biases could be amplified through the process of feedback loops. To make AI model responses more impartial and reasonable, and to align AI ethical principles with those of modern human society, scientists will need to address issues regarding data privacy, data transparency, and information justice.


Kosinki, M. (2023). Theory of mind may have spontaneously emerged in large language models. Stanford University.  


IDEO Team. (2019, September 30). Data Ethics and AI. Medium.

bottom of page