What AI’s sensory vacuum tells us about thinking on the ‘path to meaning’

What AI’s sensory vacuum tells us about thinking on the ‘path to meaning’
By Christopher Summerfield | Published: 2025-03-11 14:30:00 | Source: The Future – Big Think

Sign up for Big Think Books
A space dedicated to exploring the books and ideas that shape our world.
It is certainly safe to assume that today’s text-based LLM students, who have never seen the real world and therefore have no iconic representations on which to base their learning, are imaginative – they never think in images. But does this mean that they are unable to think at all, or that their words are completely devoid of meaning?
To answer this question, let us consider the story of a human being who grew up with very limited opportunities to sense the physical world. Helen Keller was one of the most prominent figures of the twentieth century. Born into a respectable Alabama family, when she was only nineteen months old she suffered a bout of meningitis and tragically lost her sight and hearing. She spent the next few years struggling to understand the world through her remaining senses, for example recognizing family members by the vibration of their steps. At the age of six, her mother hired a local blind woman to try to teach her to communicate by drawing letters on her hand. In her autobiography, Keller emotionally recounts the Damascene moment in which she became aware of these movements W– A– T– e– R On her palm symbolizes the wonderful thing flowing down her hand. As she expressed it with great joy: “The living Word awakened my soul, gave it light, gave it hope, and set it free!”
Keller’s story gives us a fascinating insight into what it means to grow up in a sensory environment devoid of visual and auditory references. At first glance, her tale seems to confirm the idea that words take on meaning only when they are connected to physical experiences. I finally got the word water When she realized that it referred to an experience in the physical world – the cold sensation of liquid running down her hand. It is as if Keller is narrating the precise moment when she was first able to match symbols with real-world entities, allowing their meaning to flow. By contrast, holders of a master’s degree in law (who unfortunately do not have hands and cannot feel the cold water on their fingers) remain prisoners.
However, there is a catch. Defining words as meaningful only when they refer to concrete objects or events (such as a real monkey riding a real bicycle) would radically strip large swaths of language of their meaning. We understand many words that do not refer to physical things, and cannot be seen, heard or experienced directly: words such as Square root, prattleand Gamma radiation. We can all think well in terms of things that do not exist (and therefore have no reference) such as a peach the size of a planet or a tyrannical whale ruling the Indian Ocean. Helen Keller herself grasped the meaning of countless concrete concepts that she could neither see nor hear, such as clouds, Bird soundsand red. So words do not become meaningful just because they refer to things that can be seen, heard, touched, tasted or smelled. They also get meaning from the way they are associated with other words. Indeed, the claim that “meaning” and “understanding” only arise when words are unjustly associated with physical sensations implies that speech made by people with diminished sensory experiences is somehow less meaningful, or that they themselves are less able to “understand” the words they speak. These claims are clearly false. Helen Keller, who never regained her sight or hearing but became a revered scholar, writer, political activist, and disability rights activist, owes much of her wisdom to the meaning that language conveys through its intrinsic patterns of association—the way words relate to other words.
Therefore, meaning can be obtained through two different paths. There is the data-linguistic highway, on which we learn this Spider goes with Web. Then there’s the low road of perceptual data, where we see an eight-legged insect in the middle of a geometric grid, glistening in the morning dew. Most people have the luxury of traveling both ways, and can thus learn to associate words with words, things with things, words with things, and things with words.
By contrast, LLM holders who are exclusively trained to use chatbots can only travel the highway: they can only use linguistic data to learn about the world. This means that any thinking or thinking they may do will inevitably be very different from ours. We can use mental representations formed by our direct experience of objects, space, or music to think about the world, rather than having to rely solely on propositions in natural language. This is why, for humans, thinking and speech are not closely related. As recent research suggests, our “formal” linguistic competence (the ability to construct correct sentences) does not constrain our “functional” linguistic competence (the ability to think formally or display common sense). Clear evidence of this disconnect comes from patients who have suffered damage to the parts of the brain responsible for language production. If you’re unlucky enough to have a stroke that affects the left side of your brain, you may end up developing a disability called aphasia. Patients with aphasia usually have difficulties with speech, an inability to find the right words (anomia), or problems forming sentences (grammaticality). However, human deficits in sentence formation do not necessarily go hand in hand with difficulties in thinking, because aphasias often have remarkably intact thinking or creative powers.
As MBA students evolve beyond chatbots, their opportunities to recognize relational patterns in the physical world – from images and videos – will improve dramatically, and as they do so, their way of thinking will come one step closer to ours.
Christopher Summerfield
Most current publicly available LLM programs are primarily chatbots – they take text as input and produce text as output (although advanced models, such as GPT-4 and Gemini, can now produce images as well as words – and text-to-video models will soon become widely available). They are trained in natural language, as well as some formal languages, and lots of examples of computer code. Their ability to reason, mathematics, and syntax is thus entirely based on their internal representations of these symbolic systems, such as Korean or C++.
In contrast, humans enjoy real-world experiences that are not limited to words. This means that we can use other types of mental representations to think, such as the pleasing pattern of notes that strike the ear when listening to a string quartet, the geometric projection of an algebraic expression, or the strategic spatial arrangement of pieces on a chessboard when planning a checkmate. This is why, when the language system is damaged, our ability to think is partially spared – we can turn to these alternative substrates for thinking. This is another way in which LLM cognition is strikingly different from human cognition.
The next generation of multimodal MBAs has arrived – those who receive images and videos as well as language as input. As MBA students evolve beyond chatbots, their opportunities to recognize relational patterns in the physical world – from images and videos – will improve dramatically, and as they do so, their way of thinking will come one step closer to ours.
Sign up for Big Think Books
A space dedicated to exploring the books and ideas that shape our world.
ــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــــ





