The Future of ChatGPT: We must Separate Language from Thought

ChatGPT at work Credit: www.thehindu.com

Introduction

‘Deep Learning’ techniques have given rise to generative AI-algorithms that can be prompted to make predictions, create images or write text. The field’s best-known example is OpenAI’s ChatGPT – scanning massive amounts of text to learn patterns between words and then respond to text prompts from users. However, first experiences suggest that language and thought are two distinct entities which need to be considered if we discuss the future potential of ChatGPT and transformer-based generative AI language models.

The Problem with Large Language Models (LLMs)

In an article published February 20, 2023, by bdtechtalks.com, Ben Dickson states that we need a different framework to discuss the future of LLMs. Researchers at the University of Texas and the Massachusetts Institute of Technology (MIT) argue, that to understand the power and limits of LLMs, we must separate ‘formal’ from ‘functional’ linguistic competence. The researchers explored two common fallacies related to language and thought. The first one claims that a LLM that is good at language is also good at thinking. The scientists describe this as the ‘good at language – good at thought’ fallacy. It suggests that large language models represent a first step towards ‘Artificial General Intelligence (AGI)’ which is considered the ‘Holy Grail’ of AI. The second fallacy, called ‘bad at thought – bad at language’, suggests that if a language model cannot fully capture the richness and sophistication of human thought, it cannot represent a model of human language. These language models have poor common sense reasoning abilities and lack consistent world knowledge. “These two fallacies really stem from the same misconception equating language and thought,” a researcher at MIT says. One reason for making this mistake is that in real life we do not have access to another person’s thoughts. If we want to know how good someone is as ‘thinker’, the best way to find out is to ask him a question and listen to his response. The researchers argue that evidence from cognitive science and neuroscience shows that language and thought in humans are highly dissociable. MRI scans show that the brain’s language network is very active when people listen to someone and read or generate sentences. When they are performing arithmetic and logical reasoning to write programs the brain’s neural activity is rather low. Hence, the neural machinery dedicated to processing language is separate from the machinery responsible for reasoning.

What Comes First, Language or Thought?

The relationship between language and thought has been investigated by a broad spectrum of scientists including linguists, philosophers, cognitive scientists, psychologists and anthropologists. Language is a symbolic tool that we use to communicate our thoughts. It mirrors thinking to communicate our experience of cognition. As a result, one can conclude that the language we speak not only facilitates the communication of thought but also shapes and diversifies thinking. Language acquisition refers to the process by which we learn to understand, using words and grammar to communicate. The way we acquire language is affected by many factors. We know that learning a language is not just about learning words. We must learn how to correctly connect the words, what they mean in each context and to order the words in such a way that other people will be able to understand. Defining these complex relationships between language and thinking, the following definition may be used:

First, the existence of language as a cognitive process affects the system of thinking; Second, thinking comes before language is learned; Third, language spoken may affect the system of thinking.

One way to understand whether thought or cognitive processes exist before language is to study infants’ relationship to language. Infants can categorize objects and actions before they can talk. They understand the cause-and-effect relationship between events and movements starting at the age of one. According to researchers at NYU, infants outperform current artificial intelligence algorithms and outperform AI in common-sense as today’s neural-network models fail to capture infants’ knowledge. These research results highlight fundamental differences between computation and human cognition, pointing to shortcomings in current machine learning applications.

With ChatGPT and Search towards AGI?

OpenAI, the creator of ChatGPT, claims that its programs are approaching general intelligence that would put the machines on par with human intelligence. But by just making models better at word prediction will not be enough to reach this ambitious goal. Different training methods are required to spur further advances in AI, for instance adding social reasoning to word prediction. ChatGPT has already taken a step in that direction. Besides reading massive amounts of text , prompts from human feedback are part of their transformer-based knowledge-system. “Language is more than just syntax,” says Gary Marcus, a cognitive scientist and prominent AI researcher. “It is also about semantics and the struggle to comprehend how a sentence derives meaning from the structure of its parts. That a large language model is good at language is overstated,” Marcus says. There is still a healthy discussion as to how programs such as ChatGPT ‘understand’ the world by simply being fed data from books and Wikipedia entries. Meaning is negotiated in our interactions and discussions, not only with other people but also with the world. We reach this level of understanding with the engagement of language. If that is correct, building a truly intelligent machine would require a different way of combining language and thought – not just layering different algorithms but designing a program that might learn language and navigate social relationships at the same time. Humans make good use of language because they combine thought with meaning. A computer that masters the rules of language requires intelligence. But before we can use our organic brains to better understand silicon ones, we will need new ideas to complement the significance of language. Humans record and transmit skills and knowledge through books, articles and LLMs. They write with an implicit assumption of the reader’s general knowledge while, at the same time, a vast amount of information is assumed and unsaid. If AI lacks this basis of common-sense knowledge, it will not achieve AGI.

Kosmos-1: Microsoft’s New AI-Model to achieve AGI

According to an article written by Liam Tung and published by CDNET in March 2023, Microsoft’s announced Kosmos-1 software is a multimodal large language model (MLLM) that can not only respond to language prompts but also to visual cues. This feature can be used for an array of tasks, including image captioning, visual question-answering and more. OpenAI’s ChatGPT has helped popularize the concept of LLMs with its GPT (Generative Pre-trained Transformer) model and the possibility of transforming a text prompt or input into an output. However, Microsoft’s AI researchers argue that multimodal perception and ‘grounding’ in the real world is needed to move beyond ChatGPT-like capabilities to achieve AGI. To understand the importance and complexity of AGI, it is worthwhile looking at some of its design parameters. Whereas deep learning has enabled major advances in computer vision, today’s AI-systems are far from developing human-like sensory-perception capabilities. Due to poor colour cognition- to list just one example – self-driving cars have been fooled by small pieces of black tape or stickers on a red stop sign. For any human, the redness of the stop sign is still completely evident, but the deep learning–based system gets fooled into thinking the stop sign is something else. Multimodal input greatly expands the application-spectrum of language models, for example in multimodal machine learning, document intelligence and robotics. Microsoft’s goal is to align multimodal perception with LLMs, so that these new MLLM applications are capable not just to see and talk but to understand the semantics and causal structure of web pages as well.

Conclusion

To achieve human level intelligence, machines need to be ‘grounded’ to reality. Today’s systems cannot reason because the data available does not describe causal relationships. However, as AI-research advances exponentially, within a few years, we might have systems that map and correlate human dialogue across the entire internet. As a result, knowledge will be advanced to new frontiers and the ‘Holy Grail’ of AGI might finally be reached. Content, which is so crucial in human causal cognition, is a product of culture which has evolved over thousands of years. We will be challenged to adapt to a significant change in AI-technology and only time will tell how successful and sustainable we master this process.

SINGULARITY 2030