Transformers to Improve Memory, a Paradigm Shift in AI?

Posted by Peter Rudin on 23. September 2022 in Essay

Sharpen Your Memory 


Our memory is always involved when we try to distinguish between the mental and the physical world. We are good at remembering landmarks and settings and, if we give our memories a location for reference, remembering becomes easier. To remember long speeches, ancient Greek and Roman orators imagined wandering through ‘memory palaces’ full of reminders. In the past two decades, research has shown that at least two of our faculties – memory and navigation – have a physical basis in the brain. As neuroscience continuous to reveal more insights on the functionality of the human brain, new theories are proposed to model and improve the application of Artificial Intelligence (AI).

Information Processing Theories

According to What is Information Processing Theory? Stages, Models & Limitations | , Information Processing Theory is the traditional approach for studies about cognitive development which aim to explain how information is encoded into our memory. The theory is based on the idea that humans do not merely respond to stimuli from the environment. Instead, humans process the information they receive. The theory explains how information is captured, stored and retrieved and describes how we receive input from the environment through our senses such as touch, smell or vision. The input is stored in our memory and retrieved when needed to perform a predefined task. Consequently, the theory addresses behaviour as well as cognition. As AI-models related to the Information Processing Theory may slightly vary, they typically consist of three key elements described as follows:

Information storage: The different places where information is stored in our mind, such as sensory memory, short-term memory, long-term memory, semantic memory, episodic memory and more. Cognitive processes: The various processes which transfer memory among different memory stores. Some of the processes include perception, coding, recording and retrieval. 

Executive cognition: The awareness of the individual as to how information is processed within him- or herself. It also pertains to knowing one’s own strengths and weaknesses performing cognitive tasks.

There have been several attempts to develop models of information processing. One of the most popular is the multi-store model defined by the psychologists Atkinson and Shiffrin in 1968. The model explains the three subsections of human memory and how they work together:

Sensory Memory holds the information that the mind perceives through various senses such as visual, olfactory or auditory information. These sense organs receive a barrage of stimuli all the time. However, most are ignored and forgotten to prevent the mind from getting overwhelmed. When sensory information gets the attention of the mind, it is transferred to short-term memory. Short-Term Memory, also called working memory, stores information for about 30 seconds. Cognitive abilities affect how individuals process information in the working memory. Additionally, attention and focus on the most important information play an important role in encoding and transferring it into long-term memory. Long-Term Memory with its unlimited amount of space can store information for a long time to be retrieved later as needed. Various methods are used to store information in long-term memory such as repetition or relating information to meaningful experiences.                                 

Limitations of the Information Processing Theory

The Information Processing theory is based on the analogy that our brain works like a computer.

This computational model has been the most prominent metaphor in neuroscience for decades. It implies that computers are very closely aligned to the functionality of the human brain. Any information-processing system consists of five main components — input, output, storage, processing and program. By viewing the brain as a computer that passively responds to inputs and processes data, we forget that the brain is an active organ, part of a body that is interacting with the world, and which has an evolutionary past that has shaped its structure and function. The brain is not simply passively absorbing stimuli and representing them through a neural code, but rather is actively searching through alternative possibilities to test various options. The brain does not represent information – it constructs it. New research-findings indicate that the computer analogy is limiting our ability to look ahead. According to Our Brain is not a Computer, Perhaps a Transducer? › SINGULARITY 2030 ,a so-called ‘Transducer-Model’ might present a better foundation for explaining brain functionality. Going a step further, Natural Language Processing (NLP) which utilizes so called ‘Transformer Models’ with billions of nodes mapped by deep Artificial Neural Networks (ANNs), is expanding its AI-application potential way beyond NLP.

The Transformer Model

A transformer model is a neural network that learns context and meaning by tracking relationships in sequential data such as the words in this sentence. Transformer models apply an evolving set of mathematical techniques, called attention or self-attention, to detect textual relationships in a series of words and sentences that depend on each other. First described in a 2017 paper from Google, transformers are among the newest and one of the most powerful classes of models invented to date. In a paper published in August 2021, Stanford researchers define transformers as “foundation models” because they see them driving a paradigm shift in AI. The “sheer scale and scope of foundation models over the last few years have stretched our imagination of what is possible,” they wrote. A foundation model is any model that is trained on broad data (generally using self-supervision) to be fine-tuned to a wide range of downstream tasks. From a technological point of view, foundation models are not new – they are based on deep neural networks and self-supervised learning, both of which have existed for decades. However, the sheer scale and scope of foundation models from the last few years have stretched our imagination of what is possible. The significance of foundation models can be summarized by two words: emergence and homogenization. Emergence means that the behavior of a system is implicitly induced rather than explicitly constructed – it is both the source of scientific excitement and anxiety about unanticipated consequences. Homogenization indicates the consolidation of methodologies for building machine learning systems across a wide range of applications – it provides strong leverage towards many tasks but also creates single points of failure. 

Transformer Model to Mimic the Brain

Transformers use a mechanism called self-attention, in which every input – a word, a pixel, a number in a sequence – is always connected to every other input. In contrast, standard ANNs connect inputs only to certain other inputs. But while transformers were designed for language tasks, they have since excelled at other tasks such as classifying images. New research in neuroscience suggests that transformers can also be used to describe brain functionality. Understanding how the brain organizes and accesses spatial information is a huge challenge. The process involves recalling an entire network of memories and stored spatial data from tens of billions of neurons, each connected to thousands of others. Neuroscientists have identified key elements such as grid cells which are neurons that map locations. For years, neuroscientists have worked with many types of neural networks  – the engines that power most deep learning applications  – to model the firing of neurons in the brain. However, researchers now find that the hippocampus – a structure of the brain critical to memory – is basically a special kind of neural net, similar to a transformer. Their new model tracks spatial information in a way that parallels the inner workings of the brain. Studies by Stanford scientists and others hint that transformers can greatly improve the ability of neural network models to mimic the sorts of computations carried out by grid cells and other parts of the brain. Such models could increase our understanding of how artificial neural networks work and how computations are carried out in the brain. “We are not trying to re-create the brain,” says David Ha, a computer scientist at Google Brain who also works on transformer models. “But can we create a mechanism that can do what the brain does?” Ha recently designed a neural network model that could intentionally send large amounts of data through a transformer in a random, unordered way, mimicking how the human body transmits sensory observations to the brain. His transformer, similar to our brains, could successfully handle a disordered flow of information. Today’s ANNs are hard-wired to accept a particular input. But in real life, data sets often change quickly, and most AI-systems do not have any way to adjust. Despite these signs of progress, David Ha and his colleagues at Stanford see transformers as just a step toward an accurate model of the brain – not the end of the quest. 


The arguments to advance AI from a computer to a transformer model are intriguing. Improving the accuracy of memory with a neural foundational model implemented with an intelligent machine, might indeed signal a paradigm shift. According to the researchers working at Stanford University’s Institute for Human-Centered Artificial Intelligence (HAI), the complexity involved to execute this shift is significant. More multidisciplinary research efforts are required. However, reaching the goal to move humanity from a machine-centered to a human-centered AI is well worth any effort.

Leave a Reply

Your email address will not be published.