Sarah, an artificial human for customer support / Picture Credit: Soul Machines
Introduction
Emotional AI involves a broad range of technologies aimed at automating the measurement of opinions, feelings, and behaviors. It relies on natural language processing (NLP) & understanding (NLU), as well as modern psychology to extract the relevant information on human opinions and feelings. Increasingly, it involves face- and voice-recognition technologies to analyse tone, facial expressions and the mood of the speaker. As conversational, human-like avatars are entering the knowledge-transfer market, a new era of communication technology is about to emerge.
According to Wikipedia, emotion is any conscious experience characterized by intense mental activity and a certain degree of pleasure or displeasure. Scientific discourse has drifted to other meanings and there is no consensus on a definition. Emotion is often intertwined with mood, temperament, personality, disposition, and motivation. In some theories, cognition is an important aspect of emotion. Those acting primarily on the emotions they are feeling, may seem as if they are not thinking, but mental processes are still essential, particularly in the interpretation of events. For example, the realization that we are in a dangerous situation and the subsequent arousal of our body’s nervous system (rapid heartbeat and breathing, sweating, muscle tension) is integral to the experience of feeling afraid.
Observing Emotions with Intelligent Machines
Emotions can be observed and interpreted by intelligent machines. Several devices and systems are available to measure and analyse emotional expressions and reactions of humans in a conversational setting:
- Intelligent cameras to analyse facial expressions and bodily movements
- Intelligent voice recorders to analyse vocal expressions
- Intelligent body sensors to recognize deviations from a predefined norm such as an acceleration of heartbeat, body temperature, blood sugar (diabetes) or blood pressure etc.
The catalogue of means to observe emotions continuous to grow. For example, researchers from the Centre of Cognitive and Brain Sciences at Ohio State University have recently published a paper titled ‘Facial colour is an efficient mechanism to visually transmit emotion’. The surface of the face is innervated with a large network of blood vessels. Blood-flow variations in these vessels yield visible colour changes on the face. The researchers have investigated the hypothesis that these visible facial colours facilitate the successful transmission and visual interpretation of emotions even in the absence of facial muscle activation. Consider, for instance, the redness that is sometimes associated with anger or the paleness associated with fear. The research findings support a revised model of the production and perception of facial expressions of emotions where facial colour is an additional mechanism to visually transmit and decode emotion.
The ‘intelligence’ of emotion-sensing is provided by deep learning algorithms, using neural networks and large data-sets for training. As the quality of data improves, the quality of the interpretation of emotional reactions observed improves, up to the point where the intelligent machine knows more about an individual’s emotional state than the individual does.
Human and Economic Considerations
Augmenting human intellectual capacity with avatars can provide value, especially in conversational knowledge-creation scenarios. Given the pace of avatar research, we can assume that within the next 5 years one no longer can distinguish a real person from an avatar in a conversation carried out via screen. We have become used to 2-D screen communication as the quality/resolution of video has been significantly improved over the last decade and bidirectional connectivity in real-time is no longer a limiting factor.
Trust is one of the most fundamental requirements for sincere and effective communication. Fake news and manipulated videos are a factor which inhibits the acceptance of an avatar as a bidirectional communication partner. In response, legal and ethical ground rules are being defined both by governments as well as software and service providers to provide barriers against this misuse.
Human knowledge capacity is limited. It is impossible for one individual to keep up with the continuous stream of new knowledge produced by the research and product design community. To overcome this bottleneck the conversational avatar provides a ‘human-like’ interface to cloud-based knowledge platforms to cover a specific area of interest. Consequently, economic considerations will set the pace of future avatar applications as avatars can be accessed 24/7, providing knowledge way beyond a human’s individual capacity to memorize knowledge.
The Avatar as Carrier of Empathy and Personality
Trust and empathy are the key quality factors in conversational avatar communications. While trust is associated with the credibility and data-security of the avatar’s service provider, empathy relates to the communication skills of an avatar. Empathy improves communication at an emotional level and it grows over time as the communication with an avatar augments the user’s knowledge space and self-respect. Compared to currently implemented voice-activated bots like Siri or Alexa, conversational avatars are far more efficient in transferring knowledge. By analysing reactions and learning in real time, they not only recognize emotional expressions but respond appropriately and interactively. Facial expressions are a rich and subtle way to convey meaning. From birth, we are genetically programmed to respond to faces. The natural interactions in the first months and years of life are a fundamental element of learning and lay the foundation for successful social and emotional functioning through life.
An avatar in his role as an artificial human has a distinct personality, possibly virtually copied from a real-life individual. The personality of the avatar must convey sympathy and competence and match the role a human would have in the ‘real’-world. If, for example, the avatar acts as virtual customer support agent, a range of emotional responses, expressions, and behaviors must be implemented that are consistent with the role and the core values of the organization that the avatar is representing.
State-of-the-Art: The BabyX Project
Developed over several years by an engineering research team at the Laboratory for Animate Technologies at the University of Auckland’s Bioengineering Institute, BabyX is a working intelligent, emotionally responsive avatar. A spin-off from the university, Mark Sagar and his team have launched a company called Soul Machines to further develop the BabyX technology for commercial applications. Over the past 18 months, avatars have been launched for customer support in financial services (Royal Bank of Scotland, Daimler Financial Services) or software product support (Autodesk). Their latest avatar called ‘Will’ teaches children on sustainable energy issues. Honouring its efforts and accomplishments, Soul Machines has been selected to become a ‘2018 Tech Pioneer’ by the World Economic Forum.
BabyX” was created as an autonomously animated psycho-biological model of a virtual infant. This presents two problems: first the representation of the baby – does she look and emote correctly, and second, can she interact. Having solved these problems, BabyX is able to recognize an individual via camera and voice input for facial tracking and voice analysis. To answer a query, she has a bio-based ‘brain’ that reacts to the individual, displaying her emotional response. One can show her a picture of a sheep and she will smile and say ‘sheep’. The BabyX framework models the current best thinking on how the human brain works. For example, expressions are generated by neural patterns in both the subcortical and cortical regions of the human brain. Evidence suggests that certain basic emotional expressions like laughing or crying do not have to be learned. In comparison, voluntary facial movements, such as those involved in speech and culture-specific expressions, are learned through experience and predominantly rely on cortical motor control of the human brain. BabyX’s biologically inspired nervous system consists of an interconnected set of neural system and subsystem models which generate face muscle-activation-based animation from a continuously evaluated neural network, including fine details of visually important elements such as mouth, eyes, eyelashes, and eyelid geometry. Another major effort has gone into her ability to speak. BabyX babbles with a synthesized voice sampled from phonemes produced by a real child. The research team has been implementing techniques so BabyX can assimilate an acoustic mapping from any arbitrary voice to construct new words using her own voice.
From BabyX to Expert-Avatar Services
To provide expert-avatar services for customer support, Soul Machines has developed an API to connect their ‘human behavioral frontend’ with IBM’s Watson cloud-based knowledge management software. The integration between the Soul Machine platform and IBM Watson is relatively straightforward to set up. The Soul Machine frontend sends audio of the customer’s voice to Watson. Then Watson converts it into text and searches a corpus of knowledge for relevant answers to the customer’s question, ranks the results and returns the top-ranked answer to the Soul Machine frontend. Meanwhile, the Soul Machine frontend has analysed the audio-visual input for emotional cues from the customer’s tone of voice and his facial micro-expressions. It then converts the answer provided by Watson into modulated, emotionally inflected speech for the avatar to deliver, matched with appropriately generated facial expressions.
Conclusion
When the internet started about 25 years ago it quickly became the leading communication standard across the globe. The continuous increase in bandwidth and computational processing and storage capacity coupled with fundamentally new applications has changed our daily lives with an unforeseen intensity. The changes experienced in this short time-frame exceeds what humanity has experienced in the previous 250 years. The conversational avatar, acting as an artificial human, defines a new scenario of intelligent machine communication, possibly as far reaching as the launch of the internet 25 years ago. Expanding at an exponential rate, this new form of man-machine interaction is likely to cause serious disruptions both at the individual as well as the socio-economic level. Are we really prepared for this?