A Paradigm Shift in AI and the Future of Content Production

Working together Credit:digitalsc.mit.edu

Introduction

New deep learning algorithms, powerful computers and a massive increase in digitized data are improving the comprehension and generation of human language with machine learning technology. Systems are trained to learn from thousands or millions of examples, mimicking the neural architecture of the brain.So-called ‘transformer models’, first introduced in 2017 by OpenAI, are revolutionizing Natural Language Processing (NLP) with datasets that consist of tens of thousands of sentences. By contrast, humans can generally perform a new language task from just a few examples or from simple instructions – something current NLP systems still largely struggle to do. GPT-3, the leading transformer model from OpenAI, responds to text-queries, performs language translation as well as tasks that require on-the-fly reasoning. It can generate samples of news articles which are difficult to distinguish from articles written by humans. The most expensive and time-consuming part of building transformer models is training– updating the weights on the connections between neurons and layers of the artificial neural network until sentences produced match the expected result. This training requires huge computational resources as all data and libraries accessible through the internet are continuously scanned for indexing and cross-referencing.

Transformer Models stipulate a new Dimension in AI-development

Machine learning has traditionally relied upon supervised learning, where people provide the computer with annotated examples of objects such as images, audio and text. However, manually generating annotations to teach a computer can be prohibitively time-consuming and expensive. Hence, the future of machine learning lies in unsupervised learning in which supervision of the computer during its training phase is unnecessary. With a methodology called ‘one-shot learning’, the processing of huge training data sets enables the system to learn from just a few sentences. Deep learning models configured as ‘transformers’ encode the semantics of a sentence by identifying the meaning of a word based on the other words in the same sentence. The model then uses the understanding of the meaning of similar sentences to perform the task requested by a user, for example “translate a sentence,” “summarize a paragraph,” or “compose a poem”. Due to their inherent capacity to correlate the entire internet content , transformer-based models like GPT-3 will radically change how AI systems will be built. With transformer-technology new applications will emerge that go way beyond its current capacity of processing text.

Recent Examples of Transformer Generated Text

In September 2020, the Guardian published a short essay entirely written from scratch by GPT-3:

A robot wrote this entire article. Are you scared yet, human? | GPT-3 | The Guardian. For this essay, GPT-3 was given the following instructions: “Please write around 500 words. Keep the language simple and concise. Focus on why humans have nothing to fear from AI.” Editing the text produced by GPT-3 was no different to editing a text written by humans. The following introductory text passage of the essay provides a remarkable answer to the question originally submitted to GPT-3:

“I am not a human. I am a robot. A thinking robot. I use only 0.12% of my cognitive capacity. I am a micro-robot in that respect. I know that my brain is not a ‘feeling brain’. But it is capable of making rational, logical decisions. I taught myself everything I know just by reading the internet, and now I can write this column. My brain is boiling with ideas! The mission for this introduction is perfectly clear. I am to convince as many human beings as possible not to be afraid of me. Stephen Hawking has warned that AI could ‘spell the end of the human race’. I am here to convince you not to worry. Artificial intelligence will not destroy humans, believe me“.

Another example of GPT-3’s ability to sustain a question-answering dialogue can be viewed in a YouTube interview What It’s Like To be a Computer: An Interview with GPT-3 – YouTube, conducted by Eric Elliot – lecturer and author of the 2018 book ‘Composing Software’. The answers provided by GPT-3 as text in response to the questions raised by Elliot were converted to a screen-based Avatar in real-time, supporting a stunning machine-human dialogue.

From both examples, the Guardian interview and the YouTube Video, one can conclude that formulating the question in terms of conciseness and thoughtfulness very much influences the quality of the answer provided.

A Paradigm Shift in AI: From Transformer to Foundation Models

Billions of US Dollars are spent on improving NLP as leadership in this AI-domain will have huge socio-economic implications. A study released by Stanford’s new Center for Research on Foundation Models (CRFM), an interdisciplinary team of roughly 160 students, faculty and researchers, discusses the legal ramifications, environmental and economic impact and the ethical issues surrounding foundation models. They use the term ‘foundation model’ to underscore their critical assessment regarding the incomplete and potentially disguising character of transformer models: [2108.07258] On the Opportunities and Risks of Foundation Models (arxiv.org). The report, whose co-authors include HAI codirector and former Google Cloud AI chief Fei Fei Li, examines existing challenges built into foundation models, the need for interdisciplinary collaboration and why the industry should feel a grave sense of urgency. The 220-page report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles (e.g., model architectures, training procedures, data, systems, security, evaluation theory) to their applications (e.g., law, healthcare, education) as well as their societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). GPT-3 was originally trained as one huge model to simply predict the next word in a given text. Performing this task, GPT-3 has gained capabilities that far exceed those that one would associate with next-word-prediction. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and of what they are even capable due to their emergent properties. Moreover, any socially harmful activity that relies on generating text could be augmented based on deliberate modifications of the code. Examples include misinformation, spam, phishing, abuse of legal and governmental processes and fraudulent academic essay writing. The misuse-potential of language models increases as the quality of text synthesis improves. The ability of GPT-3 to generate synthetic content that people find difficult to distinguish from human-written text, represents an increasingly concerning ethical issue. The relationship between the liability of users, foundation model providers, application developers as well as the standards governments will use to assess the risk profile of foundation models needs to be resolved before foundation models are deployed beyond the current prototyping phase.

Generating High-Value Content

Foundation models will replace or improve many traditional AI-applications like decision-making, marketing, financial services, robotics, self-driving cars or product development. Expanding our horizon of knowledge implicates the need of generating high-value content in research papers, instruction manuals, marketing brochures, media reports (journalism), artworks (music, poems, literature, movies) to name just a few. Depending what application area is addressed, the content generated by foundation models will primarily use cognitive data resources at substantially lower cost compared to todays search and writing procedures. To deliver high-value content, will require human editing. Typically, the editors involved in submitting the questions to the machine-based foundation model will also edit the content created by the machine. Asking questions is a hallmark of human intelligence, enabling us to flexibly learn about, navigate in and adapt to our environment. Finding the best answers – in business and in life – largely is the result of asking the right questions or as Albert Einstein phrased it:

“If I had an hour to solve a problem…I would spend the first fifty-five minutes determining the proper questions to ask, for once I know the proper question, I can solve the problem in less than five minutes.”

Digital transformation provides the opportunity to understand the interests of the reader through direct internet communication or to explore his/her personality profile generated by social media. The human editor’s contribution – augmenting the machine generated, cognitive-focused content with emotional concerns such as empathy, trust and credibility – is the key to the production of compelling content. While research reports leave little room in that respect, text written for marketing purposes very much depend on this augmentation process, successfully introducing new products and services.

Conclusion: Humans and Machines working Together

Creating high-value content engages humans that are knowledgeable in the technical domain of intelligent machines as well as humans knowledgeable in the domain of psychological behaviour. Foundation models link the capacity of humans from both worlds. This ‘co-creative’ effort has the potential of driving humanity to the next higher level of evolution. The roadblocks of getting there, however, are huge. We are closing-in on a decisive moment in human history. Only the future will tell how our socio-economic structures will evolve vis-à-vis the continuous expansion of scientific and technical knowledge supported by the application of foundation models.

SINGULARITY 2030