ChatGPT and Large Language Models: Syntax and Semantics

For more on artificial intelligence (AI) in investment management, check out The Handbook of Artificial Intelligence and Big Data Applications in Investments, by Larry Cao, CFA, from the CFA Institute Research Foundation.

A New Frontier for Finance?

The banking and finance sectors have been among the early adopters of artificial intelligence (AI) and machine learning (ML) technology. These innovations have given us the ability to develop alternative, challenger models and improve existing models and analytics quickly and efficiently across a diverse range of functional areas, from credit and market risk management, know your customer (KYC), anti-money laundering (AML), and fraud detection to portfolio management, portfolio construction, and beyond.

ML has automated much of the model-development process while compressing and streamlining the model development cycle. Moreover, ML-driven models have performed as well as, if not better than, their traditional counterparts.

Today, ChatGPT and large language models (LLMs) more generally represent the next evolution in AI/ML technology. And that comes with a number of implications.

The finance sector’s interest in LLMs is no surprise given their vast power and broad applicability. ChatGPT can seemingly “comprehend” human language and provide coherent responses to queries on just about any topic.

Its use cases are practically limitless. A risk analyst or bank loan officer can have it assess a borrower’s risk score and make a recommendation on a loan application. A senior risk manager or executive can use it to summarize a bank’s current capital and liquidity positions to address investor or regulatory concerns. A research and quant developer can direct it to develop a Python code that estimates the parameters of a model using a certain optimization function. A compliance or legal officer may have it review a law, regulation, or contract to determine whether it is applicable.

But there are real limitations and hazards associated with LLMs. Early enthusiasm and rapid adoption notwithstanding, experts have sounded various alarms. Apple, Amazon, Accenture, JPMorgan Chase, and Deutsche Bank, among other companies, have banned ChatGPT in the workplace, and some local school districts have forbidden its use in the classroom, citing the attendant risks and potential for abuse. But before we can figure out how to address such concerns, we first need to understand how these technologies work in the first place.

ChatGPT and LLMs: How Do They Work?

To be sure, the precise technical details of the ChatGPT neural network and training thereof are beyond the scope of this article and, indeed, my own comprehension. Nevertheless, certain things are clear: LLMs do not understand words or sentences in the way that we humans do. For us humans, words fit together in two distinct ways.

Syntax

On one level, we examine a series of words for its syntax, attempting to understand it based on the rules of construction applicable to a particular language. After all, language is more than jumbles of words. There are definite, unambiguous grammatical rules about how words fit together to convey their meaning.

LLMs can guess the syntactic structure of a language by the regularities and patterns they recognize from all the text in their training data. It is akin to a native English speaker who may never have studied formal English in school but who knows what kinds of words are likely to follow in a series given the context and their own past experiences, even if their grasp of grammar may be far from perfect. LLMs are similar. Since they lack an algorithmic understanding of the syntactic rules, they may miss some formally correct grammatical cases, but they will have no problems communicating.

Graphic for Handbook of AI and Big data Applications in Investments

Semantics

“An evil fish orbits electronic games joyfully.”

Syntax provides one layer of constraint on language, but semantics provides an even more complex, deeper constraint. Not only do words have to fit together according to the rules of syntax, but they also have to make sense. And to make sense, they must communicate meaning. The sentence above is grammatically and syntactically sound, but if we process the words as they are defined, it is gibberish.

Semantics assumes a model of the world where logic, natural laws, and human perceptions and empirical observations play a significant role. Humans have an almost innate knowledge of this model — so innate that we just call it “common sense” — and apply it unconsciously in our everyday speech. Could ChatGPT-3, with its 175 billion parameters and 60 billion to 80 billion neurons, as compared with the human brain’s roughly 100 billion neurons and 100 trillion synaptic connections, have implicitly discovered the “Model of Language” or somehow deciphered the law of semantics by which humans create meaningful sentences? Not quite.

ChatGPT is a giant statistical engine trained on human text. There is no formal generalized semantic logic or computational framework driving it. Therefore, ChatGPT cannot always make sense. It is simply producing what “sounds right” based on what it “sounds like” according to its training data. It is pulling out coherent threads of texts from the statistical conventional wisdom accumulated in its neural net.

Key to ChatGPT: Embedding and Attention

ChatGPT is a neural network; it processes numbers not words. It transforms words or fragments of words, about 50,000 in total, into numerical values called “tokens” and embeds them into their meaning space, essentially clusters of words, to show relationships among the words. What follows is a simple visualization of embedding in three dimensions.

Three-Dimensional ChatGPT Meaning Space

Of course, words have many different contextual meanings and associations. In ChatGPT-3, what we see in the three dimensions above is a vector in the 12,228 dimensions required to capture all the complex nuances of words and their relationships with one another.

Besides the embedded vectors, the attention heads are also critical features in ChatGPT. If the embedding vector gives meaning to the word, the attention heads allow ChatGPT to string together words and continue the text in a reasonable way. The attention heads each examine the blocks of sequences of embedded vectors written so far. For each block of the embedded vectors, it reweighs or “transforms” them into a new vector that is then passed through the fully connected neural net layer. It does this continuously through the entire sequences of texts as new texts are added.

The attention head transformation is a way of looking back at the sequences of words thus far. It is repackaging the past string of texts so that ChatGPT can anticipate what new text might be added. It is a way for the ChatGPT to know, for instance, that a verb and adjective that have appeared or will appear after a sequence modifies the noun from a few words back.

The best thing about ChatGPT is its ability to _________

Most ProbableNext WordProbabilitylearn4.5%predict3.5%make3.2%understand3.1%do2.9%

Source: “What Is ChatGPT Doing . . . and Why Does It Work?” Stephen Wolfram, Stephen Wolfram Writings

Once the original collection of embedded vectors has gone through the attention blocks, ChatGPT picks up the last of the collection of transformations and decodes it to produce a list of probabilities of what token should come next. Once a token is chosen in the sequence of texts, the entire process repeats.

So, ChatGPT has discovered some semblance of structure in human language, albeit in a statistical way. Is it algorithmically replicating systematic human language? Not at all. Still, the results are astounding and remarkably human-like, and make one wonder if it is possible to algorithmically replicate the systematic structure of human language.

In the next installment of this series, we will explore the potential limitations and risks of ChatGPT and other LLMs and how they may be mitigated.

If you liked this post, don’t forget to subscribe to Enterprising Investor.

All posts are the opinion of the author. As such, they should not be construed as investment advice, nor do the opinions expressed necessarily reflect the views of CFA Institute or the author’s employer.

Professional Learning for CFA Institute Members

CFA Institute members are empowered to self-determine and self-report professional learning (PL) credits earned, including content on Enterprising Investor. Members can record credits easily using their online PL tracker.

Source link