What Is ChatGPT Doing … and Why Does It Work?

by Stephen Wolfram — writings.stephenwolfram.com

Rating: 7/10

The first thing to explain is that what ChatGPT is always fundamentally trying to do is to produce a "reasonable continuation" of whatever text it's got so far, where by "reasonable" we mean "what one might expect someone to write after seeing what people have written on billions of webpages, etc."

And the remarkable thing is that when ChatGPT does something like write an essay what it's essentially doing is just asking over and over again "given the text so far, what should the next word be?"... and each time adding a word.

Because for some reason (that maybe one day we'll have a scientific-style understanding of) if we always pick the highest-ranked word, we'll typically get a very "flat" essay, that never seems to "show any creativity" (and even sometimes repeats word for word). But if sometimes (at random) we pick lower-ranked words, we get a "more interesting" essay.

It is worth understanding that there's never a "model-less model". Any model you use has some particular underlying structure... then a certain set of "knobs you can turn" (i.e. parameters you can set) to fit your data.

The big idea is to make a model that lets us estimate the probabilities with which sequences should occur, even though we've never explicitly seen those sequences in the corpus of text we've looked at.

Can we "mathematically prove" that they work? Well, no. Because to do that we'd have to have a mathematical theory of what we humans are doing.

Whatever input it's given the neural net will generate an answer, and in a way reasonably consistent with how humans might. As I've said above, that's not a fact we can "derive from first principles". It's just something that's empirically been found to be true, at least in certain domains. But it's a key reason why neural nets are useful: that they somehow capture a "human-like" way of doing things.

The point is that the trained network "generalizes" from the particular examples it's shown. Just as we've seen above, it isn't simply that the network recognizes the particular pixel pattern of an example cat image it was shown; rather it's that the neural net somehow manages to distinguish images on the basis of what we consider to be some kind of "general catness".

In other words, somewhat counterintuitively, it can be easier to solve more complicated problems with neural nets than simpler ones. And the rough reason for this seems to be that when one has a lot of "weight variables" one has a high-dimensional space with "lots of different directions" that can lead one to the minimum, whereas with fewer variables it's easier to end up getting stuck in a local minimum ("mountain lake") from which there's no "direction to get out".

What was found is that (at least for "human-like tasks") it's usually better just to try to train the neural net on the "end-to-end problem", letting it "discover" the necessary intermediate features, encodings, etc. for itself.

There was also the idea that one should introduce complicated individual components into the neural net, to let it in effect "explicitly implement particular algorithmic ideas". But once again, this has mostly turned out not to be worthwhile; instead, it's better just to deal with very simple components and let them "organize themselves" (albeit usually in ways we can't understand) to achieve (presumably) the equivalent of those algorithmic ideas.

The capabilities of something like ChatGPT seem so impressive that one might imagine that if one could just "keep going" and train larger and larger neural networks, then they'd eventually be able to "do everything". And if one's concerned with things that are readily accessible to immediate human thinking, it's quite possible that this is the case. But the lesson of the past several hundred years of science is that there are things that can be figured out by formal processes, but aren't readily accessible to immediate human thinking.

The kinds of things that we normally do with our brains are presumably specifically chosen to avoid computational irreducibility.

There's just a fundamental tension between learnability and computational irreducibility. Learning involves in effect compressing data by leveraging regularities. But computational irreducibility implies that ultimately there's a limit to what regularities there may be.

There's an ultimate tradeoff between capability and trainability: the more you want a system to make "true use" of its computational capabilities, the more it's going to show computational irreducibility, and the less it's going to be trainable. And the more it's fundamentally trainable, the less it's going to be able to do sophisticated computation.

Computationally irreducible processes are still computationally irreducible, and are still fundamentally hard for computers, even if computers can readily compute their individual steps. And instead what we should conclude is that tasks, like writing essays, that we humans could do, but we didn't think computers could do, are actually in some sense computationally easier than we thought.

The reason a neural net can be successful in writing an essay is because writing an essay turns out to be a "computationally shallower" problem than we thought.

One can think of an embedding as a way to try to represent the "essence" of something by an array of numbers, with the property that "nearby things" are represented by nearby numbers.

Rather than directly trying to characterize "what image is near what other image", we instead consider a well-defined task (in this case digit recognition) for which we can get explicit training data... then use the fact that in doing this task the neural net implicitly has to make what amount to "nearness decisions".

In effect, we're "opening up the brain of ChatGPT" (or at least GPT-2) and discovering, yes, it's complicated in there, and we don't understand it, even though in the end it's producing recognizable human language.

It has to be emphasized again that (at least so far as we know) there's no "ultimate theoretical reason" why anything like this should work. And in fact, as I'll discuss, I think we have to view this as a, potentially surprising, scientific discovery: that somehow in a neural net like ChatGPT's it's possible to capture the essence of what human brains manage to do in generating language.

In some ways it's perhaps surprising (though empirically observed also in smaller analogs of ChatGPT) that the "size of the network" that seems to work well is so comparable to the "size of the training data". After all, it's certainly not that somehow "inside ChatGPT" all that text from the web and books and so on is "directly stored". Because what's actually inside ChatGPT are a bunch of numbers, with a bit less than 10 digits of precision, that are some kind of distributed encoding of the aggregate structure of all that text.

It seems on average to basically take only a bit less than one neural net weight to carry the "information content" of a word of training data.

It's interesting how little "poking" the "originally trained" network seems to need to get it to usefully go in particular directions. One might have thought that to have the network behave as if it's "learned something new" one would have to go in and run a training algorithm, adjusting weights, and so on. But that's not the case.

It could be that "everything you might tell it is already in there somewhere"... and you're just leading it to the right spot. But that doesn't seem plausible. Instead, what seems more likely is that, yes, the elements are already in there, but the specifics are defined by something like a "trajectory between those elements" and that's what you're introducing when you tell it something.

At some level it still seems difficult to believe that all the richness of language and the things it can talk about can be encapsulated in such a finite system. Part of what's going on is no doubt a reflection of the ubiquitous phenomenon (that first became evident in the example of rule 30) that computational processes can in effect greatly amplify the apparent complexity of systems even when their underlying rules are simple.

The basic answer, I think, is that language is at a fundamental level somehow simpler than it seems. And this means that ChatGPT, even with its ultimately straightforward neural net structure, is successfully able to "capture the essence" of human language and the thinking behind it.

The success of ChatGPT is, I think, giving us evidence of a fundamental and important piece of science: it's suggesting that we can expect there to be major new "laws of language".... and effectively "laws of thought".... out there to discover. In ChatGPT, built as it is as a neural net, those laws are at best implicit. But if we could somehow make the laws explicit, there's the potential to do the kinds of things ChatGPT does in vastly more direct, efficient... and transparent ways.

ChatGPT doesn't have any explicit "knowledge" of such rules. But somehow in its training it implicitly "discovers" them... and then seems to be good at following them.

Cases that a human "can solve in a glance" the neural net can solve too. But cases that require doing something "more algorithmic" (e.g. explicitly counting parentheses to see if they're closed) the neural net tends to somehow be "too computationally shallow" to reliably do.

Is there a general way to tell if a sentence is meaningful? There's no traditional overall theory for that. But it's something that one can think of ChatGPT as having implicitly "developed a theory for" after being trained with billions of (presumably meaningful) sentences from the web, etc.

We can view the great strength of ChatGPT as being something a bit similar: because it too has in a sense "drilled through" to the point where it can "put language together in a semantically meaningful way" without concern for different possible turns of phrase.

At some level it's a great example of the fundamental scientific fact that large numbers of simple computational elements can do remarkable and unexpected things. But it also provides perhaps the best impetus we've had in two thousand years to understand better just what the fundamental character and principles might be of that central feature of the human condition that is human language and the processes of thinking behind it.