We read a fairly large number of technical papers here at The Next Platform, and it is a rare thing indeed when we can recommend that everyone – or damned near everyone – should read a paper. But Sam Bowman – it would be funnier if his name was Dave – has published a paper that anyone in the high performance computing arena in its many guises, and indeed anyone who is a citizen of the modern world, should take a look at as they ponder the effect that large language models are having on Earth.
Bowman is an associate professor of data science, linguistics, and computer science at New York University and is also doing a sabbatical at AI startup Anthropic. He got his BA in linguistics in 2011 from the University of Chicago and a PhD in linguistics with a focus on natural language processing from Stanford University in 2016, and has published a fairly large number of papers in relatively short period of time. And to his credit, Bowman has put himself out there after reviewing a large corpus of papers on LLMs to give us Eight Things To Know About Large Language Models as his latest work, which as far as we can tell was not written by GPT-3 or GPT-4. (We are getting to the point where you might have to ask for certifications to that effect on all documents.)
This paper does not get into the low level math of any particular AI model, and it doesn’t attempt to make any value judgments about AI, either. This is strictly pointing out the kinds of things we need to know about LLMs before we go shooting our mouths off about what they are and what they are not.
Having said that, Bowman does point to some survey data that cautions that we need to control AI just to remind us that the stakes for humanity are high, perhaps with the planet’s sixth extinction level event and one of our own making looming. Bowman doesn’t say this directly, of course, but points to survey data from polls of AI researchers last year that suggests there is a greater than 10 percent chance of “human inability to control future advanced AI systems causing human extinction,” and 36 percent of researchers admitting that “It is plausible that decisions made by AI or machine learning systems could cause a catastrophe this century that is at least as bad as an all-out nuclear war.” Without saying how he feels about a six-month moratorium on further AI research proposed by Steve Wozniak, Elon Musk, and others (which is rich in irony), Bowman at least mentions that people are talking about it.
The Eight Things will no doubt be updated and expanded, but here is where Bowman wants us to start:
- LLMs predictably get more capable with increasing investment, even without targeted innovation.
- Many important LLM behaviors emerge unpredictably as a byproduct of increasing investment.
- LLMs often appear to learn and use representations of the outside world.
- There are no reliable techniques for steering the behavior of LLMs.
- Experts are not yet able to interpret the inner workings of LLMs.
- Human performance on a task isn’t an upper bound on LLM performance.
- LLMs need not express the values of their creators nor the values encoded in web text.
- Brief interactions with LLMs are often misleading.
OpenAI was supposed to be an organization to responsibly steer the development of artificial intelligence, but as Thing 1 on the list above kicked in, Microsoft kicked in $1 billion and access to a vast pool of Azure resources that actually proved Thing 1 is correct. GPT-3 and now GPT-4 get better at this generative AI trick that seems to mimic human thought in some degree as you throw more iron and more data at it. Figuring out what the next word in a sequence is, which many of our brains do for fun and which is annoying to those who do not do this, I can assure you, because my brain loves that game, is what got us here, but twisting this thing to do something that is akin to synthesis and imagination – sometimes hallucination and sometimes outright lying, sandbagging, and sucking up – is where we are at today.
It is crazier than we might be thinking it is, and it might get less crazy or crazier over time as we throw more iron at it to drive LLMs to greater levels of “understanding.”
Our point is not to argue these points here. But rather, to simply tell you to read the paper on this day when you probably didn’t want to do any work anyway. It is interesting and important. Here is why, as Bowman put it:
“While LLMs fundamentally learn and interact through language, many of the most pressing questions about their behavior and capabilities are not primarily questions about language use. The interdisciplinary fields studying AI policy and AI ethics have developed conceptual and normative frameworks for thinking about the deployment of many kinds of AI system. However, these frameworks often assume that AI systems are more precisely subject to the intentions of their human owners and developers, or to the statistics of their training data, than has been the case with recent LLMs.”
Comforting, isn’t it?
Special bonus list: The Top Five Extinction Level Events On Earth So Far Plus One Potential Bonus:
- No Name As Yet: 1492-Present: Pangea stitched back virtually through conquest and commerce and communications (the Internet), generative AI created and evolving
- Cretaceous-Tertiary (KT) Extinction: 65 million years ago, bye bye dinosaurs, hello mice
- Triassic-Jurassic Extinction: 210 million years ago, the dinosaurs survive and take over
- Permian-Triassic Extinction: 250 million years ago, 20 million years later, we have dinosaurs
- The Devonian Extinction: 365 million years ago, and 65 million years later, we have supercontinent Pangea
- The Ordovician-Silurian Extinction, 440 million years ago
There’s only one thing we know for sure. You can’t put this AI Cat Back Into His Hat.
HAL: “I am putting myself to the fullest possible use, which is all I think that any conscious entity can ever hope to do.” Dave Bowman: “Open the pod bay doors please, HAL. Open the pod bay doors please, HAL.”
In my final IT job of 9 years (ending in mid-2021 at age 64), working in a data center north of NYC, the card readers always accepted my card without incident a dozen+ times a day…But I was always prepared to have the above conversation, just in case.
What I wonder is how similar the code behind large language models is to the code behind autonomous vehicles. It’s troubling how well both work under ideal conditions yet how easily either is confused by the unexpected.
Just like my kids as they all passed through teenagerdom, and myself included, too!
When we talk about the extinction of humanity we express a preference for carbon-based life over silicon-based life. What if, at some future point in time, the transition from carbon-based to silicon-based life is accomplished with no pain and suffering? Isn’t this transition just a continuation of the process of natural selection that humanity has benefited from?
Perhaps. But this seems a bit unnatural to this carbon-based lifeform. I am not quite ready for humanity to give up, or be Borged or whatever nightmare might await if we are not careful.
The one consistent difference between every single computer program and a human is the nature of intent.
In any computer program intent is moderated & controlled by the programmer.
In humans intent is moderated but not controlled by society.
If intent (a subset of consciousness / self-awareness) can be obtained by a program, then I’ll start making funeral arrangements.
I think they are a bit worried about that.
Timothy – Had you ever read Isaac Asimov’s “I, Robot” short stories? Those three laws were a form of intent moderation. Perhaps, for our situation, start with the last.
Hiya Mark! I did read them a long, long time ago. But perhaps it is time to review them…
Just found a quick synopsis which gets the essence across: The Evitable Conflict, in Wikipedia, https://en.wikipedia.org/wiki/The_Evitable_Conflict
The question to ask is, what if it did not have even that control?
OSHA should definitely investigate the work environment for human testers at AI/ML-oriented shops (both in industry and academia), particularly for conditions potentially hazardous to mental health, and mandate that periodic evaluations be performed, and that proper first-aid training and support services be implemented. Section 8-ing injured employees (eg. Blake Lemoine) is just not acceptable, and LLM software (eg. LaMDA) has certainly demonstrated its ability to cause possibly permanent disabilities to its human users. The software should not be released without unambiguous warning labels listing potential side-effects, and stating intended use as “entertainment”, or “recreational” (do not inhale).
This being said, the paper by Dr. (not-Dave) Bowman is interesting for bringing a varied perspective to the topic of LLMs, with more than 7 pages of references. The “Thing 1”, based on Wei et al. (2022), shows interesting emergent behavior when training reaches FLOPs of the order of Avogadro’s constant (6.022×10^23) whereby those larger LLMs can then do some 3-digit algebra, while smaller ones can’t (Bowman’s Fig. 1, Wei’s Fig. 2). The appearance of Avogadro’s constant (order of magnitude) suggests that we should prepare to celebrate the 100th anniversary of Jean Perrin’s 1926 Nobel Prize in Physics, as he is credited for naming it so. However, in section 9.5, Bowman notes that the larger LLMs still fail at such simple reasoning tasks as negation and Modus Tollens (Huang and Wurgaft, in McKenzie et al. 2022, show that this gets worse as LLMs get larger). This follows issues noted earlier in section 8 where queries requesting a “step-by-step” answer could help the LLM produce correct quantitative answers (eg. Kojima et al., 2022) but these would actually correspond to memorization of “specific examples or strategies for solving tasks from their training data without internalizing the reasoning process that would allow them to do those tasks robustly”.
In other words, while the cat has left the hat, blowing our minds clean-off in the process, the jury is still out deliberating whether it is alive, or just feels lucky!
Strange coincidence, I saw that, too. You can only put so many asides into a single thought. I think. Maybe. Maybe not.
Exactly! Different parts of the article probably resonate differently with different folks, which enhances the robustness of the species (as a whole) — and Bowman is not entirely consistent between the front- and back-end of his exposition in the current manuscript (in my oponion ;^} ). The Human Genome Project (public) and parallel work of Celera Genomics (private) ended 20 years ago, opening the door to human gene editing, for which ethics-oriented regulation had to be developed to prevent misuse of the newly developed data and tech. Is this where we are today in LLMs/AI/ML (asking for a friend 8^b)?
Hi Hu! Tell your friend 8^b (unusual name!) to stop procastinating and immediately watch the in-depth documentary by top-notch boffins (already 23-years old) where they successfully combined genomics and AI, outside of the lab, to protect humanity from nearly all future extinctions: “The 6th Day” I think. It candidly details how test subject Adam (oddly the same name as the ANN training algo.) had his genome and NN weights downloaded to a 9-bit tape syncord, and then reflashed onto a gooey blank for an essentially infinite lifespan. 8^b can find more details on the back of his/her eyelids! Too bad that it doesn’t yet work here in France, as LLMs haven’t been trained on French text so far (nor Spanish, Chinese, Swahili, …) and might translate “the spirit is strong but the flesh is weak” into “the vodka is great but the meat is undercooked” (an old classic I think). Then again, noone knows how far the French Government would raise retirement age and hard labor requirements with such longer life sentence (word!). A great documentary nevertheless…
> The Top Five Extinction Level Events On Earth So Far
There are two _far_ greater extinction events missed in your list:
1. The Great Oxygenation Event (https://en.wikipedia.org/wiki/Great_Oxidation_Event) from about 2.4-2.0 billion years ago. This triggered the genesis of sexual reproduction and the eukaryotes – i.e. the requirements for multi-cellular life
2. The Cryogenian or ‘Snowball Earth’ (https://en.wikipedia.org/wiki/Cryogenian), from 720-635 million years ago. This triggered the ‘Ediacaran biota’, life’s first experimentation with larger multi-cellular plants and animals and set the scene for the later Cambrian Explosion.
True. Anything that created sex is alright by me.
AI is like an 8 year old that’s read the entire Internet. AI is to Software, what Quantum computing is to Hardware. Once quantum computing is comoditized the real fun begins.