OpenAI’s GPT-3 algorithm now produces billions of words per day

When OpenAI launched its massive GPT-3 natural language algorithm last summer, its jaw dropped. Coders and developers with special access to an initial API quickly discovered new (and unexpected) things that the GPT-3 could do with nothing but a prompt. He wrote acceptable poetry, produced decent code, calculated simple sums, and, with a few edits, wrote news articles.

All of this, it seems, was just the beginning. In a recent blog update, OpenAI said that tens of thousands of developers are making applications on the GPT-3 platform.

More than 300 applications (and increasing) use GPT-3, and the algorithm is generating 4.5 billion words a day for them.

Obviously, there are many words. But to get an idea of ​​how many, let’s try some basic math.

The next torrent of algorithmic content

Each month, users publish about 70 million posts on WordPress, which is undoubtedly the dominant online content management system.

Assuming an average article is 800 words – which is speculation on my part, but not too long or short – people are producing about 56 billion words per month or 1.8 billion words per day on WordPress.

If our assumption of average word count is close, then the GPT-3 is producing more twice the daily word count of WordPress posts. Even if you make the average closer to 2,000 words per article (which seems high to me), the two are roughly equivalent.

Now, not every word that the GPT-3 produces is a word worth reading and is not necessarily producing blog posts (more about the applications below). But in any case, just nine months later, the departure of the GPT-3 appears to herald an imminent torrent of algorithmic content.

GPT-3 is driving a variety of applications

So, how exactly are all these words being used? As suggested by the initial explosion of activity, developers are building a variety of applications around the GPT-3.

Viable, for example, presents themes in customer feedback – surveys, evaluations and help desk tickets, for example – and provides short summaries for companies that want to improve their services. Fable Studio is bringing virtual characters in interactive stories with dialogues generated by the GPT-3. And Algolia uses GPT-3 to power an advanced search tool.

Instead of code, developers use “immediate programming”, providing GPT-3 with some examples of the type of output they hope to generate. More advanced users can tweak things by providing the algorithm with sample data sets or even human feedback.

In this sense, GPT-3 (and other similar algorithms) can accelerate the adoption of machine learning in natural language processing (NLP). Considering that the learning curve was once steep for working with machine learning algorithms, OpenAI says that many in the GPT-3 developer community have no experience in AI or programming.

“It’s almost this new interface for working with computers,” said Greg Brockman, director of technology and cofounder of OpenAI Nature in an article earlier this month.

A walled garden for AI

OpenAI licensed the GPT-3 to Microsoft – which invested a billion dollars in OpenAI in exchange for such partnerships – but did not release the code publicly.

The company argues that monetizing its machine learning products helps to fund its larger mission. In addition, they say they are able to control how the technology is being used by restricting access to it with an API.

One concern, for example, is that advanced natural language algorithms like GPT-3 can overwhelm online misinformation. Another is that large-scale algorithms also contain built-in bias and great care and attention are needed to limit their effects.

At the height of the initial frenzy, OpenAI CEO Sam Altman tweeted, “The GPT-3 advertising campaign is too much. It is impressive (thanks for the nice compliments!), But it still has serious weaknesses and sometimes makes very silly mistakes. “

Deep learning algorithms lack common sense or contextual awareness. So, of course, with the right prompt, the GPT-3 promptly repeated the online ugliness that was part of its training data set.

To solve these problems, OpenAI examines developers and applications before granting access to GPT-3. They also created guidelines for developers, are working on tools to identify and mitigate bias, and require processes and people to be in place to monitor apps for bad behavior.

It remains to be seen whether these safeguards will be sufficient to access the GPT-3 scales.

Researchers would love to give algorithms a degree of common sense, understanding of cause and effect and moral judgment. “What we have today is essentially a brainless mouth,” said Yejin Choi, a computer scientist at the University of Washington and the Allen Institute of AI. Nature.

As long as these qualities remain out of reach, GPT-3 researchers and human manipulators will have to work hard to ensure that the benefits outweigh the risks.

Alt-AI: open source alternatives to GPT-3

Not everyone agrees with the walled garden approach.

Eleuther, a project that aims to become an open source competitor for the GPT-3, launched its latest GPT-Neo model last week. The project uses OpenAI GPT-3 documents as a starting point for its algorithms and is training them on distributed computing resources donated by cloud computing company CoreWeave and Google.

They also created a meticulously organized training data set called the Stack. Eleuther co-founder Connor Leahy said Wired the project “made a great effort over the months to cure this data set, make sure it was well filtered and diversified, and document its deficiencies and prejudices”.

The performance of the GPT-Neo may not yet match the GPT-3, but it is on par with the less advanced version of the GPT-3, according to Wired. Meanwhile, other open source projects are also underway.

“There is great enthusiasm now for open source NLP and for producing useful models outside the big tech companies,” said Alexander Rush, professor of computer science at Cornell University. “There is something similar to an NLP space race going on.”

The risks of open source remain: once the code is out there, there is no going back, there is no control over how it is used.

But Rush argues that developing algorithms openly allows researchers from outside large companies to study them, with warts and all, and solve problems.

The New Command Line

Open source or not, the GPT-3 will not be alone for long. Google Brain, for example, recently announced its own huge natural language model, weighing 1.6 trillion parameters.

In a recent Tech Crunch The article Oren Etzioni, CEO of Allen Insitute for AI, and venture capitalist Matt McIlwain wrote that they hope that GPT-3 and the addition of other large-scale natural language algorithms will bring more accessibility and lower costs.

And, in particular, they see “immediate programming” as a significant change.

The text, Etzioni and McIlwain wrote, could increasingly become the new command line, a kind of universal translator that allows the “without code” to take advantage of machine learning and bring new ideas to life: “We believe that this will enable a whole new generation of creators, with trillions of parameters at their fingertips, in a totally low-code / no-code way. “

The machines, it seems, are about to become terribly more conversational. And we have a lot of work to make sure the conversation is meaningful.

Image credit: Emil Widlund / Unsplash

Source