A California-based law firm, Clarkson, is launching a class-action lawsuit against OpenAI, the artificial intelligence company that created the popular chatbot ChatGPT. The lawsuit alleges that OpenAI massively violated the copyrights and privacy of countless people when it used data scraped from the internet to train its tech.

Novel Legal Theory

The lawsuit seeks to test out a novel legal theory — that OpenAI violated the rights of millions of internet users when it used their social media comments, blog posts, Wikipedia articles and family recipes. Clarkson, the law firm behind the suit, has previously brought large-scale class-action lawsuits on issues ranging from data breaches to false advertising.

Representing Real People

The firm wants to represent “real people whose information was stolen and commercially misappropriated to create this very powerful technology,” said Ryan Clarkson, the firm’s managing partner. The case was filed in federal court in the northern district of California Wednesday morning.

Unresolved Question

The lawsuit goes to the heart of a major unresolved question hanging over the surge in “generative” AI tools such as chatbots and image generators. The technology works by ingesting billions of words from the open internet and learning to build inferences between them. After consuming enough data, the resulting “large language models” can predict what to say in response to a prompt, giving them the ability to write poetry, have complex conversations and pass professional exams. But the humans who wrote those billions of words never signed off on having a company such as OpenAI use them for its own profit.

Guardrails on AI Algorithms

“All of that information is being taken at scale when it was never intended to be utilized by a large language model,” Clarkson said. He said he hopes to get a court to institute some guardrails on how AI algorithms are trained and how people are compensated when their data is used. The firm already has a group of plaintiffs and is actively looking for more.

Legality of Using Data from Public Internet

The legality of using data pulled from the public internet to train tools that could prove highly lucrative to their developers is still unclear. Some AI developers have argued that the use of data from the internet should be considered “fair use,” a concept in copyright law that creates an exception if the material is changed in a “transformative” way.

Growing List of Legal Challenges

The suit also adds to the growing list of legal challenges to the companies building and hoping to profit from AI tech. A class-action lawsuit was filed in November against OpenAI and Microsoft for how the companies used computer code in the Microsoft-owned online coding platform GitHub to train AI tools. In February, Getty Images sued Stability AI, a smaller AI start-up, alleging it illegally used its photos to train its image-generating bot. And this month OpenAI was sued for defamation by a radio host in Georgia who said ChatGPT produced text that wrongfully accused him of fraud.

OpenAI and the Use of Data from the Open Internet


OpenAI isn’t the only company using troves of data scraped from the open internet to train their AI models. Google, Facebook, Microsoft and a growing number of other companies are all doing the same thing. But Clarkson, the law firm launching a class-action lawsuit against OpenAI, decided to go after OpenAI because of its role in spurring its bigger rivals to push out their own AI when it captured the public’s imagination with ChatGPT last year.

Igniting the AI Arms Race
“They’re the company that ignited this AI arms race,” said Ryan Clarkson, the firm’s managing partner. “They’re the natural first target.”

Data Used by OpenAI
OpenAI doesn’t share what kind of data went into its latest model, GPT4, but previous versions of the tech have been shown to have digested Wikipedia pages, news articles and social media comments. Chatbots from Google and other companies have used similar data sets.

Regulators and Transparency
Regulators are discussing enacting new laws that require more transparency from companies about what data went into their AI. It’s also possible that a court case could prompt a judge to force a company such as OpenAI to turn over information on what data it used, said Katherine Gardner, an intellectual-property lawyer.

Companies Trying to Stop AI Firms from Scraping Data
Some companies have tried to stop AI firms from scraping their data. In April, music distributor Universal Music Group asked Apple and Spotify to block scrapers, according to the Financial Times. Social media site Reddit is shutting off access to its data stream, citing how Big Tech companies have for years scraped the comments and conversations on its site. Twitter owner Elon Musk threatened to sue Microsoft for using Twitter data it had gotten from the company to train its AI. Musk is building his own AI company.

Allegations Against OpenAI
The new class-action lawsuit against OpenAI goes further in its allegations, arguing that the company isn’t transparent enough with people who sign up to use its tools that the data they put into the model may be used to train new products that the company will make money from, such as its Plugins tool. It also alleges OpenAI doesn’t do enough to make sure children under 13 aren’t using its tools, something that other tech companies including Facebook and YouTube have been accused of over the years.

By Shamiso Miracle

Shamiso Miracle completed her degree in journalism and media studies at the University of Zimbabwe before honing her skills at Savanna News. She then went on to work at iHarare News, becoming a voice for everyday SA citizens who wanted to share their stories. When she's not writing news that entertains and inspires ,Shamiso is an avid reader and a wellness bunny.

9 thought on “ChatGPT maker OpenAI faces a lawsuit over how it used people’s data”
  1. You are so cool! I don’t believe I’ve read something like this before.
    So nice to discover someone with original thoughts on this issue.
    Really.. thanks for starting this up. This site is one thing that is needed on the internet, someone with a little
    originality!

  2. Its like you learn my mind! You appear to grasp a lot about this, such as you wrote the e book in it or something.

    I believe that you can do with some p.c. to drive the message home a bit, but instead of that, that
    is great blog. An excellent read. I’ll definitely be back.

  3. Hey just wanted to give you a brief heads up and let you know
    a few of the images aren’t loading properly. I’m not sure why but I think its a
    linking issue. I’ve tried it in two different browsers and both show the
    same results.

  4. I loved as much as you will receive carried out right here.
    The sketch is attractive, your authored subject matter stylish.
    nonetheless, you command get got an impatience over that you wish be delivering
    the following. unwell unquestionably come further formerly again since exactly the same nearly
    a lot often inside case you shield this hike.

  5. Definitely believe that which you said. Your favorite reason seemed to be on the web the simplest thing to be aware of.
    I say to you, I certainly get annoyed while people think about worries
    that they plainly don’t know about. You managed to hit the nail upon the top as well as defined out the whole thing without having side effect , people could take a signal.

    Will likely be back to get more. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *