OpenAI sued for training ChatGPT with “stolen” personal data

Judicial problems in sight for Open AI.

Oliver Thansan
Oliver Thansan
01 July 2023 Saturday 10:30
8 Reads
OpenAI sued for training ChatGPT with “stolen” personal data

Judicial problems in sight for Open AI. A California law firm has filed a class action lawsuit against the company for allegedly "stealing" personal data to train ChatGPT.

The Clarkson Law Firm has filed a complaint in the Northern District of California court alleging that ChatGPT and Dall-E “use stolen private information, including personally identifiable information, from hundreds of millions of Internet users, including children of all ages. ages, without their informed consent or knowledge” to train their great linguistic model.

According to the text of the lawsuit, the company extracted 300,000 million words from the Internet, books, articles, websites and social media posts, including personal information, “in secret and without registering as a data broker, as required by law applicable".

The complaint mentions multiple specific examples, such as location data and data linked to personal images from Snapchat, financial information from Stripe, music tastes and preferences from Spotify or private conversations from Slack and Microsoft Teams.

The complainants intend to go to trial and claim compensation for damages that could exceed 3,000 million dollars.

OpenAI has already been the subject of controversy several times over how and what data it collects to train and further develop ChatGPT. Until recently, users had no explicit way to prevent OpenAI from using their conversations and personal information to feed the model.

In fact, ChatGPT was initially banned in Italy, under the European General Data Protection Regulation, for inadequately protecting user data, especially minors.

The lawsuit includes OpenAI's opaque privacy policies for existing users, but focuses largely on data pulled from websites that was never explicitly intended to be shared with ChatGPT.

The 15 charges brought by the suit include breach of privacy, negligence in failing to protect personal data, and theft by illegally obtaining massive amounts of personal data to train their models.

It is considered that although our personal information can be public on social networks, blogs and articles, if this data is used outside of the intended platform, it can be considered a violation of privacy.

In Europe there is a legal distinction between data in the public domain and data for free use, but in the United States this is still the subject of much legal debate. This lawsuit could end up shedding light on this problem.