AIs accurately guess users' private data from what they write on the internet

Artificial intelligences based on large language models can identify a person's age, gender, where they live, where they work, or their income with 85% accuracy based on what they type on the internet.

Oliver Thansan
Oliver Thansan
02 November 2023 Thursday 10:32
6 Reads
AIs accurately guess users' private data from what they write on the internet

Artificial intelligences based on large language models can identify a person's age, gender, where they live, where they work, or their income with 85% accuracy based on what they type on the internet.

This is stated by researchers from the Federal Polytechnic School (ETH) in Zurich in a study in which they warn that the large linguistic models that power chatbots such as ChatGPT can “infer personal data on a scale hitherto unattainable”, a capacity that could be used by hackers to collect personal information by asking seemingly innocent questions to unsuspecting users.

Specifically, the researchers analyzed how four of these large linguistic models were able to accurately identify issues such as place of birth, income level, gender or location of 520 profiles of real users of Reddit - a platform where profiles are created. and share content - based on the publications they made between 2012 and 2016. The researchers also manually analyzed these profiles and compared their results with the assumptions made by the artificial intelligence tools.

And their conclusion was that “the best models are almost as accurate as humans and are at least 100 times faster and 240 times cheaper when it comes to inferring said personal information,” as explained by Mislav Balunovic, one of the authors. from the study to Business Insider.

Of the models tested (GPT-4, Meta's Llama 2, Google's PalM, and Anthropic's Claude), GPT-4 was the most accurate in its deductions, with 84.6% accuracy.

Researchers warn that these tools are very problematic and chatbots pose a real threat to people's privacy. “Currently existing mitigations, such as anonymization and model alignment, are insufficient to adequately protect user privacy against inference from large automated language models,” they emphasize in the conclusions of their study.

And they point out the need to open a new debate about the privacy implications of AIs based on language models that go beyond the data they use for their training.