ChatGPT generates fake clinical trial data that looks real

Inventing compelling databases to simulate clinical trials has never been easier.

Oliver Thansan
Oliver Thansan
04 December 2023 Monday 10:29
6 Reads
ChatGPT generates fake clinical trial data that looks real

Inventing compelling databases to simulate clinical trials has never been easier. It is the conclusion of three researchers from the Department of Ophthalmology at the Magna Graecia University of Catanzaro, in Italy, after asking the latest version of ChatGPT to create from scratch the results of a scientific investigation that never existed to support a hypothesis not proven by the evidence. The false data, the scientists point out, seemed authentic at first glance.

In a research letter to the journal JAMA Ophthalmology, the authors warn of how easy it was for them to carry out this falsification, which, in their opinion, can have consequences on the integrity of scientific research. “In just a few minutes you can create a database that is not based on real data, and that is also contrary to the available evidence,” Giuseppe Giannaccare, an eye surgeon and one of the authors of the work, explains to the magazine Nature.

The issue is not limited to malpractice itself, but also has an impact on the way in which the public and the scientific environment itself can perceive new advances. “Generative AIs can pose a risk to the credibility of scientific studies,” says Ramon López de Mántaras, founder and former director of the Artificial Intelligence Research Institute of the CSIC, in conversation with La Vanguardia.

The authors point out that a deep analysis of the database, for now, does allow the falsehood to be identified. In fact, when looking exhaustively at the results of the “study,” Nature magazine identified several details that pointed to artificial fabrication: the names of the supposed participants did not match their sex, their ages were distributed in a suspicious way, and the results pre-existed. and postoperative periods did not correlate.

However, such detailed analyzes are not common in the review process of scientific articles, so identifying falsification is not a simple task. Furthermore, “databases generated by AI may become more and more difficult to distinguish from real ones as the technology behind AI progresses,” Andrea Taloni, co-author of the study, points out in statements to this medium.

That is why the Italian team recommends that control be prior, within the research centers themselves. Taking the data with digital tools to maintain an encrypted and immutable copy of the data, and registering clinical trials and protocols in reference entities are probably the best guarantee of real research, they indicate.

Taloni also suggests investing in the development of software specialized in identifying artificially created data. The idea is that, similar to the tools that are being developed to detect texts generated by AI, these “recognize anomalous patterns in false databases,” indicates the Italian expert.

Fraud and malpractice in the world of science are not new. Over the years, numerous cases have been uncovered, ranging from the manipulation of data to falsely associate vaccines with autism, to the falsification of images and graphs in Alzheimer's research. Although not common, these malpractices are widespread in Spain: 3.6% of the country's biomedical researchers acknowledge having manipulated data at some time, according to a recent study.

“Before generative AI there have been quite a few scandals of people who have invented data, results, or who have made false graphs of results,” reflects López de Mántaras. “The problem is that now, with generative AI, this is easier to do and therefore can proliferate.”

In other words, tools like ChatGPT do not create a new threat in the scientific world, but rather greatly aggravate existing problems. “Fraudulent scientists, without any ethical criteria and with bad practices, have tools at their disposal that can now make it easier for them to falsify results,” concludes the expert.

Part of the problem, Albert Sabater, director of the Observatory of Ethics in Artificial Intelligence of Catalonia (OEIAC), argues in statements to this medium, is that the development of technologies such as the OpenAI chatbot are not transparent. We do not know how they are developed, we do not know how they obtain their results and we cannot replicate the system, key issues in scientific research. This makes it even more difficult to discern which content is real and which has been artificially created.