Many AI researchers believe that fakes will be undetectable

Rishi Sunak is the British Prime Minister.

Oliver Thansan
Oliver Thansan
23 January 2024 Tuesday 09:26
18 Reads
Many AI researchers believe that fakes will be undetectable

Rishi Sunak is the British Prime Minister. If you can trust some Facebook ads (which you can't), they also promote get-rich-quick methods. One of the ads shows him recommending an app, supposedly developed by Elon Musk, with which users can "save" on a regular basis.

The video is fake. Generated with the help of AI, it is just one of 143 advertisements of this type cataloged by the British company Fenimore Harper Communications and broadcast in December and January. Not only those who are at the center of public attention are susceptible to their image being used for dubious purposes. In June 2023, the US FBI warned of the existence of "malicious agents" who use AI to create fake sexual images and videos of ordinary people in order to extort them.

How to detect such deception is a hot topic among AI researchers, many of whom attended NeurIPS, one of the largest conferences in the field, held in New Orleans in December. Many companies, from startups to established tech giants like Intel and Microsoft, offer software that claims to detect artificially generated content. At the same time, makers of large AI models are looking for ways to “watermark” their products so that real images, videos or text can be easily distinguished from those generated by machines.

However, such technologies have not proven reliable so far. Experts seem pessimistic about their prospects. The Economist conducted a (very unscientific) survey among NeurIPS attendees. Of 23 people asked, 17 thought that AI-generated content would end up being undetectable. Only one believed that reliable detection would be possible. (The other five were hesitant and preferred to wait and see what would happen).

Detection programs are based on the idea that AI models leave a trail. They either fail to reproduce some aspect of real images and videos, or human-generated text; Or they add something superfluous, and they do it frequently enough for other software to catch the error. For a time, humans could do that job. Until mid-2023, for example, image-generating algorithms often showed people with malformed hands, or got numbers wrong on things like a clock face. Nowadays, the best ones don't do it anymore.

However, tell-tale details continue to appear frequently, although they are increasingly difficult for humans to detect. In the same way that it is possible to train programs to reliably identify cats, or cancerous tumors in medical examinations, it is also possible to train them to differentiate between real images and images generated by AI.

However, they don't seem to do it so well. Detection software is prone to both false positives (erroneously marking human content as AI-generated) and false negatives (not detecting machine-generated material). According to an article available in preprint version (before peer review) posted online in September by Zeyu Lu, a computer scientist at Shanghai Jiao Tong University, the program with the best results does not correctly detect computer-generated images in 13 % of the time (although it is better than humans, who fail 39% of the time). Things are a little better when it comes to the texts. An analysis, published in December in the International Journal of Educational Integrity, compared 14 tools and found that none achieved greater than 80% accuracy.

If trying to detect computer-generated content after the fact is too difficult, another option is to label it beforehand with a digital watermark. As with paper, the idea is to add a distinctive feature that is subtle enough not to compromise the quality of the text or image, but is obvious to anyone looking for it.

A technique for marking up text was proposed by a team at the University of Maryland in July 2023 and was completed by a team at the University of California, Santa Barbara, who presented their improvements to NeurIPS. The idea is to manipulate the lexical preferences of a linguistic model. Initially, the model randomly assigns a set of words it knows to a “green” group and places all the others in a “red” group. Then, when generating a given block of text, the algorithm tricks the dice and increases the probability that you opt for a green word instead of one of its red synonyms. The watermark check consists of comparing the proportion of green words to red ones; in any case, since the technique is statistical, it is more reliable for long chunks of writing.

Meanwhile, many methods for marking images consist of lightly retouching the pixels, such as changing colors. The alterations are too subtle to be noticed by human observers, but they can be picked up by computers. However, cropping an image, rotating it, or even blurring it and then sharpening it again could remove such marks.

Another group of NeurIPS researchers presented a marking system called "tree ring", designed to be more reliable. Diffusion models (the most advanced type of imaging software) begin by filling the digital canvas with random noise from which the desired image slowly appears. The tree ring method embeds the watermark not in the finished image, but in the noise from the beginning. If the software that created the image is run in reverse, it will reproduce the watermark along with the noise. Crucially, the technique is less easy to circumvent by retouching the final image.

Although it's probably not impossible. Watermarkers are in an arms race with other researchers seeking to nullify their techniques. Another team led by Hanlin Zhang, Benjamin Edelman and Boaz Barak, all of Harvard University, presented a method (not yet peer-reviewed) that can, they say, erase watermarks. It works by adding a small amount of new noise and then using a second AI model to remove it, which also removes the original watermark. The system, researchers say, will be able to fool three new text-marking programs proposed in 2023. In September, scientists at the University of Maryland published a paper (also not yet peer-reviewed) in which they claim that none of the current image watermarking methods (not even tree ring) are foolproof.

However, in July 2023 the US government announced "voluntary commitments" to several AI companies, including OpenAI and Google, to boost investment in watermarking research. Having imperfect protections is certainly better than having none (although open source models, which users can modify freely, will be more difficult to police). The fact is that, in the battle between counterfeiters and detectives, it seems that the counterfeiters have the upper hand.

© 2024 The Economist Newspaper Limited. All rights reserved

Translation: Juan Gabriel López Guix