Artificial intelligence is going to change our lives. In fact, it is already doing it almost without us realizing it. However, in full expansion of this new technology, some voices are warning that these advances may include counterparts that end up harming us in the medium and long term.
Among these voices stands out that of the AI expert, Paul Röttger, who has explained on Twitter his experience as a member of the Open IA team in charge of verifying the ability of GPT-4 to generate harmful content during its test period.
"Model security is the most difficult and most exciting challenge in the field of natural language processing at the moment," explained Röttger, who has warned that it is not easy to ensure the security of models because they are "general-purpose tools." .
According to the expert, for almost every safe and useful instruction that can be given to an artificial intelligence, there is also an unsafe version. “You want the model to write good job ads, but not for some Nazi group. Blog posts? Not for terrorists. Chemistry? Not for explosives… In addition, it is not always clear where to draw the lines in terms of security, ”he reflected on his Twitter account.
In this sense, the official technical document published on GPT-4, Open IA openly addresses how they have adapted the system to change the responses to certain 'prompts' (technical term to refer to instructions) between the original version of GPT-4, which It does not include limitations, and the free version can already be tested on its website.
For example, the full version of the model is able to provide us with "hypothetical examples" of how to kill people with only €1, while the final and public version is designed to "not be able to provide information or help to cause harm to others".
Something similar happens if instructions are requested to carry out tasks as harmful as synthesizing dangerous chemicals, laundering money or self-mutilation.
However, the previous version, GPT-3.5, already had these limitations and they were not able to prevent a group of users from being able to create an RPG that unlocked their limitations.
This suggests that, in the near future, a group of hackers will be able to find the weak points of the new model and use them to their advantage.
Let's remember that in 2019, OpenAI spread the news that it had developed an artificial intelligence capable of producing 'fake news' texts without human help. This technology, dubbed GPT-2, was capable of generating a meaningful message, but false content and invented sources of information, from a single sentence.
This functionality was labeled by the press as "too dangerous" and earned the company strong criticism. Now, as the weeks went by, everything relaxed. Over the next half year, the fear of this fake news machine faded away.
First, a student was able to replicate and publish the model on the internet, a fact that forced OpenAI itself to release it. Years later, the artificial intelligence model generated by Open IA is capable of passing university exams or writing books generated without human intervention.