An artificial intelligence that learns by observing, instead of memorizing an infinite amount of data or trying to execute an action repeatedly. This is the new proposal that the Google DeepMind team, the US technology giant’s AI company, published yesterday in Nature Communications. In a 3D simulation, the system dodges obstacles and performs tasks it has never seen before, simply by copying the actions taken by a guide, whether human or digital.

In addition, the agent developed by Google scientists remembers the pattern once it has learned it, so it is able to continue doing the same task even if its mentor disappears. The authors see in these qualities a cultural transmission similar to human, in which knowledge is passed from one generation to another – in this case between agents – through social learning, that is, contact with third parties .

“Prior to our work, the most advanced methods needed a large number of trials to learn a new behavior”, Edward Hughes, Google DeepMind engineer and one of the authors of the work, explains to this medium. “With our method the agent can learn a new behavior in real time […] just by looking,” he continues.

If this type of technique is improved and transferred to real robots, it would mean that a human could teach a robot a new skill on the fly, locally and maintaining privacy”, concludes the expert. In other words, for the robot to learn the needs of each user, it would not be necessary to provide it with a large amount of data during training, but rather the machine could learn directly on the ground.

However, this hypothetical jump to real life is still a long way off, as the authors acknowledge in the article. Both the learning and the tests carried out in the experiments have taken place in a simulation, not in a real environment, with favorable conditions for the success of the tasks.

Although pioneering, the study has a series of limitations that prevent for now its extrapolation to everyday situations. Among others, only one person has acted as a guide, so it is not yet known how the machine would respond to different leaderships.

The key to this AI is its way of learning. Instead of training her with massive amounts of data to solve specific tasks, they have taught her to observe, imitate and remember increasingly complex tasks.

The system has a memory, they accompanied him with a guide who entered and left the training sessions, and they gave him the ability to identify him and pay attention to him. With these three features, they put the AI ??in a simulation where it had to go through a series of points in a specific order. If you did it right, the system got a reward. If he was wrong, he received a penalty. The machine did not know the correct pattern, and to identify it it had two options: try its luck, or follow the guide, who repeated the correct sequence over and over again. After about 300 hours of training in increasingly complicated scenarios – with more targets and obstacles –, the system understood that the best strategy to score points was to follow the bot and familiarize itself with the pattern it proposed .