An artificial intelligence that learns by observing, instead of memorizing an infinite amount of data or trying to execute an action repeatedly. This is the new proposal that the Google DeepMind team, the American technology giant’s AI company, published this Tuesday in Nature Communications. The system manages, in a 3D simulation, to avoid obstacles and carry out tasks that it has never seen before, simply by copying the actions carried out by a guide, whether human or digital.
Furthermore, the agent developed by Google scientists remembers the pattern once it has learned it, so that it is able to continue performing the same task even if its mentor disappears. The authors see in these qualities a cultural transmission similar to that of humans, in which knowledge is passed from one generation to another—in this case between agents—through social learning, that is, contact with third parties.
“Before our work, the most advanced methods needed a large number of trials to learn a new behavior,” Edward Hughes, a Google DeepMind engineer and one of the authors of the work, explains to this medium. “With our method, the agent can learn a new behavior in real time […] just by watching,” he continues.
“If this type of technique is scaled and transferred to real robots, it would mean that a human could teach a robot a new skill on the fly, locally and maintaining privacy,” concludes the expert. That is, for the robot to learn the needs of each user, it would not be necessary to provide it with a large amount of personal data during training, but rather the machine could learn directly on the spot.
However, this hypothetical leap to real life is still far away, as the authors acknowledge in the article. Both the learning and the tests carried out in the experiments have taken place in a simulation, not in a real environment, with favorable conditions for the success of the tasks. Although pioneering, the study shows a series of limitations that prevent for now from considering the possibility of applying the results to day-to-day situations. Among others, only one person has acted as a guide, so it is not known how the machine would respond to different leaderships.
The key to the agent developed by DeepMind is its way of learning. Instead of training him with massive amounts of data to solve specific tasks, the engineering team has taught him to observe, imitate and remember increasingly complex tasks.
Scientists have endowed the system with memory, accompanied it with a guide in and out of training sessions, and given it the ability to identify it and pay attention to it. With these three characteristics, they have put the AI ??in a simulation in which it had to go through a series of points in a specific order. If it got it right, the system got a reward, while if it got it wrong it received a penalty.
The machine did not know the correct pattern, and to identify it it had two options: either try your luck, or follow the guide – when participating in the simulation – who repeated the correct sequence over and over again. After about 300 hours of training in increasingly complicated scenarios—with a greater number of objectives and obstacles—the system understood that the best strategy to score points quickly was to follow the bot and become familiar with the pattern it proposed.
In this way, and faced with a completely unknown scenario, the AI ??is capable of immediately searching for the guide, imitating him in real time and remembering his movement pattern to replicate it when the reference disappears from the map.
The system presented this Tuesday uses “a relatively small amount of data and training time” compared to other proposals, explains Hughes. “We believe it is possible to generate much more robust imitations by training [the agent] in a much wider range of different environments,” he reflects.
The expert also points out that a change in the architecture of the neural network can make the system better recognize the most complicated trajectories and refine its memory. In fact, the agent presented today shows a curious peculiarity: although he is able to remember the path that the guide has followed once he disappears, the more times he repeats it alone, the more he makes mistakes.
This occurs because the system bases new knowledge on what happened in the last iteration, so that a small oversight in one of the imitations can quickly become an error that compromises the success of the task. By changing the architecture, engineers believe they can improve the reliability of the memory and therefore obtain better results in the long term.
The authors have not only achieved that their system shows cultural transmission, but they have been able to identify which of the approximately 2 million neurons that make up the artificial network are responsible for this milestone.
“Sometimes deep learning methods are criticized for a lack of interpretability, that is, it is difficult to reason where the tasks they carry out occur,” says the DeepMind engineer. “We have seen that we can correlate activations of individual neurons in the agent’s ‘brain’ with specific events occurring in the world” he simulated, he continues.
The finding not only provides details of how the system solves tasks, Hughes says, but also opens up the possibility of using similar agents to better understand human neuroscience. These “are interesting directions of work for the future,” he concludes, but “here we are simply pointing out that interpretability can, and does, appear.”