They create the first AI that learns language by simulating how the human brain does it

An artificial intelligence has been able to relate images and words after being trained with recordings of what a baby sees and hears on a daily basis.

Oliver Thansan
Oliver Thansan
31 January 2024 Wednesday 21:22
6 Reads
They create the first AI that learns language by simulating how the human brain does it

An artificial intelligence has been able to relate images and words after being trained with recordings of what a baby sees and hears on a daily basis. The system, presented this Thursday in Science magazine, shows that the ingredients necessary for an AI to begin learning language are minimal, and opens the door to developing more efficient models, which learn in a way more similar to how humans do. .

Other AIs had managed to relate objects to words before, but none had done so with so little training data, nor by learning as if they were a child. The recordings with which they have trained the model come from a camera and a voice recorder that Sam, an Australian child, wore incorporated into a helmet in two-hour weekly periods from 6 to 25 months of age. The little boy lived in a family with two cats in a semi-rural environment 30 kilometers from Adelaide.

Researchers have similar recordings from two other girls - Alice, from California, and Asa, from New York -, although they have not used them to train this first AI that simulates the language learning of the human brain. The documents are part of a database developed by the Massachusetts Institute of Technology (MIT) in 2021 to analyze children's learning.

The advance is essential to understand why “humans are much more efficient with data when learning language, compared to the best current AI systems,” Wai Keen Vong, a researcher at the University of New York, explains in an email to this medium. York (NYU) and one of the authors of the article.

“There is a large data gap between the way AI systems and children learn language,” corroborates Brenden Lake, the NYU researcher who led the research. “It is essential for researchers to bridge this gap if we want to build machines that can learn and think like people, and if we are to expand the training of language models beyond the doors of big technology,” he concludes.

More human learning implies more efficient learning in terms of the amount of resources necessary for training. Ultimately, the model is more economical, which represents a key objective of academic research, which seeks to guarantee that scientists working in public centers can have access to these tools, even when they have a smaller budget than large corporations.

To bridge the data gap that Lake talks about, it is only necessary for the machine to understand that when a word and an image appear at the same time, it means that they are related, while if they appear separated in time, it means that they are not related. They have nothing to do with it. This information is enough for neural networks to begin learning language “when combined with the type of input a child receives,” Vong details.

Although this new approach is somewhat closer to human learning, both in the way data is acquired and in the way it is processed, it is still far from describing exactly how children acquire their first words. The objective of the study was for the system to learn with the minimum possible capabilities, the authors acknowledge; The extent to which other cognitive abilities contribute to language development is something to be analyzed in the future.

“Both the model and the data are still quite limited compared to the experiences and abilities of the children,” who at two years old already master about 300 words, Lake points out. The limitations are obvious, given that AI is incapable of relating to the environment: it has no senses, learns passively, lacks social capabilities and has no goals or needs. There are a variety of factors that influence learning that the model is unable to capture.

Furthermore, the AI ​​learns through transcribed words, not spoken language, which, on the one hand, makes its task easier – reading words is easier than listening to them, both on a technical and logistical level – and, on the other hand, it harms, by losing details such as intonation. The research team is currently working on improving speech recognition technology to address this latter limitation, in the hope that this will “provide more details about early language acquisition,” Vong concludes.

The year and a half of weekly recordings of the baby translated into 61 hours of data—about 600,000 video frames and 37,500 statements—with which the NYU scientists trained their artificial intelligence, making it watch them over and over again. This repetition of the same data also differs from the richness of the child's daily life, which she learns in long and diverse episodes, instead of in short and repeated fragments.

To evaluate the performance of their system, the team designed a test in which they gave a word to the AI ​​and it had to recognize the object it referred to among four options, all of them known and studied during the training phase. The model was correct in six out of ten cases, an accuracy only five percentage points lower than that of another model, CLIP, which has been trained with 400 million pairs of images and words from the Internet, about 10,000 times more than the model presented today .

In a second test, the researchers wanted to evaluate whether the model was capable of generalizing, that is, of recognizing the objects with which it had been trained, but on a white background and with slightly different shapes. In this case, the system got 37% of the 67 words it was asked to get right, somewhat above a random result (25%), which shows that, although with difficulty, its AI does have a certain capacity for abstraction.

The team hopes to improve the results by expanding the amount of data with which they have trained their model, in addition to adding cognitive and sensory abilities to the AI. To do this, they have extra footage of the child with whom they have trained it, and with that of two other babies who participated in the same experiment.