Where did that sound come from?
Neuroscientists have developed a computer model that can answer that
question as well as the human brain
Date:
January 27, 2022
Source:
Massachusetts Institute of Technology
Summary:
Neuroscientists developed a computer model that can localize
sounds. The model, which consists of several convolutional neural
networks, not only performs the task as well as humans do, it also
struggles in the same ways that humans do when the task is made
more difficult by adding echoes or multiple sounds.
FULL STORY ==========================================================================
The human brain is finely tuned not only to recognize particular sounds,
but also to determine which direction they came from. By comparing
differences in sounds that reach the right and left ear, the brain
can estimate the location of a barking dog, wailing fire engine, or
approaching car.
==========================================================================
MIT neuroscientists have now developed a computer model that can
also perform that complex task. The model, which consists of several convolutional neural networks, not only performs the task as well as
humans do, it also struggles in the same ways that humans do.
"We now have a model that can actually localize sounds in the real
world," says Josh McDermott, an associate professor of brain and
cognitive sciences and a member of MIT's McGovern Institute for Brain
Research. "And when we treated the model like a human experimental
participant and simulated this large set of experiments that people had
tested humans on in the past, what we found over and over again is it the
model recapitulates the results that you see in humans." Findings from
the new study also suggest that humans' ability to perceive location is
adapted to the specific challenges of our environment, says McDermott,
who is also a member of MIT's Center for Brains, Minds, and Machines.
McDermott is the senior author of the paper, which appears today in
Nature Human Behavior. The paper's lead author is MIT graduate student
Andrew Francl.
Modeling localization When we hear a sound such as a train whistle,
the sound waves reach our right and left ears at slightly different
times and intensities, depending on what direction the sound is coming
from. Parts of the midbrain are specialized to compare these slight
differences to help estimate what direction the sound came from, a task
also known as localization.
==========================================================================
This task becomes markedly more difficult under real-world conditions --
where the environment produces echoes and many sounds are heard at once.
Scientists have long sought to build computer models that can perform the
same kind of calculations that the brain uses to localize sounds. These
models sometimes work well in idealized settings with no background noise,
but never in real-world environments, with their noises and echoes.
To develop a more sophisticated model of localization, the MIT team turned
to convolutional neural networks. This kind of computer modeling has been
used extensively to model the human visual system, and more recently,
McDermott and other scientists have begun applying it to audition as well.
Convolutional neural networks can be designed with many different architectures, so to help them find the ones that would work best for localization, the MIT team used a supercomputer that allowed them to
train and test about 1,500 different models. That search identified
10 that seemed the best-suited for localization, which the researchers
further trained and used for all of their subsequent studies.
To train the models, the researchers created a virtual world in which
they can control the size of the room and the reflection properties of
the walls of the room. All of the sounds fed to the models originated
from somewhere in one of these virtual rooms. The set of more than 400
training sounds included human voices, animal sounds, machine sounds
such as car engines, and natural sounds such as thunder.
==========================================================================
The researchers also ensured the model started with the same information provided by human ears. The outer ear, or pinna, has many folds that
reflect sound, altering the frequencies that enter the ear, and these reflections vary depending on where the sound comes from. The researchers simulated this effect by running each sound through a specialized
mathematical function before it went into the computer model.
"This allows us to give the model the same kind of information that a
person would have," Francl says.
After training the models, the researchers tested them in a real-world environment. They placed a mannequin with microphones in its ears in
an actual room and played sounds from different directions, then fed
those recordings into the models. The models performed very similarly
to humans when asked to localize these sounds.
"Although the model was trained in a virtual world, when we evaluated it,
it could localize sounds in the real world," Francl says.
Similar patterns The researchers then subjected the models to a series of
tests that scientists have used in the past to study humans' localization abilities.
In addition to analyzing the difference in arrival time at the right
and left ears, the human brain also bases its location judgments on
differences in the intensity of sound that reaches each ear. Previous
studies have shown that the success of both of these strategies varies depending on the frequency of the incoming sound. In the new study, the
MIT team found that the models showed this same pattern of sensitivity
to frequency.
"The model seems to use timing and level differences between the two ears
in the same way that people do, in a way that's frequency-dependent,"
McDermott says.
The researchers also showed that when they made localization tasks more difficult, by adding multiple sound sources played at the same time,
the computer models' performance declined in a way that closely mimicked
human failure patterns under the same circumstances.
"As you add more and more sources, you get a specific pattern of decline
in humans' ability to accurately judge the number of sources present,
and their ability to localize those sources," Francl says. "Humans seem
to be limited to localizing about three sources at once, and when we ran
the same test on the model, we saw a really similar pattern of behavior." Because the researchers used a virtual world to train their models,
they were also able to explore what happens when their model learned
to localize in different types of unnatural conditions. The researchers
trained one set of models in a virtual world with no echoes, and another
in a world where there was never more than one sound heard at a time. In
a third, the models were only exposed to sounds with narrow frequency
ranges, instead of naturally occurring sounds.
When the models trained in these unnatural worlds were evaluated on
the same battery of behavioral tests, the models deviated from human
behavior, and the ways in which they failed varied depending on the type
of environment they had been trained in. These results support the idea
that the localization abilities of the human brain are adapted to the environments in which humans evolved, the researchers say.
The researchers are now applying this type of modeling to other aspects
of audition, such as pitch perception and speech recognition, and believe
it could also be used to understand other cognitive phenomena, such as the limits on what a person can pay attention to or remember, McDermott says.
The research was funded by the National Science Foundation and the
National Institute on Deafness and Other Communication Disorders.
========================================================================== Story Source: Materials provided by
Massachusetts_Institute_of_Technology. Original written by Anne
Trafton. Note: Content may be edited for style and length.
========================================================================== Journal Reference:
1. Andrew Francl, Josh H. McDermott. Deep neural network models
of sound
localization reveal how perception is adapted to real-world
environments.
Nature Human Behaviour, 2022; 6 (1): 111 DOI:
10.1038/s41562-021-01244-z ==========================================================================
Link to news story:
https://www.sciencedaily.com/releases/2022/01/220127114318.htm
--- up 7 weeks, 5 days, 7 hours, 13 minutes
* Origin: -=> Castle Rock BBS <=- Now Husky HPT Powered! (1:317/3)