How a new noise-reduction algorithm suppresses babble in cochlear implants and hearing aids
People who wear cochlear implants and hearing aids are often faced with the challenge of distinguishing the words of the speaker they are trying to listen to from “babble” – the combined noise of other speakers in the background.
However, a research team at New York University (NYU) has created a solution – an algorithm that separates the speaker’s voice from the noise of background talkers.
This innovative noise reduction technology, called SEDA (for Speech Enhancement using Decomposition Approach), was created by Roozbeh Soleymani, an electrical engineering doctoral student at NYU Tandon School of Engineering, along with Ivan Selesnick, PhD, a professor in the NYU Tandon Department of Electrical and Computer Engineering, and David Landsberger, PhD, an assistant professor in the NYU Langone Department of Otolaryngology.
The Problem with Babble
A majority of algorithms for acoustic noise suppression focus on removing steady background noise – such as the sound of a car or a fan – which is very different from speech and therefore fairly easy to remove. Babble, however, is much more of a challenge to suppress because it resembles the foreground voice signal an individual is trying to hear.
When two things are very different, they are easier to separate – like separating salt from pepper, Soleymani said. Salt and sugar, on the other hand, are difficult to separate because they look alike and both dissolve in water. Likewise, babble and speech are very difficult to separate because they’re inherently so similar.
“Babble consists of people talking, just like the talker you want to listen to,” explained Landsberger. “This makes the properties of the talker and the properties of the background very similar and therefore very hard to separate.” Babble is also very unpredictable, Landsberger added. “The voice and timing jump all over the place, so it’s very hard to predict what you need to remove.”
An Algorithmic Solution
The traditional method to analyze a speech signal separates the signal according to its frequency. “The issue with separating signals according to frequencies is that for babble, the background and the primary speaker you want to listen to overlap in frequency and time,” explained Selesnick, whose National Science Foundation-funded research in 2010 was the starting point for Soleymani’s work. This overlap means that most traditional methods can’t distinguish between babble and speech.
“If you want to separate the speech of the main talker from the background talkers, you need to manipulate them into some space into which they’re separable,” said Landsberger. The solution was to utilize something called the Q Factor domain – which involves the sustainability of the waveform – rather than the frequency domain.
While frequency tells the number of cycles of a waveform per second, the Q Factor tells how many oscillations a particular waveform contains. Though two waveforms may have the same frequency, one waveform may consist of more oscillations than the other waveform.
The waveform with more oscillations has a high Q Factor (a longer lifetime), and the one with a lower number of oscillations has a low Q Factor (a short lifetime). “This is still just a small number of milliseconds,” Selesnick noted. “Everything is happening quickly, but some features of the waveform of the speech signal are comprised of a more sustained sequence of oscillations than others.”
Therefore, SEDA differs from traditional methods because it decomposes a speech signal into waveforms that differ in the number of oscillations that each wavelength contains. Additionally, in the traditional noise reduction methods, the speech signal can be de-noised aggressively or gently. If the signal is de-noised aggressively, the noise is removed but it creates a distorted signal. If the signal is de-noised gently, the distortion might be prevented but the noise remains.
“In traditional methods, there is always a tradeoff between how aggressively the signal is de-noised,” said Soleymani. “We have solved this problem in SEDA by implementing both aggressive and gentle de-noising to different components of the signal and used that to an advantage.”
A Challenging Process
This process presented several challenges besides the difficult nature of babble. Another major challenge, said Soleymani, was performing the de-noising in real-time. “De-noising the signal in real-time means that the moment the signal is received and de-noised, it needs to be sent into the cochlear implant device,” he explained.
When SEDA processes the speech signal, it creates a latency, or time delay to the signal. The goal is to decrease the delay to a level that is not noticeable for cochlear implant users – less than about 10 or 20 milliseconds. With that purpose in mind, the research team is currently working to enhance SEDA and create a new version that is capable of performing the de-noising in real-time and has a very short latency.
The entirety of this challenging process was only possible due to the combined efforts of the team, Soleymani said. “Everyone had a very crucial role,” he stressed. Selesnick’s transform was used to convert the signal from the frequency domain to the Q Factor domain, while Landsberger provided expertise regarding cochlear implants and created a bridge between signal-processing and hearing science.
Improved Sound Quality and Comprehension
One of the greatest impacts of the new algorithm is the improvement in cochlear implants that has been shown in listening tests in the lab. Currently, in a quiet environment, a cochlear implant user might understand 80-100% of a conversation. However, when the same person is in a crowded environment, the level of intelligibility might drop to around 20-50%. “If we can create a robust improvement, let’s say even 20% of intelligibility in a crowded environment, it will make a huge difference for them,” Soleymani said.
A formal evaluation of the algorithm in cochlear implant users demonstrated a great improvement in both sound quality and comprehension. Depending on the signal-to-noise ratio (how loud the background noise was), the improvements ranged between understanding 10% more words to around 50% more words – which is a dramatic improvement, Landsberger said.
During the evaluations, all of the cochlear implant users already had noise reduction algorithms in place from the manufacturer. “So the improvements we’re seeing are above and beyond what’s already implemented,” said Landsberger. “This is the benefit in addition to the noise reduction that’s commercially available.”
The potential uses for SEDA extend far beyond cochlear implants and hearing aids. “This sort of technology could be revolutionized to be able to be used in public successfully,” said Landsberger. He mentioned that the algorithm has the potential for a variety of applications – essentially anywhere a noisy background makes it difficult to understand one particular speaker.
For example, though cellphones use noise reduction techniques, they aren’t truly effective for extremely noisy environments, like a restaurant or train station, he said. The microphone from the cell phone picks up both the talker and all of the background talkers, muddling the noise and making it difficult to understand.
However, if a phone is fitted with the new algorithm, the algorithm will suppress the background and allow the person on the remote side to understand better, even if they don’t have a phone with the algorithm, and vice versa. SEDA also has the potential to be useful for voice recognition systems, which typically do not work well in loud environments.
A U.S. patent application has been submitted for SEDA, though researchers are already working on modifying and improving the original technology. “Most of what we’re doing at the moment is just making the software faster, better, and making it more implementable in real life devices,” Soleymani explained.
Currently, the algorithm is running on a computer in the lab, but in order to have it built into a processor for a cochlear implant, the cochlear implant manufacturers need to program a code into the implant processor.
In the meantime, however, a front-end device (like a small cellphone) can be plugged into the implant, which bypasses the implant microphone and creates a pathway to clean up the signal using the new algorithm. However, Landsberger said, the eventual goal is to work with the implant companies to incorporate the new algorithm.
New York University Tandon School of Engineering. NYU Tandon Doctoral Student’s Cochlear Implant Technology Banishes Ambient Babble. http://engineering.nyu.edu/press-releases/2016/05/03/nyu-tandon-doctoral-students-cochlear-implant-technology-banishes-babble