Skip to main content

Microsoft's new speech recognition system achieves human parity in audible words

Computers can do some amazing things lately, with things like parallel processing, machine intelligence, and more powerful hardware allowing extraordinary advancements on what seems like a daily basis. Microsoft is in the thick of things when it comes to the artificial intelligence, and machine learning is at the center of it all. On Tuesday, the company announced another significant breakthrough.

The most natural way for humans to interact with computers is by speaking with them, and Microsoft has created technology that can understand spoken language as well as humans, according to the Microsoft blog. Reaching human parity in speech recognition is a historic achievement and Microsoft achieved this milestone more quickly than it expected. “Even five years ago, I wouldn’t have thought we could have achieved this. I just wouldn’t have thought it would be possible,” said Harry Shum, executive vice president in charge of Microsoft’s Intelligence and Research Group.

Recommended Videos

According to a paper published on Monday, Microsoft’s research team has created a speec- recognition system that achieves a word error rate (WER) of only 5.9 percent, a reduction from the 6.3 percent reported just a month ago. Human beings who transcribe the same conversation used in the test also achieve around a 5.9 percent WER, meaning that for the first time, a computer performs just as well in the industry standard Switchboard task as do humans.

Speech-recognition research began in the early 1970s at the Defense Advanced Research Projects Agency (DARPA), and the computer industry took up the challenge and has been working ever since to accomplish the goal of a human-like ability to understand what is being said. Now that this milestone has been reached, we can expect digital assistants and other tools to dramatically improve their ability to interact with us in more natural fashion. “This will make Cortana more powerful, making a truly intelligent assistant possible,” Shum said.

Microsoft’s new speech-recognition system does not achieve perfection in recognizing spoken conversation, but then again, neither do we. To overcome the usual mistakes in recognizing language, the system uses neural network technology to leverage neural language models that can make the same inferences that humans make when correcting for misheard words.

The team used a few existing tools to achieve the speech-recognition milestone. For example, the Computational Network Toolkit, an open source Microsoft system for applying deep learning to computing tasks, was utilized, allowing the specialized graphics processing units (GPUs) running in parallel to enable faster processing of deep-learning algorithms. Technologies used for other tasks, such as image processing, were also leveraged.

The researchers are not resting on their laurels, however. Work remains to make the speech-recognition technology work in more real-world settings where background noise and context can make recognizing conversational speaking a much more difficult task. As Geoffrey Zweig, manager of Microsoft’s Speech & Dialog research group, put it, “The next frontier is to move from recognition to understanding.”

Mark Coppock
Former Computing Writer
Mark Coppock is a Freelance Writer at Digital Trends covering primarily laptop and other computing technologies. He has…
AMD’s RDNA 4 may surprise us in more ways than one
AMD RX 7800 XT and RX 7700 XT graphics cards.

Thanks to all the leaks, I thought I knew what to expect with AMD's upcoming RDNA 4. It turns out I may have been wrong on more than one account.

The latest leaks reveal that AMD's upcoming best graphics card may not be called the RX 8800 XT, as most leakers predicted, but will instead be referred to as theĀ  RX 9070 XT. In addition, the first leaked benchmark of the GPU gives us a glimpse into the kind of performance we can expect, which could turn out to be a bit of a letdown.

Read more
This futuristic mechanical keyboard will set you back an eye-watering $1,600
Hands typing on The Icebreaker keyboard.

I've complained plenty about how some of the best gaming keyboards are too expensive, from the Razer Black Widow V4 75% to the Wooting 80HE, but nothing comes remotely close to The Icebreaker. Announced nearly a year ago by Serene Industries, The Icebreaker is unlike any keyboard I've ever seen -- and it's priced accordingly at $1,600. Plus shipping, of course.

What could justify such an extravagant price? Aluminum, it turns out. The keyboard is constructed of one single block of 6061 aluminum in what Serene Industries calls an "unorthodox wedge form." As if that wasn't enough metal, the keycaps are also made of aluminum, and Serene says they include "about 800" micro-perforations that allow the LED backlight of the keyboard to shine through.

Read more
Google one-ups Microsoft by making chats easier to transfer
Google Spaces in Google Chat on a MacBook.

In a recent blog post, Google announced that it is making it easier for admins to migrate from Microsoft TeamsĀ to Google Chat to reduce downtime. Admins can easily do this within the Google Chat migration menu and connect to opposing Microsoft accounts to transfer Teams data.

Google gave step-by-step instructions for admins on how to transfer the messages. Admins need to connect to their Microsoft account and upload a CSV of the Teams from where they transfer the messages. From there, it requires just entering a starting date for messages to be migrated from Teams and clicking Star migration. Once it's complete, it'll make the migrated space, messages, and conversation data available to Google Workspace users.

Read more