Microsoft recently achieved a new milestone in its ability to recognise conversational speech through Switchboard, with a 5.1% word error rate (WER), beating its previous record of 5.9%.
According to a Microsoft blog post: “Switchboard is a corpus of recorded telephone conversations that the speech research community has used for more than 20 years to benchmark speech recognition systems.”
The speech recognition systems are tasked through transcribing conversations about topics such as sport or politics, and are based on neural networks and other artificial intelligence technologies.
To boost its acoustic modelling, the research team improved its capabilities by adding a convolutional neural network combined with bidirectional long-short-term memory.
The post added: “Moreover, we strengthened the recogniser’s language model by using the entire history of a dialog session to predict what is likely to come next, effectively allowing the model to adapt to the topic and local context of a conversation.”
According to Tech Republic, additional technologies like the Microsoft Cognitive Toolkit 2.1 and Azure GPUs helped improve speed and explore architectural differences.
Despite the new levels of WER, Microsoft also noted in the post that there are still many challenges to address with speech recognition.
Written by Leah Alger