Amazon’s Alexa is becoming more responsive, knowledgeable, and contextually aware. In a blog post ahead of an invited talk at the NeurIPS 2018 conference in Montreal, Ruhi Sarikaya, director of applied science at Alexa AI, detailed the progress Amazon’s made in the field of conversational artificial intelligence (AI) throughout the course of the year, and a few of the recent improvements it’s rolled out to Alexa-enabled smart speakers, televisions, set-top boxes, and other devices.
“There has been remarkable progress in conversational AI systems this decade, thanks in large part to the power of cloud computing, the abundance of the data required to train AI systems, and improvements in foundational AI algorithms,” Sarikaya wrote. “Substantial advances in machine learning technologies have enabled this, allowing systems like Alexa to act on customer requests by translating speech to text, and then translating that text into actions.”
Currently, Alexa relies on a number of contextual clues to resolve ambiguity, including historical activity, preferences, memory, third-party skill ratings and usage, session context, and physical context (i.e., the Alexa-enabled device’s location). To improve its precision further, Amazon this week launched a self-learning system that “detects the defects in Alexa’s understanding and automatically recovers from these errors” without the need for human intervention by “[taking] advantage of customers’ implicit or explicit contextual signals.”
Sarikaya said that during the beta earlier this year the AI system autonomously learned to associate the command “Play ‘Good for What’” with “Play ‘Nice for What’,” correcting a user’s misspoken request for a Drake song.
“This [AI] is currently applying corrections to a large number of music-related utterances each day, helping decrease customer interaction friction for the most popular use of Alexa-compatible devices,” Sarikaya said. “We’ll be looking to expand the use of this self-learning capability in the months ahead.”
Alexa’s advancements aren’t limited to speech comprehension. This fall, Amazon introduced an AI model that performs name-free skill interaction, allowing users to find and launch skills in the Alexa Skills Store without having to remember their exact titles or names. As Sarikaya explained, it enables customers to issue a command like, “Alexa, get me a car,” instead of having to specify a particular ride-sharing service, like Uber or Lyft.
The model made its debut in the U.S. earlier this year, and it recently expanded to the U.K., Canada, Australia, India, Germany, and Japan.
“[When] customers in Germany say ‘Alexa, welche stationen kennst du?’ (‘Alexa, what stations do you know?’) Alexa will reply ‘Der Skill Radio Brocken kann dir dabei helfen. Möchtest du ihn aktivieren?’ (‘The skill Radio Brocken can help. Do you want to enable it?’)” Sarikaya wrote.
On the conversational front, Alexa’s now better able to track references through several rounds of conversation, a problem known as slot carryover. And with Follow-Up Mode, which is powered by AI that’s able to distinguish follow-up requests from noise of background conversations or audio, it’s able to converse more naturally by allowing users to issue commands without having to repeat the wake word “Alexa.”
“For example, if a customer says ‘What’s the weather in Seattle?’ and after Alexa’s response says ‘How about Boston?’, Alexa infers that the customer is asking about the weather in Boston,” Sarikaya wrote. “If, after Alexa’s response about the weather in Boston, the customer asks, ‘Any good restaurants there?’, Alexa infers that the customer is asking about restaurants in Boston.”
Both of those enhancements hit U.S. shores earlier this year, and they’ve since expanded to customers in Canada, the U.K., Australia, New Zealand, India, and Germany.
They follow the rollout of a dialogue-driven music playlist feature that allows users to find new playlists through voice, and a more personalized Amazon Music recommendation system informed by listening habits, followed artists, favorite genres, and other factors. Amazon this week also announced Alexa Answers, a feature that lets customers submit answers to uncommon questions that may then be distributed to millions of Alexa users around the world.
“[We’re] on a multiyear journey to fundamentally change human-computer interaction,” Sarikaya said. “It’s still Day 1, and not unlike the early days of the internet, when some suggested that the metaphor of a market best described the technology’s future. Nearly a quarter-century later, a market segment is forming around Alexa, and it’s clear that for that market segment to thrive, we must expand our use of contextual signals to reduce ambiguity and friction and increase customer satisfaction.”