In the last issue
of Embodied AI
, we argued in favor of transforming audio-based virtual assistants, such as Alexa, into AI-powered avatars
for ease of skill discovery and more humanlike interactivity. In short, start by equipping Alexa and Siri with eyes
on a screen.
Therefore, we are delighted to find out that both Boris Katz
, a principal researcher at MIT
who helped invent virtual assistants, and Rohit Prasad
, head scientist of Alexa
, share similar opinions about the current limitations to virtual assistants, i.e. common sense, situational awareness, and the important role of eyes for virtual assistants.
“Incredible progress…incredibly stupid”
That is quite harsh, but it is how Katz thinks of Alexa, Siri
, and other virtual assistants in his interview with Technology Review
’s Will Knight
: a conflicted feeling of pride and embarrassment. On the one hand, Katz is proud of the progress on and the adoption of virtual assistants. But on the other hand, he thinks these programs are “incredibly stupid”.
To be fair, Alexa and her likes are not stupid: they are rather a feat of software engineering with tremendous potential for improvement. But Katz’s candid opinions draw three important takeaways. First, Katz is dubious that training models on huge amounts of data would solve language understanding. Second, language understanding should not be isolated from other modalities like visual, tactile, and other sensory inputs. Third, common sense and intuitive physics are essential for virtual assistants.
Alexa Needs Eyes
But while Alexa can quickly access an encyclopedia-like knowledge base to respond to simple commands, the hack could only go so far. Prasad’s opinion is that “[the] only way to make smart assistants really smart is to give it eyes and let it explore the world.”
Recent news suggests that Amazon has already created versions of Alexa with a camera
and is betting on home robotics for “mobile Alexa”
. This is really exciting news. However, the adjacent possible
, our favorite framework, may suggest that robotics will take many more years before adding concrete value to users?