View profile

Alexa isn't "incredibly stupid". She just lacks situational awareness.

Alexa isn't "incredibly stupid". She just lacks situational awareness.
Dear Readers,
Welcome to issue #3 of Embodied AI, your bi-weekly synthesis of the latest news, technology, and trends behind AI avatars, virtual beings, and digital humans.
Subscribe here and make sure to forward Embodied AI to your friends and colleagues. Cheers!

Alexa: An amazing virtual assistant that lacks common sense and vision
(Credit: Technology Review/Amazon)
(Credit: Technology Review/Amazon)
In the last issue of Embodied AI, we argued in favor of transforming audio-based virtual assistants, such as Alexa, into AI-powered avatars for ease of skill discovery and more humanlike interactivity. In short, start by equipping Alexa and Siri with eyes on a screen.
Therefore, we are delighted to find out that both Boris Katz, a principal researcher at MIT who helped invent virtual assistants, and Rohit Prasad, head scientist of Alexa, share similar opinions about the current limitations to virtual assistants, i.e. common sense, situational awareness, and the important role of eyes for virtual assistants.
“Incredible progress…incredibly stupid”
That is quite harsh, but it is how Katz thinks of Alexa, Siri, and other virtual assistants in his interview with Technology Review’s Will Knight: a conflicted feeling of pride and embarrassment. On the one hand, Katz is proud of the progress on and the adoption of virtual assistants. But on the other hand, he thinks these programs are “incredibly stupid”.
To be fair, Alexa and her likes are not stupid: they are rather a feat of software engineering with tremendous potential for improvement. But Katz’s candid opinions draw three important takeaways. First, Katz is dubious that training models on huge amounts of data would solve language understanding. Second, language understanding should not be isolated from other modalities like visual, tactile, and other sensory inputs. Third, common sense and intuitive physics are essential for virtual assistants.
Alexa Needs Eyes
Prasad discusses a question at EmTech Digital: “Alexa, why aren’t you smarter?” Given that users have little patience for dumb virtual assistants, Alexa’s popularity demonstrates how good software hacks have become in the absence of true machine intelligence.
But while Alexa can quickly access an encyclopedia-like knowledge base to respond to simple commands, the hack could only go so far. Prasad’s opinion is that “[the] only way to make smart assistants really smart is to give it eyes and let it explore the world.”
Recent news suggests that Amazon has already created versions of Alexa with a camera and is betting on home robotics for “mobile Alexa. This is really exciting news. However, the adjacent possible, our favorite framework, may suggest that robotics will take many more years before adding concrete value to users?
How to Smarten Up AI Assistants
So how do we make AI assistants smarter? Here are the suggestions at TwentyBN for smarter virtual assistants: Deep learning and common sense AI.
It’s all about computation, baby
In a recent blog titled “The Bitter Lesson”, renowned AI scientist Rich Sutton reflects on the recent advancements in speech recognition, computer vision, chess, and Go, observing a pattern again and again: AI researchers tended to start off pursuing methods that leveraged human knowledge but what triumphed in the end are “brute force” methods that leverage computation.
Sutton offers two takeaways from the bitter lesson: First, general purpose methods that continue to scale with increased computation, such as search and learning, are the most powerful and effective approach in AI. Second, we should stop trying to find simple ways to think about the concepts of minds, as their complexity is endless. After all, our goal is to have “want AI agents that can discover like we can, not which contain what we have discovered”.
We agree with Katz that virtual assistants must be smarter. But instead of mirroring artificial intelligence on human intelligence, we share Sutton’s opinion that deep learning, leveraging the massive computational power easily available, is the right way to make AI assistants smarter and product-ready.
Common Sense for AI
Illustrating the difficulty of true language understanding for virtual assistants, Katz mentions a Winograd schema example: “This book would not fit in the red box because it is too small.” Obviously, humans have no trouble understanding that it refers to the box. But what is intuitive to us can often elude the “smartest” AI.
Roland Memisevic, CEO at TwentyBN, has long argued that true language understanding must be grounded in vision. This is the reason why TwentyBN continues to collect millions of videos for our datasets, such as Something-Something, to teach AI this physical common sense.
Data sample from TwentyBN's Something-Something: putting [a book] into [a box]
Data sample from TwentyBN's Something-Something: putting [a book] into [a box]
As it turns out, AI systems trained on TwentyBN’s video datasets have learned a lot. MIT’s CSAIL, leveraging our Something-Something and Jester data, has trained AI that can track how objects change over time. Take a look at the visual explanations for action recognition illustrated by our AI researcher, Raghav Goyal:
Uncovering [Something]
Uncovering [Something]
Closing [Something]
Closing [Something]
Pushing [Something] so that it slightly moves
Pushing [Something] so that it slightly moves
  • Fortnite has reached 250 million users. But Epic Games, the maker of Fortnite, is also a juggernaut in the virtual human space. (Wired)
  • AI avatars are entering healthcare. A recent experiment shows that people “far more readily would tell an avatar their deepest secret.” (Vox)
  • A group of linguists, technologists, and sound designers created Q, a genderless voice, to end gender bias. Some think Q takes on societal responsibility for diversity and inclusivity. (TNW)
  • Clippy, a renowned Microsoft avatar, was resurrected briefly but then brutally killed by the corporate’s brand police again. (The Verge)
  • Three pioneers of deep learning, Geoffrey Hinton, Yoshua Bengio, and Yann LeCun, have been awarded the Turing Award. (The New York Times)
  • Will AI destroy more jobs than it creates in the coming decade? Two experts, Carl Benedikt Frey and Robert D. Atkinson, are split on this question. (The Wall Street Journal)
  • A group of AI researchers, including Yoshua Bengio, signed a letter calling on Amazon to stop selling its biased facial-recognition technology to law enforcement agencies. (The New York Times)
Adjacent Tech
  • Boston Dynamics is acquiring Kinema Systems, a California-based startup, to equip its robots with a better brain. Kinema Systems develops computer vision and ML systems for warehousing robots. (Technology Review)
  • From an ambitious vision of humanoid robots to a more modest goal to simpler machines, Google reboots its robotics program. (The New York Times)
  • Unlike the good ol’ days when developers had less trouble creating hits like Angry Birds or Pokemon Go for iPhones and Android phones, they have a harder time creating killer apps for Alexa, a four-year old platform with already 80,000 apps. (Bloomberg)
  • Intel invests $13 million in Untether AI, an inference chip startup that promises to transfer data 1,000 times faster. (Technology Review)
Thank you for reading!
If you enjoyed this issue of Embodied AI, please forward it to your friends and colleagues! You can reach out to us at and follow us on Twitter.
Written by Nahua , edited by Roland,  David and Moritz
Did you enjoy this issue?
Embodied AI - The AI Avatar Newsletter

Embodied AI is the definitive virtual beings newsletter. Sign up for the monthly digest of the latest news, technology, and trends behind AI avatars, virtual beings, and digital humans.

Written with love by Twenty Billion Neurons, an AI startup based in Berlin and Toronto.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue