"Do you want eyes with that?": Why AI assistants should become AI avatars





Subscribe to our newsletter

By subscribing, you agree with Revue’s Terms of Service and Privacy Policy and understand that Embodied AI - The AI Avatar Newsletter will receive your email address.

"Do you want eyes with that?": Why AI assistants should become AI avatars
Dear Readers,
Welcome to Issue ✌️of Embodied AI, your bi-weekly insights on the latest news, technology, and trends behind AI avatars, virtual beings, and digital humans.
Technology Review shared Bill Gatestop 10 technological breakthroughs in 2019, which include smooth-talking AI assistants. Google Duplex, for instance, can help you book tables in 43 states now.
But have you ever wondered why Alexa, Google Assistant, or Siri don’t come with a pair of eyes? This week we invite you to imagine a world where your virtual assistants are morphed into AI avatars with eyes 👀 and see how they can serve you better.
👉 Sign up here to subscribe to our newsletter!

🎤 “Oh (why) can’t you see?”
One reason that AI assistants don’t see is because building a voice AI product is itself a daunting and resource-intensive task. Amazon currently has over 10,000 employees working on Alexa and Echo devices, and Facebook has flat out failed to build one, shipping its Portal with a built-in Alexa. 🤦‍♀️
Another reason has to do with the long-standing concerns over data privacy and surveillance. After all, it’d be creepy to have a camera monitoring every breath you take, every move you make 🎶which isn’t as romantic as the song makes it seem.
These two factors are contributing to people’s low expectation of virtual assistants, which might be why many of us are comfortable simply engaging in a command-query interaction with Alexa. But as technologists and innovators, aren’t we supposed to be thinking a little bigger? How about building AI capable of a two-way, humanlike interaction?
Virtual assistants with eyes? (Video credit: Amazon Alexa)
Virtual assistants with eyes? (Video credit: Amazon Alexa)
👀 Do you want eyes with that?
While the creepiness associated with a watchful AI assistant is often a design issue solvable by measures like comprehensive legal regulations (like GDPR), more efficient edge computing (what happens at the edge stays at the edge), and building anonymized and secure AI training mechanisms (such as OpenMined), the benefits that come with a seeing AI assistant are crucial for a humanlike interaction and an enriched user experience.
Eyes are the window to a digital being’s soul
Humans are embedded with the evolutionary need for eye contact. Christian Jarrett recently wrote about the power of gaze on the BBC. Here are 3 highlights:
  • Gazing eyes immediately hold our attention on another person and make us more conscious of their mind and perspective.
  • We tend to perceive people who make more eye contact to be more intelligent, conscientious, and sincere, at least in Western cultures.
  • We rate strangers with whom we’ve made eye contact as more similar to us in terms of personality.
Jarrett concludes that eye contact is perhaps the closest we will come to “touching souls”. If we strive to create natural interactions between AI and humans that create trust, then embodying AI with eyes is essential.
So yes, we want fries *ahem* eyes with that!
TwentyBN's Millie pays attention to users by gazing
TwentyBN's Millie pays attention to users by gazing
Skill discovery is easier when AI assistants can see
Besides soul-touching, AI assistants that can see can help people discover their own special skills. In a recent blog post on Alexa, a16z‘s Benedict Evans notes that survey data shows people use virtual assistants mostly for audio activities like music, podcasts, weather forecasts, and kitchen timers, plus trivial questions and smart light control. But shouldn’t virtual assistants be able to do more than that?
They should (and could), but Evans also points out a paradox: the seemingly flexible and free-form audio-only interface is highly limited in functionality. Would you listen to Alexa list all of her 70,000 skills so that you can really know how to get the most out of your virtual assistant? 😅
Now, imagine an interface or even an operating system that comes with computer vision. Why recite or list your skills via audio when you can proactively offer your service by seeing and understanding intuitively what the person needs (paired with a screen)? AI assistants with advanced action understanding capabilities can interact with humans actively while making their lives seamless and more productive.
🤖 AI assistants envisioned as AI avatars
We are not the only ones who are imagining AI assistants embodied in human form. In a Wired article discussing the so-called Mirrorworld, Kevin Kelly visualizes the next big tech platform:
“In the mirrorworld, virtual bots will become embodied. Agents like Siri and Alexa will take on 3D forms that can see and be seen. They will be able not just to hear our voices but also see our gestures and pick up on our microexpressions.”
They will, essentially, become AI avatars. Powered by voice AI and computer vision, avatars will become the primary agents humans engage with when interacting with all interfaces. Our devices will turn on with a simple gaze. By understanding our actions, such as repeatedly scratching the skin, they will know to increase the humidity in the room. By understanding our mood changes, they can understand which of our favorite songs to play on Spotify.
With eyes, our virtual assistants no longer serve as our virtual slaves but instead transform into intelligent avatars that engage with us in a “soulful” manner. They can be emotionally connected to us through gazing and effectively serve us through seeing, while opening a new chapter of human-machine interaction.
🗞️ Latest Avatar News
  • Facebook wants to let people build lifelike virtual avatars of themselves to create a social community that can overcome the challenges of the physical distance between people. They call the project Codec Avatars. (Wired)
  • Despite the popularity of virtual influencers like Lil Miquela, a recent survey raises questions about whether these influencers wield the same influence as a human. (AdWeek)
  • Instead of using a generic emoji, you can now use DeepMotion’s digital avatars to capture your real-time natural body movement. The technology is available on Samsung Galaxy S10 smartphones. (VentureBeat)
  • Kawaii is not just cute in Japanese, it’s also the name of a new AI-powered companion created by Vinclu Inc., which aims to forge an emotional relationship between humans and digital assistants. (The Japan Times)
🔖 Bookmarked
  • Wired’s Kevin Kelly envisions a near-future where everything on earth will have its own digital twin. He calls it the mirrorworld. (Wired)
  • Talents and publicity are Demis Hassabis’ two leverages to steer DeepMind‘s AGI agenda and secure its independence from Google. Five years after being acquired by Alphabet Inc., Hal Hodson asks who is actually in charge of DeepMind. (1843 Magazine)
  • Futurists warn: “The globots are coming! This time is different!” Historian Jil Lepore, in an engaging long-form essay, addresses the doomsayers with a more nuanced analysis, intertwined by a tale of two fears, that of artificial intelligence and of remote intelligence, a.k.a. Immigrants. (The New Yorker)
👋 Thank you for reading!
We hope you enjoyed this issue of Embodied AI! Please email us at hello@embodiedai.co for comments and feedback. 
Don’t forget to follow us on 👉 Twitter and LinkedIn!
Did you enjoy this issue?
Embodied AI - The AI Avatar Newsletter

Embodied AI is the definitive virtual beings newsletter. Sign up for the monthly digest of the latest news, technology, and trends behind AI avatars, virtual beings, and digital humans.

Written with love by Twenty Billion Neurons, an AI startup based in Berlin and Toronto.

In order to unsubscribe, click here.
If you were forwarded this newsletter and you like it, you can subscribe here.
Powered by Revue