Speaking Our Languages: A Behind-the-Scenes Look at Live Translation on AI Glasses

|
Posted by Lisa Brown Jaloza
|
|

From the babel fish of The Hitchhiker’s Guide to the Galaxy to Star Trek’s universal translator, the ability to break down language barriers is critical to many works of science fiction. And it’s reflective of a basic human need: to be understood by others.

So it was no surprise when the tech world was wow’d by Mark Zuckerberg’s successful live demo of live translation at Connect 2024. Since then, we’ve expanded live translation availability across all our AI glasses, with support for English, French, German, Italian, Portuguese, and Spanish on Ray-Ban Meta, Oakley Meta Vanguard, and Oakley Meta HSTN, and support for English, French, Italian, and Spanish on Meta Ray-Ban Display. And today, we’re sharing the story behind this transformative technology — and the people who made it possible.

From Prototype to Product

In true Silicon Valley fashion, the path from prototype to product was a winding one. In fact, live translation was originally conceived as a demo feature for our then-unannounced Meta Ray-Ban Display glasses. But the teams working on the project quickly realized that Ray-Ban Meta, which was already in-market, was the perfect proverbial testbed.

“Thanks to Ray-Ban Meta’s five-microphone array, beamforming could be used to distinguish between the person wearing the glasses and their conversation partner, which in turn helps ensure the accuracy of the translations,” explains Product Manager Nish Gupta. “And rather than relying on a display to show the translated text, we could leverage the glasses’ speakers to play the translation back in near-real time.”

Of course, that resulted in a fairly complicated process. Say you have two people, one Spanish speaker wearing Ray-Ban Meta glasses and one French speaker without glasses. When the French speaker speaks, that audio is first transcribed into text. The text is then translated from French to Spanish. Then a text-to-speech model converts the Spanish text into audio, which is played through the speakers of the Spanish speaker’s glasses. And all of this happens in near-real time — and everything’s processed entirely on the glasses.

Human-Centric Design

“We put people at the center of our design process,” explains Director of Product Management Ashish Garg. “The team thought about a lot of edge-case scenarios, including travel. And when people travel, what do they lack? Good internet connectivity. So we thought, ‘What if they can download it ahead of time? Can they use it in airplane mode?’ We really thought about the end-to-end user journey.”

“This is a very complicated feature by design,” adds Product Manager Emerson Qin. “It’s not processed on the server, so being able to fit a very powerful, useful model onto glasses that can run without internet access — that itself is already a difficult exercise and it manifests into many other difficulties. Because everything happens on device, we don’t have as much information or logging that we would like in order to better the feature. So during the journey of development, that added a lot of challenges for us to know where we were and if the quality was really meeting the bar. And there’s no smarter way to address that except for testing it non-stop.”

Overcoming Obstacles + Limiting Latency

As the project grew, so did the challenges. The team had to rethink everything — from how the glasses would interact with users to how to make the experience seamless for both the wearer and their conversation partner. The models had to be optimized to fit within the glasses’ memory and avoid overheating. And the team had to drive down latency from 5+ seconds to just 2.7 seconds — a roughly 46% improvement — to help conversations feel more fluid and natural.

“This was only made feasible by the team constantly pushing the technical boundaries,” notes Qin of these latency improvements. “The most notable innovation here is enabling the model to understand, translate, and generate speech audio, all in a streaming fashion — all done within the interval of a few words without having to wait for a complete phrase or sentence.”

And still, we’re just getting started.

“Remember that the feature is still in development,” notes Software Engineering Manager Fei Wang. “There’s still noticeable latency, and the accuracy isn’t perfect. We launched now so we can improve the product over time. It’ll get faster and more accurate, and we’ll be able to add additional languages.”

That said, it’s important to keep in mind that each new language requires bespoke model training and evaluation — and that’s specific to the device’s form factor. As Qin explains, “In order for us to ship a new language, everything has to be redone, per device, so it’s difficult for us to scale. We still have a long way to go to add a lot more languages. Everything’s bespoke, so please bear with us.”

Early Impact

Even during early testing and its initial rollout through our Early Access program, live translation has made a difference in people’s lives. People are using the feature to connect with family, navigate new places, and break down barriers at work and in their communities. And live translation is already seeing strong engagement on par with other popular AI use cases.

Visiting an art museum in another country? Use live translation so you can better understand commentary from your local docent. Getting to know your future in-laws? Have deep conversations without having to rely on a third-party translator. Attending an international conference? Now you can meet and converse with new colleagues from across the globe.

While it’s still early days, we’re thrilled by the response we’ve seen thus far. Unlike with earbuds, which can make you feel a bit cut off from your environment (or even signal to other people that you’re not available for conversation), the open-ear speakers on our AI glasses feel more natural and ensure that you stay fully present in the moment and with the people around you. As an added bonus, our live translation feature includes a near-real time transcription of the conversation in both languages right in the Meta AI app, so you can show your phone to your conversation partner and help them follow along, too. Regardless of which form factor you choose, as more languages are added across devices, the tech industry is helping the world feel a little more connected — one conversation at a time.

A Team With a Shared Dream

It’s gratifying for the team to see that tangible impact. But perhaps the greatest accomplishment lies in the dynamics of the team itself.

“I’ve been working at this company for over 10 years, and this is the best team I’ve ever worked on,” says Research Scientist Baiyang Liu. “Because it’s not just about the tech — it’s about the people who believe in it and can make it happen. People worked day and night to get there. Dedicated people solved these problems together because they want to make it work.”

“The way the team coalesced was miraculous,” agrees Product Designer Amy Pu. “Everyone truly believed in this feature and had the same vision. A lot of people on the team have first languages other than English, so we could immediately understand the benefit. Travel is ranked highly as a use case, but nobody’s traveling all the time, so I think multilingual families are where this technology really shines. Think of how meaningful it would be to look at your grandma’s face and understand what she’s saying — even when you don’t speak the same language. Our goal is to create a world where people can understand any language anytime, anywhere. That would be truly empowering.”

Whether you’re traveling abroad, collaborating with international colleagues, or seeking stronger multigenerational connections among your family, the benefits of near-instant translation are clear. An IRL universal translator would help us better navigate the world, communicate with our loved ones, and broaden our horizons. And live translation on AI glasses is an important step in that direction.

“We’ve loved seeing such strong reception and hearing how this technology is making a difference in people’s lives,” says Garg. “From nuanced social settings with friends and family to high-stakes meetings with colleagues, we’re seeing people use this feature in a wide variety of ways and places. And we’re still hard at work to expand the list of available languages as quickly as possible to better scale and meet the needs of more people across the globe.”