Building the Future with Michael Abrash and Andrew “Boz” Bosworth
Following the Meta Connect 2023 keynote, Reality Labs Research Chief Scientist Michael Abrash joined Meta CTO and Head of Reality Labs Andrew “Boz” Bosworth for a virtual fireside chat. Missed the livestream? Watch the full video on-demand and check out our recap below.
People Driving Progress
The two discussed the myth of technological inevitability, which is the mistaken belief that technological progress is bound to happen, rather than recognizing that it’s always situated within a specific historical and social context. But as Abrash rightly points out, “People make it happen—specific people, specific choices get made in specific paths, and we are making that happen. And the people who are coming to Connect, that’s the community that’s making it real.”
From Xerox PARC...
For the last half century or so, as Abrash explained, we’ve been living in the world that Xerox PARC created: 2D surfaces with bitmapped graphics, a keyboard, a pointing device, WYSIWYG word processing, ethernet, object-oriented programming, and more.
“The real vision,” said Abrash, “is a world in which we can mix real and virtual in any way we want to serve our needs and to meet our goals. And that is the thing that we’re creating—where we can drive our perceptions and allow us to act in the way we do in the real world.
...to the Next Paradigm Shift
Just as the work done at Xerox PARC changed the way the world interacts with computers, the work the AR/VR community is doing now could well usher in another paradigm shift. But breakthrough technologies are still necessary to bring about true step changes in human-computer interaction.
“We all grew up in a world of Moore’s Law,” said Abrash. “We always knew there was gonna be more compute next year. And the platforms have been consistent since Xerox PARC in the sense of, again, 2D surface, pointing device, keyboard. So that has kind of gotten us used to a world in which, really, everything’s a software problem and the platform underneath it only changes in incremental ways. But what we’re talking about here is a change all the way from the bottom, where the hardware, the software, the applications—all will evolve over time into much more powerful forms.”
While the input modalities of modern computing have largely remained the same since the Alto, Bosworth pointed out that “these were things that were highly contentious in the 1950s and 1960s: What were going to be the methodologies that people were going to use to get information to the computer? We’ve since kind of zoomed in on one. Sure, we’ve replaced the mouse for Direct Touch in the case of touch screens, but otherwise we’ve been incredibly consistent for a long time. And those modalities just don’t work in augmented and virtual reality. So that’s another area that’s such a key piece that’s so different from the previous generation of technology.”
In fact, Abrash points out that the question of input—and the interface of the future—may be the key differentiating factor between our current computing paradigm and the next computing platform.
“You want the right thing to happen when you want it to happen,” Abrash explained. “And that really comes down to a combination of being able to sense the world around you, of understanding your context, of having AI that can make sense of that to help you, and then of having this ultra low-friction input that lets you act easily, intuitively, any time, any place. All those pieces need to come together.”
On Input & AI Interfaces
We’ve spoken publicly about electromyography (EMG) several times now, and we think it will be critical to how we interact with the virtual world in the future. But the question of input is also, as Bosworth notes, an AI question. Those neuromuscular signals detected at the wrist via EMG need to be decoded, which requires artificial intelligence.
“You need to have enough of a general model that the entire population can get started, and then be able to personalize that model,” Bosworth stressed. “We’re talking about something called coevolution, which was foundational in the very earliest user interfaces being designed at Stanford Research Institute was this idea of coevolution. But it was so hard to do back then that basically almost all of the evolution had to happen on the side of the consumer—the machine couldn’t really help. And with today’s AI powers, we can actually really help adapt these models to each individual in the same way that we adapt people’s News Feeds to their personal preferences.”
Think about when you click on a particular icon on a computer screen. As Abrash was quick to point out, there’s a huge amount of context that goes into that seemingly simple operation. “What application are you running? What icon is it? The system funnels down your awareness into the one place where you want to make the choice, so you can do it with one single one-bit action,” he explained. “In the physical world, that’s much harder. The physical world is much more complex. But what you can imagine is that the contextual AI that we’re talking about actually does that scoping for you. You can use EMG to simply pick the thing that you want as opposed to having to sort through all the possibilities, all the way from the bottom up: recognizing the nerve signals, customizing for each person, and then putting it all together to help you meet your goals.”
That customization is where things get really interesting. Imagine that you had a keyboard that actually moved around under your fingertips, zeroing in on what you meant to type rather than forcing you to hit physical keys where they are. That’s a pretty significant breakthrough: Rather than trying to conform to a physical keyboard that was designed to accommodate a broad range of people, we can leverage computer vision and machine learning to figure out the intent of your individual finger motion, resulting in a truly individualized keyboard that eliminates the intermediate step of you having to fit the mold of someone else’s design and instead asks simply: What is your intent?
We’re working on sensors that work in a variety of conditions and with low power consumption, and EMG may solve the problem of input. But even just a year ago, the question of how the system would be able to understand your context was very much a research problem. Now, with the advent of large language models (LLMs) and the possibility to make them multimodal, it opens the door to a future interface that can start to act proactively on your behalf, anticipating your needs and scoping down your choices to make your life easier.
The Future of Social Presence
Another key research area that Abrash and Bosworth touched on was Codec Avatars—our hyper-realistic real-time digital representations of people that we believe will play an important role in the metaverse.
“Codec Avatars consist of two parts: the encoder, which takes the data from the sensors and encodes your current state, and the decoder, which is on the receiving end, re-expanding that data into your avatar,” Abrash explained. “Codec Avatars are remarkably true-to-life. I will say that I was shocked the first time I saw a really fully functional Codec Avatar. It’s not just like a better avatar—it’s leaped ahead to the point where you feel like you are legitimately with that person. And when I think about what is most key about the metaverse, the most interesting thing in the world is other people.”
For years, we’ve known that the real magic of VR is presence—the sense that you’re actually there in a virtual environment, sharing the same physical space as digital content so that it all feels real. Codec Avatars offer a tantalizing glimpse at the future of social presence—the sense that you’re physically in the same space as another person (or people), no matter where in the world they happen to be.
“I think this may be one of the most important aspects of the metaverse really blossoming into its full potential,” Abrash noted, “which is simply the ability to put people in the same space with other people in a way that feels fully real, fully meaningful.”
The Good Old Days
“I’ve always been excited about all the things that we’re doing because they all have to happen, right?” Abrash said in response to the question of what inspires him to continue in his work day to day. “I’m excited about the fact that the whole platform is emerging as the next generation that carries us for the next 50 years. And we’re putting all those pieces in place—not just in research, but also in product.”
Though, if pressed to choose, Abrash acknowledged it’s the personalized, contextualized, ultra low-friction AI interface that he finds most exciting.
“The way that humans interact with the digital world has only changed once ever, and that really was Doug Engelbart, Xerox PARC, and the Mac,” he explained. “Since then, we’ve been living in that world. And as we move into this world of mixing the physical and virtual freely, we need a new way of interacting—and I feel that that has to be this contextualized AI approach. Getting that to happen is the thing that I find most exciting. It’s a once-in-a-lifetime opportunity to really change the way that everybody lives.”
To close, Bosworth took an oft-repeated rallying cry from Abrash’s own playbook: These are the good old days.
“When you’re in the middle of these struggles to invent new technology, they feel impossibly challenging. You suffer your defeats, visibly sometimes and painfully every time,” Bosworth paraphrased. “And at some point, we will look back on this time—not just our company, but as an industry—as the good old days: the days in which the system was forged that became a platform for the future. So I share your enthusiasm about all this work, and I’m so glad to continue to be doing it with you all these years later. Thank you, Michael.”


