Celebrating 10 Years of AI Innovation + the Future of AR/VR

|
|

This year marks the 10th anniversary of Meta’s Fundamental AI Research (FAIR) team. It’s an exciting milestone—and one that FAIR itself celebrated by introducing new models, datasets, and updates spanning audio generation, translation, and multimodal perception. And it’s a good reminder that, while AI may be the hot topic du jour, it’s also been part of the company’s DNA for years.

“It was clear from Facebook’s very early days that AI was going to be one of the most important technologies for us as a company—perhaps even the most important technology,” reflects Meta CTO & Head of Reality Labs Andrew “Boz” Bosworth, who also happens to have been the company’s very first AI hire.

“I was able to design and build our first heuristic-based systems for News Feed and then our machine learning systems with Coefficient,” Bosworth recalls. “But of course, my AI knowledge was quickly out of date. You have to remember that when I was teaching Mark, neural networks were thought to be a dead end. We taught it kind of as a once-great technology whose limitations had been exposed. Of course, a few years later, by the time I started to work on ads, the revolution of neural networks was already mature. I had the great pleasure there to oversee teams working on our first implementations of sparse neural networks and Pytorch.”

There was tremendous excitement across the tech industry in the early days of AI, as shown by a veritable race to build out cutting-edge AI teams—but Mark Zuckerberg decided very early on to put a fundamental AI research lab at the heart of the company’s AI efforts

“Beginning in 2013, FAIR set a new standard for an AI industry research lab,” notes Bosworth. “We prioritized working out in the open, collaborating with the entire research community, and we publish and open source the majority of our work, which accelerates progress for everyone.”

Within a year, FAIR had begun publishing its work, and 2017 saw the open sourcing of PyTorch, which quickly became a common framework used to build cutting-edge AI in both research and production. And AI has gone on to impact Meta’s business and most important strategic priorities, from Feed ranking and content recommendations to the delivery of relevant advertising, image and sticker generation, and AIs you can interact with, including Meta AI.

“As exciting as this work is, it’s still in its infancy,” Bosworth says. “It will play a major role not just in the products that we have today, but also in products that were previously not possible, including of course those in the space of wearables and augmented reality. Our vision in those spaces actually depends on AI that’s capable of truly understanding the world around us and anticipating our needs. We believe this kind of contextualized AI will be the bedrock of the first truly new computing platform since the PC itself.”

“I’ve spent most of the last decade leading the research effort to create a new kind of computing platform built on AR/VR, while the rest of Reality Labs has worked to make that platform a reality,” adds Chief Scientist Michael Abrash. “This has been one of Meta’s two big long-term bets on the technologies of the future, the other of course being AI. As we celebrate FAIR’s 10th anniversary, it is tremendously exciting to me to see how these two long term bets are coming together in a way that would have seemed like science fiction on the day I started.”

It was 1957 when JCR Licklider first had a vision of human-computer symbiosis, in which computers partner with humans to do work humans aren’t good at, freeing us up to be more creative. That vision ultimately culminated with a critical mass of talent at Xerox PARC and the creation of the Alto in 1973, which was in turn followed by the Mac in 1984.

“That revolution in human-oriented computing has become so all-encompassing that I don’t even have to ask—I know for certain that every one of you does your work on a direct descendant of the Alto and has a tiny version with you right now,” Abrash says. “We live in the world Licklider made. As powerful as that model of human-computer interaction is, though, it’s nonetheless sharply limited relative to the ways humans are capable of absorbing information and acting.”

While humans receive information from the 3D environment around us through our six senses, the digital world is too often only accessible via 2D screens that are woefully undersized.

“Today’s 2D model barely scratches the surface of what we’re capable of perceiving and doing,” Abrash explains. “In contrast, AR glasses and VR headsets can drive your senses in ways that approximate reality. This has the potential to enable humans to truly be with one another regardless of distance. Taken to the limit, it could someday make it possible for humans to have any experience they’re capable of having. That by itself would change the world—but there’s more.”

With contextual AI—a never-tiring, always-available proactive assistant—AR glasses and VR headsets could help you accomplish your goals, augmenting your perception, memory, and cognition to make your life almost magically easier and more productive.

“This has never been possible before because no device that can see your life from your perspective has ever existed before,” notes Abrash. “I believe this may ultimately be the most important aspect of the AR/VR revolution. Just as the graphical user interface (GUI) is how we interact with the digital world today, contextual AI will be the human-computer interface of the future, and it will be far more transformational than the GUI because it goes directly to the heart of helping us live our lives in the ways we want to.”

And that shift is starting to happen right now—after a decade of research, the pieces are coming together. You’ll get a glimpse of the future when we bring multimodal AI to our Ray-Ban Meta smart glasses next year, and with the Ego-Exo4D foundational dataset for research on video and multimodal perception. But that’s just the beginning. The full contextual AI system of the future needs all sorts of technologies that simply don’t exist today.

“In my mind, at the top of the diagram of the contextual AI effort, there was always a box that said, ‘And then a miracle occurs,’” Abrash says. “And then in the last couple of years, the miracle occurred. Large language models (LLMs) showed up, with the potential to handle the multimodal inference needed to understand users’ goals and help them achieve them based on context and history. The key is that LLMs have the potential to do inference across visual, audio, voice, eye tracking, hand tracking, EMG, and other contextual input, your history, and a wide range of world knowledge, and then act to help you achieve your goals, looping you in as needed to guide or disambiguate. LLMs need to be taken to a different level in order to realize that potential, and FAIR is the ideal team to do that. Taken as a whole, the convergence of FAIR’s AI research with Reality Labs’ AR/VR research brings together all the elements needed to create the contextual AI interface that will fully enable Meta’s vision for the future.”

For more on the 10-year anniversary of FAIR, check out the latest episode of Boz to the Future.