Contextual AI & the Future of Human Connection

|
|

As he took the stage to help close out the day 2 keynote, Reality Labs Chief Scientist Michael Abrash revisited some predictions he made at Connect in 2016, reflecting on the progress we’ve made in virtual and augmented reality to date.

“The nine years that have passed since then provide fresh confirmation of Hofstadter’s Law,” he said. “‘It always takes longer than you expect, even when you take into account Hofstadter’s Law.’”

Though they’ve yet to ship, we’ve demonstrated key breakthroughs in haptics, photorealistic avatars, field of view, and more. And both Meta Quest and our Orion prototype are blending digital content with our view of the physical world. “The future I laid out nine years ago is arriving,” said Abrash, “ just later than expected.”

What wasn’t expected — outside of the imaginations of a few visionaries — is the emergence of contextual AI, which understands the physical world and our place within it.

Unlike today’s large language models (LLMs) and generative AI, which learn about the world solely from what’s represented by the internet, contextual AI can see what we see, hear what we hear, and understand our context as it unfolds in real time.

“This is truly something new under the sun,” Abrash explained, “and it will forever alter the way we interact with the digital world, partnering us with computers and with each other to massively amplify our potential in a truly personalized, human-oriented way.”

In order to understand the Second Great Wave of hHuman-Oriented Computing, which Abrash first posited would take the form of AR and VR back in 2018, it’s helpful to understand the First Great Wave, which marked a paradigm shift in the way humans interact with the digital content and led us to today’s world of interconnected information devices.

The First Great Wave of Human-Oriented Computing: A Timeline

  • 1957 // J. C. R. “Lick” Licklider encounters the experimental TX-2 at MIT Lincoln Laboratory

“The hope is that, in not too many years, human brains and computing machines will be coupled together very tightly, and that the resulting partnership will think as no human brain has ever thought and process data in a way not approached by the information-handling machines we know today.”
— J. C. R. Licklider

  • 1960s // Licklider funds a number of researchers at ARPA, including Douglas Engelbart
  • 1968 // Engelbart presents the Mother of All Demos, demonstrating the ancestor of today’s graphical user interface (GUI)

“If in your office, you, as an intellectual worker, were supplied with a computer display backed up by a computer that was alive for you all day, and was instantly responsive to every action you have—how much value could you derive from that?”
— Douglas Engelbart

  • 1970s // Researchers at Xerox PARC develop the first true personal computer, the Alto
  • 1979 // Apple engineers visit Xerox PARC and demo the Alto, which inspires the Lisa and Macintosh systems

“And that led to the Mac and Windows and everything that followed,” noted Abrash. “Today we live in a world that Lick and Engelbart made, to the extent that every one of you has a direct descendant of Lick’s vision, running a direct descendant of Engelbart’s interface, in your pocket or bag — or in your hand, if I’m not being interesting enough.”

Changing Seas

“The First Great Wave is the sea we swim in,” said Abrash. “And yet, as sweeping as that revolution has been, it’s unfinished.”

Abrash was quick to note that our experience of the world is determined by the information that flows through our senses and the results we perceive as a result of our actions. And it doesn’t matter where that information comes from or where those actions take place: Digital or virtual information and actions are just as valid as their physical counterparts.

“The revolution Lick set in motion has created a vast virtual world that we interact with constantly, but only in a very limited way, through apps running on 2D surfaces, directed by pointing, clicking, tapping, and typing,” Abrash explained. “Of course, that’s proven to be tremendously useful, but it engages only a tiny fraction of the available bandwidth of our hands and senses, so it can only deliver a small subset of what we’re capable of experiencing and doing.”

As a result, our computing devices demand constant attention, pulling us away from the moment and the people right in front of us.

The Second Great Wave of Human-Oriented Computing promises to achieve Licklider’s vision by “driving all our senses at maximum fidelity, while letting our hands move naturally in virtual spaces and actually feel the objects we’re interacting with,” said Abrash. “Virtual experiences that are completely indistinguishable from physical reality are a long way off, but we can already drive vision and audio in convincing ways. And as that technology evolves, it will increasingly enable us to work and play in far richer ways, while enabling truly personal human connection regardless of distance.”

Of course, the human experience extends beyond our relationship to the outside world — which, while important, isn’t what makes us us. Rather, it’s the internal process of forming models of the world that sets humans apart.

“Technology has vastly expanded the range and power of our interactions with the external world,” noted Abrash. “But it has barely touched our internal experience. Even though I spend every day with my phone, it has no understanding of my context, my goals, or my decision process, so it can’t help me with my internal experience. But if it could, it would revolutionize my life in many ways, ranging from reminders to hearing assistance to context-dependent message and call handling to vastly improved memory, and much, much more.“

So what might that look like in practice? Take conversation focus, which we announced yesterday. If you manually turn on conversation focus, it could help you to hear a conversation in a noisy environment. With contextual AI that knows your preferences, it could automatically turn on conversation focus when you find yourself trying to hold a conversation in a noisy environment. When another friend joined the conversation, the AI would know to amplify their voice as well — regardless of where you were looking at any given point in the conversation. And all of this would happen without you being made consciously aware of it.

“The first scenario is a truly valuable feature, and I can’t wait to have it,” acknowledged Abrash. “But thanks to AI that understands my needs and goals, the second scenario is the full and proper augmentation of my audio perception.”

Now imagine that your AI can understand the three-dimensional world around you. It can help you find your keys. It can help with calorie counting while on the go. It can proactively surface relevant notes during a meeting. You can recall experiences from your past with near-perfect clarity. And you’d never get pulled out of the moment to have to manually discern between a robocall and an important message from your mom again.

“There are literally thousands of possibilities, some large, some small, but they all add up to helping you do what you want to do with your life in a far deeper way than anything that’s gone before,” said Abrash. “This is AI that helps you focus when you need to focus, remember what you need to remember, connect more meaningfully with people, and enhance your experience in the world rather than taking you out of it.”

Thanks to the advent of large language models and wearable devices that can let an AI see what you see, we’ve opened up the potential for technology that amplifies our intentions rather than fragmenting our attention. Think of contextual AI as your partner to help you take action and process information — physical or digital.

“Eleven years ago, I would have called all this pie-in-the-sky science fiction,” said Abrash. “But the combination of Ray-Ban Metas, LLMs, and a decade of research has made contextual AI the obvious future of human-oriented computing — the heart of the Second Great Wave.”

And to tell us more, he brought out one of the minds who’ve been thinking about this longer than anyone else: Research Science VP Richard Newcombe.

The Limits & Potential of Human Networks

To start, Newcombe also took us back to 1968 — but not to the Mother of All Demos. Rather, he pointed to the Apollo 8 mission and astronaut William Anders, who captured the first full-color photograph of Earth ever taken by a human.

“This perspective of our world was incredibly novel and powerful,” Newcombe explained. “It was unlike anything people had seen before. A world most people knew to be fractured by national borders and political divisions suddenly appeared united and singular — unique and precious amongst the vastness of space. ... It helped kickstart the modern environmental movement and quite literally changed the way people understood the world we all share.”

The rise of personal computing coincided with the dawn of a new era of shared global thinking. But, Newcombe noted, there are limits — specifically Dunbar’s number, which posits that humans can only maintain 150 stable relationships.

On the other hand, if the theory of six degrees of separation holds true, any two people on Earth can be connected through just four intermediaries.

“The math is staggering — network reach grows exponentially,” Newcombe said. “Just 80 generations connects us to everyone who ever lived, every thought ever conceived. Our potential proximity to vast human knowledge and thinking power is astonishing — Dunbar’s number and the six degrees of separation represent our limits and our potential.”

The First Great Wave of Human-Oriented Computing helped collapse distance, putting our friends, family, and colleagues just a tap or click away. And the internet and search gave us near-immediate access to a vast repository of human knowledge. Yet there’s still untapped potential.

“Think about it like this,” said Newcombe. “The American mathematician Claude Shannon, famous for developing modern information theory in the 1950s, calculated there are roughly seven to 10 bits of information per English word. At normal speaking speed — 150 words per minute — that’s roughly 10 bits per second per person. Think about it: Every law, every letter, every breakthrough in human history — indeed, everything I am saying to you right now — you understand by way of a communication channel narrower than a 1990s dial-up modem.”

By contrast, Newcombe added that our conscious experience of reality, informed by our senses, operates at roughly 1 billion bits per second. To put that in perspective, he likened it to today’s WiFi receiving and processing 4K video streams. But when you try to share that experience with others, you’re back to dial-up speeds. Our networks, then, serve to aggregate human cognition.

“We’ve invented tools to extend our individual capabilities — books for external memory, mathematics for structured reasoning, computers for information processing, and the internet to defy distance,” said Newcombe. “But it’s only through our human networks that we truly transcend the bottleneck of what a single mind can read, hold, reason about, and articulate. These networks are how we cause action and change at scale.”

The Advent of Contextual AI

As we stand at the dawn of the AI era, we’re faced with a technology that can sift through, aggregate, and make predictions about vastly more information than a single mind could handle. Rather than focusing solely on AI’s ability to automate mundane and repetitive tasks, Newcombe argued that the real impact will be how the technology fundamentally changes the dynamic of human networks.

“Imagine AI enabling a fluid network of humans connecting across all of humanity, as capable as any of our best organizations today, so that individuals gain access to the power of entire companies,” Newcombe said. “Imagine such an organization becoming available to each of us — unlocking our reach to billions of minds. We’re at the start of a leap that gives each individual mind access to a federated system of input, aggregation, processing, synthesis, and output — something that, until now, only networks of humans could achieve through the filter of time.”

However, today’s LLMs are limited. Despite being trained on internet-scale amounts of data, they lack context about and understanding of the physical world and our lived experience within it.

“Contextual AI bridges the gap between our embodied reality and our symbolic cognitive world bound by communication and enables AI to understand the reality we are in, together,” noted Newcombe. “This is made possible by AI glasses that see what you see and hear what you hear. These form a new generation of wearable computers that understand what you’re doing and how you work with others to get things done.”

And because your AI glasses can see what you see and hear what you hear, they have the potential to create a personalized knowledge base, enabling your AI to tailor your interactions and access to what’s important. Over time, this could help us understand not just isolated events, but also causal patterns.

“The potential here is for AI to understand how our connected lives weave together to form our shared reality,” said Newcombe. “Once it can do that, AI can start to operate directly on our always-on context to understand, predict, and make opportunities for us to work together in the most effective forms. And this will enable AI to understand what is important to us and how we can best communicate — with AI and each other.”

A Truly Human Interface

For more than a decade, Reality Labs has been pushing forward the state of the art in virtual and augmented reality. Codec Avatars will let us defy distance, enabling social teleportation so we can feel truly present with anyone, anywhere, as easily as making a video call. We introduced the world to the potential of AI glasses with Ray-Ban Meta. And yesterday we announced Meta Ray-Ban Display, giving AI glasses a way to share visual information with the user as well as silent, ultra low-friction input with the new Meta Neural Band.

Project Aria, first introduced in 2020 and now in its second generation, began our development of contextualized AI: research glasses combining sensing and mobile computation to furnish AI with much better context,” Newcombe explained. “Our future products will begin to unlock the value made possible by these signals including understanding what we’re looking at and what we’re trying to get done. These highly personalized context signals will enable highly personalized AI and all the features that could entail: superhuman memory and recall that helps us achieve goals, empathize, learn, and grow. Over time, as AI glasses begin to work with significant aspects of our context across their physical and digital lives, superintelligence will play a significant role in everything we do — be it in our physical or digital realities, bringing an understanding of real-life context anywhere we are.”

The technologies we’re building could unlock the potential for everyone to work together to create the reality we want, grounded in shared understanding.

“As these technologies converge with true AR — immersive displays meet contextual AI — we approach something profound: an interface that doesn’t separate us from reality but enhances our experience of it and our connection to each other,” said Newcombe. “There couldn’t be anything more human than this.”