Genie 3: A new frontier for world models

What is Genie 3 ?

Google DeepMind has revealed Genie 3, its latest foundation world model that can be used to train general-purpose AI agents, a capability that the AI lab says makes for a crucial stepping stone on the path to “artificial general intelligence,” or human-like intelligence.

Genie 3’s capabilities include:

The following are recordings of real time interactions from Genie 3.

Modelling physical properties of the world

Experience natural phenomena like water and lighting, and complex environmental interactions. Simulating the natural world Generate vibrant ecosystems, from animal behaviors to intricate plant life.

Simulating the natural world

Generate vibrant ecosystems, from animal behaviors to intricate plant life.

Modelling animation and fiction

Tap into imagination, creating fantastical scenarios and expressive animated characters.

Exploring locations and historical settings

Transcend geographical and temporal boundaries to explore places and past eras.

Pushing the frontier of real-time capabilities

Achieving a high degree of controllability and real-time interactivity in Genie 3 required significant technical breakthroughs. During the auto-regressive generation of each frame, the model has to take into account the previously generated trajectory that grows with time. For example, if the user is revisiting a location after a minute, the model has to refer back to the relevant information from a minute ago. To achieve real-time interactivity, this computation must happen multiple times per second in response to new user inputs as they arrive.

Environmental consistency over a long horizon

In order for AI generated worlds to be immersive, they have to stay physically consistent over long horizons. However, generating an environment auto-regressively is generally a harder technical problem than generating an entire video, since inaccuracies tend to accumulate over time. Despite the challenge, Genie 3 environments remain largely consistent for several minutes, with visual memory extending as far back as one minute ago.