World models predict what happens next in an environment. They capture the underlying physics, rules, and constraints by observing sequences of states, actions, and rewards.
Core Components#
- Vision/Encoder (V): Maps high-dimensional observations (images, proprioception, text) into compact latent states
- Memory/Dynamics (M): Predicts future latent states and task variables (e.g., reward, termination) given current latent state and action
- Controller/Policy (C): Selects actions using the world model to evaluate and plan
Key Advantages#
- Sample efficiency: Learn through imagination and simulated rollouts
- Planning: Evaluate multiple futures before acting
- Transfer: Reuse learned dynamics for new tasks in the same environment family
- Safety: Stress-test risky scenarios in simulation first