The data layer for world models is becoming a business category.
Across video world models, simulation, robot planners, autonomous driving, and digital twins, the recurring bottleneck is not only compute. It is rights-cleared multimodal data connected to state, action, physics, scene context, and human judgment.
What the literature says about training data.
The concrete examples below show that world modeling is not one data problem. It is a stack of modality-specific data problems that need curation, annotation, validation, and reuse rights.
Sora, Genie, Cosmos, V-JEPA, and GAIA-1 all point to video as a core substrate, but video alone does not cleanly expose action, state, object properties, or legal training rights.
Simulators need geometry, materials, object states, articulated assets, collision meshes, and physics labels. This is where generic web data runs thin.
Robot policies need demonstrations, trajectories, instructions, failures, and environment variation. These are precisely the assets Humanbased can source and verify.
Evidence catalog: actual data used across world-model venues.
Use the filters to scan by function. Each card names publicly described training data or benchmark data, how it is used, and what Humanbased could supply or validate. Some frontier labs publish only partial data recipes, so the catalog separates documented inputs from Humanbased's inferred opportunity.
Business strategy: convert the data gap into pilots.
The strongest initial motion is not to sell "world models." It is to sell fixed-scope data products tied to model training or evaluation pain.
Humanbased should become the human-verified reality layer for physical AI: source the data, label the state, validate the physics, and prove the rights.
Offer a 4-6 week household or warehouse manipulation pilot: 1,000 interaction episodes, action/state/affordance labels, validation report, and dataset card.
Offer sim-to-real validation: compare synthetic scenes, robot rollouts, or driving scenarios against human and expert realism judgments.
Offer 3D asset QA, scene capture, material/scale/collision metadata, and human preference scoring for generated interactive worlds.
Product packaging.
A practical set of features Humanbased can bring to market without waiting for a full foundation model partnership.
Guided capture and annotation for physical interactions: household, warehouse, retail, healthcare support, human navigation, and task failures.
Reusable schemas for object state, action phase, affordance, material, collision, force/contact, and success/failure outcomes.
Human and expert comparison of generated or simulated predictions against real-world outcomes and safety expectations.
Verified household contributors, device-gated 3D scanners, robot teleoperators, warehouse workers, and domain experts.
Reusable data products such as kitchen manipulation, indoor navigation, 3D rooms, common-object affordances, and synthetic-data QA.
Consent scope, collection protocol, contributor reputation, acceptance history, reuse rights, and auditable dataset cards.
Primary references.
These are the source links used for the example catalog and claims in the page.
- Fei-Fei Li, A Functional Taxonomy of World Models
- OpenAI, Video generation models as world simulators
- NVIDIA, Cosmos world foundation models
- Google DeepMind, Genie
- Google DeepMind, Genie 2
- Wayve, GAIA-1
- Meta, V-JEPA 2
- GameNGen, Diffusion Models Are Real-Time Game Engines
- Ha and Schmidhuber, World Models
- DreamerV3, Mastering Diverse Domains through World Models
- Open X-Embodiment
- DROID dataset
- RT-1 Robotics Transformer
- OpenVLA
- Octo generalist robot policy
- BridgeData V2
- RoboNet
- AgiBot World
- BEHAVIOR-1K
- Habitat 3.0
- AI2-THOR
- ProcTHOR
- CARLA
- Waymax
- Waymo Open Dataset
- nuScenes
- nuPlan
- Objaverse
- Objaverse-XL
- OmniObject3D
- ScanNet
- Habitat-Matterport 3D Dataset
- AMASS
- Motion-X