Task Modes¶

This page is the reference for the four ANYmal-D task modes registered by the upstream RWM codebase. It identifies each task, names the configuration parameters that distinguish it, and links to the validation and implementation pages for procedural and architectural detail.

The four tasks are registered in upstream/robotic_world_model/source/mbrl/mbrl/tasks/manager_based/locomotion/velocity/config/anymal_d/__init__.py.

Summary¶

Task	Role	Status
`Template-Isaac-Velocity-Flat-Anymal-D-Init-v0`	Baseline policy training, no world model.	Validated
`Template-Isaac-Velocity-Flat-Anymal-D-Pretrain-v0`	Train the dynamics model from real rollouts. Imagination disabled.	Validated at reduced scale
`Template-Isaac-Velocity-Flat-Anymal-D-Finetune-v0`	Optimize the policy through imagined rollouts. Requires a pretrained dynamics checkpoint.	Structurally understood
`Template-Isaac-Velocity-Flat-Anymal-D-Visualize-v0`	Qualitative inspection of the trained world model and policy. GUI-bound.	Structurally understood

`Init-v0`¶

The baseline ANYmal-D training task. It uses the generic Isaac Lab ManagerBasedRLEnv and the inherited AnymalDFlatPPORunnerCfg, with no model-based components.

Distinguishing configuration:

entry_point          = isaaclab.envs:ManagerBasedRLEnv
runner               = standard PPO (OnPolicyRunner)
world-model groups   = not active
imagination          = not applicable

This task is the smoke test for the local environment. It is not part of the RWM training pipeline; it serves as the baseline that confirms Isaac Lab, the task registry, the policy network, and the PPO loop all work end-to-end before the world-model-specific paths are exercised.

Procedural detail and the canonical command are in Baseline Execution.

`Pretrain-v0`¶

The world-model pretraining stage. The dynamics model is trained from real-environment rollouts; the policy is updated by standard PPO on the same rollouts. Imagination is disabled in this stage.

Distinguishing configuration:

entry_point                      = isaaclab.envs:ManagerBasedRLEnv
runner                           = MBPOOnPolicyRunner
world-model groups               = system_state, system_action, system_contact, system_termination
system_dynamics architecture     = GRU, history_horizon = 32, ensemble_size = 1
imagination                      = disabled (num_imagination_envs = 0)

The task differs from Init-v0 in three ways: the model-based runner replaces the standard PPO runner, four world-model observation groups are activated, and a SystemDynamicsEnsemble is constructed and trained from a replay buffer of real transitions. Imagination is intentionally off so the dynamics model is trained without bootstrapping from its own predictions.

Procedural detail, observation group shapes, and the upstream fix required to run this stage are documented in World-Model Pretraining Check.

`Finetune-v0`¶

The imagination-based policy optimization stage. The pretrained dynamics model is loaded and used as a learned simulator. PPO updates run against trajectories generated by the world model rather than against real-environment rollouts.

Distinguishing configuration:

entry_point                          = ANYmalDManagerBasedMBRLEnv (custom)
runner                               = MBPOOnPolicyRunner
load_system_dynamics                 = True
resume                               = True
system_dynamics_warmup_iterations    = 500
imagination                          = enabled at lab scale

The task differs from Pretrain-v0 in two ways: the environment is the custom ANYmalDManagerBasedMBRLEnv, which can step imagined transitions through the dynamics model, and imagination is enabled with default settings of 8192 imagination environments and 24 steps per environment. The custom environment reconstructs reward terms from predicted state and contact signals, so the learned dynamics model effectively acts as a neural simulator inside the PPO loop.

This task has not been executed. The reasons (missing dynamics checkpoint, missing policy resume directory, lab-scale imagination settings) and the path forward are documented in Checkpoint and Finetune Status.

`Visualize-v0`¶

The qualitative inspection task. It loads a pretrained world model and a trained policy and renders behavior through the Isaac Sim GUI for visual inspection.

Distinguishing configuration:

entry_point          = ANYmalDManagerBasedVisualizeEnv (custom)
runner               = visualization runner
load_system_dynamics = True
mode                 = GUI rendering, not training

This task has not been used. Two prerequisites apply: a trained dynamics checkpoint (the same checkpoint missing for Finetune-v0), and a machine with display capability. Per the workload split, the lab workstation is headless and cannot run GUI-bound tasks, so Visualize-v0 is local-only by capability and will be exercised on the laptop once a trained world model is available.

Pipeline interpretation¶

The four tasks are not interchangeable. They form a sequence:

Init-v0 confirms the baseline RL stack works.
Pretrain-v0 produces a trained dynamics model.
Finetune-v0 consumes that dynamics model and produces a policy.
Visualize-v0 inspects the result.

A reader who only sees Init-v0 running has not seen the RWM method execute. The method begins at Pretrain-v0.

Task Modes¶

Summary¶

Init-v0¶

Pretrain-v0¶

Finetune-v0¶

Visualize-v0¶

Pipeline interpretation¶

`Init-v0`¶

`Pretrain-v0`¶

`Finetune-v0`¶

`Visualize-v0`¶