Reproduction Status¶

This page is the per-claim ledger for the project. It separates execution claims (whether the code has been run end-to-end on local or lab hardware) from mapping claims (whether the paper's content has been identified in the code). The two axes are independent and use the convention vocabulary from the landing page.

For the procedural and architectural detail behind each row, follow the cross-references.

Execution claims¶

What has been executed, on which hardware, with what outcome.

Claim	Status	Evidence	Required to advance
Local Isaac Lab and RWM stack runs headlessly.	Validated	Baseline task reaches PPO learning iterations (`manifests/baseline_state_*.txt`).	None for local validation.
Baseline ANYmal-D task `Init-v0` executes locally.	Validated	See Baseline Execution.	None.
RWM pretraining pipeline executes end-to-end at reduced scale on local hardware and writes checkpoints.	Validated at reduced scale	Reduced-scale Pretrain-v0 run produced `model_0.pt`, `model_1.pt`. See World-Model Pretraining Check.	Lab-scale pretraining for full validation.
Default RWM pretraining configuration runs on local hardware.	Not verified	Default config exceeds 8 GB VRAM during system-dynamics loss.	Lab workstation with sufficient VRAM.
RWM pretraining at the paper's full scale (\(M=32\), \(N=8\), batch 1024, 2500 iterations) on lab hardware.	Not verified	Lab access not yet exercised.	Lab workstation execution.
Trained dynamics model has useful predictive accuracy on held-out trajectories.	Not verified	Reduced-scale run was not configured for predictive performance.	Full pretrain plus held-out forecast evaluation.
Imagination-based finetuning (`Finetune-v0`) runs end-to-end.	Not verified	Upstream config requires a pretrained dynamics checkpoint that does not exist locally. See Checkpoint and Finetune Status.	Generate or obtain a valid dynamics checkpoint and a project-specific finetune config.
MBPO-PPO produces a policy whose performance matches the paper's reported `0.90 ± 0.04` velocity-tracking reward.	Not verified	No finetuning has been run.	Finetune execution and benchmark evaluation.
RWM-U + MOPO-PPO configuration (`ensemble_size > 1`, nonzero uncertainty penalty with the correct sign) runs end-to-end.	Not verified	Path not yet exercised. The hooks exist in code but have not been activated.	Activate config, validate at reduced scale, then full scale.
Zero-shot deployment of trained policy on hardware.	Not verified	No trained policy has been deployed.	Trained MBPO-PPO policy plus hardware availability.

Mapping claims¶

What has been identified in the code as a counterpart to a paper claim.

Paper claim	Status	Cross-reference
RWM world model architecture (GRU base, MLP heads).	Mapped	Implementation Analysis §4.
Dual-autoregressive training (inner + outer autoregression).	Mapped	Implementation Analysis §4 to §5.
Multi-step prediction loss (Eq. 2).	Partially mapped	Synthesis §1; discrepancy on \(L_o\) form.
MBPO-PPO algorithm (Algorithm 1).	Mapped	Implementation Analysis §7.
Imagined action selection (Eq. 3).	Mapped	Implementation Analysis §9.
Autoregressive evaluation under noise injection (paper Fig. 3b).	Mapped	`evaluate_system_dynamics(...)` rolls out 100 trajectories of length 400 with noise scales `[0.1, 0.2, 0.4, 0.5, 0.8]`, matching the noise levels in the paper figure. See Implementation Analysis §5.10.
Reward function (paper Section A.1.2).	Partially mapped	Synthesis §2; 11 reward terms active in code (10 inherited from upstream Isaac Lab plus `stand_still` added by project), with three weights overridden by the project. The paper-to-code term-by-term correspondence remains to be verified.
Policy observation versus system-state distinction.	Mapped	Local Pretrain-v0 run shows separate `policy` (48-dim) and `system_state` (45-dim) observation groups; the dynamics model predicts `system_state`, while the policy observation is reconstructed during imagination from predicted state, command, and previous action. See Implementation Analysis §4.5.
State mean predicted directly (paper does not specify) versus residual prediction (code).	Discrepancy noted	Synthesis §4.1.
State loss as Gaussian NLL (architecturally implied) versus sampled MSE (active code).	Discrepancy noted	Synthesis §4.2.
Imagination autoregressive horizon: 100 steps per iteration (paper) versus 24-step rollout (code).	Discrepancy in scale	Synthesis §4.3.
Uncertainty handling (RWM-U paper, not RWM paper) hooks present in code but switched off.	Discrepancy noted	Synthesis §4.4.

Notes¶

The execution claims marked "Not verified" are not failures; they are blocked on lab access, lab access plus checkpoint generation, or hardware deployment that is later in the project. The mapping claims marked "Discrepancy noted" are findings to surface to a supervisor as part of a faithful reproduction effort, not as defects in the codebase.

The single execution claim that does qualify as an active blocker is the dynamics checkpoint dependency for Finetune-v0, which is documented in Checkpoint and Finetune Status along with the path forward.

The RWM-U row in the execution table is currently a structural placeholder. The corresponding paper analysis and implementation analysis will be added under a parallel uncertainty-aware/ folder once that work begins.