Reproduction Status¶
This page is the per-claim ledger for the project. It separates execution claims (whether the code has been run end-to-end on local or lab hardware) from mapping claims (whether the paper's content has been identified in the code). The two axes are independent and use the convention vocabulary from the landing page.
For the procedural and architectural detail behind each row, follow the cross-references.
Execution claims¶
What has been executed, on which hardware, with what outcome.
| Claim | Status | Evidence | Required to advance |
|---|---|---|---|
| Local Isaac Lab and RWM stack runs headlessly. | Validated | Baseline task reaches PPO learning iterations (manifests/baseline_state_*.txt). |
None for local validation. |
Baseline ANYmal-D task Init-v0 executes locally. |
Validated | See Baseline Execution. | None. |
| RWM pretraining pipeline executes end-to-end at reduced scale on local hardware and writes checkpoints. | Validated at reduced scale | Reduced-scale Pretrain-v0 run produced model_0.pt, model_1.pt. See World-Model Pretraining Check. |
Lab-scale pretraining for full validation. |
| Default RWM pretraining configuration runs on local hardware. | Not verified | Default config exceeds 8 GB VRAM during system-dynamics loss. | Lab workstation with sufficient VRAM. |
| RWM pretraining at the paper's full scale (\(M=32\), \(N=8\), batch 1024, 2500 iterations) on lab hardware. | Not verified | Lab access not yet exercised. | Lab workstation execution. |
| Trained dynamics model has useful predictive accuracy on held-out trajectories. | Not verified | Reduced-scale run was not configured for predictive performance. | Full pretrain plus held-out forecast evaluation. |
Imagination-based finetuning (Finetune-v0) runs end-to-end. |
Not verified | Upstream config requires a pretrained dynamics checkpoint that does not exist locally. See Checkpoint and Finetune Status. | Generate or obtain a valid dynamics checkpoint and a project-specific finetune config. |
MBPO-PPO produces a policy whose performance matches the paper's reported 0.90 ± 0.04 velocity-tracking reward. |
Not verified | No finetuning has been run. | Finetune execution and benchmark evaluation. |
RWM-U + MOPO-PPO configuration (ensemble_size > 1, nonzero uncertainty penalty with the correct sign) runs end-to-end. |
Not verified | Path not yet exercised. The hooks exist in code but have not been activated. | Activate config, validate at reduced scale, then full scale. |
| Zero-shot deployment of trained policy on hardware. | Not verified | No trained policy has been deployed. | Trained MBPO-PPO policy plus hardware availability. |
Mapping claims¶
What has been identified in the code as a counterpart to a paper claim.
| Paper claim | Status | Cross-reference |
|---|---|---|
| RWM world model architecture (GRU base, MLP heads). | Mapped | Implementation Analysis §4. |
| Dual-autoregressive training (inner + outer autoregression). | Mapped | Implementation Analysis §4 to §5. |
| Multi-step prediction loss (Eq. 2). | Partially mapped | Synthesis §1; discrepancy on \(L_o\) form. |
| MBPO-PPO algorithm (Algorithm 1). | Mapped | Implementation Analysis §7. |
| Imagined action selection (Eq. 3). | Mapped | Implementation Analysis §9. |
| Autoregressive evaluation under noise injection (paper Fig. 3b). | Mapped | evaluate_system_dynamics(...) rolls out 100 trajectories of length 400 with noise scales [0.1, 0.2, 0.4, 0.5, 0.8], matching the noise levels in the paper figure. See Implementation Analysis §5.10. |
| Reward function (paper Section A.1.2). | Partially mapped | Synthesis §2; 11 reward terms active in code (10 inherited from upstream Isaac Lab plus stand_still added by project), with three weights overridden by the project. The paper-to-code term-by-term correspondence remains to be verified. |
| Policy observation versus system-state distinction. | Mapped | Local Pretrain-v0 run shows separate policy (48-dim) and system_state (45-dim) observation groups; the dynamics model predicts system_state, while the policy observation is reconstructed during imagination from predicted state, command, and previous action. See Implementation Analysis §4.5. |
| State mean predicted directly (paper does not specify) versus residual prediction (code). | Discrepancy noted | Synthesis §4.1. |
| State loss as Gaussian NLL (architecturally implied) versus sampled MSE (active code). | Discrepancy noted | Synthesis §4.2. |
| Imagination autoregressive horizon: 100 steps per iteration (paper) versus 24-step rollout (code). | Discrepancy in scale | Synthesis §4.3. |
| Uncertainty handling (RWM-U paper, not RWM paper) hooks present in code but switched off. | Discrepancy noted | Synthesis §4.4. |
Notes¶
The execution claims marked "Not verified" are not failures; they are blocked on lab access, lab access plus checkpoint generation, or hardware deployment that is later in the project. The mapping claims marked "Discrepancy noted" are findings to surface to a supervisor as part of a faithful reproduction effort, not as defects in the codebase.
The single execution claim that does qualify as an active blocker is the dynamics checkpoint dependency for Finetune-v0, which is documented in Checkpoint and Finetune Status along with the path forward.
The RWM-U row in the execution table is currently a structural placeholder. The corresponding paper analysis and implementation analysis will be added under a parallel uncertainty-aware/ folder once that work begins.