Deep Stochastic State-Space Modeling for Occlusion-Robust Robotic ManipulationPrepare for submission.
Reliable robotic manipulation under visual occlusion requires a unified treatment of uncertainty, dynamics, and partial observability. Existing world models either rely on deterministic latent transitions, which are overconfident under missing information, or treat stochasticity heuristically without stability guarantees. This paper introduces a Deep Stochastic State-Space Model (Deep SSSM) that provides a probabilistic foundation for occlusion-robust prediction and control. The proposed framework formulates representation learning, uncertainty propagation, and stability within a single variational objective derived from the evidence lower bound. We further establish theoretical properties, showing that the learned latent dynamics achieve sublinear error growth \(\mathcal{O}(\sqrt{k}) \) with horizon and maintain calibrated uncertainty over long horizons \(k\). This formulation not only bridges latent stochastic modeling with control-theoretic stability but also lays the groundwork for future predictive and Bayesian model predictive control methods. Empirical validation on occlusion-rich manipulation tasks confirms that Deep~SSSM yields robust long-horizon prediction and improved uncertainty calibration compared to deterministic world models. |