REST3D research resource · physically stable 3D scene reconstruction

REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

REST3D turns a single casual RGB image into a visually consistent and physically stable interactive 3D scene. REST3D focuses on simulation-ready digital assets, scene-tree reasoning, physics-constrained refinement, Isaac Gym simulation, and VR human-object interaction.

Authors: Xiaoxuan Ma · Jiashun Wang · Nicolás Ugrinovic · Yehonathan Litman · Kris Kitani Carnegie Mellon University arXiv:2605.30338 · 2026 Code status: coming soon
1
single RGB image

REST3D starts from one casual image, not a dense scan or multi-view capture.

3
pipeline stages

Scene-tree construction, scene initialization/canonicalization, and physics-constrained optimization.

95%+
reported stable rates

REST3D reports high stability on Replica, ScanNet++, and Custom scene sets.

VR
real interaction demo

REST3D demonstrates hand-based human-object interaction with Meta Quest Pro and Isaac Gym.

What is REST3D?

REST3D is single-image 3D reconstruction built for physical stability, not just visual plausibility.

REST3D means REconstructing physically STable 3D scenes. The central idea is brutal and simple: a 3D scene that looks correct is not enough if objects float, intersect, explode under gravity, or collapse in a simulator. REST3D uses physical scene understanding and physics-constrained optimization so the reconstructed scene can behave like a usable digital asset.

Audience

REST3D searchers want proof, demos, code status, metrics, and a fast explanation of why physical stability matters.

Researchers

They look for the REST3D paper, method diagram, baselines, datasets, metrics, ablations, limitations, BibTeX, and reproducibility notes.

3D / VR / game creators

They care about whether REST3D can convert a casual image into simulation-ready assets, stable layouts, VR demos, and interactive scenes.

Robotics and embodied AI builders

They care about gravity, support relations, collision rate, stable rate, real-to-sim, Isaac Gym, object contact, and reliable manipulation scenes.

Important disambiguation

REST3D is a CMU computer-vision research framework, not a sleep supplement, mobile game, or generic 3D tool.

Searches for "REST3D" pull up several unrelated products and projects. To help visitors and search engines, we explicitly list what REST3D is NOT:

Not a sleep supplement

Anabolix Nutrition sells a product called REST3D ($69.95 sleep aid). This site has no affiliation with Anabolix Nutrition. REST3D refers to REconstructing Physically STable 3D scenes — an arXiv computer-vision paper by CMU authors.

Not a Unity game

Unity Play hosts a game named REST3D. This is unrelated to the REST3D research paper. REST3D.org focuses on single-image 3D scene reconstruction for simulation-ready assets.

Not ryanj/rest3d (GitHub)

A different GitHub repository called rest3d by ryanj provides a 3D client/server (MIT license), whose README references www.rest3d.org. This is a separate project. The official REST3D code repository is ShirleyMaxx/REST3D for the CMU paper on physically stable 3D scene reconstruction.

Not rest3d.wordpress.com

A WordPress blog ("3D for the REST of us") occupies rest3d.wordpress.com. REST3D.org is an independent research resource for the REST3D paper specifically.

This disambiguation helps Google understand that REST3D.org and REST3D (the sleep supplement / Unity game / other GitHub repo / WordPress blog) are distinct entities with different search intents.
Abstract

REST3D abstract: Reconstructing Physically Stable 3D Scenes from a Single Image

Reconstructing physically stable 3D scenes from a single RGB image enables casual images to be converted into simulation-ready digital assets for applications such as immersive interaction and content creation. However, existing single-image reconstruction methods fall short in capturing the physical structure of a scene. As a result, they often produce geometrically plausible but physically inconsistent results, including object floating and penetration, which lead to unstable behavior in physics simulations. Image-conditioned scene generation methods improve physical plausibility but often rely on strong scene priors, yielding plausible yet inaccurate object arrangements that fail to match the input image. We propose REST3D, a single-image reconstruction framework that can REconstruct physically STable 3D scenes by integrating physical scene understanding with physics-constrained refinement. We first introduce an agentic physical scene understanding technique that constructs a scene-tree representation capturing object physical states and inter-object relationships from a gravity-support perspective, providing a structural prior for reconstruction. Leveraging this structure, we initialize the scene using image-to-3D models, followed by scene-tree-guided alignment and physics-constrained optimization to resolve physical violations while preserving visual consistency with the input image. Experiments show that our method significantly reduces physical errors and improves simulation stability on both synthetic and real-world datasets while maintaining strong reconstruction quality. We further demonstrate the reconstructed scenes in VR-based human-object interaction, showing their potential for immersive applications.

TL;DR

From a single casual image to a visually consistent and physically stable interactive 3D scene.

This is the REST3D promise in one sentence: the scene should not merely look plausible; it should settle, support objects, avoid severe interpenetration, and survive physics simulation.
The problem

Single-image 3D reconstruction often fails when gravity asks the obvious question: should this object stand, fall, or explode?

Object floating

Visually plausible reconstructions can place objects above their support surfaces. REST3D targets support consistency.

!

Object penetration

Objects may overlap in 3D. Under physics, that collision can cause explosive separation and unstable behavior.

ImgGen

Plausible but inaccurate generation

Image-conditioned scene generation can produce a physically plausible scene that does not match the input image. REST3D is designed to preserve visual consistency.

Pipeline

REST3D combines scene-tree construction, scene-tree-guided alignment, and physics-constrained optimization.

Scene-Tree Construction

REST3D infers a hierarchical scene tree that captures objects, physical states, and inter-object spatial support relationships from a gravity-support perspective.

Scene Initialization and Canonicalization

REST3D initializes object meshes using image-to-3D models, then uses the scene tree to correct global orientation and enforce coarse support constraints.

Physics-Constrained Optimization

REST3D refines object poses through simulation-based optimization to reduce floating, penetration, drift, and instability while preserving the input image layout.

Input RGBImageStage 1Scene-TreeConstructionStage 2Initialization &CanonicalizationStage 3Physics-ConstrainedOptimizationStableAgentic Physical Scene UnderstandingVLM object detection+ SAM segment + verifyImage-to-3D models+ scene-tree alignmentIsaac Gym simulation+ CEM optimization
Figure: REST3D pipeline — Stage 1 constructs a scene tree, Stage 2 initializes meshes via image-to-3D and canonicalizes, Stage 3 runs physics-constrained optimization in Isaac Gym.
Scene tree

REST3D scene-tree construction models gravity-support relationships: ground, wall, ceiling, and ground-wall.

A REST3D scene tree is not a decorative hierarchy. It is the structural prior that says which object supports which object: table on ground, plant on table, poster attached to wall, radiator supported by ground-wall. This is the hidden skeleton that lets REST3D keep visual reconstruction and physical behavior aligned.

REST3D scene treesupport relationonhangingattached togroundwallceilingground-wall
Input ImageRoom (root node)Ground Surfacesupport-type: onWall Surfacesupport-type: attached-toTableparent: ground, onRadiatorparent: ground-wall, onPosterparent: wall, attached-to
Figure: REST3D scene tree — hierarchical support relations from a gravity-support perspective. Each node has a parent surface and a support type (on, attached-to, hanging, wall, ceiling, ground-wall).
Physical scene understanding

REST3D uses agentic physical scene understanding to identify objects, segment instances, and reason about spatial support.

Open-vocabulary object list analysis

REST3D asks a vision-language model to identify distinct objects with descriptive attributes, not just coarse labels.

Agentic instance segmentation

REST3D uses a segmentation agent and verifier loop to refine prompts and masks for each object instance.

Spatial relationship reasoning

REST3D infers support parents and support types from a gravity-aware perspective.

Initialization & canonicalization

REST3D initializes the 3D scene with image-to-3D models, then canonicalizes the layout so physics has a fighting chance.

REST3D starts with raw image-to-3D output, then uses the scene tree to correct coarse orientation, enforce support, and produce a structured initial scene. Canonicalization alone is not enough; it improves stability but still needs the full REST3D physics-constrained optimization stage.

Physics optimization

REST3D physics-constrained optimization resolves physical violations while preserving visual consistency.

Local group optimization

REST3D decomposes complex scenes according to the scene tree and optimizes smaller support groups so crowded scenes can converge more reliably.

Global group optimization

REST3D then refines the whole scene to reduce collision, drift, velocity, and instability under simulated gravity.

Before Optimizationcollision / explosionunstableAfter REST3D Optimizationresolved / stablesimulation-ready
Figure: Before (left) — objects float and collide causing instability. After REST3D optimization (right) — physical violations resolved, scene is simulation-ready.
Simulation-ready assets

REST3D targets simulation-ready digital assets for immersive interaction, content creation, gaming, and embodied AI.

The high-value promise of REST3D is practical conversion: one casual image becomes a 3D scene with object meshes and world-frame layout that can be imported into physics simulation. For users searching REST3D, the phrase simulation-ready digital assets should appear early and repeatedly because that is the real difference from ordinary image-to-3D reconstruction.

Interactive simulation

REST3D Interactive 3D Physics Simulation in Isaac Gym

Explore the physics simulation of reconstructed scenes in Isaac Gym. Users can rotate by dragging, zoom by scrolling, inspect simulation, press Play, Reset, adjust Speed, and compare synchronized methods.

Controls included

▶ Play · ↻ Reset · Speed · Run Simulation · Click or press Space · Loading scene...

Methods included

Input Image · Ours · DigitalCousins · Gen3DSR · SceneGen · SAM3D.

PaintingSimpson RoomScanNet++ 1ScanNet++ 2Room 1Room 4Room 5WorldLabReplica
Interactive 3D physics simulation in Isaac Gym — click any scene to view its simulation video. Use Play, Reset, Speed controls on the official project page.
Baseline failures

REST3D highlights why baseline methods can explosively separate when gravity is applied.

Due to object interpenetration in baseline methods, applying gravity in a physics simulator can cause objects to explosively separate and become unstable. REST3D is built around the opposite expectation: reconstructed scenes should quickly settle into stable states.

Results

REST3D results show high-resolution physics simulation of reconstructed scenes in Isaac Gym.

Objects are placed sequentially for clarity and then simulated jointly. REST3D reconstructed scenes are simulation-ready and quickly settle into stable states.

Simpson RoomRoom 0Room 1Room 2ReplicaScanNet++
Figure: REST3D reconstructed scenes in Isaac Gym — objects settle stably under gravity. Click to view full video on the official project page.
VR interaction

REST3D reconstructs an immersive and physically grounded 3D scene for VR hand-based interaction.

REST3D includes an interactive VR system that reconstructs an immersive, physically grounded 3D scene from a single image, enabling users to naturally interact with stable virtual objects through hand-based interactions. The demo was recorded with Meta Quest Pro and played back at 3× speed.

In the paper, hand motions are tracked and mapped to a dexterous robotic hand in Isaac Gym, with the simulation rendered back to a VR headset.
Meta Quest Prohand trackingDexterousRobotic Handmotion mappingIsaac GymPhysics Simulation@ 30 FPSVR Renderingstablehuman-objectHand tracking → Dexterous Robotic Hand mapping → Isaac Gym simulation → VR rendering loop
Figure: REST3D VR interaction pipeline — real-time hand tracking on Meta Quest Pro maps to a dexterous robotic hand in Isaac Gym, with simulation rendered back to VR.
VR Demos: Hand-based human-object interaction recorded with Meta Quest Pro (played back at 3x speed). Click to view full videos.
SOTA comparison

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

The REST3D comparison focuses on physics simulation of reconstructed scenes in Isaac Gym. Existing methods struggle to balance reconstruction fidelity and physical stability, while REST3D produces stable, simulation-ready scenes that settle with only minor adjustments.

PaintingSimpson RoomRoom 1Room 2Room 3Room 4Room 5Room 6WorldLab 1WorldLab 2ReplicaScanNet++ 1ScanNet++ 2ScanNet++ 3
SOTA comparison on Simpson Room — click each thumbnail to view the physics simulation video.
Input ImageSingle RGBDigitalCousinsretrievalmismatchGen3DSRvisuallyunstableSceneGenplausibleinaccurateSAM3Dobject-levelonlyREST3D (Ours)input-faithfulphysically stableREST3D outperforms baselines on physical stability while preserving visual consistency with the input image
Figure: REST3D vs SOTA methods — baselines struggle with retrieval mismatch (DigitalCousins), visual instability (Gen3DSR), generation inaccuracy (SceneGen), and object-level-only reconstruction (SAM3D).
Physical metrics

REST3D metric snapshot: low collision, high stability, low drift.

DatasetMethodFailure RateCollision RateStable RatePosition DriftLinear VelocityAngular Velocity
ReplicaREST3D / Ours0.0%0.0%95.8%0.094 m0.152 m/s0.557 rad/s
ScanNet++REST3D / Ours0.0%5.9%93.6%0.080 m0.159 m/s1.039 rad/s
CustomREST3D / Ours0.0%1.2%95.5%0.017 m0.140 m/s0.468 rad/s
Physical Stability Metrics by Dataset100%75%50%25%0%95.8%Replica93.6%ScanNet++95.5%Custom0.0%Failure Rate5.9%CollisionStable Rate (bars 1-3) averaged 95.0% across all datasets · Failure Rate 0% · Collision Rate ≤ 5.9%
Figure: Physical stability metrics — REST3D achieves 95%+ stable rates across Replica, ScanNet++, and Custom datasets with 0% failure rate and negligible collision rates.
\n
\n\n
Datasets

REST3D is evaluated on synthetic Replica, real-world ScanNet++, and a challenging Custom set.

Replica dataset scene

Replica

A synthetic dataset with ground-truth scene meshes, used for physical metrics and geometric metrics.

ScanNet++ dataset scene

ScanNet++

A real-world dataset covering scenes such as meeting rooms, classrooms, and offices.

Custom casual image dataset scene

Custom casual images

A harder set including bedrooms, living rooms, and cartoon-style scenes to test REST3D robustness.

Evaluation

REST3D reports physical plausibility and geometric reconstruction quality.

Physical metrics

Failure rate, collision rate, stability rate, position drift, peak linear velocity, and peak angular velocity.

Geometric metrics

Chamfer Distance, [email protected], and B-IoU are used when ground-truth meshes exist.

Alignment

Replica and ScanNet++ reconstructions are aligned to ground truth with ICP before geometric evaluation.

vs DigitalCousins

REST3D differs from DigitalCousins by emphasizing input-faithful reconstruction plus physics stability.

DigitalCousins-style approaches can improve physical plausibility by retrieving and assembling 3D assets, but retrieval can be constrained by the asset database and may yield mismatched objects. REST3D instead uses image-to-3D priors and physics-constrained refinement to preserve visual consistency while reducing physical errors.

vs Gen3DSR

REST3D targets physical stability beyond divide-and-conquer scene reconstruction.

Gen3DSR is a strong single-view 3D scene reconstruction baseline. REST3D compares to Gen3DSR and focuses on the failure mode that matters in simulation: a scene can be reconstructed but still physically unstable under gravity.

vs SceneGen

REST3D prioritizes physical consistency with the observed image, while scene generation can trade accuracy for plausibility.

SceneGen-style methods synthesize multiple 3D assets and positions from a single scene image. REST3D argues that generation priors can be physically plausible yet inaccurate relative to the input. REST3D is framed as reconstruction: match the image and obey physics.

vs SAM3D

REST3D pushes beyond object-level reconstruction toward scene-level physical validity.

SAM3D can recover high-fidelity individual objects, but scene-level reconstruction also needs global orientation, wall attachment, support, collision handling, and stable contacts. REST3D explicitly focuses on those scene-level physical constraints.

Use cases

REST3D use cases cluster around interactive 3D, VR, game content, robotics, and real-to-sim.

Content creation and gaming

REST3D can become a reference point for converting casual images into stable, editable, simulation-ready scenes for immersive production.

Tools

REST3D.org should make every high-intent tool one click away.

Paper tools

arXiv, abstract, citation, BibTeX, author links, project page, publication status, and release notes.

Demo tools

Interactive 3D, Play, Reset, Speed, synchronized method comparison, high-resolution videos, and VR demos.

Reproducibility tools

GitHub repository, code status, datasets, baselines, metrics, implementation details, limitations, and future work.

Keyword map

REST3D keyword clusters for headings, internal anchors, meta tags, and long-tail search coverage.

Core keyword

REST3D, REST3D.org, REST3D paper, REST3D arXiv, REST3D code, REST3D GitHub, REST3D demo, REST3D citation.

Long-tail keywords

REST3D reconstructing physically stable 3D scenes from a single image; REST3D single RGB image to simulation-ready 3D assets; REST3D physics-constrained optimization; REST3D scene-tree construction.

Surrounding keywords

single image 3D reconstruction, physically plausible 3D scene, Isaac Gym 3D simulation, VR human-object interaction, DigitalCousins, Gen3DSR, SceneGen, SAM3D, object penetration, object floating.

Visual system

REST3D.org uses a Material Design 3 token-driven palette with M3 baseline colors, WCAG AAA contrast, and surface container hierarchy for spatial depth.

The REST3D audience expects a technical research interface, not a lifestyle landing page. This design uses Google Material Design 3 tokens (--md-sys-color-*) for all colors, M3 surface container hierarchy for depth, primary for interactive elements, tertiary for AI cues, and error/success/warning for semantic status. The page includes light/dark theme toggle, forced-colors support, and high-contrast text conforming to WCAG AAA standards for vision accessibility.

Primary

#D0BCFF (dark) / #6750A4 (light) for REST3D interactive highlights, links, and CTAs.

Surface hierarchy

M3 container-lowest, low, container, high, highest — luminance stepping for depth.

Tertiary

#EFB8C8 (dark) / #7D5260 (light) for AI and generation accents.

Semantic status

Success #81C784, Warning #FFD54F, Error #F2B8B5 — WCAG AAA compliant.

Limitations

REST3D is strong, but not magic: VLM robustness and deformable objects remain future-work territory.

REST3D relies on the robustness of vision-language models for physical scene understanding and may fail in challenging cases. The current REST3D paper focuses on rigid objects and does not explicitly model deformable or non-rigid objects, leaving those cases for future work.

Code & reproducibility

REST3D code repository exists, but the public README currently says code coming soon.

Do not oversell the implementation. Link to GitHub, invite users to star/watch the repository, and clearly say that the code release should be checked there.
Citation

Cite REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image

@article{ma2026rest3d,
  title     = {REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image},
  author    = {Ma, Xiaoxuan and Wang, Jiashun and Ugrinovic, Nicol'{a}s and Litman, Yehonathan and Kitani, Kris},
  booktitle = {arXiv preprint arXiv:2605.30338},
  year      = {2026}
}
Acknowledgement

REST3D acknowledgement

The authors would like to thank Yuxuan Kuang, Yufei Wang, and Maxwell Jones for their insightful discussions.

FAQ

REST3D FAQ for searchers, researchers, creators, and builders.

What is REST3D?

REST3D is a single-image reconstruction framework that reconstructs physically stable 3D scenes by integrating physical scene understanding with physics-constrained refinement.

What does REST3D stand for?

REST3D expands as REconstructing physically STable 3D scenes.

What is the main REST3D difference from ordinary image-to-3D?

REST3D focuses on scene-level physical plausibility: support relations, collision reduction, stability under gravity, and simulation-ready behavior.

Is REST3D code available?

The REST3D GitHub repository exists, but the current public README says Code coming soon. This site links to the repository without claiming a released implementation.

What are the REST3D baselines?

REST3D compares against DigitalCousins, Gen3DSR, SceneGen, and SAM3D.

What are the REST3D datasets?

REST3D evaluates on Replica, ScanNet++, and a Custom set with casual images including bedrooms, living rooms, and cartoon-style scenes.

What are the REST3D applications?

REST3D is relevant to immersive interaction, content creation, gaming, simulation-ready assets, robotics, embodied AI, real-to-sim, and VR human-object interaction.

Glossary

REST3D glossary: terms users search after they understand the headline.

Scene tree

A support-relation structure that represents which objects are on, hanging from, or attached to other objects or surfaces.

Physics-constrained optimization

Simulation-based refinement that moves object poses toward stable, low-collision configurations while preserving the input layout.

Stable rate

A physical plausibility metric indicating whether reconstructed scenes settle into stable states under simulation.

Sources

REST3D primary links