PlaneFormers: From Sparse View Planes to 3D Reconstruction

Samir Agarwala
Linyi Jin
Chris Rockwell
David F. Fouhey

University of Michigan

ECCV 2022


Given a sparse set of images, our method detects planes and cameras, and produces plane correspondences and refined cameras using a Plane Transformer from which it can reconstruct the scene in 3D.

We present an approach for the planar surface reconstruction of a scene from images with limited overlap. This reconstruction task is challenging since it requires jointly reasoning about single image 3D reconstruction, correspondence between images, and the relative camera pose between images. Past work has proposed optimization-based approaches. We introduce a simpler approach, the PlaneFormer, that uses a transformer applied to 3D-aware plane tokens to perform 3D reasoning. Our experiments show that our approach is substantially more effective than prior work, and that several 3D-specific design decisions are crucial for its success.


Interactive Results

View A
View B
Ground Truth


This work was supported by the DARPA Machine Common Sense Program. We would like to thank Richard Higgins and members of the Fouhey lab for helpful discussions and feedback.

Our project website was adapted from Sparse Planes and was originally made by some colorful folks. The interactive examples are powered by model-viewer.