Capturing high resolution stereoscopic panorama images with a single camera

Written by Paul Bourke
November 2024

Introduction

The photographic capture of so called "Omnidirectional Stereoscopic Panoramas" (ODSP) has been proposed by various authors at least 25 years ago. A working camera rig was perfected by Seitz with their dual head Roundshot film camera. The author has subsequently built various dual digital camera versions. These rigs all share two features, that is, they comprise of two digital cameras along with a motor for smooth rotation.

Seitz dual Roundshot

GH5/6 rig

Ximea camera housing

Over 20+ years ago Peleg and Ben-Ezra proposed solutions based upon a single camera. It seems counter-intuitive that one could generate a left and right image using just one camera. To understand how, consider the standard ODSP dual camera geometry below. Each camera rotates about a central axis. Many photographs are taken, each at a small angular increment, or more commonly video is recorded. Note that this is the same geometry that digitally generated ODSP pairs are created of 3D scenes using computer graphics rendering methods.

From each frame of the video, a narrow slit is extracted and all these slits are stacked next to each other to form a left and right eye panorama. Normally a slight blending would be applied between each slit. A key factor in the quality of the final image is the width of the slits, if it's too wide then dog-leg effects are visible along straight edges in the scene, the effect is worse for close objects. Minimising the width of the slit is controlled by the relative frame rate and rotation speed. Increasing the frame rate reduces the slit width, increasing the rotation speed (faster rotations) also increases the slit width. Narrow slits are in general desirable as is a fast rotation time in order to reduce negative effects due to moving objects. High frame rates and fast rotations require short exposure times, in the later case it is in order to reduce image degradation due to motion blur.

Not illustrated below is that in practice more than 360 degrees are captured and the two resulting images are wrapped/cropped/aligned in post to form the desired zero parallax distance.

Before introducing the single camera geometry, it should be noted that the two cameras don't actually need to be opposite each other and parallel. If they are arranged "toe-in" as shown below, the two panoramas will be identical to the parallel case except they will be horizontally shifted with respect to each other by an angle proprotional to the toe-in angle. Of course that shift can be readily corrected digitally during the wrap/crop/alignment stage.

Single camera

For the single camera arrangement, the nodal point of the camera lens is offset from the rotation axis and the camera is pointing perpendicular to the rotation circle, rather than tangential as per the dual camera arrangement. Instead of extracting a central slit from each video frame, a slit is extracted from a position to the left of the image center and another slit is extracted the same distance to the right of the image center. These slits are arranged adjacent to each other as before, the slits from the left side forms the right eye panorama and the slit from the right side forms the left eye panorama. This is illustrated below. The rays through the slit on the left appear (subject to a small scale difference) to be originating from a virtual camera representing the right eye. The rays through the slit on the right appear to be originating from a virtual camera representing the left eye.

There are some advantages to the dual camera approach, but achieving a human eye separation (6.5cm say) is difficult due to the size of the bodies for good quality cameras. One of the elegant aspects of this approach is that the effective eye separation can be chosen in post production simply by how far to the left and right of the image center the slits are chosen.

The test rig is shown below, a Canon R5 mounted in portrait mode on a Nodal Ninja Mecha motorised mount. The camera is operated in 8K mode so the final images extracted from the movie are 8192 pixels high and 4320 pixels wide. The nodal (zero parallax) point of the lens is approximately 12cm from the rotation axis, extracting slits 540 and 3780 results in approximately a 6cm interocular. The choice of lens determines the vertical FOV of the resulting panoramas, here a 15mm Laowa lens and Sigma 28mm lens were used giving a vertical FOV of 100 degrees and 65 degrees respectively.

An example is given below after the pairs have been wrapped, cropped and aligned for the intended cylindrical display (reduced resolution).

Discussion

There are merits and disadvanatges of this approach versus a dual camera rig.

Achieving the same vertical resolution using a dual camera rig with other factors equal is problematic due to the (consumer) camera units and/or lenses being wider than 6.5cm. This single camera approach can readily achieve human eye separation.
There are advanatges in only having one camera involved: cost, reliability, simplicity. In addition there is no colour adjustment arising due to differences between the two sensors, actual or accidental.
Managing a finite slit width is the same irrespective of the approach. In the test examples the slit width was typically 20 pixels, although that could be reduced if slower rotation times were used. But slower rotation times introduce greater likelihoods of movement in the scene. The maximum frame rate of the R5 in 8K mode is only 30fps, the slit width could be halved for the same rotation speed by using more recent cameras with frame rates of 60fps or even 120fps.
A disadvanatge of the single camera setup is that less scene movement is tolerated compared to the dual camera. This arises because there is a greater angular separation (and hence time separation) between when the slit from one camera passes a scene object and when the other slit pass the same scene object.