Computer Vision - Lecture 2.2 (Image Formation: Geometric Image Formation)

Name: Computer Vision - Lecture 2.2 (Image Formation: Geometric Image Formation)
Uploaded: 2021-04-15T18:43:28.000Z
Duration: 1 h 8 min 46 s
Description: Lecture: Computer Vision (Prof. Andreas Geiger, University of Tübingen) Course Website with Slides, Lecture Notes, Problems and Solutions: https://uni-tuebingen.de/fakultaeten/mathematisch-naturwissenschaftliche-fakultaet/fachbereiche/informatik/lehrstuehle/autonomous-vision/lectures/computer-vision/

Geometric Image Formation and Camera Models

Introduction to Geometric Image Formation

The unit discusses the geometric image formation process and basic camera models, focusing on how 3D points relate to 2D image planes.

The principle of projection is illustrated using the human eye, where light passes through a lens and aperture to reach photoreceptors in the retina.

Historical Context: Camera Obscura

The camera obscura is introduced as an early example of this projection principle, where light enters through a small hole in a dark room, creating an upside-down image on the wall.

Due to the small aperture, very little light enters, requiring adaptation for visibility inside such rooms; this technique has been utilized by artists historically.

Basic Pinhole Camera Model

A simple pinhole camera is described as a box with a small hole that allows light to pass through in straight lines, projecting an inverted image onto the sensor behind it.

Light rays travel linearly from points in space through the pinhole (focal point), resulting in an upside-down projection on the image plane.

Mathematical vs. Physical Models

In mathematical models used for cameras, the image plane is typically placed in front of the focal point rather than behind it.

Both physical and mathematical models yield equivalent images; however, mathematical models avoid upside-down projections for easier interpretation.

Projection Models Overview

Two primary projection models are discussed: autographic and perspective projection. Autographic assumes parallel light rays while perspective involves converging rays at a focal point.

Perspective projection changes object size based on distance from the camera; autographic maintains consistent object size regardless of distance.

Applications of Projection Models

Most modern cameras utilize perspective projection but can approximate autographic when using telephoto lenses which minimize perspective effects.

Telecentric lenses exemplify autographic projection by ensuring real-world distances correspond directly with those measured post-projection.

Understanding Projections in 3D Graphics

Perspective Projection and Vanishing Points

The cube is projected using a perspective projection, where the vanishing point is nearby rather than at infinity. This causes parallel edges of the 3D cube to intersect at this vanishing point.

Increasing the focal length moves the vanishing point outside the image domain, transitioning towards a weak perspective setting.

Further increasing focal length and moving the camera infinitely away leads to an autographic projection, where parallel lines in 3D remain parallel in the image.

Mathematical Description of Projections

The mathematical model for autographic projection is introduced, focusing on x and c coordinates while omitting y for clarity.

The camera center serves as the origin for defining 3D coordinates; both image and camera coordinate systems share axes (x and y).

Light Rays and Coordinate Systems

Light rays travel parallel to the principal line; thus, x and y coordinates remain unchanged during projection.

In autographic projections, only x and y are retained from 3D points while dropping c coordinates.

Linear Algebra Representation

Autographic projection can be expressed using linear algebra by dropping the c component of a 3D point with a specific matrix transformation.

An equivalent representation in homogeneous coordinates involves augmented vectors; after projection, distance from the image cannot be recovered.

Practical Application: Scaled Autography

In practice, images are measured in pixels rather than physical units. Thus, scaling factors are applied to convert metric points into pixel measurements.

Camera Projection and Pinhole Model

Understanding the Camera Projection Process

Light rays converge at a focal point, specifically the camera center, illustrating how a 3D point in camera coordinates projects to pixel coordinates on an image plane.

All light rays must pass through the camera center, establishing a relationship between pixel coordinates (xs) and 3D points (xc), with the principal axis being orthogonal to the image plane.

The principal axis is defined as perpendicular to the image plane, which helps clarify spatial relationships in projection.

The mathematical projection of a 3D point involves known x-coordinates in both camera and screen coordinates, alongside the focal length (f), which is crucial for determining object size on the image plane.

Changing focal length alters object size on the image plane; this relationship can be visualized through equal triangles formed by corresponding points in 3D space and their projections.

Mathematical Formulation of Projection

By applying principles of similar triangles, we derive that xs = xc / cc * f, leading to the pinhole projection formula applicable for both x and y coordinates.

This projection can also be expressed using matrix multiplication in homogeneous coordinates, simplifying calculations while maintaining accuracy across dimensions.

In perspective projection, dividing by c component allows mapping from 3D points to an image plane effectively; this process is linear when using homogeneous coordinates.

A camera or projection matrix facilitates this transformation into homogeneous form. It consists of focal lengths along its diagonal with zeros filling other positions.

Normalizing this vector after multiplication yields pixel values for x and y based on their respective ratios involving focal length and depth components.

Practical Considerations in Image Coordinates

The conversion from homogeneous to inhomogeneous coordinates occurs post-projection by dividing by the third element of the vector; this step is essential for accurate representation on screens.

Focal length (f), typically measured in pixels, ensures compatibility between metric 3D points and pixel-based screen representations during projections.

When defining f in pixels rather than meters, it simplifies conversions between different measurement systems while ensuring consistent output formats for images captured by cameras.

Enhancements to Basic Projection Models

An important enhancement includes introducing a principal point within coordinate systems; it addresses issues related to negative pixel values that arise when centered around axes not aligned with practical storage needs.

Adjusting coordinate systems so that they originate from convenient locations like top-left corners enhances usability when storing images digitally.

Camera Projection and Intrinsic Parameters

Understanding Image Coordinates and Principal Points

The image coordinate system is defined by adding the principal point coordinates (c_x, c_y) to pixel coordinates, ensuring only positive values are considered.

Perspective Projection Model

The complete perspective projection model incorporates focal length adjustments in both x and y directions, along with translations by the principal point.

A skew factor is introduced to account for sensors not mounted perpendicular to the optical axis due to manufacturing inaccuracies.

Intrinsic Matrix and Calibration

In practice, focal lengths (f_x = f_y) are often set equal, while skew (q) is typically negligible; however, estimating the principal point remains crucial.

The intrinsic matrix (K), a 3x3 submatrix of the projection matrix, contains all intrinsic parameters like focal length and pixel ratio.

Extrinsic Parameters and Transformations

Extrinsic parameters define camera pose relative to a world coordinate system; transformations can be performed using homogeneous coordinates.

By combining extrinsic and intrinsic matrices into one transformation matrix, efficiency in calculations can be improved.

Projection Matrix Computation

The screen coordinate is derived from multiplying camera coordinates with intrinsics; this allows mapping from world coordinates directly onto the screen.

A full rank 4x4 projection matrix can also be utilized for more complex transformations involving depth normalization.

Depth Normalization and Inverse Mapping

After projecting a 3D point through a 4D homogeneous vector, normalization occurs concerning the third coordinate to yield accurate image points.

Knowing inverse depth enables inversion of projections back to 3D space; depth information is essential for accurate reconstruction.

Lens Distortion Considerations

Understanding Camera Distortion

Importance of Light Collection in Lenses

The sensor plane receives minimal light, making it crucial to have a lens that effectively collects light.

Lens systems can introduce geometric distortion, violating the linear projection model where straight lines appear curved in images.

Types of Distortion

Two primary types of distortions are identified: radial distortion and tangential distortion.

These distortions can be modeled for most camera and lens configurations, allowing for correction.

Modeling Distortion

A common formula is used to normalize camera coordinates before applying polynomials that model both radial and tangential distortions.

The transformation from original points to distorted points is non-linear, resulting in a new point x' .

Correcting Distortion

The process of undistorting images is possible because the transformations are typically monotonic, allowing pre-computation of undistorted images.

After undistorting, the simple pinhole model can be applied directly to these corrected images.

Complex Lens Models

More complex lenses, such as wide-angle lenses, require advanced distortion models beyond simple polynomial equations.