Representing a camera mathematically can be a bit tricky, especially if you want to represent many aspects of the camera. In this post, I will begin a discussion of the linear pinhole camera model. This is the first in a series of posts on camera representation; at the end of this series, we will have completely walked through the derivation of a linear system that describes how a point in the 3D world projects to a point on the 2D image plane.

## Homogenous Coordinates

Before we can go any further, we need to discuss *homogenous coordinates*, which is basically a linear algebra trick to simplify the writing of our equations. To convert a normal coordinate system to a homogenous coordinate system, an extra dimension must be added to every point in the system. This extra coordinate is simply a scalar multiple ( here), so an (originally 3D) world point would be in homogenous coordinates. Similarly, an (originally 2D) image point will then be in homogenous coordinates.

It is important to note that in homogenous coordinates, the value of the scalar multiple ( and above) does not matter, since it can simply be divided out of the point. It just cannot be zero in most circumstances. For example,

.

## The Simplest Pinhole Camera

Using homogenous coordinates, we will now build a mathematical description of a camera. The mathematical description of the camera is a set of linear equations that translate a world point into an image point . Since the homogenous world point is 4 dimensional and the homogenous image point is 3 dimensional, the overall transformation can be described by the *projection matrix* . The projection from the world point to its corresponding image point can then be written as .

Let's now dig in and look at an example camera:

In the above diagram of a simple pinhole camera, we have a number of key items listed:

*x*,*y*, and*z*are the 3D world axes*.**z*is the*principal axis,*which is simply the axis perpendicular to the image plane. Think of this as the direction that the camera is pointing. The*z*axis is often chosen as the principal axis because most vision scientists have historically chosen it to be the principal axis. All of the equations we derive can be re-derived to use a different axis as the principal axis if you are so inclined.- is the world coordinate of the camera center. This is a 4 dimensional homogenous point.
- is the
*principal point*, which is the point where the principal axis meets the image plane. This is a 3 dimensional homogenous point because it is on the 2D image plane, not in the 3D world. *f*is the the focal length, which is just is the scalar distance from the camera center to the image plane.- is the world point being imaged. This is a 4 dimensional homogenous point.
- Finally, is the point on the image plane that the world point projects to. This is a 3 dimensional homogenous point.

In the simplest case of the projection matrix, the camera center is at the origin of the world coordinate system, which means . Therefore, our very simple transformation of the world coordinate to the image coordinate () can be fully written out as

where

and

.

That is where we will stop for today. Come back for the next post in this series, which will explore moving the camera center to a different point in the world and moving the principal point to a different point in the image plane.

Edit 8/16/2013: You can find Part 2 of this series here, and Part 3 of this series here.

[…] Camera Representation Part 1: Homogenous Coordinate Systems and the Simplest Camera Imaginable […]