May 102013
 

Representing a camera mathematically can be a bit tricky, especially if you want to represent many aspects of the camera.  In this post, I will begin a discussion of the linear pinhole camera model.  This is the first in a series of posts on camera representation;  at the end of this series, we will have completely walked through the derivation of a linear system that describes how a point in the 3D world projects to a point on the 2D image plane.

Homogenous Coordinates

Before we can go any further, we need to discuss homogenous coordinates, which is basically a linear algebra trick to simplify the writing of our equations.  To convert a normal coordinate system to a homogenous coordinate system, an extra dimension must be added to every point in the system.  This extra coordinate is simply a scalar multiple (s_w here), so an (originally 3D) world point would be \mathbf{X}_w= (s_w x_w, s_w y_w, s_w z_w, s_w)^T= s_w ( x_w, y_w, z_w, 1)^T in homogenous coordinates.  Similarly, an (originally 2D) image point will then be \mathbf{X}_i= (s_i x_i, s_i y_i, s_i)^T= s_i (x_i, y_i, 1)^T in homogenous coordinates.

It is important to note that in homogenous coordinates, the value of the scalar multiple (s_w and s_i above) does not matter, since it can simply be divided out of the point.  It just cannot be zero in most circumstances.  For example,

\frac{1}{s_i}\mathbf{X}_i= \frac{1}{s_i} \left[ \begin{array}{c}s_i x_i \\ s_i y_i \\ s_i \end{array}\right]= \left[ \begin{array}{c} x_i \\ y_i \\ 1 \end{array}\right].

The Simplest Pinhole Camera

Using homogenous coordinates, we will now build a mathematical description of a camera.  The mathematical description of the camera is a set of linear equations that translate a world point \mathbf{X}_w into an image point \mathbf{X}_i.  Since the homogenous world point is 4 dimensional and the homogenous image point is 3 dimensional, the overall transformation can be described by the 3 \times 4 projection matrix \mathbf{P}.  The projection from the world point to its corresponding image point can then be written as \mathbf{X}_i= \mathbf{P} \mathbf{X}_w.

Let's now dig in and look at an example camera:

Pinhole camera diagram

Pinhole camera diagram.

In the above diagram of a simple pinhole camera, we have a number of key items listed:

  • x, y, and z are the 3D world axes.
  • z is the principal axis, which is simply the axis perpendicular to the image plane.  Think of this as the direction that the camera is pointing.  The z axis is often chosen as the principal axis because most vision scientists have historically chosen it to be the principal axis.  All of the equations we derive can be re-derived to use a different axis as the principal axis if you are so inclined.
  • \mathbf{C}_w is the world coordinate of the camera center.  This is a 4 dimensional homogenous point.
  • \mathbf{C}_i is the principal point, which is the point where the principal axis meets the image plane.  This is a 3 dimensional homogenous point because it is on the 2D image plane, not in the 3D world.
  • f is the the focal length, which is just is the scalar distance from the camera center to the image plane.
  • \mathbf{X}_w is the world point being imaged.  This is a 4 dimensional homogenous point.
  • Finally, \mathbf{X}_i is the point on the image plane that the world point projects to.  This is a 3 dimensional homogenous point.

In the simplest case of the projection matrix, the camera center is at the origin of the world coordinate system, which means \mathbf{C}_w= (0, 0, 0, 1)^T.  Therefore, our very simple transformation of the world coordinate to the image coordinate (\mathbf{X}_i= \mathbf{P} \mathbf{X}_w) can be fully written out as

\mathbf{X}_i= \left[ \begin{array}{c} s_i x_i\\ s_i y_i\\ s_i \end{array}\right]= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right] \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right]

where

\mathbf{P}= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right]

and

\mathbf{X}_w= \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right].

That is where we will stop for today.  Come back for the next post in this series, which will explore moving the camera center to a different point in the world and moving the principal point to a different point in the image plane.

Edit 8/16/2013:  You can find Part 2 of this series here, and Part 3 of this series here.