# Camera Representation Part 1: Homogenous Coordinate Systems and the Simplest Camera Imaginable

Representing a camera mathematically can be a bit tricky, especially if you want to represent many aspects of the camera.  In this post, I will begin a discussion of the linear pinhole camera model.  This is the first in a series of posts on camera representation;  at the end of this series, we will have completely walked through the derivation of a linear system that describes how a point in the 3D world projects to a point on the 2D image plane.

## Homogenous Coordinates

Before we can go any further, we need to discuss homogenous coordinates, which is basically a linear algebra trick to simplify the writing of our equations.  To convert a normal coordinate system to a homogenous coordinate system, an extra dimension must be added to every point in the system.  This extra coordinate is simply a scalar multiple ($s_w$ here), so an (originally 3D) world point would be $\mathbf{X}_w= (s_w x_w, s_w y_w, s_w z_w, s_w)^T= s_w ( x_w, y_w, z_w, 1)^T$ in homogenous coordinates.  Similarly, an (originally 2D) image point will then be $\mathbf{X}_i= (s_i x_i, s_i y_i, s_i)^T= s_i (x_i, y_i, 1)^T$ in homogenous coordinates.

It is important to note that in homogenous coordinates, the value of the scalar multiple ($s_w$ and $s_i$ above) does not matter, since it can simply be divided out of the point.  It just cannot be zero in most circumstances.  For example,

$$\frac{1}{s_i}\mathbf{X}_i= \frac{1}{s_i} \left[ \begin{array}{c}s_i x_i \\ s_i y_i \\ s_i \end{array}\right]= \left[ \begin{array}{c} x_i \\ y_i \\ 1 \end{array}\right].$$

## The Simplest Pinhole Camera

Using homogenous coordinates, we will now build a mathematical description of a camera.  The mathematical description of the camera is a set of linear equations that translate a world point $\mathbf{X}_w$ into an image point $\mathbf{X}_i$.  Since the homogenous world point is 4 dimensional and the homogenous image point is 3 dimensional, the overall transformation can be described by the $3 \times 4$ projection matrix $\mathbf{P}$.  The projection from the world point to its corresponding image point can then be written as $\mathbf{X}_i= \mathbf{P} \mathbf{X}_w$.

Let’s now dig in and look at an example camera:

Pinhole camera diagram.

In the above diagram of a simple pinhole camera, we have a number of key items listed:

• x, y, and z are the 3D world axes.
• z is the principal axis, which is simply the axis perpendicular to the image plane.  Think of this as the direction that the camera is pointing.  The z axis is often chosen as the principal axis because most vision scientists have historically chosen it to be the principal axis.  All of the equations we derive can be re-derived to use a different axis as the principal axis if you are so inclined.
• $\mathbf{C}_w$ is the world coordinate of the camera center.  This is a 4 dimensional homogenous point.
• $\mathbf{C}_i$ is the principal point, which is the point where the principal axis meets the image plane.  This is a 3 dimensional homogenous point because it is on the 2D image plane, not in the 3D world.
• f is the the focal length, which is just is the scalar distance from the camera center to the image plane.
• $\mathbf{X}_w$ is the world point being imaged.  This is a 4 dimensional homogenous point.
• Finally, $\mathbf{X}_i$ is the point on the image plane that the world point projects to.  This is a 3 dimensional homogenous point.

In the simplest case of the projection matrix, the camera center is at the origin of the world coordinate system, which means $\mathbf{C}_w= (0, 0, 0, 1)^T$.  Therefore, our very simple transformation of the world coordinate to the image coordinate ($\mathbf{X}_i= \mathbf{P} \mathbf{X}_w$) can be fully written out as

$$\mathbf{X}_i= \left[ \begin{array}{c} s_i x_i\\ s_i y_i\\ s_i \end{array}\right]= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right] \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right]$$

where

$$\mathbf{P}= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right]$$

and

$$\mathbf{X}_w= \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right].$$

That is where we will stop for today.  Come back for the next post in this series, which will explore moving the camera center to a different point in the world and moving the principal point to a different point in the image plane.

Edit 8/16/2013:  You can find Part 2 of this series here, and Part 3 of this series here.

### One Comment

1. Pingback: Camera Representation Part 3: Final Pieces » Nate Bird