Representing a camera mathematically can be a bit tricky, especially if you want to represent many aspects of the camera. In this post, I will begin a discussion of the linear pinhole camera model. This is the first in a series of posts on camera representation; at the end of this series, we will have completely walked through the derivation of a linear system that describes how a point in the 3D world projects to a point on the 2D image plane.

## Homogenous Coordinates

Before we can go any further, we need to discuss *homogenous coordinates*, which is basically a linear algebra trick to simplify the writing of our equations. To convert a normal coordinate system to a homogenous coordinate system, an extra dimension must be added to every point in the system. This extra coordinate is simply a scalar multiple ($s_w$ here), so an (originally 3D) world point would be $\mathbf{X}_w= (s_w x_w, s_w y_w, s_w z_w, s_w)^T= s_w ( x_w, y_w, z_w, 1)^T$ in homogenous coordinates. Similarly, an (originally 2D) image point will then be $\mathbf{X}_i= (s_i x_i, s_i y_i, s_i)^T= s_i (x_i, y_i, 1)^T$ in homogenous coordinates.

It is important to note that in homogenous coordinates, the value of the scalar multiple ($s_w$ and $s_i$ above) does not matter, since it can simply be divided out of the point. It just cannot be zero in most circumstances. For example,

$$\frac{1}{s_i}\mathbf{X}_i= \frac{1}{s_i} \left[ \begin{array}{c}s_i x_i \\ s_i y_i \\ s_i \end{array}\right]= \left[ \begin{array}{c} x_i \\ y_i \\ 1 \end{array}\right].$$

## The Simplest Pinhole Camera

Using homogenous coordinates, we will now build a mathematical description of a camera. The mathematical description of the camera is a set of linear equations that translate a world point $\mathbf{X}_w$ into an image point $\mathbf{X}_i$. Since the homogenous world point is 4 dimensional and the homogenous image point is 3 dimensional, the overall transformation can be described by the $3 \times 4$ *projection matrix* $\mathbf{P}$. The projection from the world point to its corresponding image point can then be written as $\mathbf{X}_i= \mathbf{P} \mathbf{X}_w$.

Let’s now dig in and look at an example camera:

In the above diagram of a simple pinhole camera, we have a number of key items listed:

*x*,*y*, and*z*are the 3D world axes*.**z*is the*principal axis,*which is simply the axis perpendicular to the image plane. Think of this as the direction that the camera is pointing. The*z*axis is often chosen as the principal axis because most vision scientists have historically chosen it to be the principal axis. All of the equations we derive can be re-derived to use a different axis as the principal axis if you are so inclined.- $\mathbf{C}_w$ is the world coordinate of the camera center. This is a 4 dimensional homogenous point.
- $\mathbf{C}_i$ is the
*principal point*, which is the point where the principal axis meets the image plane. This is a 3 dimensional homogenous point because it is on the 2D image plane, not in the 3D world. *f*is the the focal length, which is just is the scalar distance from the camera center to the image plane.- $\mathbf{X}_w$ is the world point being imaged. This is a 4 dimensional homogenous point.
- Finally, $\mathbf{X}_i$ is the point on the image plane that the world point projects to. This is a 3 dimensional homogenous point.

In the simplest case of the projection matrix, the camera center is at the origin of the world coordinate system, which means $\mathbf{C}_w= (0, 0, 0, 1)^T$. Therefore, our very simple transformation of the world coordinate to the image coordinate ($\mathbf{X}_i= \mathbf{P} \mathbf{X}_w$) can be fully written out as

$$\mathbf{X}_i= \left[ \begin{array}{c} s_i x_i\\ s_i y_i\\ s_i \end{array}\right]= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right] \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right]$$

where

$$\mathbf{P}= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right]$$

and

$$\mathbf{X}_w= \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right].$$

That is where we will stop for today. Come back for the next post in this series, which will explore moving the camera center to a different point in the world and moving the principal point to a different point in the image plane.

Edit 8/16/2013: You can find Part 2 of this series here, and Part 3 of this series here.

Warning: preg_match(): Compilation failed: invalid range in character class at offset 4 in/homepages/4/d153601691/htdocs/natebird/wp-content/plugins/wp-remove-author-url-and-comment-links/lib/simple_html_dom.phpon line1110Warning: preg_match_all(): Compilation failed: invalid range in character class at offset 4 in/homepages/4/d153601691/htdocs/natebird/wp-content/plugins/wp-remove-author-url-and-comment-links/lib/simple_html_dom.phpon line618Warning: Invalid argument supplied for foreach() in/homepages/4/d153601691/htdocs/natebird/wp-content/plugins/wp-remove-author-url-and-comment-links/lib/simple_html_dom.phpon line625Camera Representation Part 3: Final Pieces » Nate Bird