Aug 162013

This post is the third, and final, in a series of posts on mathematical camera representation.  The following are links to the earlier two entries in this series:

  1. Camera Representation Part 1: Homogenous Coordinate Systems and the Simplest Camera Imaginable
  2. Camera Representation Part 2: Moving the Camera and the Image

This post builds upon the model built up in these previous two posts by adding two final concepts:  the ability to handle non-square pixels in an image and the ability to handle skewed images.

For the rest of this discussion, the form of the solution for finding the projection matrix will remain the same as in Part 2.  That is, the 3 \times 4 projection matrix \mathbf{P} can be found by incorporating the 3 \times 3 camera rotation matrix \mathbf{R}, the 3-vector \mathbf{t}, and the 3 \times 3 upper-triangular intrinsic camera parameter matrix \mathbf{K} as

\mathbf{P}= \mathbf{K} \left[ \mathbf{R} | \mathbf{t} \right].

The intrinsic camera parameter matrix \mathbf{K} defined in the Part 2 will be updated to take into account non-square pixels and skew.  It hopefully makes sense that \mathbf{K} is where these changes take place since pixel dimensions and image skew are intrinsic to the camera and do not relate to the camera's extrinsic location in the world.

 Non-Square Pixels

Most digital cameras have rectangular pixels.  Because the pixels are rectangular, the camera model must scale the image by different amounts along the x- and y-axes.  We now update the definition of the intrinsic camera parameter matrix \mathbf{K} to be defined as:

\mathbf{K}= \left[ \begin{array}{ccc} \alpha_x & 0 & x_0\\ 0 & \alpha_y & y_0\\ 0 & 0 & 1 \end{array} \right].

Here, \alpha_x= f m_x and \alpha_y= f m_y where m_x is the number of pixels per unit distance in x and m_y is the number of pixels per unit distance in y.  The principal point (x_0, y_0) is now measured in terms of pixels.


The final parameter we will add to our model is the skew parameter s.  The skew parameter models how the x- and y-axes are aligned in the image plane.  In most cases, the axes are perpendicular and s=0.  If the x- and y-axes are not perpendicular, then s \neq 0.

Incorporating the skew parameter into the intrinsic camera parameter matrix, we get

\mathbf{K}= \left[ \begin{array}{ccc} \alpha_x & s & x_0\\ 0 & \alpha_y & y_0\\ 0 & 0 & 1 \end{array} \right].

Final Note on Degrees of Freedom

The camera projection matrix \mathbf{P} is a homogenous transform, which means that two projection matrices are equivalent if the only difference between them is a non-zero scaling coefficient.  That is, \mathbf{P}_1= \mathbf{P}_2 if \mathbf{P}_2= c \mathbf{P}_1 where c is a non-zero constant.  Practically, this means that a projection matrix has 11 degrees of freedom despite being a 12-item matrix.

Going into a bit more depth, we can expand out our projection matrix as

\mathbf{P}= \mathbf{K} \left[ \mathbf{R} | \mathbf{t} \right]= \left[ \begin{array}{ccc} \alpha_x & s & x_0\\ 0 & \alpha_y & y_0\\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{cccc} r_{11} & r_{12} & r_{13} & t_x\\ r_{21} & r_{22} & r_{23} & t_y\\r_{31} & r_{32} & r_{33} & t_z \end{array} \right]

We can now count our degrees of freedom:

  • \mathbf{K} has 5 degrees of freedom since it has 6 elements, but is homogenous and only defined up to scale.  That is, \mathbf{K} only has five elements that are mutually exclusive.
  • \mathbf{R} defines a rotation matrix, and therefore only has 3 degrees of freedom (roll, pitch, and yaw).
  • \mathbf{t} has 3 degrees of freedom since it defines a translation in 3-dimensional space which links the camera position with the world origin.

Thus, by simple addition, the camera projection matrix \mathbf{P} has 11 degrees of freedom.

And with that, we are finished with our discussion of the mathematical camera model.  I hope that you have found this useful!