May 172013

This post is the second in a series of posts on representing cameras mathematically.  If you have not read it yet, or need a quick refresher, please read Part 1 here.

Intrinsic vs. Extrinsic Camera Properties

To move the camera in the world and to move the image on the image plane, we must distinguish between properties that are intrinsic to the camera and those that are extrinsic to it.  Extrinsic properties describe the camera's position in the world, while intrinsic properties describe things like the location of the image plane origin and image scaling.

To separate out the intrinsic from the extrinsic parameters, we define the camera calibration matrix \mathbf{K} which describes the camera's intrinsic parameters.  The camera calibration matrix for the simple pinhole camera described in Part 1 is

\mathbf{K}= \left[ \begin{array}{ccc} f & 0 & 0\\ 0 & f & 0\\ 0 & 0 & 1\end{array} \right].

This camera calibration matrix only takes into account the focal length f.  But, we now have a description of the intrinsic parameters that is separate from the camera's position in the world.  Let's now change the camera's position.

Setting the Camera Location

pinhole camera diagram

The above diagram was introduced back in Part 1, but the projection matrix \mathbf{P} was then calculated assuming that the camera center \mathbf{C}_w was at the origin and the camera points along the z-axis.  We will now generalize and assume that \mathbf{C}_w can be any location in the world, and that the camera can be rotated arbitrarily.

The rotation of the camera is described by a 3 \times 3 rotation matrix \mathbf{R}.  Rotation matrices are a common way to mathematically describe an object's roll, pitch, and yaw in a 3 dimensional space.  Rotation matrices are used whenever a linear model of 3D location is needed--vision, robotics, and graphics are example sub-fields of computer science that use rotation matrices regularly.

To apply the rotation matrix \mathbf{R} and the camera position \mathbf{C}_w, we must define a transformation that translates and rotates the camera in terms of the world frame.  That is, we need the rotation and translation of the camera from the origin of the world frame to its position and orientation in the world.  The rotation is very straight forward, as it is described by rotation matrix \mathbf{R}.  However, the translation is a bit trickier;  to find the translation to use in the projection matrix \mathbf{P}, we need to "correct" for the rotation.  Thus, the translation is described as

\mathbf{t}= -\mathbf{RC}_w,

where \mathbf{t} is the resulting 3 dimensional vector.

Given all of this, we can solve for the projection matrix using the following equation:

\mathbf{P}= \mathbf{K} \left[ \mathbf{R} | \mathbf{t} \right].

Setting the Image Location

Now that we can move the camera around to any arbitrary location and orientation in the world, we will focus on moving the principal point of the image to an arbitrary point in the image plane.  The principal point is the point in the 2D image plane that corresponds to point \mathbf{C}_i in the diagram above.  The reason why it is important to move is because the principal point is the origin, point (0, 0) in the image.  Most digital image formats put the origin in the corner of the image, but without moving the principal point, the origin will be in the center of the image.  This must be changed!


Image plane diagram

Image plane diagram. Shows the location of the principal point and associated axes in the camera image plane (C_{cam}) and the x,y axes of the actual image.

To move the principal point to the image origin, we need to add the y_0 offset for the y-axis and the x_0 offset for the x-axis.  This is a fairly straightforward modification of the camera calibration matrix \mathbf{K} above.  Once we make this change, we get:

\mathbf{K}= \left[ \begin{array}{ccc} f & 0 & x_0\\ 0 & f & y_0\\ 0 & 0 & 1 \end{array} \right].

It can clearly be seen that this addition simply adds a (scaled) offset to the image locations in the image plane.  To illustrate this with an example, let's solve for \mathbf{K X}_{cam} where \mathbf{X}_{cam} is a 3D homogenous vector containing a point in the camera's image plane:

\mathbf{K X}_{cam}= \left[ \begin{array}{ccc} f & 0 & x_0\\ 0 & f & y_0\\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{c} x_{cam}\\ y_{cam}\\ 1 \end{array} \right]= \left[ \begin{array}{c} fx_{cam}+x_0\\ fy_{cam}+y_0\\ 1 \end{array} \right].

Images with Origin in the Upper-Left-Hand Corner

One final thought to consider:  many digital image formats put the origin of the image in the upper left-hand corner of the image, with the y-axis pointed down.  If you are dealing with images like that, you will need to correct your camera calibration matrix as follows:

\mathbf{K}'= \left[ \begin{array}{ccc}1 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & 1 \end{array} \right] \mathbf{K}.

This correction will flip the y-axis so that it will line up correctly with the image plane.

And that is where we will leave off for today.  Come back next time for Part 3 of this series where we will add in more intrinsic camera parameters to think about.

Edit 8/16/2013:  You can find Part 3 of this series here.

May 102013

Representing a camera mathematically can be a bit tricky, especially if you want to represent many aspects of the camera.  In this post, I will begin a discussion of the linear pinhole camera model.  This is the first in a series of posts on camera representation;  at the end of this series, we will have completely walked through the derivation of a linear system that describes how a point in the 3D world projects to a point on the 2D image plane.

Homogenous Coordinates

Before we can go any further, we need to discuss homogenous coordinates, which is basically a linear algebra trick to simplify the writing of our equations.  To convert a normal coordinate system to a homogenous coordinate system, an extra dimension must be added to every point in the system.  This extra coordinate is simply a scalar multiple (s_w here), so an (originally 3D) world point would be \mathbf{X}_w= (s_w x_w, s_w y_w, s_w z_w, s_w)^T= s_w ( x_w, y_w, z_w, 1)^T in homogenous coordinates.  Similarly, an (originally 2D) image point will then be \mathbf{X}_i= (s_i x_i, s_i y_i, s_i)^T= s_i (x_i, y_i, 1)^T in homogenous coordinates.

It is important to note that in homogenous coordinates, the value of the scalar multiple (s_w and s_i above) does not matter, since it can simply be divided out of the point.  It just cannot be zero in most circumstances.  For example,

\frac{1}{s_i}\mathbf{X}_i= \frac{1}{s_i} \left[ \begin{array}{c}s_i x_i \\ s_i y_i \\ s_i \end{array}\right]= \left[ \begin{array}{c} x_i \\ y_i \\ 1 \end{array}\right].

The Simplest Pinhole Camera

Using homogenous coordinates, we will now build a mathematical description of a camera.  The mathematical description of the camera is a set of linear equations that translate a world point \mathbf{X}_w into an image point \mathbf{X}_i.  Since the homogenous world point is 4 dimensional and the homogenous image point is 3 dimensional, the overall transformation can be described by the 3 \times 4 projection matrix \mathbf{P}.  The projection from the world point to its corresponding image point can then be written as \mathbf{X}_i= \mathbf{P} \mathbf{X}_w.

Let's now dig in and look at an example camera:

Pinhole camera diagram

Pinhole camera diagram.

In the above diagram of a simple pinhole camera, we have a number of key items listed:

  • x, y, and z are the 3D world axes.
  • z is the principal axis, which is simply the axis perpendicular to the image plane.  Think of this as the direction that the camera is pointing.  The z axis is often chosen as the principal axis because most vision scientists have historically chosen it to be the principal axis.  All of the equations we derive can be re-derived to use a different axis as the principal axis if you are so inclined.
  • \mathbf{C}_w is the world coordinate of the camera center.  This is a 4 dimensional homogenous point.
  • \mathbf{C}_i is the principal point, which is the point where the principal axis meets the image plane.  This is a 3 dimensional homogenous point because it is on the 2D image plane, not in the 3D world.
  • f is the the focal length, which is just is the scalar distance from the camera center to the image plane.
  • \mathbf{X}_w is the world point being imaged.  This is a 4 dimensional homogenous point.
  • Finally, \mathbf{X}_i is the point on the image plane that the world point projects to.  This is a 3 dimensional homogenous point.

In the simplest case of the projection matrix, the camera center is at the origin of the world coordinate system, which means \mathbf{C}_w= (0, 0, 0, 1)^T.  Therefore, our very simple transformation of the world coordinate to the image coordinate (\mathbf{X}_i= \mathbf{P} \mathbf{X}_w) can be fully written out as

\mathbf{X}_i= \left[ \begin{array}{c} s_i x_i\\ s_i y_i\\ s_i \end{array}\right]= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right] \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right]


\mathbf{P}= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right]


\mathbf{X}_w= \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right].

That is where we will stop for today.  Come back for the next post in this series, which will explore moving the camera center to a different point in the world and moving the principal point to a different point in the image plane.

Edit 8/16/2013:  You can find Part 2 of this series here, and Part 3 of this series here.

May 022013

In an attempt to generate additional traffic for my Birdseye College Price Comparison site, I have added a set of PDF files to the site.  There is one PDF file for each state and the District of Columbia, which contains a list of all of the four-year colleges in the state along with my system's estimated four-year price for the average student starting college this year.  These lists are ordered from lowest price to highest price.  My personalized cost estimates are still going to be superior because they will provide a price range closer to what an individual student will actually pay for their education, but I think these PDF files can provide a useful starting point for students or their parents.

My hope is that these side-by-side average (or net) four-year college price comparisons per state are something that my potential users are interested in.  If you want to take a look at it, here is the link: Side-by-Side Four-Year Average Net College Prices by State.

If this is useful for you, or you have any ideas on what might make it more useful, let me know!  I'm always interested in improvement.