May 172013
 

This post is the second in a series of posts on representing cameras mathematically.  If you have not read it yet, or need a quick refresher, please read Part 1 here.

Intrinsic vs. Extrinsic Camera Properties

To move the camera in the world and to move the image on the image plane, we must distinguish between properties that are intrinsic to the camera and those that are extrinsic to it.  Extrinsic properties describe the camera's position in the world, while intrinsic properties describe things like the location of the image plane origin and image scaling.

To separate out the intrinsic from the extrinsic parameters, we define the camera calibration matrix \mathbf{K} which describes the camera's intrinsic parameters.  The camera calibration matrix for the simple pinhole camera described in Part 1 is

\mathbf{K}= \left[ \begin{array}{ccc} f & 0 & 0\\ 0 & f & 0\\ 0 & 0 & 1\end{array} \right].

This camera calibration matrix only takes into account the focal length f.  But, we now have a description of the intrinsic parameters that is separate from the camera's position in the world.  Let's now change the camera's position.

Setting the Camera Location

pinhole camera diagram

The above diagram was introduced back in Part 1, but the projection matrix \mathbf{P} was then calculated assuming that the camera center \mathbf{C}_w was at the origin and the camera points along the z-axis.  We will now generalize and assume that \mathbf{C}_w can be any location in the world, and that the camera can be rotated arbitrarily.

The rotation of the camera is described by a 3 \times 3 rotation matrix \mathbf{R}.  Rotation matrices are a common way to mathematically describe an object's roll, pitch, and yaw in a 3 dimensional space.  Rotation matrices are used whenever a linear model of 3D location is needed--vision, robotics, and graphics are example sub-fields of computer science that use rotation matrices regularly.

To apply the rotation matrix \mathbf{R} and the camera position \mathbf{C}_w, we must define a transformation that translates and rotates the camera in terms of the world frame.  That is, we need the rotation and translation of the camera from the origin of the world frame to its position and orientation in the world.  The rotation is very straight forward, as it is described by rotation matrix \mathbf{R}.  However, the translation is a bit trickier;  to find the translation to use in the projection matrix \mathbf{P}, we need to "correct" for the rotation.  Thus, the translation is described as

\mathbf{t}= -\mathbf{RC}_w,

where \mathbf{t} is the resulting 3 dimensional vector.

Given all of this, we can solve for the projection matrix using the following equation:

\mathbf{P}= \mathbf{K} \left[ \mathbf{R} | \mathbf{t} \right].

Setting the Image Location

Now that we can move the camera around to any arbitrary location and orientation in the world, we will focus on moving the principal point of the image to an arbitrary point in the image plane.  The principal point is the point in the 2D image plane that corresponds to point \mathbf{C}_i in the diagram above.  The reason why it is important to move is because the principal point is the origin, point (0, 0) in the image.  Most digital image formats put the origin in the corner of the image, but without moving the principal point, the origin will be in the center of the image.  This must be changed!

 

Image plane diagram

Image plane diagram. Shows the location of the principal point and associated axes in the camera image plane (C_{cam}) and the x,y axes of the actual image.

To move the principal point to the image origin, we need to add the y_0 offset for the y-axis and the x_0 offset for the x-axis.  This is a fairly straightforward modification of the camera calibration matrix \mathbf{K} above.  Once we make this change, we get:

\mathbf{K}= \left[ \begin{array}{ccc} f & 0 & x_0\\ 0 & f & y_0\\ 0 & 0 & 1 \end{array} \right].

It can clearly be seen that this addition simply adds a (scaled) offset to the image locations in the image plane.  To illustrate this with an example, let's solve for \mathbf{K X}_{cam} where \mathbf{X}_{cam} is a 3D homogenous vector containing a point in the camera's image plane:

\mathbf{K X}_{cam}= \left[ \begin{array}{ccc} f & 0 & x_0\\ 0 & f & y_0\\ 0 & 0 & 1 \end{array} \right] \left[ \begin{array}{c} x_{cam}\\ y_{cam}\\ 1 \end{array} \right]= \left[ \begin{array}{c} fx_{cam}+x_0\\ fy_{cam}+y_0\\ 1 \end{array} \right].

Images with Origin in the Upper-Left-Hand Corner

One final thought to consider:  many digital image formats put the origin of the image in the upper left-hand corner of the image, with the y-axis pointed down.  If you are dealing with images like that, you will need to correct your camera calibration matrix as follows:

\mathbf{K}'= \left[ \begin{array}{ccc}1 & 0 & 0\\ 0 & -1 & 0\\ 0 & 0 & 1 \end{array} \right] \mathbf{K}.

This correction will flip the y-axis so that it will line up correctly with the image plane.

And that is where we will leave off for today.  Come back next time for Part 3 of this series where we will add in more intrinsic camera parameters to think about.

Edit 8/16/2013:  You can find Part 3 of this series here.

May 102013
 

Representing a camera mathematically can be a bit tricky, especially if you want to represent many aspects of the camera.  In this post, I will begin a discussion of the linear pinhole camera model.  This is the first in a series of posts on camera representation;  at the end of this series, we will have completely walked through the derivation of a linear system that describes how a point in the 3D world projects to a point on the 2D image plane.

Homogenous Coordinates

Before we can go any further, we need to discuss homogenous coordinates, which is basically a linear algebra trick to simplify the writing of our equations.  To convert a normal coordinate system to a homogenous coordinate system, an extra dimension must be added to every point in the system.  This extra coordinate is simply a scalar multiple (s_w here), so an (originally 3D) world point would be \mathbf{X}_w= (s_w x_w, s_w y_w, s_w z_w, s_w)^T= s_w ( x_w, y_w, z_w, 1)^T in homogenous coordinates.  Similarly, an (originally 2D) image point will then be \mathbf{X}_i= (s_i x_i, s_i y_i, s_i)^T= s_i (x_i, y_i, 1)^T in homogenous coordinates.

It is important to note that in homogenous coordinates, the value of the scalar multiple (s_w and s_i above) does not matter, since it can simply be divided out of the point.  It just cannot be zero in most circumstances.  For example,

\frac{1}{s_i}\mathbf{X}_i= \frac{1}{s_i} \left[ \begin{array}{c}s_i x_i \\ s_i y_i \\ s_i \end{array}\right]= \left[ \begin{array}{c} x_i \\ y_i \\ 1 \end{array}\right].

The Simplest Pinhole Camera

Using homogenous coordinates, we will now build a mathematical description of a camera.  The mathematical description of the camera is a set of linear equations that translate a world point \mathbf{X}_w into an image point \mathbf{X}_i.  Since the homogenous world point is 4 dimensional and the homogenous image point is 3 dimensional, the overall transformation can be described by the 3 \times 4 projection matrix \mathbf{P}.  The projection from the world point to its corresponding image point can then be written as \mathbf{X}_i= \mathbf{P} \mathbf{X}_w.

Let's now dig in and look at an example camera:

Pinhole camera diagram

Pinhole camera diagram.

In the above diagram of a simple pinhole camera, we have a number of key items listed:

  • x, y, and z are the 3D world axes.
  • z is the principal axis, which is simply the axis perpendicular to the image plane.  Think of this as the direction that the camera is pointing.  The z axis is often chosen as the principal axis because most vision scientists have historically chosen it to be the principal axis.  All of the equations we derive can be re-derived to use a different axis as the principal axis if you are so inclined.
  • \mathbf{C}_w is the world coordinate of the camera center.  This is a 4 dimensional homogenous point.
  • \mathbf{C}_i is the principal point, which is the point where the principal axis meets the image plane.  This is a 3 dimensional homogenous point because it is on the 2D image plane, not in the 3D world.
  • f is the the focal length, which is just is the scalar distance from the camera center to the image plane.
  • \mathbf{X}_w is the world point being imaged.  This is a 4 dimensional homogenous point.
  • Finally, \mathbf{X}_i is the point on the image plane that the world point projects to.  This is a 3 dimensional homogenous point.

In the simplest case of the projection matrix, the camera center is at the origin of the world coordinate system, which means \mathbf{C}_w= (0, 0, 0, 1)^T.  Therefore, our very simple transformation of the world coordinate to the image coordinate (\mathbf{X}_i= \mathbf{P} \mathbf{X}_w) can be fully written out as

\mathbf{X}_i= \left[ \begin{array}{c} s_i x_i\\ s_i y_i\\ s_i \end{array}\right]= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right] \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right]

where

\mathbf{P}= \left[ \begin{array}{cccc} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0\end{array} \right]

and

\mathbf{X}_w= \left[ \begin{array}{c} x_w\\ y_w\\ z_w\\ 1 \end{array}\right].

That is where we will stop for today.  Come back for the next post in this series, which will explore moving the camera center to a different point in the world and moving the principal point to a different point in the image plane.

Edit 8/16/2013:  You can find Part 2 of this series here, and Part 3 of this series here.

May 022013
 

In an attempt to generate additional traffic for my Birdseye College Price Comparison site, I have added a set of PDF files to the site.  There is one PDF file for each state and the District of Columbia, which contains a list of all of the four-year colleges in the state along with my system's estimated four-year price for the average student starting college this year.  These lists are ordered from lowest price to highest price.  My personalized cost estimates are still going to be superior because they will provide a price range closer to what an individual student will actually pay for their education, but I think these PDF files can provide a useful starting point for students or their parents.

My hope is that these side-by-side average (or net) four-year college price comparisons per state are something that my potential users are interested in.  If you want to take a look at it, here is the link: Side-by-Side Four-Year Average Net College Prices by State.

If this is useful for you, or you have any ideas on what might make it more useful, let me know!  I'm always interested in improvement.

Apr 122013
 

On Saturday, April 6, I had the pleasure to go to Minnebar 8.  Minnebar is a free-to-attend, volunteer-run "unconference" put on by the Minne* organization.  It was very good, and I highly recommend attending the next Minnebar if you are in the area.  Minnebar focuses on the intersection of the high-tech, start-up, and social-outreach communities in the Twin Cities.  This was the eighth annual version of the Minnebar gathering and it was held at the Best Buy corporate headquarters in the south Minneapolis metro (Best Buy donated the space).  Over a thousand people were registered for the gathering, and from my view in the middle of the crowd, that seems like a reasonable estimate of how many actually attended.

There were sessions scheduled throughout the day much like for any conference you might go to.  There were probably 10 or so parallel tracks going, so there was always a selection of topics.  The sessions were 40 minutes long, and consisted of presentations about the topic at hand by one or two presenters.  The presenters tried to take a conversational tone with their crowd, making it more of a dialog.  Most sessions were very good.  Here is an outline of the sessions I attended:

  • Teaching Kids to Code.  This session discussed a new Coder Dojo starting up in the Twin Cities (link).  Coder Dojos are distributed volunteer organizations that are currently springing up across the world focused on getting K-12 kids interested in programming, especially kids from groups that do not typically pursue computing careers.  They do this by hosting learn-to-program type events with a bunch of volunteer professionals to take the scariness out of coding, and provide the kids with a great experience.  Outreach efforts are extremely important for the creativity and vitality of the profession, so I really hope it takes off.
  • Agile Financial Modeling.  This session was about putting together a quick and dirty financial model for a fledgeling company.  I went because I have a fledgeling company, and had never heard of financial modeling before (it is essentially a codified way to sketch out projected estimates of income and expenditures over a year or three).  I found the session informative, and the layout they gave for the models is more intuitive to build and use than the one I had rolled myself for my own operation.
  • Managing Your IT Career.  This was a great session by a local headhunter about the state of the high tech labor market in the Midwest.  In short, the market is good for pretty much any computing professional.  The Twin Cities is a good place to be.
  • Civic Hacking.  This was a session put on by Open Twin Cities, a local group devoted to ultimately getting structured, real-time access to government-collected data so that community members can use it to improve the local community.  This group seems to be focused on software development with local hackathons and the National Day of Civic Hacking.  I think this is an interesting idea, and I can't wait to see what comes of it.
  • Percolating Trep Net.  To be honest, I am still not quite sure what this session was about.  There was talk about the different social networks people have, both online and off.  There was some information about categorizing these networks and people in them, but I never did figure out what the end result was supposed to be.  I guess it was for people with a different educational background than I.
  • This Old Website.  This was a really great session about adding HTML5, CSS3, and Responsive Design to an existing website.  They covered all three topics very quickly, and very well given the time constraint.  Participating in this session was akin to drinking water from a fire hose.  As someone whose web design tends to be pretty ad hoc (just see Birdseye College Price Comparison for an example), a lot of what was presented is very applicable to me.  They even put their slides online here.
  • Technology Behind the Obama 2012 Campaign.  This was a great final session for the day.  It was put on by a developer who worked on the information infrastructure behind the Obama 2012 Campaign.  It is absolutely amazing what all they were able to build, deploy, support, and then tear down in under a year and a half.  Political campaigns these days need a massive amount of software providing a variety of different functions to different groups of people.  The Obama Campaign built their software on Amazon Web Services, which was a fantastic choice for this type of operation--they only need the massive data center for a relatively short,  fixed time period, cloud services can adjust to exponentially exploding usage as the end of the campaign nears, and cloud services can be readily replicated to deal with parts of the infrastructure going down.  Overall a fascinating look into what it takes to run a modern political campaign.

As can be seen from the list above, there was a very diverse range of topics covered at Minnebar 8, most of which were very exciting.  Food was also provided, and it was delicious.  We got Pizza Luce (a local gourmet pizza chain) for lunch.  Beer was also provided for a social meet-and-greet at the end of the day.  What more could you ask for?

Overall, Minnebar 8 was an excellent experience.  It is astounding and heartening to me that a conference of this quality and magnitude can be organized and delivered in a completely volunteer manner.  It was very cool and very impressive.  I will definitely try to attend next year.

 

Apr 052013
 

I recently finished a cover-to-cover reading of Frederick Brooks' classic book on large software project management, The Mythical Man-Month (TMMM).  Originally published in 1975, TMMM details Brooks' management philosophy, which he developed as manager of the IBM OS/360 operating system from 1964-1965.  I read the 20th anniversary edition (published 1995), in which two additional chapters were added that update the philosophy put forth in the original edition from the beginning of the timesharing era up to the PC era of computing.

My impression of TMMM is very positive overall; it is easy to see why this is still a required text on software project management, even nearly 50 years after Brooks finished his stint managing the OS/360 project.  There are many interesting anachronisms in TMMM that would necessitate a history lesson in computing for today's college-age audience, e.g., software development in pencil and paper (since machine time was rare and expensive), separate punch-card readers from computers, development primarily in machine code, a preoccupation with memory size of programs, a view that structured programming might be a good idea that likely goes too far, etc.  Clearly, a very different world technologically than that which we work in now.  But despite all the differences in computing technology from 1965 to today, the fundamental problems in large software projects and the structural solutions to combat them that Brooks presents have not changed all that much.  It is rather fascinating--when I was teaching, there were always a few students who went into conniptions if an assignment required them to use a terminal interface instead of a graphical IDE, a 10 year change in development habit historically.  But current techniques for building large software systems remain fundamentally quite similar over 50 years of computing history.

In TMMM, Brooks posits, quite rightly in my opinion, that conceptual complexity is the primary hindrance to quick and quality software development.  In addition, conceptual complexity is a problem that can never be truly solved.  Conceptual complexity here means the number of logical components in a software system coupled with the the amount of communication required between these components. Because of the complex interrelationships between the components, a particular component could interact with hundreds or thousands of other components, requiring a huge amount of communication overhead between the people building these components.  This communication eats up a lot of time for the people involved, which in turn requires more people to get the project done in a timely manner, which in turn adds more necessary communication...and so forth.  The cycle is recursive.  The curse of complexity is that at some point, when more people are added to a project, the amount of time the project takes actually gets longer, instead of shorter.  This is why big software is almost always late and over budget.

Brooks presents a number of management techniques in TMMM to tame the overhead brought on by this complexity.  Many of these are standard industry practice today for large projects:  Brooks recommends having a rigidly hierarchical top-down design and development process, a chief architect at whom the buck stops for design decisions, a separation of implementation from design, teams dedicated to testing and finding bugs, a team dedicated to keeping documentation up-to-date, keeping track of the project through document maintenance, starting the design with a user manual/model, source control, and regression testing.  Brooks also presented some ideas that are not really used today, including his small "surgical team" concept, an insistence on the Waterfall design method (which he renounced in the 1995 retrospective chapter), and fairly hands-off upper management (which might exist somewhere?).

While reading TMMM, it seemed to me that Brooks had an idea of future developments that actually did eventually show up.  This includes documentation contained within the source code itself (Javadoc and the like), Git-style version control, a Wiki-like system for managing project documents, and in some sense Agile development (at least for individual components).  While reading through the original chapters, I noticed a few places where Brooks hinted at object-oriented-like abstraction, but in the 1995 retrospective, he claimed to have not seen that coming.  Brooks consistently reads as being ahead of his own time throughout.

As Brooks points out in TMMM, large software projects are the most complex things that humanity has ever built, and this complexity makes them very difficult to build on time and on budget.  Further, we have collectively already solved most of the easy problems in tackling the software project overhead, which means that most of the time spent on large software projects now is simply inherent to the problem being solved.  But despite this, TMMM ended on an optimistic note:  we may not ever solve the tractability problems in building large software, but we continue to build large software, producing ever more wondrous marvels of human ingenuity.