Feb 252013

Background segmentation is a computer vision technique that is routinely used as part of automated video surveillance systems to try to distinguish targets to track within a scene.  It works only in the case of vision systems that incorporate a stationary camera observing a mostly stationary scene.  It works best when the scene is not regularly crowded with many moving targets.  The central concept is that targets are found by modeling not the targets themselves, but the rest of the scene.

Background segmentation relies on a background model of the scene, i.e., image features that do not change or change very slowly with time.  Then, on each frame of video, the background model is compared to what is actually in the image;  the parts of the image that do not match the background are therefore foreground and are worthy of further processing.  The background model can also be updated with every frame so that moving objects that stop can be incorporated into the background model (e.g. a vehicle stopping in a parking lot) or so that slow changes can be incorporated (e.g. the position of shadows as the location of the sun changes).

Variable Definitions

At every point of time t, we have an image frame I^{t}.  The image is broken up into a number of pixels, and we can identify an individual pixel as I^t_{i, j}.  The pixel is represented by a three-dimensional vector, containing a red, green, and blue value.

The background model that we build up will have several main components:  A three-dimensional mean vector \mu_{i, j} and a 3 \times 3 covariance matrix \Sigma_{i,j} for each pixel in the video.  We also need to keep track of the learning rate \alpha that represents how fast new information is incorporated into the background model (a good starting value is 0.01). Finally, we need a threshold value \tau to determine how many standard deviations away from the mean we have to be to be considered foreground (a good starting value is between 1.0 and 3.0).


The first step in the background segmentation process is initialization--defining an initial set of values for our model before we can start processing video frames as they come in.  For this, we will set the initial mean value for each pixel to that pixel's value in the first frame of the video sequence, e.g.,

\mu_{i, j}= I^0_{i, j}.

Similarly, we will initialize the covariance matrix for each pixel to the identity matrix, e.g.,

\Sigma_{i,j}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right].


The model is updated on every frame t using an exponential moving average.  This allows the model to incorporate changes into the background. We update the mean value of each pixel based on that pixel's current value in the current frame, e.g.,

\mu_{i, j}= \alpha I^t_{i, j}+ ( 1- \alpha) \mu_{i, j}.

Similarly, the covariance matrix for each pixel is updated based on the that pixel's current value in the current frame, e.g.,

\sigma= [ \mu_{i, j}- I^t_{ i, j}] [ \mu_{i, j}- I^t_{i, j}]^T

\Sigma_{i,j}= \alpha \sigma+ ( 1- \alpha) \Sigma_{i, j}.

Creating a Foreground Mask

A foreground mask is a black-and-white image where a pixel is white if that pixel in the current frame is foreground, and black otherwise.  This requires a way to determine if a given pixel is foreground.  For this, we use a distance measure known as the Mahalanobis distance, which ultimately just tells us how far a sample point is from the mean in terms of standard deviations as defined by the covariance matrix.  The Mahalanobis distance is really handy, because it gives us a distance measure that scales with the uncertainty we have in the system.  Simply, the distance d_{i, j} is

d_{i, j}= \sqrt{ (I^t_{i, j}- \mu_{i, j})^T \Sigma_{i, j}^{-1} (I^t_{i, j}- \mu_{i, j})}

To create our foreground mask M^t, we set M^t_{i, j}= 1 if d_{i, j}> \tau and set M^t_{i, j}= 0 otherwise.


Here is a demonstration of the method presented above.  It shows me walking around a static background.  The current frame I^t is shown on the top, the current background model is shown in the center, and the foreground mask M^t is shown on the bottom.

Notice there are a couple of jumps where almost the whole image is detected as foreground when my camera decided to "helpfully" automatically adjust it's focus; this is expected behavior as the values detected for most individual pixels changed, and this was detected.  Also notice that I get learned into the background when I stop by the table.

Notes on Implementation

This algorithm is really straightforward to code up, but is not readily vectorizable, which means that it runs very slowly when coded up in Matlab.  It runs much faster than real time if you code it up in C++ and use OpenCV to handle the reading and writing of the video files and VNL to handle the linear algebra.

Also note that the method presented here is very bare-bones.  For professional systems, there are additional pieces that can be added on to improve results.

Feb 112013

There are a number of useful tools out there for writing computer vision software.  The three packages that follow are the ones that I have gotten the most use out of over the years.  These are very good tools, which I highly recommend to anyone that needs to create computer vision or image processing software.

Matlab Image Processing Toolbox

The Matlab Image Processing Toolbox is a set of functions in Matlab that support image processing and manipulation.  The toolbox provides the basic functionality to read in and write out images;  in recent years, this has expanded to video files as well.  Matlab treats images as matrices of pixel values, making great use of the first-class-citizen support Matlab gives to vectors and matrices.  Working with images at a pixel level is just like working with any other matrix in Matlab.

Matlab's great strength is it's ease of use for the programmer.  I sometimes joke that Matlab code is "executable pseudocode," and that is precisely the feel it gives while using it.  I can code up and try out more ideas, faster, in Matlab than I can in any other language.  Creating output plots, figures, and images is very straight forward, and makes checking code behavior a joy.  Unfortunately, there are some drawbacks to using Matlab--it's code can only run within the Matlab interpreter; it's dynamic record datatype is nice to work with, but there is no object-oriented support; and code executes very slowly (especially if you use any loops in your code, which is likely with image processing).

I find that my typical development workflow when developing computer vision algorithms will often start with creating prototype code in Matlab.  Once I get all of the mathematical and structural kinks worked out, I will sometimes then move on to implementing the code in C++ based on my completed Matlab functions.  Reimplementation in C++ is only done if I need real-time performance or faster testing.  If performance is no big deal, I find simply using my Matlab code to be acceptable.


The OpenCV C++ library is probably the most widely used computer vision library.  There is a lot in here from the low level to the high level.  The low level functionality to read in and write out video files is exceptionally useful, and the first step for someone just beginning to write vision software in C++.  OpenCV contains a plethora of higher-level vision code, from edge detectors through face detectors through machine learning algorithms.

OpenCV is very usable, and installation is very easy.  There is not much that has to be linked against, and the most widely used functions in it work really well.  It is an open source project though, and components that do not get much use can exhibit some wonky code.  It is pretty big, needing about 4 GB of space.  Overall, OpenCV is a great library and a good place to start for someone beginning to write vision code in C++.


The VXL C++ library is an amazing library, with vast functionality applicable to computer vision projects.  I am especially fond of the VNL and VNL Algos sub-libraries.  Honestly, they are the closest thing I have found to having Matlab's language-level support for matrices and matrix operations in C++.  This isn't to say the VXL library is as intuitive to use as Matlab, but VXL provides a lot of the functionality without having to write all the low-level matrix manipulation functions yourself.  What I find especially remarkable about VXL is that all of the code that I have had to go through in it has been very well written, which is not the case with all open source projects.

The major downside of VXL, in my opinion, is the building and installation process.  It is a serious rite of passage.  If you intend to use VXL for a project of yours, make sure to set aside a day (maybe two) to get to the point where you can link against it.  VXL is a library where only the source is provided, and it is up to you to get it compiled and working.  There are installation instructions on the VXL site, but they are not very detailed.  Here are some detailed VXL installation instructions that I have found to be useful.  It is important to also note that VXL is quite large; make sure to have about 10 GB free before you start with it.

Feb 042013

Birdseye College Price Comparison is the site that I have spent the past few months creating.  The purpose of the site is to provide students first looking at prospective colleges an estimate of what a four-year education at each college they are interested in will cost them personally.  The site provides an estimate of the full cost for four years of tuition, fees, supplies, room, and board.  After just one visit to this site, with as little typing as possible, a student will have a list of the colleges they are interested in ranked by how much a four-year education will cost at each one. Odds of being accepted to each college are provided as well.

Estimating what a student will ultimately be charged by a specific college is challenging.  In the United States, public and private colleges practice widespread price manipulation.  Colleges have a sticker price, the posted tuition amount, but they offer discounts to most students they accept.  The discount offered is often called a scholarship or a grant in the student's acceptance letter, and the discount amount differs from student to student.  For example, students from poor families often get significantly larger discounts than students from rich families.  The discounts can be huge--some students may pay just 10% of the sticker price.  The discounted price, including room, board, fees, and supplies, is often referred to as the "net price."

Over the years colleges have become very good at offering students a net price in their admissions letter that is just under what the student (or their parents) are willing to pay.  Unfortunately, there is no easy way to figure out what a college will charge a particular student without applying to that college, which involves filling out forms, writing essays, and paying application fees.  Birdseye College Price Comparison helps by creating an estimate for each individual student based on what students like them have been charged in the past.

Since the 1992/93 school year, the net price charged to a new student to attend one year at a public four-year college rose an average of 5.9% per year.  Over the same time period, the net price charged to a new student to attend one year at a private non-profit four-year college rose an average of 3.6% per year.  (Source: College Board's Trends in College Pricing 2012 report).  And these are increases just for the cost of the first year of schooling.

After their first year, students currently attending a college end up paying more because colleges raise tuition and fees from year-to-year.  The discount that the college gives a student in their first year does not increase to cover the extra amount being charged, so the amount that tuition and fees are increased must be paid by the student.  Tuition and fee increases are a way for a college to extract more revenue out of its existing student body.  At private non-profit four-year colleges, the sticker price rose an average of 6.1% per year since 1992/93.   And those increases are passed directly to students who are currently attending college.

To create a complete price estimate for a four-year education, initial net price offers as well as tuition increases must be factored in.  Birdseye College Price Comparison takes care of all of that for students, first by estimating net price based on what similar students have been charged in the past, and then extrapolating out expected tuition increases for the years necessary to get a degree.  It does all the arithmetic for students, giving them a list of each college they are interested in with an expected price for a complete degree.  That way, a student can decide if a degree from one college is really worth an extra $100,000 over a degree from another.

Don't waste your time applying to colleges that cost too much.  Give Birdseye College Price Comparison a try, and start narrowing your college search today!