Mar 042013
 

Sorting.  That perennial problem that forms the backbone of every introductory Algorithms class.  Now, hardly anyone ever has to code up their own sorting algorithm, as great implementations are included in the standard libraries for every language worth naming.  But, ask anyone who passed an algorithms class in college which particular sorting algorithm to use on a large unsorted array of real numbers, and they would probably answer Quicksort or some other O(n \log n) algorithm.  Give this hypothetical computer scientist a half hour and they would most likely be able to produce a reasonable facsimile of Quicksort on the whiteboard, even if they had not thought about it since their college days.

But O( n \log n) sorting algorithms are not the end of sorting.  There is a certain class of sorting problems that can run faster.  Much faster.  In certain circumstances, you can sort an array in O( n).  That's linear time!  The speedup over naively using Quicksort on these problems is immense.

So how does linear sort work?  The central concept goes back to one of my favorite topics in algorithms--techniques that trade memory space for speed.  The restriction for linear sort is that it can only be done when the numbers being sorted are both bound and countable.  An example of such a set of numbers is integers within some defined range.  Instead of shuffling items around an array like is done in Quicksort, the linear sorting algorithm creates a new array to keep a count of each of the numbers in the source range encountered, and then using these counts to create the final sorted array.

Let us now add some specificity to this discussion.  We will start with an unsorted array \mathbf{X}= \langle x_1, x_2, ..., x_n \rangle in which every item x_i is an integer in the range x_{min} \leq x_i \leq x_{max}.  We define our new array of counts \mathbf{C} over the range [ x_{min}, x_{max}]. Finally, we return the sorted array \mathbf{Y}.

Here is the pseudocode for this algorithm:

Linear-Sort ( \mathbf{X})
    x_{min}= \min( \mathbf{X})
    x_{max}= \max( \mathbf{X})
    n= \text{size}( \mathbf{X})

    \mathbf{C}= new array( x_{min}, x_{max})

    for each x_i \in \mathbf{X}
        \mathbf{C}_{x_i}= \mathbf{C}_{x_i}+ 1
    end

    \mathbf{Y}= new array of size n
    i= x_{min}
    j= 1
    while j \leq n
        if \mathbf{C}_{i}> 0
            \mathbf{Y}_{j}= i
            \mathbf{C}_{i}= \mathbf{C}_{i}- 1
            j= j+ 1
        else
            i= i+ 1
        end
    end

    return \mathbf{Y}
end

The code above starts by finding some information about the unsorted array \mathbf{X}, and then create the array of counts \mathbf{C}.  After that, we have a loop which goes through all n elements of \mathbf{X} and increments each count in \mathbf{C} as needed.  Finally, we have the last loop which goes through all r= x_{max}- x_{min} elements in \mathbf{C} to create the output array \mathbf{Y}.  Overall, the runtime of the entire algorithm is then O( \max( n, r)).

Of course, we are sacrificing space to get this linear runtime.  We can see that we need an r length array \mathbf{C} to hold the counts.  Therefore, the memory space required is O( \max( n, r) as well.

A few final thoughts:

  • I glossed over the array indexing for \mathbf{C} to make the algorithm description readable.  It's unlikely that a programming language would support exactly what I have written above--you would instead have to create a mapping for the index from [ x_{min}, x_{max}] to [ 1, r] and use that for indexing the \mathbf{C} array.
  • If the range r is large compared to the number of items in the array n, it probably makes sense to use a hash table instead of an array to keep track of the counts (\mathbf{C} above).  That would save memory space since \mathbf{C} would be sparse.  It would also save time by shortening the r length final loop that produces the output array \mathbf{Y}.  Note that the exact space and time needed in this case ultimately rely on how the hash table being used is implemented, which is why I have not provided a final Big-O.
  • For another example of trading memory space for speed, check out my post on dynamic programming.
Feb 252013
 

Background segmentation is a computer vision technique that is routinely used as part of automated video surveillance systems to try to distinguish targets to track within a scene.  It works only in the case of vision systems that incorporate a stationary camera observing a mostly stationary scene.  It works best when the scene is not regularly crowded with many moving targets.  The central concept is that targets are found by modeling not the targets themselves, but the rest of the scene.

Background segmentation relies on a background model of the scene, i.e., image features that do not change or change very slowly with time.  Then, on each frame of video, the background model is compared to what is actually in the image;  the parts of the image that do not match the background are therefore foreground and are worthy of further processing.  The background model can also be updated with every frame so that moving objects that stop can be incorporated into the background model (e.g. a vehicle stopping in a parking lot) or so that slow changes can be incorporated (e.g. the position of shadows as the location of the sun changes).

Variable Definitions

At every point of time t, we have an image frame I^{t}.  The image is broken up into a number of pixels, and we can identify an individual pixel as I^t_{i, j}.  The pixel is represented by a three-dimensional vector, containing a red, green, and blue value.

The background model that we build up will have several main components:  A three-dimensional mean vector \mu_{i, j} and a 3 \times 3 covariance matrix \Sigma_{i,j} for each pixel in the video.  We also need to keep track of the learning rate \alpha that represents how fast new information is incorporated into the background model (a good starting value is 0.01). Finally, we need a threshold value \tau to determine how many standard deviations away from the mean we have to be to be considered foreground (a good starting value is between 1.0 and 3.0).

Initialization

The first step in the background segmentation process is initialization--defining an initial set of values for our model before we can start processing video frames as they come in.  For this, we will set the initial mean value for each pixel to that pixel's value in the first frame of the video sequence, e.g.,

\mu_{i, j}= I^0_{i, j}.

Similarly, we will initialize the covariance matrix for each pixel to the identity matrix, e.g.,

\Sigma_{i,j}= \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array} \right].

Update

The model is updated on every frame t using an exponential moving average.  This allows the model to incorporate changes into the background. We update the mean value of each pixel based on that pixel's current value in the current frame, e.g.,

\mu_{i, j}= \alpha I^t_{i, j}+ ( 1- \alpha) \mu_{i, j}.

Similarly, the covariance matrix for each pixel is updated based on the that pixel's current value in the current frame, e.g.,

\sigma= [ \mu_{i, j}- I^t_{ i, j}] [ \mu_{i, j}- I^t_{i, j}]^T

\Sigma_{i,j}= \alpha \sigma+ ( 1- \alpha) \Sigma_{i, j}.

Creating a Foreground Mask

A foreground mask is a black-and-white image where a pixel is white if that pixel in the current frame is foreground, and black otherwise.  This requires a way to determine if a given pixel is foreground.  For this, we use a distance measure known as the Mahalanobis distance, which ultimately just tells us how far a sample point is from the mean in terms of standard deviations as defined by the covariance matrix.  The Mahalanobis distance is really handy, because it gives us a distance measure that scales with the uncertainty we have in the system.  Simply, the distance d_{i, j} is

d_{i, j}= \sqrt{ (I^t_{i, j}- \mu_{i, j})^T \Sigma_{i, j}^{-1} (I^t_{i, j}- \mu_{i, j})}

To create our foreground mask M^t, we set M^t_{i, j}= 1 if d_{i, j}> \tau and set M^t_{i, j}= 0 otherwise.

Example

Here is a demonstration of the method presented above.  It shows me walking around a static background.  The current frame I^t is shown on the top, the current background model is shown in the center, and the foreground mask M^t is shown on the bottom.

Notice there are a couple of jumps where almost the whole image is detected as foreground when my camera decided to "helpfully" automatically adjust it's focus; this is expected behavior as the values detected for most individual pixels changed, and this was detected.  Also notice that I get learned into the background when I stop by the table.

Notes on Implementation

This algorithm is really straightforward to code up, but is not readily vectorizable, which means that it runs very slowly when coded up in Matlab.  It runs much faster than real time if you code it up in C++ and use OpenCV to handle the reading and writing of the video files and VNL to handle the linear algebra.

Also note that the method presented here is very bare-bones.  For professional systems, there are additional pieces that can be added on to improve results.

Feb 112013
 

There are a number of useful tools out there for writing computer vision software.  The three packages that follow are the ones that I have gotten the most use out of over the years.  These are very good tools, which I highly recommend to anyone that needs to create computer vision or image processing software.

Matlab Image Processing Toolbox

The Matlab Image Processing Toolbox is a set of functions in Matlab that support image processing and manipulation.  The toolbox provides the basic functionality to read in and write out images;  in recent years, this has expanded to video files as well.  Matlab treats images as matrices of pixel values, making great use of the first-class-citizen support Matlab gives to vectors and matrices.  Working with images at a pixel level is just like working with any other matrix in Matlab.

Matlab's great strength is it's ease of use for the programmer.  I sometimes joke that Matlab code is "executable pseudocode," and that is precisely the feel it gives while using it.  I can code up and try out more ideas, faster, in Matlab than I can in any other language.  Creating output plots, figures, and images is very straight forward, and makes checking code behavior a joy.  Unfortunately, there are some drawbacks to using Matlab--it's code can only run within the Matlab interpreter; it's dynamic record datatype is nice to work with, but there is no object-oriented support; and code executes very slowly (especially if you use any loops in your code, which is likely with image processing).

I find that my typical development workflow when developing computer vision algorithms will often start with creating prototype code in Matlab.  Once I get all of the mathematical and structural kinks worked out, I will sometimes then move on to implementing the code in C++ based on my completed Matlab functions.  Reimplementation in C++ is only done if I need real-time performance or faster testing.  If performance is no big deal, I find simply using my Matlab code to be acceptable.

OpenCV

The OpenCV C++ library is probably the most widely used computer vision library.  There is a lot in here from the low level to the high level.  The low level functionality to read in and write out video files is exceptionally useful, and the first step for someone just beginning to write vision software in C++.  OpenCV contains a plethora of higher-level vision code, from edge detectors through face detectors through machine learning algorithms.

OpenCV is very usable, and installation is very easy.  There is not much that has to be linked against, and the most widely used functions in it work really well.  It is an open source project though, and components that do not get much use can exhibit some wonky code.  It is pretty big, needing about 4 GB of space.  Overall, OpenCV is a great library and a good place to start for someone beginning to write vision code in C++.

VXL

The VXL C++ library is an amazing library, with vast functionality applicable to computer vision projects.  I am especially fond of the VNL and VNL Algos sub-libraries.  Honestly, they are the closest thing I have found to having Matlab's language-level support for matrices and matrix operations in C++.  This isn't to say the VXL library is as intuitive to use as Matlab, but VXL provides a lot of the functionality without having to write all the low-level matrix manipulation functions yourself.  What I find especially remarkable about VXL is that all of the code that I have had to go through in it has been very well written, which is not the case with all open source projects.

The major downside of VXL, in my opinion, is the building and installation process.  It is a serious rite of passage.  If you intend to use VXL for a project of yours, make sure to set aside a day (maybe two) to get to the point where you can link against it.  VXL is a library where only the source is provided, and it is up to you to get it compiled and working.  There are installation instructions on the VXL site, but they are not very detailed.  Here are some detailed VXL installation instructions that I have found to be useful.  It is important to also note that VXL is quite large; make sure to have about 10 GB free before you start with it.

Feb 042013
 

Birdseye College Price Comparison is the site that I have spent the past few months creating.  The purpose of the site is to provide students first looking at prospective colleges an estimate of what a four-year education at each college they are interested in will cost them personally.  The site provides an estimate of the full cost for four years of tuition, fees, supplies, room, and board.  After just one visit to this site, with as little typing as possible, a student will have a list of the colleges they are interested in ranked by how much a four-year education will cost at each one. Odds of being accepted to each college are provided as well.

Estimating what a student will ultimately be charged by a specific college is challenging.  In the United States, public and private colleges practice widespread price manipulation.  Colleges have a sticker price, the posted tuition amount, but they offer discounts to most students they accept.  The discount offered is often called a scholarship or a grant in the student's acceptance letter, and the discount amount differs from student to student.  For example, students from poor families often get significantly larger discounts than students from rich families.  The discounts can be huge--some students may pay just 10% of the sticker price.  The discounted price, including room, board, fees, and supplies, is often referred to as the "net price."

Over the years colleges have become very good at offering students a net price in their admissions letter that is just under what the student (or their parents) are willing to pay.  Unfortunately, there is no easy way to figure out what a college will charge a particular student without applying to that college, which involves filling out forms, writing essays, and paying application fees.  Birdseye College Price Comparison helps by creating an estimate for each individual student based on what students like them have been charged in the past.

Since the 1992/93 school year, the net price charged to a new student to attend one year at a public four-year college rose an average of 5.9% per year.  Over the same time period, the net price charged to a new student to attend one year at a private non-profit four-year college rose an average of 3.6% per year.  (Source: College Board's Trends in College Pricing 2012 report).  And these are increases just for the cost of the first year of schooling.

After their first year, students currently attending a college end up paying more because colleges raise tuition and fees from year-to-year.  The discount that the college gives a student in their first year does not increase to cover the extra amount being charged, so the amount that tuition and fees are increased must be paid by the student.  Tuition and fee increases are a way for a college to extract more revenue out of its existing student body.  At private non-profit four-year colleges, the sticker price rose an average of 6.1% per year since 1992/93.   And those increases are passed directly to students who are currently attending college.

To create a complete price estimate for a four-year education, initial net price offers as well as tuition increases must be factored in.  Birdseye College Price Comparison takes care of all of that for students, first by estimating net price based on what similar students have been charged in the past, and then extrapolating out expected tuition increases for the years necessary to get a degree.  It does all the arithmetic for students, giving them a list of each college they are interested in with an expected price for a complete degree.  That way, a student can decide if a degree from one college is really worth an extra $100,000 over a degree from another.

Don't waste your time applying to colleges that cost too much.  Give Birdseye College Price Comparison a try, and start narrowing your college search today!

Jan 282013
 

When I taught Computer Science at university, dynamic programming was a topic that I thought was particularly important to cover in the Algorithms courses. The idea was to instill in the students the understanding that memory (space) can be exchanged for speed (at very favorable exchange rates, in some computational situations). When you have a problem that it can be applied to, dynamic programming is an amazingly powerful technique, capable of reducing the runtime needed to find a solution from absurd exponential runtime (O(2n)) down to polynomial runtime (O(n2)).

Let that example runtime comparison sink in for a moment. If you need help understanding how huge this is, compare it for the case where n=1000. In this case, our absurd exponential runtime from a non-dynamic programming algorithm would be (on the order of) 21000 = 1.0715 ×10301, while our improved polynomial runtime from a dynamic programming solution would be (on the order of) 10002 = 106. The dynamic programming algorithm would therefore take phenomenally less time to generate a solution. And the improvement only gets better as the problem you are addressing gets bigger. Where it works, dynamic programming pays off in a big way.

So, how does dynamic programming work? At it's most fundamental, it focuses on problems which can be defined recursively, but which at different points in program execution, calculate the same piece of information again and again and again and again... This happens because for one reason or another, the algorithm must call a function multiple times with the same parameters. In a dynamic programming solution however, these repeated calculations are instead performed only once, and their solution is stored within an array or hash table. Then, when the solution to those calculations is required, it is merely looked up. Instead of recalculating a repeated value at potentially great cost, a cheap lookup operation is performed instead.

Saving solutions while the program is executing and keeping them until they are needed later takes memory space. And this is where the tradeoff between space and time comes from. By sacrificing space to save answers you need in the future, time is not needed to recalculate them every time they are needed.

The Fibonacci sequence is a classic example that demonstrates the power of dynamic programming. For this discussion, we will ignore the fact that a closed-form solution exists, and instead focus on its recursive definition. The nth Fibonacci number is defined as F(n) = F(n−1)+ F(n−2) for all n > 1, and F(0)=F(1)=1. So now, we write a simple program that encapsulates this formula:

F( n)
    if( n== 0|| n== 1)
        return 1
    endif
    return F( n- 1)+ F( n- 2)
end

We can see that if a simple program was written recursively following that definition, many calculations will be repeated. For instance, if we wanted to calculate F( 1000), then F( 999) is calculated once, F( 998) twice, F( 997) three times, and so forth following the Fibonacci sequence until F( 2) is calculated roughly 2.7 ×10208 times. This is a LOT of repeat calculations! We actually need to go through that little program a grand total of 7 ×10208 times to calculate F( 1000).

However, we can take that little program's repetitive nature into account and create a dynamic programming algorithm to take its place. For this, we create an array and store Fibonacci numbers we have already calculated before moving on. The dynamic programming algorithm is then:

Fdp( n)
    fib= array( n);
    fib( 0)= 1;
    fib( 1)= 1;

    for ii= 2 to n
        fib( ii)= fib( ii- 1)+ fib( ii- 2);
    end

    return fib( n);
end

If we were to walk through the dynamic programming algorithm to calculate Fdp( 1000), we can see that we calculate Fdp( 0) once, Fdp( 1) once, Fdp( 2) once, and so forth until Fdp( 1000) is calculated...just once. The time savings are enormous! (If you don't believe me, code it up and give it a spin.) Doing a little computational complexity analysis, we can see that our naive, non-dynamic programming algorithm has a ludicrous runtime of F( n) = O( 2n). Meanwhile, our dynamic programming algorithm has a runtime of Fdp( n) = O( n). The time savings are immense!

But, it is not all shiny. To get these time savings, we are sacrificing memory space. For the example of calculating Fdp( 1000), an array must be created containing 1000 elements. Thus, it has a space complexity that must be accounted for. The space complexity of Fdp( n) = O( n), which really is not that great of a price to pay for the speedup it makes possible.

And that is the power of dynamic programming-the ability to sacrifice memory space to significantly improve runtime. It is my opinion that recognizing when this is the case, and being able to act on it is one mark of an excellent programmer. So start looking around, problems where dynamic programming can be applied are fairly common (especially when comparing sets and sequences).