Tuesday 4 November 2014

Intuition Behind the Derivative Filters for Edges, Corners and Blobs

This semester I'm TAing the graduate computer vision course CSC2503 at UofT. Last week, when giving a tutorial on SIFT, I had an interesting question from a student, which I could not answer on the spot, and required some thoughts to find a satisfactory answer afterwards. 

The context of the question was following: to introduce the idea of "interest point" that is distinctive locally, I asked the students what criteria should one look at. One student suggested to look for strong edges, so I told them that edge response doesn't localize well along direction of edges, but instead one should look for corners or blobs. Later on, when I talked about scaled normalized Laplacian of Gaussian, and difference of Gaussian, a student asked: given how we motivated interest points, we should be looking for points with large first order derivative magnitude in both direction, but Laplacian is second derivative, so how do they relate? 

One way to answer would be to describe how Laplacian of Gaussian finds blobs, and how blobs are well localized in location and scale. However, the student clearly wanted some intuitive answers about relation and difference to edge detection.  

I think a better way would be the following answer: 

First order derivatives can reveal edges; but to detect corners, one needs to consider how first derivatives change (why not magnitude of first derivative? because not robust to contrast variation). This can be accomplished by looking at the second moment (matrix) of first derivatives in the vicinity (Harris corner detector) of the point, or by looking at the second derivatives (Laplacian of Gaussian).

That being said, while corners localize well in location, they do not localize well in scale, unlike blobs.  It turns out that Laplacian of Gaussian has maximal response for blobs of the correct scale, which are regions where gradients change a lot. So LoG (and approximation by DoG) is really a blob detector than a corner detector, although in practice, it will also find some corners.  

A slightly tangential but easy to confuse issue is with the use of second derivative in edge detection. Because using a threshold on the norm of first derivative is not robust to contrast variation, and also does not localize well, a better way is to look at the zero-crossing of first derivative, which are extrema of first derivatives. This is different (almost opposite in a sense) from the extrema of Laplacian that we seek with SIFT. From the perspective of image processing, DOG zero crossings typically form curves on the image plane and thus individual zero-crossing points are not localizable.


Another intuitive explanation from first principles (due to the course instructor):

What kinds of defs yield isolated keypoints? You could look at extrema of intensity, but they are clearly not invariant to illumination, etc. You could look at extrema of 1st derivatives, but they correspond to zero-crossings of 2nd derivatives---and these are localizable only when zero-crossing curves intersect, which they almost never do.

So DoG extrema have the advantage of (a) corresponding to neighbourhoods where image intensities vary, (b) are invariant to absolute intensity, (c) can be localized their neighbourhoods, and (d) there is a reasonably dense set of them in typical images.