Semantic Robot Vision Challenge 2008

Objective

Detect, locate, and recognize any object in a real environment.

What's given

How do you learn about objects?

Step 1: Download images.
Search the internet for images of the objects with their names as keywords. You can download the source code here. Please note that the set of images returned from oogle or any other search engines contain a lot of outliers. You may have to come up with ways to weed out the outliers.

Step 2: Learn the object representation, object = boundary(shape) + interior (Keypoints)
An object can be represented by its shape and what's inside it. Shape is hard to describe but is very informative. In fact, the only way to learn about category objects like "frying pan", "apple" is by encoding their shape information whereas for specific objects like "pepsi bottle", it's the invariant logo or similar unique pattern on its interior that can identify them.

To encode the information on the interior of the object, you can use SIFT, MSER or any other keypoint detectors. Encoding shape, however, is bit tricky as you first have to extract the object out from the images and use its boundary as its shape. The fixation based segmentation algorithm can be employed to obtain regions for different interest points in the image. Each object is now represented by a set of keypoints or a set of shape descriptors extracted from all of its training images.

Step 3: How do you detect and locate the objects?
Objects can exist any scale and locations in an image. We use the saliency operator proposed by kadir et al. to find salient locations with scales associated with them. Extract regions centered at these locations and compare the features and shape of the region contours with that of objects learned during the training phase.

Results:


Ritz pan iRobot Doritos