Semantic Robot Vision Challenge 2008
Objective
Detect, locate, and recognize any object in a real environment.
What's given
- A list of object names in a text file.
- A set of test images containing multiple or single or no object at all.
How do you learn about objects?
Step 1: Download images.
Search the internet for images of the
objects with their names as keywords. You can download the source code here. Please note that the set of images returned from oogle or any
other search
engines contain a lot of outliers. You may have to come up with ways to weed out the outliers.
Step 2: Learn the object representation, object = boundary(shape) + interior (Keypoints)
An object can be represented by its shape and what's inside it. Shape is hard to describe but is very
informative. In fact, the only way to learn about category objects like "frying pan", "apple" is by encoding
their shape information whereas for specific objects like "pepsi bottle", it's the invariant logo or similar
unique pattern on its interior that can identify them.
To encode the information on the interior of the object,
you can use SIFT, MSER or any other keypoint detectors. Encoding shape,
however, is bit tricky as you first have to extract the object out from
the images and use its boundary as its shape. The fixation based segmentation
algorithm can be employed to obtain regions for different interest points in the image. Each object is now
represented by a set of keypoints or a set of shape descriptors extracted from all of its training images.
Step 3: How do you detect and locate the objects?
Objects can exist any scale and locations in an image. We use the saliency operator proposed by kadir et al. to find salient locations with
scales associated with them. Extract regions centered at these locations and compare the features and shape of
the region contours with that of objects learned during the training phase.
Results:



