The fixation based segmentation framework segments objects in a real scene


Using only low level cues (color, texture and motion) and surface cues, the fixation based segmentation framework segments objects on a surface.

What's the definition of an object?

Since there is no high-level knowledge currently used in our framework, the object is defined as a region enclosed by the depth boundaries in the image.

A brief overview

The objects are kept on a table. A moving camera looks at the table and capture three consecutive frames. The framework then uses color, texture, motion and surface cues to locate the possible boundaries in the image. It then decides to fixate close to the detected boundaries and segments for each of these fixations. Once all fixations have their corresponding regions, a subset of salient regions is extracted. Finally, to label a region in the set a possible object, we examine if the region is temporally persistent (that it appears again as carry out the process for the next pair of images).

How do we find depth boundaries?

We move our while camera looking at the table (see the examples below) and extract three consecutive frames. We then calculate the optic flow for the consecutive pairs of images. The depth boundaries are the edges with significant change in the flow values across them.

A few Examples:

The leftmost column shows a three frame motion sequence captured using a camcorder in my hand. The motion is completely unconstrained. On the right of each motion sequence, the objects found in the scene are shown. The average number of fixations made for each sequence is about 200.

Input: 3-frame image sequence
Output: salient fixation points (Left), Corresponding segmented objects (Right)

Input: 3-frame image sequence
Output: salient fixation points (Left), Corresponding segmented objects (Right)