Segmenting "Simple" Objects
(using fixation-based segmentation framework)

Code to segment simple objects is availble now.


An active agent such as a human or robot continually performs different actions which involve interacting either with themselves (e.g. dancing, running, jumping) or with their surroundings (e.g. picking up a glass on a table, riding a bike etc.) For a successful interaction between an agent and its surrounding, the agent has to use two important processes: navigation and object perception. The agent has to identify an object of interest (object perception) and then approach that object (navigation). Without object perception, navigation simply reduces to being an obstacle avoidance task. This means that navigation and object perception are complementary capabilities.

The short comic below shows the complementary nature of object perception and navigation.


Despite being equally important, object perception unlike navigation does not have a set of robust algorithms. While we have a large number of algorithms designed to perceive objects in a scene, the lack of clear understanding of what object perception is has made it difficult to come up with a roust solution to the problem of object perception. In fact, the exact definition of an object and what consistutes perception are perhaps two of the biggest research problems today.

So, what is perception? According to dictionary, perception is "the ability to see, hear, or become aware of something through the senses." Even if we are only concerned with visual perception, it is still an intricate phenomenon. To simplify, we start by defining a minimal form of visual perception of an object as determining the boundary of the object. In fact, for many primitive interactions such as picking, pushing, reaching an object, even this minimal perception might suffice. More importantly, this minimal form of perception can be generated in a purely bottom up fashion without using any high-level semantics.

To complete our definition of object perception, we still have to define: what is an object? Once again, the concept of a general object is too broad and complicated. So we define the concept of a "simple" object instead: a compact region enclosed by the pixels at depth and/or contact boundaries in the scene; Also, the border ownership of these boundary pixels around a "simple" object must point to the interior of the object.

Examples of "simple" object are given below:
A complex object such as the chair in the image below consists of multiple "simple" objects, marked with numerals. It also has a hole in it.

Our approach to segment the simple objects in a scene is based on our fixation based segmentation strategy wherein we segment the object given a point anywhere inside that object. This step is carried out using only color, texture and depth information. In order to select the points inside different objects and pick the segments corresponding to the objects, we use border ownership information at boundary pixels, which is also calculated using the low level cues and the prior knowledge about camera orientation. Using the border ownership based attention algorithm, our segmentation process extracts all objects in the scene automatically. Most importantly, it returns only the regions that are likely to be objects not the ones from background as a typical segmentation algorithm would do. To learn more about our system, refer to our most recent publication "Visual Segmentation of Simple Objects for Robots" in RSS 2011.


Following is a set of results using depth information for data made available by Solution in Perception Challenge. The first row contains the RGB image. The depth map of the scene has not been shown. The second row contains all the "simple" objects found by our system.

09_14 09_14 10_2 13_11 17_13
08_11_seg 09_14_seg 10_2_seg 13_11_seg 17_13_seg