An example of basic segmentation capability

Step 1: The picture of me and the objects on the floor.

This is what I see through my Kinect camera.

Step 2: These are the "simple" objects that I found. I used depth information along with color and texture to get the segmentation results.
An example of an active perception capability
While there can be a number of ways to exhibit active behaviour, following is one example where I move to get a better look at an object of interest in the scene such that I can recognize the object better.

Once again, a picture of me and the objects on the floor.

The view from my Kinect.

The extracted "simple" objects.

Now, I select the first "simple" object extracted in the segmentation process which is the "America" box in the scene. I extract the normal of the dominant surface of this object as show in the figure above.

According to the normal on the object of interest, I calculate my motion plan such that I end up facing the object from a fixed distance away that object. The picture above shows such a motion plan.
The video of me moving according to the calculated motion plan.

I have arrived at the target location and This is what I see from the new location. As you can see, the box is closer to me now.
![]() |
![]() |
I segment the box again from this new location as shown in the picture on the left. If we compare the new
segmentation with the prior segmentation of the same object (right), you can see how much a simple active step
has improve the captured visual data. Any high-level visual processing on this data will result in
better accuracy.
Finally, to learn more about the segmentation startegy, please refer to Ajay Mishra, Yiannis Aloimonos,
Visual Segmentation for "Simple" objects for Robots, in RSS
2011.