I am an Active Agent. My primary task is to segment "simple" objects that are placed in front of me. If an object is away from me or is occluded, I can move to see it up close and capture the image from a better viewpoint. In order to segment the object, I look (or fixate) at it and use the fixation-based segmentation engine to extract/segment it. To segment multiple objects, I look (fixate) inside each of them individually and carry out the same process. I am currently learning to represent the visual content of the segmented objects.

I am about two feet tall. I move with the help of a iRobot Create, and see using a Microsoft Kinect. I have 3D vision. My torso is just an inexpensive box.

An example of basic segmentation capability


Step 1: The picture of me and the objects on the floor.


This is what I see through my Kinect camera.


Step 2: These are the "simple" objects that I found. I used depth information along with color and texture to get the segmentation results.

An example of an active perception capability

While there can be a number of ways to exhibit active behaviour, following is one example where I move to get a better look at an object of interest in the scene such that I can recognize the object better.


Once again, a picture of me and the objects on the floor.


The view from my Kinect.


The extracted "simple" objects.


Now, I select the first "simple" object extracted in the segmentation process which is the "America" box in the scene. I extract the normal of the dominant surface of this object as show in the figure above.


According to the normal on the object of interest, I calculate my motion plan such that I end up facing the object from a fixed distance away that object. The picture above shows such a motion plan.

The video of me moving according to the calculated motion plan.


I have arrived at the target location and This is what I see from the new location. As you can see, the box is closer to me now.

view2_object_seg.png view1_object_seg.png

I segment the box again from this new location as shown in the picture on the left. If we compare the new segmentation with the prior segmentation of the same object (right), you can see how much a simple active step has improve the captured visual data. Any high-level visual processing on this data will result in better accuracy.

Finally, to learn more about the segmentation startegy, please refer to Ajay Mishra, Yiannis Aloimonos, Visual Segmentation for "Simple" objects for Robots, in RSS 2011.