FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation

FlowBotHD handles multi-modality and occlusions in articulated objects manipulation.

Abstract

We introduce a novel approach to manipulate articulated objects with ambiguities, such as opening a door, in which multi-modality and occlusions create ambiguities about the opening side and direction.

Multi-modality occurs when the method to open a fully closed door (push, pull, slide) is uncertain, or the side from which it should be opened is uncertain. Occlusions further obscure the door’s shape from certain angles, creating further ambiguities during the occlusion.

To tackle these challenges, we propose a history-aware diffusion network that models the multi-modal distribution of the articulated object and uses history to disambiguate actions and make stable predictions under occlusions. Experiments and analysis demonstrate the state-of-art performance of our method and specifically improvements in ambiguity-caused failure modes.

Real-world Demos

We carry out real-world experiments on general objects and specifically our custom made multi-modal door to demonstrate FlowBotHD's ability to function in real world scenarios.

Regular Objects: The above videos show that FlowBotHD is capable of opening unambiguous objects.

Multi-modal Door: The above videos show FlowBotHD's ability to make multi-modal predictions and open the door in all 4 different ways: pull left, pull right, push left and push right, where the action of 'push' is executed as pulling from the back. These are successful trials with the first prediction falling into the correct mode that aligns with the current door configuration. If the first prediction falls into other modes, the model will adjust with switch grasp point policy as demonstrated below. (You may see black tape on the back of the door, they are for filling the wood crack and will not affect the model's judgement because they are invisible to the camera placed at the front and also color will not affect methods based only on point clouds).

Multi-modal Door (Switch Grasp Point): The above video show the effect of switch grasp point policy when the first prediction falls into a wrong mode, which enables FlowBotHD to adjust from wrong grasp points to a correct one.

Simulation Demos

We provide simulation visualizations for special cases (multi-modal, occlusion) and also general examples. In these visuals, we can see our model's stable performance on various objects.

I. Multi-modality Example

We can see from the visuals that FlowBotHD, combined with our proposed policy, is able to try different possible moving direction and grasp point at fully-closed state, thus able to open objects despite ambituity at early stage.

Case 1 - FlowBot3D: Fail to open the door!

Case 1 - FlowBotHD (Ours): Succeed to open the door!

Action Visualization (Grasp Point & Direction)

Action Visualization (Grasp Point & Direction)



Case 2 - FlowBot3D: Fail to open the door!

Case 2 - FlowBotHD (Ours): Succeed to open the door!

Action Visualization (Grasp Point & Direction)

Action Visualization (Grasp Point & Direction)

II. Occlusion Examples

With history integrated, FlowBotHD is able to make consistent and stable predictions even at severely occluded angles.

Case 1 - FlowBot3D: Fail to open the fridge!

Case 1 - FlowBotHD (Ours): Succeed to open the fridge!

Action Visualization (Grasp Point & Direction)

Action Visualization (Grasp Point & Direction)


Case 2 - FlowBot3D: Fail to open the fridge!

Case 2 - FlowBotHD (Ours): Succeed to open the fridge!

Action Visualization (Grasp Point & Direction)

Action Visualization (Grasp Point & Direction)

III. General Examples

We also provide a few more simulation demos to demonstrate the stable performance of our model on various objects.

Diffusion Process

We visualize the denoising process of our model. The two videos demonstrate FlowBotHD producing 2 different modes based on the same input observation.

Ambiguity Demos

Predictions under Multimodality

As demonstrated in the diffusion process visualization, with the help of diffusion, our model can preserve various possible modes for multi-modality cases. We compare our model's predictions against those of the original FlowBot3D on a custom built ambiguous door. We can see that FlowBot3D only makes one sensible, but incorrect prediction, whereas our model is able to predict both possible configurations.


Predictions under Occlusions

With the help of history, our model can produce stable and high-quality prediction even under severe occlusions. The following example is a door opened 90 degrees. Comparing our model's prediction with the original FlowBot3D, we can see that our model is able to predict stable and consistent results even under severe occlusions.