PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive learning

21 July 2023

Several recent works show impressive results in mapping language-based human commands and image scene observations to direct robot executable policies (e.g., pick and place poses). However, these approaches do not consider the uncertainty of the trained policy and simply always execute actions suggested by the current policy as the most probable ones. This makes them vulnerable to domain shift and inefficient in the number of required demonstrations.

We extend previous works and present the PARTNR algorithm that can detect ambiguities in the trained policy by analyzing multiple modalities in the pick and place poses using topological analysis. PARTNR is an interactive imitation learning algorithm that asks the human to take over control in case it considers the situation to be ambiguous. The situation is ambiguous when the learned policy does not provide a single dominant solution, i.e., there are multiple local maxima with close values in the action space. User demonstrations are aggregated to the dataset 𝒟 and used for subsequent training. The robot observes, at each execution step, a human-provided natural language command and the state of the environment (e.g., a top-view image of the table). Based on the observation, the policy provides the heatmap, representing the value of the action. The heatmap is then analyzed to detect multiple local maxima (in TopAnalysis).

In this work, we rely on computational topology methods for finding local maxima, specifically we use a persistent homology method. Then, in AmbiguityMeasure, the obtained corresponding values of the local maxima T, are normalized using the softmax function and the maximum value is then used to decide if the situation is ambiguous. If AmbiguityMeasure(T) is smaller than a threshold value, the situation is ambiguous. In case the situation is ambiguous, the robot is not executing the policy but queries the human teacher. The threshold is updated continuously, at every step, by function UpdateThreshold, to satisfy a user defined sensitivity value. Whenever there is a teacher input, the data is aggregated and the policy is updated using the function Train. We validated PARTNR in a table-top pick and place task.

Fig. 1. PARTNR framework on an example task.

Fig. 2. Visualization of the ambiguity measure.

Fig. 3.Table-top pick and place task.

Authored by Jelle Luijkx and TU Delft team

Delft University of Technology -TU Delft, Netherlands