Good Time to Ask:
A Learning Framework for Asking for Help in Embodied Visual Navigation
Ubiquitous Robotics 2023



In reality, it is often more efficient to ask for help than to search the entire space to find an object with an unknown location. We present a learning framework that enables an agent to actively ask for help in such embodied visual navigation tasks, where the feedback informs the agent of where the goal is in its view. To emulate the real-world scenario that a teacher may not always be present, we propose a training curriculum where feedback is not always available. We formulate an uncertainty measure of where the goal is and use empirical results to show that through this approach, the agent learns to ask for help effectively while remaining robust when feedback is not available.


We propose a learning framework for asking for help in embodied visual navigation, Good Time to Ask (GTA). We tackle the ObjectNav task. To complete the task, the agent must navigate to the target object instance within a pre-determined stopping distance. The target object is chosen randomly and the placement of objects is randomized every episode. In addition to the usual navigation actions, such as move forward, turning, and looking up or down, the agent is equipped with the action ask. The agent receives RGB-D observations, text embedding of the target object, and an additional object-in-view observation. Object-in-view observation is only provided upon each ask action, containing the ground truth semantic segmentation observation of the target object's location in the agent's view. Pixels corresponding to the target object's location have a value of 1, and 0 otherwise.


Training Curriculum with Semi-Present Teacher. While the agent should learn to make use of external feedback when available, the teacher may not always be present to provide assistance in a real-world setting. When help is unavailable, the desired behavior for the agent in this task would be to navigate the scene autonomously to find the target object, even if it takes more time. Hence, we introduce a semi-present teacher training curriculum, where an η% semi-present teacher is present in only η% of the training episodes.

Quantifying Uncertainty. We propose a novel metric to quantify the agent’s uncertainty of where the target object is. The high-level idea is that, for each possible 3D position of where the target object can be, we estimate the likelihood of the target object being at that position. As the agent explores the environment and gains more information, the likelihood of the target object being at explored areas decreases, causing the estimated uncertainty of the goal to decrease too. By this uncertainty metric, we quantify instances where it is not the most informative time to ask:

  • Consecutive asks: no new information gained.
  • Vapid asks: when the agent should already have a good idea of where the goal is (i.e., the uncertainty estimate is < 10% of its starting uncertainty).
  • Statistically insignificant asks: little information gain (i.e., the change in uncertainty estimate by taking an ask action is less than a threshold γ = 2.0)


Experimental results

Feedback increases the agent's performance significantly. The agent trained with 100% present teacher fails in 95% of all test episodes when the teacher is absent, significantly worse than the baseline. Agents trained with semi-present teacher achieves comparable performance to the baseline when the teacher is absent, and is still able to utilize feedback when available to achieve much higher performance.


The agent train with an always present teacher has the lowest percentage of consecutive ask actions, vapid ask actions, and statistically insignificant ask actions, followed by Semi-75 and lastly Semi-25. This suggests that while the agent trained with an always present teacher uses a higher percentage of ask actions, it has learned to utilize them in a statistically informative manner. While the percentages of vapid asks and statistically insignificant asks seem high, this is partly attributed to the definition of uncertainty metric. It assumes a perfect memory where uncertainty only decreases with more observations, which may not be the case for an RL agent.



title={Good Time to Ask: A Learning Framework for Asking for Help in Embodied Visual Navigation},
author={Zhang, Jenny and Yu, Samson and Duan, Jiafei and Tan, Cheston},
journal={arXiv preprint arXiv:2206.10606},

The website template was borrowed from Jon Barron.