This research proposal introduces Lang-Explore, a framework for active robot exploration that leverages natural language instructions to guide autonomous navigation in unknown environments. The method integrates Vision-Language Models with information-theoretic exploration through a Language-Guided Information Gain formulation that scores frontiers based on semantic relevance. The system maintains dense semantic maps by fusing CLIP-based visual features with geometric occupancy, enabling open-vocabulary spatial queries without pre-trained environment models.
Key findings
Language-Guided Information Gain (LGIG) formulation scores exploration frontiers based on their semantic relevance to natural language instructions.
Integration of Vision-Language Models with information-theoretic exploration enables efficient task-relevant mapping in unknown environments.
Dense semantic mapping fuses CLIP visual features with 3D geometric occupancy to support open-vocabulary spatial queries.
The framework eliminates dependency on pre-trained environment models or task-specific navigation graphs for goal-directed exploration.
Limitations & open questions
Traditional frontier-based exploration treats all unknown regions equally, potentially wasting effort on task-irrelevant areas.
Existing Vision-Language Navigation methods typically require pre-computed navigation graphs limiting deployment in truly unknown environments.
Current approaches often rely on passive perception rather than actively seeking viewpoints that maximize information gain for language understanding.