Abstract: Embodied intelligence requires agents to interact with 3D environments in real time based on language instructions. A foundational task in this domain is ego-centric 3D visual grounding.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results