Yingdong Hu

I am a third-year Ph.D. student at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by Prof. Yang Gao. Previously, I obtained my bachelor's degree from Beijing University of Posts and Telecommunications (BUPT).

My research primarily revolves around Embodied AI, a multidisciplinary domain intersecting computer vision, machine learning, and robotics. Specifically, I am dedicated to exploring how prior knowledge in foundational models can be leveraged to construct general-purpose robots capable of effective generalization in diverse, real-world environments.

Email / Google Scholar / Github

News

[2023.08] Semantic-Geometric Representation (SGR) is accepted at CoRL 2023.

[2023.04] Our work on pre-trained vision models in motor control is accepted at ICML 2023.

[2022.07] SFC is accepted at ECCV 2022 as oral presentation!

Publicaions (* indicates equal contribution)

	Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning Yingdong Hu, Fanqi Lin, Tong Zhang, Li Yi, Yang Gao arXiv, 2023 project page / arXiv We introduce ViLa, a novel approach for long-horizon robotic planning that leverages GPT-4V to generate a sequence of actionable steps. ViLa empowers robots to execute complex tasks with a profound understanding of the visual world.
	A Universal Semantic-Geometric Representation for Robotic Manipulation Tong Zhang, Yingdong Hu, Hanchen Cui, Hang Zhao, Yang Gao CoRL, 2023 project page / arXiv We present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.
	For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao ICML, 2023 project page / arXiv / code We conduct the first thorough evaluation of pre-trained vision model performance across different downstream policy learning methods and environments. We discover that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.
	Semantic-Aware Fine-Grained Correspondence Yingdong Hu, Renhao Wang, Kaifeng Zhang, Yang Gao ECCV, 2022 (Oral Presentation) project page / arXiv / code We show that fine-grained features learned with pixel-level self-supervised learning (SSL) objectives are complementary to semantic features from image-level SSL methods. Fusing these features can significantly improve the performance for visual correspondence tasks.

Modified from Jon Barron