Yingdong Hu

I am a fourth-year Ph.D. student at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by Prof. Yang Gao. Previously, I obtained my bachelor's degree from Beijing University of Posts and Telecommunications (BUPT).

My research focuses on Embodied AI, a frontier domain at the intersection of machine learning, robotics and computer vision. I investigate the fundamental challenges of developing general-purpose robotic systems that can effectively adapt and generalize their learned behaviors across diverse, unstructured real-world environments.

Email  /  Google Scholar  /  Github  /  WeChat

profile photo
News

  • [2024.10] We release 🚀Data Scaling Laws in Imitation Learning for Robotic Manipulation🚀.

  • [2023.08] Semantic-Geometric Representation (SGR) is accepted at CoRL 2023.

  • [2023.04] Our work on pre-trained vision models in motor control is accepted at ICML 2023.

  • [2022.07] SFC is accepted at ECCV 2022 as oral presentation!
  • Publicaions (* indicates equal contribution)

    Here are representative papers. For more works, please refer to my Google Scholar.

    Data Scaling Laws in Imitation Learning for Robotic Manipulation
    Yingdong Hu*, Fanqi Lin*, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
    International Conference on Learning Representations (ICLR), 2025   (Oral Presentation)
    Best Paper Award at Workshop on X-Embodiment Robot Learning, CoRL 2024
    project page / arXiv / code / summary

    We demonstrate that the policy’s generalization ability to new objects, new environments, or both scales approximately as a power law with the number of training objects, training environments, or training environment-object pairs, respectively.

    Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
    Yingdong Hu*, Fanqi Lin*, Tong Zhang, Li Yi, Yang Gao
    Workshop on Vision-Language Models for Navigation and Manipulation, ICRA 2024
    project page / arXiv

    We introduce ViLa, a novel approach for long-horizon robotic planning that leverages GPT-4V to generate a sequence of actionable steps. ViLa empowers robots to execute complex tasks with a profound understanding of the visual world.

    A Universal Semantic-Geometric Representation for Robotic Manipulation
    Tong Zhang*, Yingdong Hu*, Hanchen Cui, Hang Zhao, Yang Gao
    Conference on Robot Learning (CoRL), 2023
    project page / arXiv / code

    We present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.

    For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
    Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao
    International Conference on Machine Learning (ICML), 2023
    project page / arXiv / code

    We conduct the first thorough evaluation of pre-trained vision model performance across different downstream policy learning methods and environments. We discover that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.

    Semantic-Aware Fine-Grained Correspondence
    Yingdong Hu, Renhao Wang, Kaifeng Zhang, Yang Gao
    European Conference on Computer Vision (ECCV), 2022   (Oral Presentation)
    project page / arXiv / code

    We show that fine-grained features learned with pixel-level self-supervised learning (SSL) objectives are complementary to semantic features from image-level SSL methods. Fusing these features can significantly improve the performance for visual correspondence tasks.


    Modified from Jon Barron