Yingdong Hu

I am a fourth-year Ph.D. student at Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, advised by Prof. Yang Gao. Previously, I obtained my bachelor's degree from Beijing University of Posts and Telecommunications (BUPT).

My research focuses on Embodied AI, a frontier domain at the intersection of machine learning, robotics and computer vision. I investigate the fundamental challenges of developing general-purpose robotic systems that can effectively adapt and generalize their learned behaviors across diverse, unstructured real-world environments.

Email  /  Google Scholar  /  Github

profile photo
News

  • [2024.10] We release 🚀Data Scaling Laws in Imitation Learning for Robotic Manipulation🚀.

  • [2023.08] Semantic-Geometric Representation (SGR) is accepted at CoRL 2023.

  • [2023.04] Our work on pre-trained vision models in motor control is accepted at ICML 2023.

  • [2022.07] SFC is accepted at ECCV 2022 as oral presentation!
  • Publicaions (* indicates equal contribution)

    Here are representative papers. For more works, please refer to my Google Scholar.

    Data Scaling Laws in Imitation Learning for Robotic Manipulation
    Fanqi Lin*, Yingdong Hu*, Pingyue Sheng, Chuan Wen, Jiacheng You, Yang Gao
    Best Paper Award at Workshop on X-Embodiment Robot Learning, CoRL 2024
    project page / arXiv / code / summary

    We demonstrate that the policy’s generalization ability to new objects, new environments, or both scales approximately as a power law with the number of training objects, training environments, or training environment-object pairs, respectively.

    Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
    Yingdong Hu*, Fanqi Lin*, Tong Zhang, Li Yi, Yang Gao
    Workshop on Vision-Language Models for Navigation and Manipulation, ICRA 2024
    project page / arXiv

    We introduce ViLa, a novel approach for long-horizon robotic planning that leverages GPT-4V to generate a sequence of actionable steps. ViLa empowers robots to execute complex tasks with a profound understanding of the visual world.

    A Universal Semantic-Geometric Representation for Robotic Manipulation
    Tong Zhang*, Yingdong Hu*, Hanchen Cui, Hang Zhao, Yang Gao
    CoRL, 2023
    project page / arXiv / code

    We present Semantic-Geometric Representation (SGR), a universal perception module for robotics that leverages the rich semantic information of large-scale pre-trained 2D models and inherits the merits of 3D spatial reasoning.

    For Pre-Trained Vision Models in Motor Control, Not All Policy Learning Methods are Created Equal
    Yingdong Hu, Renhao Wang, Li Erran Li, Yang Gao
    ICML, 2023
    project page / arXiv / code

    We conduct the first thorough evaluation of pre-trained vision model performance across different downstream policy learning methods and environments. We discover that the effectiveness of pre-training is highly dependent on the choice of the downstream policy learning algorithm.

    Semantic-Aware Fine-Grained Correspondence
    Yingdong Hu, Renhao Wang, Kaifeng Zhang, Yang Gao
    ECCV, 2022   (Oral Presentation)
    project page / arXiv / code

    We show that fine-grained features learned with pixel-level self-supervised learning (SSL) objectives are complementary to semantic features from image-level SSL methods. Fusing these features can significantly improve the performance for visual correspondence tasks.


    Modified from Jon Barron