Yihao LIU

photo_lyh.jpg
Google scholar Github

I am a Research Scientist at the Shanghai Artificial Intelligence Laboratory, where I lead a team focusing on multimodal generation and understanding. I earned my Bachelor’s degree in 2018 and my Ph.D. in 2023, both from the University of Chinese Academy of Sciences (UCAS). During my doctoral studies, I was affiliated with the Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences, under the supervision of Prof. Yu Qiao and Prof. Chao Dong. My research lies at the intersection of computer vision, generative modeling, and scientific intelligence, with particular emphasis on multimodal foundation models and image/video enhancement.

Throughout my student journey, I have been honored with prestigious awards, including the President’s Award of the Chinese Academy of Sciences, the Zhu Li Yue Hua Outstanding Doctoral Student Award, the CAS Excellent Youth League Member Award, the Beijing Outstanding Graduate Award, the SIAT President’s Innovation Award, as well as the CVMJ 2025 Best Paper Honorable Mention Award.

I have also excelled in multiple international and national competitions, such as 1st place in the PIRM 2018 Perceptual Image Super-Resolution Challenge, 1st place in the AIM 2020 Video Frame Interpolation Challenge, 2nd place in the NTIRE 2021 HDR Enhancement Challenge, 3rd place in the UDC 2020 Under-Display Camera Restoration Challenge. I serve as a reviewer for various top journals and conferences, including TPAMI, TIP, TCSVT, TMM, CVPR, ICCV, ECCV, NeurIPS, etc.

Current Research Focus

My current research focuses on pioneering a new generation of multimodal foundation models that integrate generation and understanding within a unified architecture. Specifically:

  • Unified Multimodal Architectures: Designing new-generation frameworks (e.g., discrete diffusion, autoregressive hybrids) that integrate text, image, video, and audio tasks, enabling coherent cross-modal representation, reasoning, and generation.
  • Knowledge-Driven and Causality-Aware Modeling: Embedding structured world knowledge, physical realism, and causal reasoning into multimodal models, moving beyond perceptual fidelity toward scientifically grounded and logically consistent outputs.
  • General Low-Level Vision Models: Consolidating diverse low-level vision tasks — restoration, enhancement, style transfer, and dense prediction — into a robust multimodal framework, advancing detail recovery, fidelity, and generalization for real-world applications.
  • Post-training and Reward Alignment: Developing multimodal alignment and reinforcement learning paradigms, incorporating human preference modeling and expert feedback, to ensure outputs that are not only high-quality and aesthetic but also reliable, interpretable, and scientifically valid.

I am open to collaboration and discussions. Feel free to reach out at liuyihao14@mails.ucas.ac.cn or liuyihao@pjlab.org.cn

news

Sep 10, 2025 We are excited to announce Lumina-DiMOO, our latest unified multimodal generation and understanding model built upon an advanced discrete diffusion architecture. This framework demonstrates the strong potential of multimodal diffusion large language models (dLLM) to unify diverse tasks within a single, streamlined architecture, while delivering state-of-the-art performance that surpasses many existing unified models. Learn more and explore resources: [Homepage] [GitHub] [HuggingFace].
Sep 01, 2025 We introduce ArtiMuse, a multimodal large language model (MLLM) for professional aesthetic understanding, which is trained on ArtiMuse-10K, a meticulously curated, expert-annotated dataset. ArtiMuse-10K systematically defines eight explainable and fine-grained aesthetic attributes (e.g., Composition & Design, Visual Elements & Structure), with a wide coverage of diverse visual domains, including graphic design, 3D design, AIGC-generated images, photography, and painting & calligraphy. [Paper] [Homepage] [GitHub] [Online Demo v1.0] Note: ArtiMuse was officially released at WAIC 2025, in the forum “Evolving with AI: The Iteration and Resilience of Artistic Creativity”.
Jun 26, 2025 Our video restoration method DiffVSR was accepted by ICCV2025. [Paper] [Homepage]
Apr 22, 2025 Our video colorization method TCVC has won the CVMJ 2025 Best Paper Honorable Mention Award.
Apr 01, 2025 We present Lunima-OmniLV (abbreviated as OmniLV), a universal multimodal multi-task framework for low-level vision that addresses over 100 sub-tasks across four major categories, including image restoration, image enhancement, weak-semantic dense prediction, and stylization. [Paper] [Homepage]
Jul 18, 2024 GenLV was accepted by ACM MM2024. GenLV is a successive work of PromptGIP, which further broadens the tasks and improves performance. The paper can be found at here.
Jul 01, 2024 Two papers were accepted by ECCV2024. By analyzing the relationships between image degradations, GRIDS propose a grouped learning method to deal with multiple-degradation restoration. X-Restormer is a new general image restoration backbone network, which possesses good task generality and achieves competitive performance across a variety of restoration tasks.
May 02, 2024 PromptGIP was accepted by ICML2024. PromptGIP is a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc. Code is available at here.

selected publications

  1. ECCV
    grids.png
    GRIDS: Grouped Multiple-Degradation Restoration with Image Degradation Similarity
    Shuo Cao*Yihao Liu*, Wenlong Zhang, Yu Qiao, and Chao Dong
    In European Conference on Computer Vision, 2024
  2. ECCV
    x-restormer.png
    A Comparative Study of Image Restoration Networks for General Backbone Network Design
    Xiangyu Chen, Zheyuan Li, Yuandong Pu, Yihao Liu, Jiantao Zhou, Yu Qiao, and Chao Dong
    In European Conference on Computer Vision, 2024
  3. ICML
    promptgip.png
    Unifying Image Processing as Visual Prompting Question Answering
    Yihao Liu*, Xiangyu Chen*, Xianzheng Ma*, Xintao Wang, Jiantao Zhou, Yu Qiao, and Chao Dong
    In Proceedings of the 41st International Conference on Machine Learning (ICML), 2024
  4. ACM MM
    genlv.png
    Learning A Low-Level Vision Generalist via Visual Task Prompt
    Xiangyu Chen, Yihao Liu, Yuandong Pu, Wenlong Zhang, Jiantao Zhou, Yu Qiao, and Chao Dong
    In Proceedings of the 32nd ACM International Conference on Multimedia, 2024
  5. CVMJ
    tcvc.gif
    Temporally Consistent Video Colorization with Deep Feature Propagation and Self-Regularization Learning
    Yihao Liu*, Hengyuan Zhao*, Kelvin CK Chan, Xintao Wang, Chen Change Loy, Yu Qiao, and Chao Dong
    Computational Visual Media, 2024
  6. CVPR
    degae.png
    DegAE: A New Pretraining Paradigm for Low-Level Vision
    Yihao Liu, Jingwen He, Jinjin Gu, Xiangtao Kong, Yu Qiao, and Chao Dong
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
  7. TPAMI
    srga.png
    Evaluating the Generalization Ability of Super-Resolution Networks
    Yihao Liu, Hengyuan Zhao, Jinjin Gu, Yu Qiao, and Chao Dong
    IEEE Transactions on pattern analysis and machine intelligence, 2023
  8. CVPR
    maskeddenoising.png
    Masked Image Training for Generalizable Deep Image Denoising
    Haoyu Chen, Jinjin Gu, Yihao Liu, Salma Abdel Magid, Chao Dong, Qiong Wang, Hanspeter Pfister, and Lei Zhu
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
  9. TPAMI
    cp3.png
    CP3: Unifying Point Cloud Completion by Pretrain-Prompt-Predict Paradigm
    Mingye Xu, Yali Wang, Yihao Liu, Tong He, and Yu Qiao
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023
  10. TMM
    csrnet-l.png
    Very Lightweight Photo Retouching Network with Conditional Sequential Modulation
    Yihao Liu*, Jingwen He*, Xiangyu Chen, Zhengwen Zhang, Hengyuan Zhao, Chao Dong, and Yu Qiao
    IEEE Transactions on Multimedia, 2022
  11. TPAMI
    bsr_survey.png
    Blind Image Super-Resolution: A Survey and Beyond
    Anran Liu, Yihao Liu, Jinjin Gu, Yu Qiao, and Chao Dong
    IEEE transactions on pattern analysis and machine intelligence, 2022
  12. TPAMI
    ranksrgan-j.png
    RankSRGAN: Super Resolution Generative Adversarial Networks with Learning to Rank
    Wenlong Zhang, Yihao Liu, Chao Dong, and Yu Qiao
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
  13. TPAMI
    cresmd.png
    Interactive Multi-Dimension Modulation for Image Restoration
    Jingwen He, Chao Dong, Yihao Liu, and Yu Qiao
    IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021
  14. ICCV
    l2m.png
    Learn to Match: Automatic Matching Network Design for Visual Tracking
    Zhipeng Zhang, Yihao Liu, Xiao Wang, Bing Li, and Weiming Hu
    In International Conference on Computer Vision (ICCV), 2021
  15. arXiv
    ddr.png
    Discovering" Semantics" in Super-Resolution Networks
    Yihao Liu*, Anran Liu*, Jinjin Gu, Zhipeng Zhang, Wenhao Wu, Yu Qiao, and Chao Dong
    arXiv preprint arXiv:2108.00406, 2021
  16. AAAI
    fdgan.png
    FD-GAN: Generative Adversarial Networks with Fusion-Discriminator for Single Image Dehazing
    Yu Dong*Yihao Liu*, He Zhang, Shifeng Chen, and Yu Qiao
    In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2020
  17. ECCV
    csrnet.png
    Conditional Sequential Modulation for Efficient Global Image Retouching
    Jingwen He*Yihao Liu*, Yu Qiao, and Chao Dong
    In European Conference on Computer Vision (ECCV), 2020
  18. ECCVW
    eqvi.png
    Enhanced Quadratic Video Interpolation
    Yihao Liu*, Liangbin Xie*, Li Siyao, Wenxiu Sun, Yu Qiao, and Chao Dong
    In European Conference on Computer Vision (ECCV) Workshops, 2020
  19. ICCV
    RankSRGAN: Generative Adversarial Networks with Ranker for Image Super-Resolution
    Wenlong Zhang, Yihao Liu, Chao Dong, and Yu Qiao
    In International Conference on Computer Vision (ICCV), 2019
  20. ECCVW
    ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks
    Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy
    In Proceedings of the European conference on computer vision (ECCV) workshops, 2018