Publications

2025

  1. OSDI’25
    KPerfIR: Towards a Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads (To Appear)
    Yue Guan, Yuanwei Fang, Keren Zhou, Corbin Robeck, Manman Ren, Zhongkai Yu, and Yufei Ding
    In USENIX Symposium on Operating Systems Design and Implementation, 2025
  2. MLsys’25
    FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference (To Appear)
    Zaifeng Pan, Yitong Ding, Yue Guan, Zheng Wang, Zhongkai Yu, Xulong Tang, Yida Wang, and Yufei Ding
    In Proceedings of Machine Learning and Systems, 2025

2024

  1. MICRO’24
    Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM
    Zhongkai Yu, Shengwen Liang, Tianyun Ma, Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao, Jie Zhang, Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu, Qi Guo, and Tianshi Chen
    In Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
  2. TACO
    Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client Hierarchies
    Zhuoran Song, Zhongkai Yu, Xinkai Song, Yifan Hao, Li Jiang, Naifeng Jing, and Xiaoyao Liang
    ACM Transactions on Architecture and Code Optimization, 2024
  3. TCAD
    Harmonia: A Unified Architecture for Efficient Deep Symbolic Regression
    Tianyun Ma, Yuanbo Wen, Xinkai Song, Pengwei Jin, Di Huang, Husheng Han, Ziyuan Nan, Zhongkai Yu, Shaohui Peng, Yongwei Zhao, and others
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024

2022

  1. DAC’22
    E2sr: an end-to-end video codec assisted system for super resolution acceleration
    Zhuoran Song, Zhongkai Yu, Naifeng Jing, and Xiaoyao Liang
    In Proceedings of the 59th ACM/IEEE Design Automation Conference, 2022