Publications

Filter

2026

  1. ArXiv
    ChipMATE: Multi-Agent Training via Reinforcement Learning for Enhanced RTL Generation
    Zhongkai Yu*, Yichen Lin*, Chenyang Zhou, Yuwei Zhang, Kun Zhou, Junxia Cui, Haotian Ye, Zhengding Hu, Zaifeng Pan, Ruiyi Wang, Yujie Zhao, Hejia Zhang, Jingbo Shang, Jishen Zhao, and Yufei Ding
    arXiv preprint arXiv:2605.12857, 2026
  2. ArXiv
    ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design
    Zhongkai Yu*, Chenyang Zhou*, Yichen Lin, Hejia Zhang, Haotian Ye, Junxia Cui, Zaifeng Pan, Jishen Zhao, and Yufei Ding
    arXiv preprint arXiv:2601.21448, 2026
  3. ArXiv
    AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving
    Zhongkai Yu, Haotian Ye, Chenyang Zhou, Ohm Rishabh Venkatachalam, Zaifeng Pan, Zhengding Hu, Junsung Kim, Won Woo Ro, Po-An Tsai, Shuyi Pei, Yangwook Kang, and Yufei Ding
    arXiv preprint arXiv:2604.26103, 2026
  4. ISCA’26
    Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference
    Zhongkai Yu, Yue Guan, Zihao Yu, Chenyang Zhou, Zhengding Hu, Shuyi Pei, Yangwook Kang, Yufei Ding, and Po-An Tsai
    arXiv preprint arXiv:2510.05497, 2026
    Best Paper Candidate
  5. ArXiv
    ScaleSim: Serving Large-Scale Multi-Agent Simulation with Invocation Distance-Based Memory Management
    Zaifeng Pan, Yipeng Shen, Zhengding Hu, Zhuang Wang, Aninda Manocha, Zheng Wang, Zhongkai Yu, Yue Guan, and Yufei Ding
    arXiv preprint arXiv:2601.21473, 2026
  6. ArXiv
    Pancake: Hierarchical Memory System for Multi-Agent LLM Serving
    Zhengding Hu, Zaifeng Pan, Prabhleen Kaur, Vibha Murthy, Zhongkai Yu, Yue Guan, Zhen Wang, Steven Swanson, and Yufei Ding
    arXiv preprint arXiv:2602.21477, 2026
  7. OpenReview
    TritonGym: A Benchmark for Agentic LLM Workflows in Triton GPU Code Generation
    Yue Guan, Yichen Lin, Xu Zhao, Jianzhu Yao, Xinwei Qiang, Zhongkai Yu, Pramod Viswanath, Yufei Ding, and Adnan Aziz
    OpenReview preprint, 2026
  8. ArXiv
    JigsawRL: Assembling RL Pipelines for Efficient LLM Post-Training
    Zhengding Hu, Hehua Ouyang, Chang Chen, Zaifeng Pan, Yue Guan, Zhongkai Yu, Zhen Wang, Steven Swanson, and Yufei Ding
    arXiv preprint arXiv:2604.23838, 2026

2025

  1. OSDI’25
    KPerfIR: Towards an Open and Compiler-centric Ecosystem for GPU Kernel Performance Tooling on Modern AI Workloads
    Yue Guan, Yuanwei Fang, Keren Zhou, Corbin Robeck, Manman Ren, Zhongkai Yu, Yufei Ding, and Adnan Aziz
    In 19th USENIX Symposium on Operating Systems Design and Implementation, 2025
  2. TCAD
    DomSim: Hardware-Aware Hybrid Fault Simulation with Dominator Tree-guided Partitioning
    Mingjun Wang, Hui Wang, Feng Gu, Zizhen Liu, Jianan Mu, Shengwen Liang, Zhongkai Yu, Zheng Liang, Jun Gao, Jiaping Tang, Jing Ye, Bei Yu, Xiaowei Li, and Huawei Li
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025
  3. MLsys’25
    FastTree: Optimizing Attention Kernel and Runtime for Tree-Structured LLM Inference
    Zaifeng Pan, Yitong Ding, Yue Guan, Zheng Wang, Zhongkai Yu, Xulong Tang, Yida Wang, and Yufei Ding
    In Proceedings of Machine Learning and Systems, 2025

2024

  1. MICRO’24
    Cambricon-LLM: A Chiplet-Based Hybrid Architecture for On-Device Inference of 70B LLM
    Zhongkai Yu*, Shengwen Liang*, Tianyun Ma, Yunke Cai, Ziyuan Nan, Di Huang, Xinkai Song, Yifan Hao, Jie Zhang, Tian Zhi, Yongwei Zhao, Zidong Du, Xing Hu, Qi Guo, and Tianshi Chen
    In Proceedings of the 57th IEEE/ACM International Symposium on Microarchitecture, 2024
  2. TACO
    Environmental Condition Aware Super-Resolution Acceleration Framework in Server-Client Hierarchies
    Zhuoran Song*, Zhongkai Yu*, Xinkai Song, Yifan Hao, Li Jiang, Naifeng Jing, and Xiaoyao Liang
    ACM Transactions on Architecture and Code Optimization, 2024
  3. TCAD
    Harmonia: A Unified Architecture for Efficient Deep Symbolic Regression
    Tianyun Ma, Yuanbo Wen, Xinkai Song, Pengwei Jin, Di Huang, Husheng Han, Ziyuan Nan, Zhongkai Yu, Shaohui Peng, Yongwei Zhao, and others
    IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024

2022

  1. DAC’22
    E2sr: an end-to-end video codec assisted system for super resolution acceleration
    Zhuoran Song, Zhongkai Yu, Naifeng Jing, and Xiaoyao Liang
    In Proceedings of the 59th ACM/IEEE Design Automation Conference, 2022