Publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2024

  1. MoE-Infinity: Offloading-Efficient MoE Model Serving
    Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, and Mahesh K. Marina
    2024
  2. ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
    Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii UstiugovYuvraj Patel, and Luo Mai
    In OSDI, 2024

2022

  1. PAINT: Path Aware Iterative Network Tomography for Link Metric Inference
    Leyang Xue, Mahesh K. Marina, Geng Li, and Kai Zheng
    In ICNP, 2022