Publications

publications by categories in reversed chronological order. generated by jekyll-scholar.

2025

  1. Towards Decentralized and Sustainable Foundation Model Training with the Edge
    Leyang Xue, Meghana Madhyastha, Randal Burns, Myungjin Lee, and Mahesh K. Marina
    SIGENERGY Energy Inform. Rev., 2025
  2. HybridServe: Efficient Serving of Large AI Models with Confidence-Based Cascade Routings
    Leyang Xue, Yao FuLuo Mai, and Mahesh Marina
    In ICDCS (In Conjunction Events), 2025
  3. TUBO: A Tailored ML Framework for Reliable Network Traffic Forecasting
    Zhihang Yuan, Leyang Xue, Waleed Ahsan, and Mahesh Marina
    In ICDCS (In Conjunction Events), 2025
  4. MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching
    Tairan Xu, Leyang Xue, Zhan Lu, and Luo Mai
    2025
  5. NeurIPS
    MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems
    Yao Fu, Yinsicheng Jiang, Yeqi Huang, Ping Nie, Zhan Lu, Leyang Xue, Congjie He, Man-Kit Sit, and 5 more authors
    In NeurIPS Datasets & Benchmarks Track, 2025
  6. Towards Energy Efficient 5G vRAN Servers
    In NSDI, 2025

2024

  1. MoE-Infinity: Offloading-Efficient MoE Model Serving
    Leyang Xue, Yao Fu, Zhan Lu, Luo Mai, and Mahesh K. Marina
    2024
  2. ServerlessLLM: Low-Latency Serverless Inference for Large Language Models
    Yao Fu, Leyang Xue, Yeqi Huang, Andrei-Octavian Brabete, Dmitrii UstiugovYuvraj Patel, and Luo Mai
    In OSDI, 2024

2022

  1. PAINT: Path Aware Iterative Network Tomography for Link Metric Inference
    Leyang Xue, Mahesh K. Marina, Geng Li, and Kai Zheng
    In ICNP, 2022