Jue Wang

Hello, I am currently a senior staff researcher at Together AI, working closely with Prof. Ce Zhang. Before that, I got my Ph.D. degree from Zhejiang University, advised by Prof. Lidan Shou.

My recent research mainly focuses on efficient and cost-effective algorithms and systems for LLMs:

Effecient Inference for Language Models
- Ladder Residual (ICML25), FloE (ICML25), Self-Spec (ACL24), Compress & Prompt (ICML24), Deja Vu (ICML23, Oral), SkipBERT (ACL22)
Efficient Training Systems at Scale
- LoRAM (ICLR25) CocktailSGD (ICML23), AQ-SGD (NeurIPS22)
Cost-Effective Algorithms for Enhancing LLMs
- MoAA (ICML25), Mixture-of-Agents (ICLR25, Spotlight), Scaling-context (ICLR25), Skill-it! (NeurIPS23, Spotlight)

Updates

Mar 2026: Congrats to the team on the TorchSpec release – excited to see speculative decoding training scaled up effectively!
Jan 2026: We got 3 papers accepted to MLSys and 1 paper accepted to ICLR 2026! Congratulation to the collaborators!
Dec 2025: I’m happy to serve as the HPCA 2026 Artifact Evaluation Chair. Have fun in Sydney!
May 2025: We got three papers accepted to ICML 2025! Congratulation to the collaborators!
Jan 2025: We got three papers accepted to ICLR 2025! Congratulation to the collaborators!
Jun 2024: Check out Together MoA! Achieving SoTA results with open-source models only.
May 2024: We had a paper accepted to ACL 2024. Congratulation to the collaborators!
May 2024: We had a paper accepted to ICML 2024. Congratulation to the collaborators!
Sep 2023: We had a paper accepted to NeurIPS.
Aug 2023: LLaMA-7B-32K and LLaMA-7B-32K-Instruct have been released.
Jun 2023: RedPajama-7B-v1 has been released.
Apr 2023: We got two papers accepted to ICML 2023!
Mar 2023: OpenChatKit has been released, cheers!
Nov 2022: Check out our demo of GPT-JT!
Nov 2022: We had a paper accepted to AAAI 2023. Congratulation to the collaborators!
Nov 2022: Check out our benchmark on LLMs!
Sep 2022: We had a paper accepted to NeurIPS 2022. Congratulation and thanks to all the collaborators!
Apr 2022: We got a paper accepted to IJCAI 2022.
Mar 2022: I had a visit to ETH Zurich.
Feb 2022: As the first author, I had a paper accepted to ACL 2022.
Jun 2021: I graduated from CentraleSupélec with diplôme d’Ingénieur (master degree), cheers!
Dec 2020: As the first author, I had a paper accepted to AAAI 2021.
Sep 2020: As the first author, I had a paper accepted to EMNLP 2020.
Apr 2020: As the first author, I had a paper accepted to ACL 2020.

Work Experience

Together AI, Senior Staff Researcher, May 2025 - Now
Together AI, Staff Researcher, July 2023 - May 2025
Rokid, Research Intern, Jun 2018 - Sep 2018

Education

Zhejiang University, PhD in Computer Science, Sep 2018 - Jun 2023
ETH Zurich, Academic Guest, Mar 2021 - Sep 2021
Université Paris Saclay (CentraleSupélec), Master (Engineer) in General Engineering, Sep 2016 - Jun 2018
Zhejiang University, Bachelor in Electrical Engineering, Sep 2014 - Jun 2018

Publications

Ladder Residual: Redefining Tensor Parallelism in Transformers for Accelerated Inference
Muru Zhang, Mayank Mishra, Zhongzhu Zhou, William Brandon, Jue Wang, Yoon Kim, Jonathan Ragan-Kelley, Shuaiwen Leon Song, Ben Athiwaratkun, Tri Dao
Accepted to ICML 2025.
[Paper]
FloE: On-the-Fly MoE Inference on Memory-constrained GPU
Yuxin Zhou, zheng li, Jun Zhang, Jue Wang, Yiping Wang, Zhongle Xie, Ke Chen, Lidan Shou
Accepted to ICML 2025.
Improving Model Alignment Through Collective Intelligence of Open-Source Models
Junlin Wang, Roy Xie, Shang Zhu, Jue Wang, Ben Athiwaratkun, Bhuwan Dhingra, Shuaiwen Leon Song, Ce Zhang, James Zou
Accepted to ICML 2025.
Mixture-of-Agents Enhances Large Language Model Capabilities
Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, James Zou
Accepted to ICLR 2025.
[Paper] [Code]
Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation
Linda He, Jue Wang, Maurice Weber, Shang Zhu, Ben Athiwaratkun, Ce Zhang
Accepted to ICLR 2025.
Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models
Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Yang You, Guiming Xie, Xuejian Gong, Kunlong Zhou
Accepted to ICLR 2025.
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding
Jun Zhang, Jue Wang, Huan Li, Lidan Shou, Ke Chen, Gang Chen, Sharad Mehrotra
In Proc. of ACL 2024.
[Paper] [Code]
Compress, Then Prompt: Improving Accuracy-Efficiency Trade-off of LLM Inference with Transferable Prompt
Zhaozhuo Xu, Zirui Liu, Beidi Chen, Yuxin Tang, Jue Wang, Kaixiong Zhou, Xia Hu, Anshumali Shrivastava
In Proc. of ICML 2024.
[Paper] [Code]
Skill-it! A Data-Driven Skills Framework for Understanding and Training Language Models
Mayee F. Chen, Nicholas Roberts, Kush Bhatia, Jue Wang, Ce Zhang, Frederic Sala, Christopher Ré
In Proc. of NeurIPS 2023.
[Paper]
CocktailSGD: Fine-tuning Foundation Models over 500Mbps Networks
Jue Wang$^{*}$, Yucheng Lu$^{*}$, Binhang Yuan, Beidi Chen, Percy Liang, Christopher De Sa, Christopher Re, Ce Zhang.
In Proc. of ICML 2023.
[Paper] [Code]
Deja Vu: Contextual Sparsity for Efficient LLMs at Inference Time
Zichang Liu, Jue Wang, Tri Dao, Tianyi Zhou, Binhang Yuan, Zhao Song, Anshumali Shrivastava, Ce Zhang, Yuandong Tian, Christopher Re, Beidi Chen.
In Proc. of ICML 2023.
[Paper] [Code]
Holistic Evaluation of Language Models
Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda.
TMLR.
[Paper] [Code]
Effective Continual Learning for Text Classification with Lightweight Snapshots
Jue WANG$^{*}$, Dajie Dong$^{*}$, Lidan Shou, Ke Chen, Gang Chen
In Proc. of AAAI 2023
[Paper]
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang$^{*}$, Binhang Yuan$^{*}$, Luka Rimanic$^{*}$, Yongjun He, Tri Dao, Beidi Chen, Christopher Re, Ce Zhang.
In Proc. of NeurIPS 2022.
[Paper] [Code]
SkipBERT: Efficient Inference with Shallow Layer Skipping
Jue Wang, Ke Chen, Gang Chen, Lidan Shou, and Julian McAuley.
In Proc. of ACL 2022.
[Paper] [Code]
Continual Federated Learning Based on Knowledge Distillation
In Proc. of IJCAI 2022.
[Paper]
Effective Slot Filling via Weakly-Supervised Dual-Model Learning
Jue Wang, Ke Chen, Lidan Shou, Sai Wu, and Gang Chen.
In Proc. of AAAI 2021.
[Paper] [Code] [Video]
Two are Better than One: Joint Entity and Relation Extraction with Table-Sequence Encoders
Jue Wang and Lu Wei.
In Proc. of EMNLP 2020.
[Paper] [Code] [Video]
Pyramid: A Layered Model for Nested Named Entity Recognition
Jue Wang, Lidan Shou, Ke Chen, and Gang Chen.
In Proc. of ACL 2020.
[Paper] [Code] [Video]

Contact

251 Rhode Island St,

Together AI, San Francisco, CA 94103

Email: [email protected]

Jue Wang

https://juewang.me/about/index.html

Author

Jue Wang

Posted on

2026-03-20

Updated on

2026-03-28

Jue Wang

Updates

Work Experience

Education

Publications

Contact

Author

Posted on

Updated on

Licensed under

Comments