About me

Professional Experience

Leader, AGI Infra, Effective and Efficient General Foundation Model Reasoning Algorithm, Framework & System Team, JDT, 2025.07-Now
Staff Algorithm Expert, AGI NextEvo, AI Alignment Team, Ant Group, 2023‑2025.07
Staff Algorithm Expert, AI, (Multi-Agent) Reinforcement Learning and Dynamic Decision Intelligence Team, Ant Group, 2017‑2022
Algorithm Expert, Recommendation Algorithm Platform Team, Alibaba, 2014‑2016
Senior Algorithm R & D Engineer, Social Graph Recommendation Algorithm and Engine Team, Renren Applied Research Center and Tsinghua Joint Laboratory, 2011‑2014
R&D Research Intern, Social Search Mining Team, Baidu, 2011
Research Assistant, Video Media Algorithm Group, Future Network Center of Hong Kong City University, 2008

Hummer: Towards Limited Competitive Preference Dataset, Conference on Language Modeling(COLM), poster, paper; ICML 2024 Workshop MHFAIA(Models of Human Feedback for AI Alignment), Oral, 2024.10
Topic: Digital Human Interactive Recommendation Decision‑Making Based on Reinforcement Learning, Conference: NeurIPS 2022 Workshop on Human in the Loop Learning Presentation, poster, demo, paper, 2022.12
Topic: Agent Decision Based on Reinforcement Learning: Research and Application of Decision Making in Dynamic Complex Context, Conference: AI Developer Day ‑ Decision Intelligence Workshop Live, video, lecture, The World Artificial Intelligence Conference (WAIC), 2022.09
Topic: Deep Reinforcement Learning in Intelligence Finance, Conference: Reinforcement Learning Track, The Pacific Rim International Conference on Artificial Intelligence (PRICAI), lecture, 2018.08

Professional Service

Program Committee Reviewer for the Main Track, COLM’26, NeurIPS’25, AAAI’22, AAAI’21
Representant Ant Group In Top Academic Conferences, COLM’24, AAAI’20, AAAI’19, ICML’18, ICML’17
Represent Alibaba, Renren Company In Top Academic Conferences, ACM SighKDD, Recsys China, China Computer Federation ‘Subject Frontier Workshop’, China Database Technology Conference, 2011-2016

Professional Affiliations

China Computer Federation Conference on Artificial Intelligence (CCFAI) Multi-agent Systems Group, Member of the 12th/10th Organizing Committee, representative of enterprise members, 2025/2023
China Computer Federation (CCF) Computational Economics Professional Group, First batch of executive members, representative of enterprise members, 2022

Awards and Honors

Bailing MoE Training with Domestic XPU, Ant Group Technology Annual Award T-Star, 2025
The CEO Annual Team Special Contribution Award, Ant Group (Only two teams), 2021
The Company ‘SuperMa’ Award, Ant Group (Eight teams in total), 2020
Certification of Level 1 Trainer Training of American Management Association, Renren, 2013
Senior Excellent Training Instructor, Renren, 2013
Baidu Space Best Team Award, Baidu, 2011
The Second‑class Scholarship of China Aerospace Science and Technology Corporation (CASC) (Top 2%), School of Computer Science, Beihang University, 2009
Outstanding Graduates of Hunan Province’s General Colleges and Universities(Top 2%), 2007
The 7th Hunan Provincial Scholarship for Outstanding College Students in Special Poor (Top 0.2%), Hunan Provincial Government, 2006
Hunan Provincial Government First Prize Scholarship (Top 0.2%), Hunan Provincial Government, 2006

Paper publications

Effective and Efficient Embodied Foundation Model Reasoning

JDT: Thousand-GPU Large-Scale Training and Optimization Recipes for Embodied Intelligence, Technical Report(Camera Ready), AGI Infra, JDT, 202601

Effective and Efficient (Multi-)Agentic Reasoning

Understanding Agentic AI: Algorithms and Infrastructure, Submitted, AGI Infra, JDT, 202512
How Social is It? A Benchmark for LLMs’ Capabilities in Multi-user Multi-turn Social Agent Tasks, Yusen Wu, Junwu Xiong, Xiaotie Deng, paper, Under Review, 202502.
Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong , Sheng Wen, Yang Xiang, AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways, ACM Computing Surveys, paper, 202501.
T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu, Q Zhang, Z Qiu, P Li, Z Tan, Junwu Xiong and others, Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems, paper, 202401. Submitted to ACM Computing Surveys, 202401.
Chao Qu, Hui Li, Chang Liu, Junwu Xiong , James Zhang, Wei Chu, Weiqiang Wang, Yuan Qi, and Le Song. Variational Policy Propagation for Multi‑agent Reinforcement Learning, paper, 2020.
Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, and Junwu Xiong . Value propagation for decentralized networked deep multi‑agent reinforcement learning. Advances in Neural Information Processing Systems, paper, 2019.

Effective and Efficient Omini-Based RLXF/RL Foundation Model Reasoning

Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs, Ring Team, Technical Report, 202506
SHARP: Synthesizing High-quality Aligned Reasoning Problems for Large Reasoning Models Reinforcement Learning, Xiong Jun Wu, Zhenduo Zhang, ZuJie Wen, Zhiqiang Zhang, Wang Ren, Lei Shi, Cai Chen, Deng Zhao, Qing Wang, Xudong Han, Chengfu Tang, Dingnan Jin, Qing Cui, Jun Zhou, paper, Neurips Under Review, 202505.
Yusen Wu, Li Jiang, Junwu Xiong , Jingqing Ruan, Yichuan Ding, Qingpei Guo, zujie wen, Jun Zhou, Xiaotie Deng, Hummer: Towards Limited Competitive Preference Dataset, paper, dataset, Conference on Language Modeling(COLM), ICML 2024 Workshop MHFAIA(Models of Human Feedback for AI Alignment), Oral, 2024.
Junwu Xiong , Xiaoyun Feng, YunZhou Shi, James Zhang, Zhongzhou Zhao, and Wei Zhou. Digital human interactive recommendation decision‑making based on reinforcement learning. NeurIPS 2022 Workshop on Human in the Loop Learning, poster, demo, paper, 2022.

Game Theory and Reinforcement Learning

Romain Lopez, Chenchen Li, Xiang Yan, Junwu Xiong , Michael Jordan, Yuan Qi, and Le Song. Cost‑effective incentive allocation via structured counterfactual inference. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 4997–5004, paper, 2020.
Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, and Junwu Xiong . Latent Dirichlet Allocation for Internet Price War. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 33, pages 639–646, paper, 2019.

Deep Reinforcement Learning

Tan, Xiaoyu and Qu, Chao and Xiong, Junwu and Zhang, James and Qiu, Xihe and Jin, Yaochu,Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding, IEEE Transactions on Emerging Topics in Computational Intelligence, paper, 202403.
Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, and Junwu Xiong . Reinforcement learning for uplift modeling, paper, 2018.

Deep Learning

Huiru Xiao, Caigao Jiang, Yangqiu Song, James Zhang, and Junwu Xiong . Unit ball model for embedding hier‑ archical structures in the complex hyperbolic space, paper, 2021.
Tong Yin, Xiaotie Deng, Yuan Qi, Wei Chu, Jing Pan, Xiang Yan, and Junwu Xiong . Personalized behavior predic‑ tion with encoder‑to‑decoder structure. In 2018 IEEE International Conference on Networking , Architecture and Storage (NAS), pages 1–10. IEEE, paper, 2018.

Scalable Wireless Sensor Networks

Huan Li, Yanlei Liu, Weifeng Chen, Weijia Jia, Bing Li, and Junwu Xiong . COCA: Constructing optimal clustering architecture to maximize sensor network lifetime. Computer Communications , 36(3):256–268, paper, 2013.
Huan Li, Jierui Cao, and Junwu Xiong . Constructing optimal clustering architecture for maximizing lifetime in large scale wireless sensor networks. In 2009 15th International Conference on Parallel and Distributed Systems, pages 182–189. IEEE, paper, 2009.

Innovation patents

Effective and Efficient Embodied Foundation Model Reasoning

An AI Infra Toolchain System for Cloud-Based Embodied AI Training, Inference, and Rendering, Xiong Junwu, Li Yihang, Wang Jianhui, Cao Xuelin, Hao Peng, Lu Lu, Gong Yicheng, Diao Xuefei, Cao Peng
A Retargeting Algorithm for Embodied Agents and Human Bodies, Xiong Junwu, Ma Yunxuan, Li Yihang, Gong Yicheng, Diao Xuefei, Cao Peng
A Fine-Grained FP8 Block Quantization Compression and Accelerated Inference Method for VLM/VLA Models, Xiong Junwu, Long Jing, Di Shuai, Gong Yicheng, Diao Xuefei, Cao Peng
An Intelligent Identification and Removal Method for Invalid Image Tokens in pi0 Model Training, Bai Xiaodong, Xiong Junwu, Zhou Chen, Gong Yicheng, Diao Xuefei, Cao Peng
An Adaptive Adjustment Method for Dynamic Padding Length in VLA Model Training, Bai Xiaodong, Xiong Junwu, Zhou Chen, Gong Yicheng, Diao Xuefei, Cao Peng
A Performance Optimization Method for Training and Inference of Embodied VLA Models Based on Data Packing and Variable-Length Attention, Di Shuai, Xu Wanting, Xiong Junwu, Gong Yicheng, Diao Xuefei, Cao Peng
An Efficient Storage Read/Write Implementation Scheme for Thousand-GPU Training of Embodied AI, Xiong Junwu, Guo Yongjian, Di Shuai, Guo Yucheng, Zhu Yihe, Cao Xuelin, Gong Yicheng, Diao Xuefei, Cao Peng
A System and Method for Unification and Automated Conversion of Multi-Source Robot Datasets Oriented to Embodied AI, Xiong Junwu, Xu Wanting, Huang Wen, Li Yihang, Gong Yicheng, Diao Xuefei, Cao Peng
A Collaborative Accelerated Query Method for Deduplication and Vectorization of Billion-Scale Samples for Embodied AI Training, Di Shuai, Guan Zhong, Xiong Junwu, Gong Yicheng, Diao Xuefei, Cao Peng
An Implementation Method of Multi-Process Parallel Preprocessing Pipeline for Embodied Model Training, Xu Wanting, Sun Haoran, Xiong Junwu, Gong Yicheng, Diao Xuefei, Cao Peng

Effective and Efficient Omini-Based RLXF/RL Foundation Model Reasoning

A Multi-Threaded LLM Data Distillation Tool, Di Shuai, Wu Zhifeng, Xiong Junwu, Guo Yongjian, Tian Zhen, Li Mingyang, Hao Peng, Lin Shunli, Lu Lu, Guo Fangyu, Teng Fang, Wei Wei, Zhang Lianshuai, Shen Yuede, Chen Pengtao, Liu Zhaomin, Wang Shuaiting, Song Bowen, Hao Yanxia, Han Feng, Oct.14, 2025, Under Review.
Junwu Xiong, Zhenduo Zhang, Zujie Wen, Ziqiang Zhang, Jun Zhang, A self-alignment strategy for large general reasoning model, Application No. CXANT4683202, April 11, 2025, Under Review.
Junwu Xiong, Zujie Wen, Xinyu Kong, Jian Yan, A hierarchical ablation optimization scheme and device for improving large reasoning model based on PPL score, Application No. CXANT4198343, Dec.27, 2024, Under Review.
Junwu Xiong , Xiaoyu Tan, XU Hairui, James Zhang, Wei Chu, Yunzhou Shi, Zhongzhou Zhao, Wei Zhou, Xiaolong Li, Digital avatar recommendation method and recommendation system, US Patent, US20240177216A1, App. 18/516,730, 2024.
Junwu Xiong , Xiaoyu Tan, Hairui Xu, James Zhang, Wei Chu, Yunzhou Shi, Zhongzhou Zhao, Wei Zhou, and Xiaolong LI, Interactive recommendation decision‑making of digital Avatar based on reinforcement learning, Application No. CXTA103673, Sep. 2022, Review completed.

Reinforcement Preference Learning

Hairui Xu, Hong Tang, Jingxin Mao, Manhuo Hong, Xiaoyu Tan, Caigao Jiang, Wenpeng Zhang, James Zhang, Chao Qu, Junwu Xiong , and Wei Chu, Model Switch of Attention Capital Intelligent Pricing Algorithm, Application No. CXTA66183, Nov. 2020, Review completed.
Xiaoyu Tan, Chao Qu, Caigao Jiang, Hairui Xu, Junwu Xiong , and James Zhang, A method and system for training a recommendation model, May 2020, Publication No. CN111311384A.
Chenchen Li, Xiang Yan, Junlong Qiao, Chao Qu, Junwu Xiong , and Le Song, A method and device for selecting target users, Nov. 2019, Publication No. CN111027676A.
Junwu Xiong , Zhongyi Liu, and Wei Hu. Recommendation method and device, November 2019. US Patent , US 10,489,471 B2, Nov. 26 , 2019
Xiaoyu Tan, Caigao Jiang, James Zhang, Chao Qu, and Junwu Xiong , Application of quadratic programming algorithm combined with upper bound of the confidence interval in rebate rate pricing, Application No. 101429255, Nov. 2019, Review completed.
Tong Yin, Jing Pan, and Junwu Xiong , An event prediction method and device, Dec. 2018, Publication No. CN110020882A.
Zhiguo Fan, Junwu Xiong , Guowei Zhang, and Zhongyi Liu, A method and device for sorting commodity objects based on dynamic sliding time window, Nov. 2016, Publication No. CN108090794B.

Thesis

Master’s Thesis: Research of Energy Efficiency Optimization Strategy with Data Aggregation for Wireless Sensor Networks, lecture, demo, paper
Undergraduate Thesis: Computation and Performance Evaluation of Bidirectional Associative Memory Neural Based on Time-Delay Differential Equation, lecture, paper

Xiong Jun Wu

Professional Experience

Academy/Industry Sharing

Professional Service

Professional Affiliations

Awards and Honors

Paper publications

Effective and Efficient Embodied Foundation Model Reasoning

Effective and Efficient (Multi-)Agentic Reasoning

Effective and Efficient Omini-Based RLXF/RL Foundation Model Reasoning

Game Theory and Reinforcement Learning

Deep Reinforcement Learning

Deep Learning

Scalable Wireless Sensor Networks

Innovation patents

Effective and Efficient Embodied Foundation Model Reasoning

Effective and Efficient Omini-Based RLXF/RL Foundation Model Reasoning

Reinforcement Preference Learning

Thesis