About me

Professional Experience

  • Staff Algorithm Expert, AGI NextEvo, AI Alignment Team, Ant Group, 2023‑Now
  • Staff Algorithm Expert, (Multi-Agent) Reinforcement Learning and Dynamic Decision Intelligence Team, Ant Group, 2017‑2022
  • Algorithm Expert, Recommendation Algorithm Platform Team, Alibaba, 2014‑2016
  • Senior Algorithm R & D Engineer, Social Graph Recommendation Algorithm and Engine Team, Renren Applied Research Center and Tsinghua Joint Laboratory, 2011‑2014
  • R&D Research Intern, Social Search Mining Team, Baidu, 2011
  • Research Assistant, Video Media Algorithm Group, Future Network Center of Hong Kong City University, 2008

Academy/Industry Sharing

Professional Service

  • Representant Ant Group In Top Academic Conferences, COLM’24, AAAI’20, AAAI’19, ICML’18, ICML’17
  • Program Committee Member for the Main Track, AAAI’22, AAAI’21
  • Represent Alibaba, Renren Company In Top Academic Conferences, ACM SighKDD, Recsys China, China Computer Federation ‘Subject Frontier Workshop’, China Database Technology Conference, 2011-2016

Professional Affiliations

Awards and Honors

  • The CEO Annual Team Special Contribution Award, Ant Group (Only two teams), 2021
  • The Company ‘SuperMa’ Award, Ant Group (Eight teams in total), 2020
  • Certification of Level 1 Trainer Training of American Management Association, Renren, 2013
  • Senior Excellent Training Instructor, Renren, 2013
  • Baidu Space Best Team Award, Baidu, 2011
  • The Second‑class Scholarship of China Aerospace Science and Technology Corporation (CASC) (Top 2%), School of Computer Science, Beihang University, 2009
  • Outstanding Graduates of Hunan Province’s General Colleges and Universities(Top 2%), 2007
  • The 7th Hunan Provincial Scholarship for Outstanding College Students in Special Poor (Top 0.2%), Hunan Provincial Government, 2006
  • Hunan Provincial Government First Prize Scholarship (Top 0.2%), Hunan Provincial Government, 2006

Paper publications

Scalable RLXF/RL/Agentic Alignment

  • Zehang Deng, Yongjian Guo, Changzhou Han, Wanlun Ma, Junwu Xiong , Sheng Wen, Yang Xiang, AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways, ACM Computing Surveys, paper, 202501.
  • Yusen Wu, Li Jiang, Junwu Xiong , Jingqing Ruan, Yichuan Ding, Qingpei Guo, zujie wen, Jun Zhou, Xiaotie Deng, Hummer: Towards Limited Competitive Preference Dataset, paper, dataset, Conference on Language Modeling(COLM), ICML 2024 Workshop MHFAIA(Models of Human Feedback for AI Alignment), Oral, 2024.
  • T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu, Q Zhang, Z Qiu, P Li, Z Tan, Junwu Xiong and others, Risk Taxonomy, Mitigation, and Assessment Benchmarks of Large Language Model Systems, paper, 202401. Submitted to ACM Computing Surveys, 202401.

Multi-modal Reinforcement Learning

  • Junwu Xiong , Xiaoyun Feng, YunZhou Shi, James Zhang, Zhongzhou Zhao, and Wei Zhou. Digital human interactive recommendation decision‑making based on reinforcement learning. NeurIPS 2022 Workshop on Human in the Loop Learning, poster, demo, paper, 2022.

Multi-agent Reinforcement Learning

  • Chao Qu, Hui Li, Chang Liu, Junwu Xiong , James Zhang, Wei Chu, Weiqiang Wang, Yuan Qi, and Le Song. Variational Policy Propagation for Multi‑agent Reinforcement Learning, paper, 2020.
  • Chao Qu, Shie Mannor, Huan Xu, Yuan Qi, Le Song, and Junwu Xiong . Value propagation for decentralized networked deep multi‑agent reinforcement learning. Advances in Neural Information Processing Systems, paper, 2019.

Game Theory and Reinforcement Learning

  • Romain Lopez, Chenchen Li, Xiang Yan, Junwu Xiong , Michael Jordan, Yuan Qi, and Le Song. Cost‑effective incentive allocation via structured counterfactual inference. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 34, pages 4997–5004, paper, 2020.
  • Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, and Junwu Xiong . Latent Dirichlet Allocation for Internet Price War. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 33, pages 639–646, paper, 2019.

Deep Reinforcement Learning

  • Tan, Xiaoyu and Qu, Chao and Xiong, Junwu and Zhang, James and Qiu, Xihe and Jin, Yaochu,Model-Based Off-Policy Deep Reinforcement Learning With Model-Embedding, IEEE Transactions on Emerging Topics in Computational Intelligence, paper, 202403.
  • Chenchen Li, Xiang Yan, Xiaotie Deng, Yuan Qi, Wei Chu, Le Song, Junlong Qiao, Jianshan He, and Junwu Xiong . Reinforcement learning for uplift modeling, paper, 2018.

Deep Learning

  • Huiru Xiao, Caigao Jiang, Yangqiu Song, James Zhang, and Junwu Xiong . Unit ball model for embedding hier‑ archical structures in the complex hyperbolic space, paper, 2021.
  • Tong Yin, Xiaotie Deng, Yuan Qi, Wei Chu, Jing Pan, Xiang Yan, and Junwu Xiong . Personalized behavior predic‑ tion with encoder‑to‑decoder structure. In 2018 IEEE International Conference on Networking , Architecture and Storage (NAS), pages 1–10. IEEE, paper, 2018.

Scalable Wireless Sensor Networks

  • Huan Li, Yanlei Liu, Weifeng Chen, Weijia Jia, Bing Li, and Junwu Xiong . COCA: Constructing optimal clustering architecture to maximize sensor network lifetime. Computer Communications , 36(3):256–268, paper, 2013.
  • Huan Li, Jierui Cao, and Junwu Xiong . Constructing optimal clustering architecture for maximizing lifetime in large scale wireless sensor networks. In 2009 15th International Conference on Parallel and Distributed Systems, pages 182–189. IEEE, paper, 2009.

Innovation patents

Scalable RL x LLM Reasoning

  • Junwu Xiong , Zujie Wen, Xinyu Kong, Jian Yan, A hierarchical ablation optimization scheme and device for improving pre-trained large language model based on PPL score, Application No. CXANT4198343, Dec.27, 2024, Under Review.

Multi-modal Reinforcement Learning

  • Junwu Xiong , Xiaoyu Tan, XU Hairui, James Zhang, Wei Chu, Yunzhou Shi, Zhongzhou Zhao, Wei Zhou, Xiaolong Li, Digital avatar recommendation method and recommendation system, US Patent, US20240177216A1, App. 18/516,730, 2024.
  • Junwu Xiong , Xiaoyu Tan, Hairui Xu, James Zhang, Wei Chu, Yunzhou Shi, Zhongzhou Zhao, Wei Zhou, and Xiaolong LI, Interactive recommendation decision‑making of digital Avatar based on reinforcement learning, Application No. CXTA103673, Sep. 2022, Review completed.

Reinforcement Preference Learning

  • Hairui Xu, Hong Tang, Jingxin Mao, Manhuo Hong, Xiaoyu Tan, Caigao Jiang, Wenpeng Zhang, James Zhang, Chao Qu, Junwu Xiong , and Wei Chu, Model Switch of Attention Capital Intelligent Pricing Algorithm, Application No. CXTA66183, Nov. 2020, Review completed.
  • Xiaoyu Tan, Chao Qu, Caigao Jiang, Hairui Xu, Junwu Xiong , and James Zhang, A method and system for training a recommendation model, May 2020, Publication No. CN111311384A.
  • Chenchen Li, Xiang Yan, Junlong Qiao, Chao Qu, Junwu Xiong , and Le Song, A method and device for selecting target users, Nov. 2019, Publication No. CN111027676A.
  • Junwu Xiong , Zhongyi Liu, and Wei Hu. Recommendation method and device, November 2019. US Patent , US 10,489,471 B2, Nov. 26 , 2019
  • Xiaoyu Tan, Caigao Jiang, James Zhang, Chao Qu, and Junwu Xiong , Application of quadratic programming algorithm combined with upper bound of the confidence interval in rebate rate pricing, Application No. 101429255, Nov. 2019, Review completed.
  • Tong Yin, Jing Pan, and Junwu Xiong , An event prediction method and device, Dec. 2018, Publication No. CN110020882A.
  • Zhiguo Fan, Junwu Xiong , Guowei Zhang, and Zhongyi Liu, A method and device for sorting commodity objects based on dynamic sliding time window, Nov. 2016, Publication No. CN108090794B.

Thesis

  • Master’s Thesis: Research of Energy Efficiency Optimization Strategy with Data Aggregation for Wireless Sensor Networks, lecture, demo, paper
  • Undergraduate Thesis: Computation and Performance Evaluation of Bidirectional Associative Memory Neural Based on Time-Delay Differential Equation, lecture, paper