Tianxiang Sun (孙天祥)

Tianxiang Sun

I am the founder & CEO of Analemma, and also an assistant professor at SII. I received my Ph.D. in Computer Science and Technology from Fudan University in 2024, where I was advised by Xipeng Qiu and Xuanjing Huang. I had internships at Shanghai AI Laboratory (2023), Alibaba DAMO Academy (2022), and Amazon Shanghai AI Lab (2019-2020).

My research interests are using post-training, especially reinforcement learning, to improve pre-trained large language models for various scenarios. Reach out to me over email: txsun1997@gmail.com.

Google Scholar / Github / Twitter / OpenMOSS

News

[Apr. 2025] I joined SII as an assistant professor.
[Mar. 2025] We founded an LLM startup -- Analemma!
[Mar. 2024] Excited to announce OpenMOSS!
[May 2023] Four papers accepted to ACL 2023!
[Feb. 2023] We are excited to release MOSS, a conversational language model.
[Oct. 2022] Three papers accepted to EMNLP 2022!
[Aug. 2022] I gave a talk on LMaaS and black-box tuning at AI Time.
[Aug. 2022] I am co-organizing a PLM-tuning competition (total prize of 1 million RMB) with Zhengfu He. Welcome!
[July 2022] I gave a talk on derivative-free optimization for pre-trained language models at MLNLP. Slides here.
[July 2022] We have released a paper list on Language-Model-as-a-Service (LMaaS). Feel free to submit pull requests!
[May 2022] One paper accepted to ICML 2022 (21.9% acceptance rate)!

---- show more ----

[Apr. 2022] Our paradigm shift survey is accepted to Machine Intelligence Research as an invited paper.

[Apr. 2022] One paper accepted to NAACL 2022!

[Feb. 2022] One paper accepted to ACL 2022 (Findings, 31.4% acceptance rate)!

[Oct. 2021] I gave a talk on efficient NLP at BAAI Big Model Meetup.

[Oct. 2021] I gave a talk on paradigm shift in NLP at BAAI Qinyuan LIVE.

[Sep. 2021] We have released a paper list on early exiting. Feel free to submit pull requests!

[Sep. 2021] I gave a talk on CoLAKE at SFFAI.

[May 2021] One paper accepted to ACL 2021 (21.2% acceptance rate)!

[Mar. 2021] One paper accepted to NAACL 2021 (26% acceptance rate)!

[Dec. 2020] I gave a talk on CoLAKE at CSSNLP.

[Sep. 2020] One paper accepted to COLING 2020 (32.9% acceptance rate)!

[May 2020] Our PTM survey is accepted to SCIENCE CHINA Technological Sciences as an invited paper.

[Nov. 2019] I gave a talk on entity linking at Amazon Shanghai AI Lab. Slides can be downloaded here.

[Nov. 2019] My first paper is accepted to AAAI 2020 for oral presentation (4.5% oral presentation acceptance rate)!

[Oct. 2019] I joined Amazon Shanghai AI Lab as a research intern, supervised by Zheng Zhang.

[Sep. 2019] I joined the NLP Lab at Fudan University as a Ph.D. student.

[Jun. 2019] I received B.Eng. from School of Computer Science and Technology at Xidian University. GPA: 3.8/4.0 (top 0.5%)

Highlighted Papers

Full list of papers can be found at Google Scholar / Semantic Scholar / DBLP / ORCID

(*: Equal contribution)

	Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu ICLR, 2025 pdf / blog on OpenMOSS We discover the quantitative predictability of model performance regarding the mixture proportions in function forms, which we refer to as the data mixing laws. Fitting such functions on sample mixtures unveils model performance on unseen mixtures before actual runs, thus guiding the selection of an ideal data mixture.
	Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu arXiv, 2402.12201 pdf / blog on OpenMOSS Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to attack superposition and extract more human-understandable features from model activations. We ask a further question based on the extracted more monosemantic features: How do we recognize circuits connecting the enormous amount of dictionary features? We propose a circuit discovery framework alternative to activation patching.
	Can AI Assistants Know What They Don't Know? Qinyuan Cheng, Tianxiang Sun*, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li, Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu ICML*, 2024 pdf / code / blog on OpenMOSS We ask the question "Can AI assistants know what they don't know and express them through natural language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for an assistant, which contains its known and unknown questions, based on existing open-domain question answering datasets. Then we align the assistant with its corresponding Idk dataset and observe whether it can refuse to answer its unknown questions after alignment.
	Black-Box Tuning for Language-Model-as-a-Service Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu ICML, 2022 (Spotlight) pdf / code / slides We propose a promising and practical scenario, Language-Model-as-a-Service (LMaaS), where users cannot access model parameters and gradients but can only access language models' output probability. For such a scenario, we propose the black-box tuning to optimize continuous prompts via derivative-free optimization.
	Paradigm Shift in Natural Language Processing Tianxiang Sun, Xiangyang Liu, Xipeng Qiu, Xuanjing Huang Machine Intelligence Research, 2022 (Invited Paper) pdf / project / slides Recent years have witnessed a trend of paradigm shift in a variety of NLP tasks, which is to solve a task that is originally performed with a paradigm (e.g., sequence labeling) with another paradigm (e.g., machine reading comprehension).
	Towards Efficient NLP: A Standard Evaluation and A Strong Baseline Xiangyang Liu, Tianxiang Sun*, Junliang He, Jiawen Wu, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu NAACL, 2022 (Oral Presentation)* pdf / code / benchmark / slides We propose a benchmark, ELUE (Efficient Language Understanding Evaluation), for efficient NLP models and a strong baseline/backbone pre-trained model, ElasticBERT.
	CoLAKE: Contextualized Language and Knowledge Embedding Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, Zheng Zhang COLING, 2020 pdf / code / slides We pre-train a model called CoLAKE for jointly learning language and knowledge representation by unifying language and knowledge into word-knowledge graphs.
	Pre-trained Models for Natural Language Processing: A Survey Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang SCIENCE CHINA Technological Sciences, 2020 (Invited Paper, Most Influential Paper of SCTS in 2020) pdf We provide a comprehensive survey of pre-trained models (PTMs) for NLP, ranging from non-contextual word embeddings to state-of-the-art language models. This is a hands-on guide for understanding, using, and developing PTMs for various NLP tasks.
	Learning Sparse Sharing Architectures for Multiple Tasks Tianxiang Sun, Yunfan Shao, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, Xuanjing Huang AAAI, 2020 (Oral Presentation) pdf / code / slides We propose a new parameter sharing mechanism for multi-task learning, sparse sharing, which allocates a subnet for a task based on lottery ticket hypothesis. The sparse sharing successfully avoids negative transfer between tasks.

Projects & Resources

MOSS: A Conversational Language Model
project led by Tianxiang Sun

MOSS is a conversational language model like ChatGPT. It is capable of following users' instructions to perform various natural language tasks including question answering, generating text, summarzing text, generating code, etc. MOSS is also able to challenge incorrect premises, and reject inappropriate requests. Here is a brief introduction to MOSS.

Paper List on Language-Model-as-a-Service (LMaaS)
maintained by Tianxiang Sun

Pre-trained large language models (LLMs) such as GPT-3 are usually released as a service instead of open sourcing model weights. We call this scenario "Language-Model-as-a-Service (LMaaS)", where users can access the powerful LLMs through their inference APIs. We maintain a curated list of papers that fit into this scenario.

Awards

Outstanding Graduate of Shanghai (2024)
ByteDance Scholarships (13 winners in China, 2023)
National Scholarships (Ministry of Education, China, 2023)
WAIC Yunfan Award - Rising Star (15 winners across the world, 2023)
Fudan Academic Star (10 winners across STEM graduate schools, 2023)
Most Influential Paper Award of Sci. China Tech Sci. (2022)
National Scholarships (Ministry of Education, China, 2020)
Outstanding Graduate (associated with Xidian University, 2019)
First Prize in China High School Biology Olympiad (2014)

Service

Student Seminar Co-Chair

CCL 2023

Reviewer / Program Committee Member

ACL (2021, 2022, 2023)
EMNLP (2021, 2022, 2023)
COLING (2020, 2022)
ICML (2022)
ICLR (2023)
NeurIPS (2022, 2023)
AAAI (2021)
IJCAI (2021)

Design and source code from Jon Barron's website