News
- [Mar. 2024] Excited to announce OpenMOSS!
- [May 2023] Four papers accepted to ACL 2023!
- [Feb. 2023] We are excited to release MOSS, a conversational language model.
- [Oct. 2022] Three papers accepted to EMNLP 2022!
- [Aug. 2022] I gave a talk on LMaaS and black-box tuning at AI Time.
- [Aug. 2022] I am co-organizing a PLM-tuning competition (total prize of 1 million
RMB) with Zhengfu He.
Welcome!
- [July 2022] I gave a talk on derivative-free optimization for pre-trained language
models at MLNLP. Slides here.
- [July 2022] We have released a paper
list on Language-Model-as-a-Service (LMaaS). Feel free to submit pull requests!
- [May 2022] One paper accepted to ICML 2022 (21.9% acceptance rate)!
---- show more ----
- [Apr. 2022] Our paradigm shift survey is accepted to Machine Intelligence Research as
an
invited paper.
- [Apr. 2022] One paper accepted to NAACL 2022!
- [Feb. 2022] One paper accepted to ACL 2022 (Findings, 31.4% acceptance rate)!
- [Oct. 2021] I gave a talk on efficient NLP at BAAI
Big Model Meetup.
- [Oct. 2021] I gave a talk on paradigm shift in NLP at BAAI
Qinyuan LIVE.
- [Sep. 2021] We have released a paper list on early exiting.
Feel free to submit pull requests!
- [Sep. 2021] I gave a talk on CoLAKE at SFFAI.
- [May 2021] One paper accepted to ACL 2021 (21.2% acceptance rate)!
- [Mar. 2021] One paper accepted to NAACL 2021 (26% acceptance rate)!
- [Dec. 2020] I gave a talk on CoLAKE at CSSNLP.
- [Sep. 2020] One paper accepted to COLING 2020 (32.9% acceptance rate)!
- [May 2020] Our PTM survey is accepted to SCIENCE CHINA Technological Sciences as an
invited paper.
- [Nov. 2019] I gave a talk on entity linking at Amazon Shanghai AI Lab. Slides can be downloaded
here.
- [Nov. 2019] My first paper is accepted to AAAI 2020 for oral presentation (4.5% oral
presentation acceptance rate)!
- [Oct. 2019] I joined Amazon Shanghai AI Lab as a research intern, supervised by Zheng Zhang.
- [Sep. 2019] I joined the NLP Lab at Fudan University as a Ph.D. student.
- [Jun. 2019] I received B.Eng. from School of Computer Science and Technology at Xidian
University. GPA: 3.8/4.0 (top 0.5%)
|
|
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye, Peiju Liu, Tianxiang Sun, Yunhua Zhou, Jun Zhan, Xipeng Qiu
arXiv, 2403.16952  
pdf
/
blog on OpenMOSS
We discover the quantitative predictability of model performance regarding the mixture proportions
in function forms, which we refer to as the data mixing laws. Fitting such functions on sample
mixtures unveils model performance on unseen mixtures before actual runs, thus guiding the selection
of an ideal data mixture.
|
|
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic
Interpretability: A Case Study on Othello-GPT
Zhengfu He, Xuyang Ge, Qiong Tang, Tianxiang Sun, Qinyuan Cheng, Xipeng Qiu
arXiv, 2402.12201  
pdf
/
blog on OpenMOSS
Sparse dictionary learning has been a rapidly growing technique in mechanistic interpretability to
attack superposition and extract more human-understandable features from model activations. We ask a
further question based on the extracted more monosemantic features: How do we recognize circuits
connecting the enormous amount of dictionary features? We propose a circuit discovery framework
alternative to activation patching.
|
|
Can AI Assistants Know What They Don't Know?
Qinyuan Cheng*, Tianxiang Sun*, Xiangyang Liu, Wenwei Zhang, Zhangyue Yin, Shimin Li,
Linyang Li, Zhengfu He, Kai Chen, Xipeng Qiu
ICML, 2024  
pdf
/
code
/
blog on OpenMOSS
We ask the question "Can AI assistants know what they don't know and express them through natural
language?" To answer this question, we construct a model-specific "I don't know" (Idk) dataset for
an assistant, which contains its known and unknown questions, based on existing open-domain question
answering datasets. Then we align the assistant with its corresponding Idk dataset and observe
whether it can refuse to answer its unknown questions after alignment.
|
|
Black-Box Tuning for Language-Model-as-a-Service
Tianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, Xipeng Qiu
ICML, 2022   (Spotlight)
pdf
/
code
/
slides
We propose a promising and practical scenario, Language-Model-as-a-Service (LMaaS), where users
cannot access model parameters and gradients but can only access language models' output
probability. For such a scenario, we propose the black-box tuning to optimize continuous prompts via
derivative-free optimization.
|
|
Paradigm Shift in Natural Language Processing
Tianxiang Sun, Xiangyang Liu, Xipeng Qiu, Xuanjing Huang
Machine Intelligence Research, 2022   (Invited Paper)
pdf
/
project
/
slides
Recent years have witnessed a trend of paradigm shift in a variety of NLP tasks, which is to solve a
task that is originally performed with a paradigm (e.g., sequence labeling) with another paradigm
(e.g., machine reading comprehension).
|
|
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu*, Tianxiang Sun*, Junliang He, Jiawen Wu, Lingling Wu, Xinyu Zhang, Hao
Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu
NAACL, 2022   (Oral Presentation)
pdf
/
code
/
benchmark
/
slides
We propose a benchmark, ELUE (Efficient Language Understanding Evaluation), for efficient NLP models
and a strong baseline/backbone pre-trained model, ElasticBERT.
|
|
CoLAKE: Contextualized Language and Knowledge Embedding
Tianxiang Sun, Yunfan Shao, Xipeng Qiu, Qipeng Guo, Yaru Hu, Xuanjing Huang, Zheng
Zhang
COLING, 2020
pdf
/
code
/
slides
We pre-train a model called CoLAKE for jointly learning language and knowledge representation by
unifying language and knowledge into word-knowledge graphs.
|
|
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu, Tianxiang Sun, Yige Xu, Yunfan Shao, Ning Dai, Xuanjing Huang
SCIENCE CHINA Technological Sciences, 2020   (Invited Paper,
Most Influential Paper of SCTS in 2020)
pdf
We provide a comprehensive survey of pre-trained models (PTMs) for NLP, ranging from non-contextual
word embeddings to state-of-the-art language models. This is a hands-on guide for understanding,
using, and developing PTMs for various NLP tasks.
|
|
Learning Sparse Sharing Architectures for Multiple Tasks
Tianxiang Sun*, Yunfan Shao*, Xiaonan Li, Pengfei Liu, Hang Yan, Xipeng Qiu, Xuanjing
Huang
AAAI, 2020   (Oral Presentation)
pdf
/
code
/
slides
We propose a new parameter sharing mechanism for multi-task learning, sparse sharing, which
allocates a subnet for a task based on lottery ticket hypothesis. The sparse sharing successfully
avoids negative transfer between tasks.
|
|
MOSS: A Conversational Language Model
project led by Tianxiang Sun
MOSS is a conversational language model like ChatGPT. It is capable of following users' instructions
to perform various natural language tasks including question answering, generating text, summarzing
text, generating code, etc. MOSS is also able to challenge incorrect premises, and reject
inappropriate requests. Here is a brief introduction to MOSS.
|
|
Paper List on Language-Model-as-a-Service (LMaaS)
maintained by Tianxiang Sun
Pre-trained large language models (LLMs) such as GPT-3 are usually released as a service instead of
open sourcing model weights. We call this scenario "Language-Model-as-a-Service (LMaaS)", where
users can access the powerful LLMs through their inference APIs. We maintain a curated list of
papers that fit into this scenario.
|
Awards
- Outstanding Graduate of Shanghai (2024)
- ByteDance Scholarships (13 winners in China, 2023)
- National Scholarships (Ministry of Education, China, 2023)
- WAIC Yunfan Award - Rising Star (15 winners across the world, 2023)
- Fudan Academic Star (10 winners across STEM graduate schools, 2023)
- Most Influential Paper Award of Sci. China Tech Sci. (2022)
- National Scholarships (Ministry of Education, China, 2020)
- Outstanding Graduate (associated with Xidian University, 2019)
- First Prize in China High School Biology Olympiad (2014)
|
Service
Student Seminar Co-Chair
Reviewer / Program Committee Member
- ACL (2021, 2022, 2023)
- EMNLP (2021, 2022, 2023)
- COLING (2020, 2022)
- ICML (2022)
- ICLR (2023)
- NeurIPS (2022, 2023)
- AAAI (2021)
- IJCAI (2021)
|
|