About Me

I am a 1st year Ph.D. student in CoAI Group, Dept. of Computer Science and Technology of Tsinghua University. I’m advised by Prof. Minlie Huang. My research interests lie in LLM safety and trustworthy, and I’m recently working on the mechanism of safety alignment, hallucination and knowledge boundary of LRMs.

News

🎉 Our Papers LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety and How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study are accepted by ACL 2026.
🎉 Our Papers BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs and Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! are accepted by ICLR 2026.
🎉 We built Open Cowork - Opensource Claude Cowork for Windows & macOS. code

Working Experiences

Research intern at A*STAR’s Centre for Frontier AI Research (CFAR), from Feb 2025 to May 2025, under the supervision of Prof. Yew-Soon Ong.

Publications

Paper

ACL 2026 Main Conference

LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety

Junxiao Yang, Haoran Liu, Jinzhe Tu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Jiaqi Weng, Jialing Tao, Hui Xue, Hongning Wang, Han Qiu, Minlie Huang

Paper

ICLR 2026

BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs

Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang.

Paper

ACL 2025 Main Conference

Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints

Junxiao Yang*, Zhexin Zhang*, Shiyao Cui, Hongning Wang, Minlie Huang.

Paper

ACL 2024 Main Conference

Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization

Zhexin Zhang*, Junxiao Yang*, Pei Ke, Fei Mi, Hongning Wang, Minlie Huang.

Paper

AAAI 2025 Oral

When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity

Shiyao Cui, Xijia Feng, Yingkang Wang, Junxiao Yang, Zhexin Zhang, Biplab Sikdar, Hongning Wang, Han Qiu, Minlie Huang.

Paper

ACL 2026 Main Conference

How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study

Zhexin Zhang, Xian Qi Loye, Victor Shea-Jay Huang, Junxiao Yang, Qi Zhu, Shiyao Cui, Fei Mi, Lifeng Shang, Yingkang Wang, Hongning Wang, Minlie Huang.

Paper

ICLR 2026 Main Conference

Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!

Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang.

Paper

NeurIPS 2025 Workshop

Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks

Zhexin Zhang*, Junxiao Yang*, Yida Lu, Pei Ke, Shiyao Cui, Chujie Zheng, Hongning Wang, Minlie Huang.

Paper

AAAI 2025 Workshop

Agent-SafetyBench: Evaluating the Safety of LLM Agents

Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, Minlie Huang.

Open Source Projects

Open Cowork

Open-source AI agent desktop app for Windows and macOS.

GitHub Core Lead

AISafetyLab

A comprehensive framework for AI safety evaluation and improvement.

GitHub Core Lead

Loong

Synthesize long chain-of-thought data at scale through verifiers.

GitHub Core Lead

Open Codesign

Open-source design agent for turning prompts into prototypes, slides, and PDFs.

GitHub Core Lead

Teaching

I was a TA for the following undergraduate courses:

Artificial Neural Network (2024 Fall, 2025 Fall)
Linear Algebra (2024 Fall)

Honors and Awards

Excellent Graduate, Tsinghua University, 2025
3rd Prize Winner of the Global Challenge for Safe and Secure LLMs (Track 1)
Academic Excellence in Research Award of Tsinghua University, 2023.09-2024.09
Meritorious Winner of Mathematical Contest In Modeling Certificate of Achievement, 2023
Comprehensive Scholarship of Tsinghua University, 2022.09-2023.09
Comprehensive Scholarship of Tsinghua University, 2021.09-2022.09

Educations

2025.09-now, Tsinghua University, Beijing, China. Ph.D. Student.
2021.09-2025.06, Tsinghua University, Beijing, China. Undergraduate Student.
2018.09-2021.06, Urumqi No.1 Senior High School, Xinjiang, China. High school Student.