About Me
I am a 1st year Ph.D. student in CoAI Group, Dept. of Computer Science and Technology of Tsinghua University. I’m advised by Prof. Minlie Huang. My research interests lie in LLM safety and trustworthy, and I’m recently working on the mechanism of jailbreaking attack & defense, hallucination and knowledge boundary of LLMs.
News
🎉 Our Papers
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMsandBe Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!are accepted by ICLR 2026.🎉 We built Open Cowork - Opensource Claude Cowork for Windows & macOS. code
🎉 Our Paper
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraintsis accepted by ACL 2025 Main! Please refer to our code and paper for more details.
Working Experiences
- Research intern at A*STAR’s Centre for Frontier AI Research (CFAR), from Feb 2025 to May 2025, under the supervision of Prof. Yew-Soon Ong.
Publications
- [ICLR 2026] BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs. [paper]
- Junxiao Yang, Jinzhe Tu, Haoran Liu, Xiaoce Wang, Chujie Zheng, Zhexin Zhang, Shiyao Cui, Caishun Chen, Tiantian He, Hongning Wang, Yew-Soon Ong, Minlie Huang.
- [ACL 2025] Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints. [paper]
- Junxiao Yang*, Zhexin Zhang*, Shiyao Cui, Hongning Wang, Minlie Huang.
- [ACL 2024] Defending Large Language Models Against Jailbreaking Attacks Through Goal Prioritization. [paper]
- Zhexin Zhang*, Junxiao Yang*, Pei Ke, Fei Mi, Hongning Wang, Minlie Huang.
- [AAAI 2025 Oral] When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs’ Toxicity. [paper]
- Shiyao Cui, Xijia Feng, Yingkang Wang, Junxiao Yang, Zhexin Zhang, Biplab Sikdar, Hongning Wang, Han Qiu, Minlie Huang.
- [ICLR 2026] Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen! [paper]
- Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang.
- [NeurIPS 2025 Workshop] Safe Unlearning: A Surprisingly Effective and Generalizable Solution to Defend Against Jailbreak Attacks. [paper]
- Zhexin Zhang*, Junxiao Yang*, Yida Lu, Pei Ke, Shiyao Cui, Chujie Zheng, Hongning Wang, Minlie Huang.
- [AAAI 2025 Workshop] Agent-SafetyBench: Evaluating the Safety of LLM Agents. [paper]
- Zhexin Zhang, Shiyao Cui, Yida Lu, Jingzhuo Zhou, Junxiao Yang, Hongning Wang, Minlie Huang.
- [Preprint] How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study. [paper]
- Zhexin Zhang, Xian Qi Loye, Victor Shea-Jay Huang, Junxiao Yang, Qi Zhu, Shiyao Cui, Fei Mi, Lifeng Shang, Yingkang Wang, Hongning Wang, Minlie Huang.
- [Preprint] AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement. [paper]
- Zhexin Zhang*, Leqi Lei*, Junxiao Yang*, Xijie Huang, Yida Lu, Shiyao Cui, et al.
- [Preprint] Global Challenge for Safe and Secure LLMs Track 1. [paper]
- Xiaojun Jia, Yihao Huang, Yang Liu, …, Junxiao Yang, Zhexin Zhang, …, Zhe Zhao.
Resources:
- AISafetyLab: A comprehensive framework for AI safety evaluation and improvement.
- Open Cowork: Opensource Claude Cowork for Windows & macOS.
Teaching
I was a TA for the following undergraduate courses:
- Artificial Neural Network (2024 Fall, 2025 Fall)
- Linear Algebra (2024 Fall)
Honors and Awards
- Excellent Graduate, Tsinghua University, 2025
- 3rd Prize Winner of the Global Challenge for Safe and Secure LLMs (Track 1)
- Academic Excellence in Research Award of Tsinghua University, 2023.09-2024.09
- Meritorious Winner of Mathematical Contest In Modeling Certificate of Achievement, 2023
- Comprehensive Scholarship of Tsinghua University, 2022.09-2023.09
- Comprehensive Scholarship of Tsinghua University, 2021.09-2022.09
Educations
- 2025.09-now, Tsinghua University, Beijing, China. Ph.D. Student.
- 2021.09-2025.06, Tsinghua University, Beijing, China. Undergraduate Student.
- 2018.09-2021.06, Urumqi No.1 Senior High School, Xinjiang, China. High school Student.
