About Me
I am a 4th year undergraduate student of Dept. of Computer Science and Technology of Tsinghua University, Beijing, PRC. I am also an incomming Ph.D Student of Prof. Minlie Huang @ Conversational AI Group starting from 2025 Fall. I’m now a research intern at A*STAR’s Centre for Frontier AI Research (CFAR), under the supervision of Prof. Yew-Soon Ong. My research interests lie in LLM safety and trustworthy, and I’m recently working on the mechanism of jailbreaking attack & defense, hallucination and knowledge boundary of LRMs.
News
🎉 Our Paper
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
is accepted by ACL 2025 Main! Please refer to our code and paper for more details.🎉 Our Paper
BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs
is released.
Publications
Conference Papers
- Yang, J.*, Zhang, Z.*, Cui, S., Wang, H., & Huang, M. (2025). Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints. ACL 2025 (Long Paper). link
- Zhang, Z.*, Yang, J.*, Ke, P., Mi, F., Wang, H., & Huang, M. (2023). Defending large language models against jailbreaking attacks through goal prioritization. ACL 2024 (Long Paper). link
Preprints
- Yang, J., Tu, J., Liu, H., Wang, X., Zheng, C., Zhang, Z., … & Huang, M. (2025). BARREL: Boundary-Aware Reasoning for Factual and Reliable LRMs. link
- Zhang, Z., Sun, Y., Yang, J., Cui, S., Wang, H., & Huang, M. (2025). Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!. link
- Zhang, Z., Loye, X. Q., Huang, V. S. J., Yang, J., Zhu, Q., Cui, S., … & Huang, M. (2025). How Should We Enhance the Safety of Large Reasoning Models: An Empirical Study.\ link
- Zhang, Z.*, Lei, L.*, Yang, J.*, … , & Huang, M. (2025). AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement. link
- Zhang, Z.*, Yang, J.*, Ke, P., Cui, S., Zheng, C., Wang, H., & Huang, M. (2024). Safe unlearning: A surprisingly effective and generalizable solution to defend against jailbreak attacks. link
- Zhang, Z., Cui, S., Lu, Y., Zhou, J., Yang, J., Wang, H., & Huang, M. (2024). Agent-SafetyBench: Evaluating the Safety of LLM Agents. link.
- Jia, X., … , Yang, J., … , & Zhao, Z. (2024). Global Challenge for Safe and Secure LLMs Track 1. link
Resources:
Teaching
I was a TA for the following undergraduate courses:
- Artificial Neural Network (2024 Fall)
- Linear Algebra (2024 Fall)
Honors and Awards
- Excellent Graduate, Tsinghua University, 2025
- 3rd Prize Winner of the Global Challenge for Safe and Secure LLMs (Track 1)
- Academic Excellence in Research Award of Tsinghua University, 2023.09-2024.09
- Meritorious Winner of Mathematical Contest In Modeling Certificate of Achievement, 2023
- Comprehensive Scholarship of Tsinghua University, 2022.09-2023.09
- Comprehensive Scholarship of Tsinghua University, 2021.09-2022.09
Educations
- 2021.09-now, Tsinghua University, Beijing, China. Undergraduate Student.
- 2018.09-2021.06, Urumqi No.1 Senior High School, Xinjiang, China. High school Student.