About Me
I am a 4th year undergraduate student of Dept. of Computer Science and Technology of Tsinghua University, Beijing, PRC. I am also an incomming Ph.D Student of Prof. Minlie Huang @ Conversational AI Group starting from 2025 Fall. I’m now a research intern at A*STAR’s Centre for Frontier AI Research (CFAR), under the supervision of Prof. Yew-Soon Ong. My research interests lie in LLM safety and trustworthy, and I’m recently working on the mechanism of jailbreaking attack & defense, hallucination and knowledge boundary of LRMs.
News
🎉 Our Paper
Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints
is accepted by ACL 2025 Main! Please refer to our code and paper for more details.🎉 After months of hard work and collaboration, http://CAMEL-AI.org just release its new project Loong🐉. I’m delighted to be an contributor in it, mainly leading the safety and security domain team. I believe verifiable synthesis for complex questions will be an highly important issue for future LLM development. Here is our project link.
🎉 AISafetyLab github repo released, please give us a star QwQ.
Publications
Conference Papers
- Yang, J.*, Zhang, Z.*, Cui, S., Wang, H., & Huang, M. (2025). Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints. ACL 2025 (Long Paper). link
- Zhang, Z.*, Yang, J.*, Ke, P., Mi, F., Wang, H., & Huang, M. (2023). Defending large language models against jailbreaking attacks through goal prioritization. ACL 2024 (Long Paper). link
Preprints
- Zhang, Z.*, Lei, L.*, Yang, J.*, … , & Huang, M. (2025). AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement. link
- Zhang, Z.*, Yang, J.*, Ke, P., Cui, S., Zheng, C., Wang, H., & Huang, M. (2024). Safe unlearning: A surprisingly effective and generalizable solution to defend against jailbreak attacks. link
- Zhang, Z., Cui, S., Lu, Y., Zhou, J., Yang, J., Wang, H., & Huang, M. (2024). Agent-SafetyBench: Evaluating the Safety of LLM Agents. link.
- Jia, X., … , Yang, J., … , & Zhao, Z. (2024). Global Challenge for Safe and Secure LLMs Track 1. link
Teaching
I was a TA for the following undergraduate courses:
- Artificial Neural Network (2024 Fall)
- Linear Algebra (2024 Fall)
Honors and Awards
- 3rd Prize Winner of the Global Challenge for Safe and Secure LLMs (Track 1)
- Academic Excellence in Research Award of Tsinghua University, 2023.09-2024.09
- Meritorious Winner of Mathematical Contest In Modeling Certificate of Achievement, 2023
- Comprehensive Scholarship of Tsinghua University, 2022.09-2023.09
- Comprehensive Scholarship of Tsinghua University, 2021.09-2022.09
Educations
- 2021.09-now, Tsinghua University, Beijing, China. Undergraduate Student.
- 2018.09-2021.06, Urumqi No.1 Senior High School, Xinjiang, China. High school Student.