About Me

I am a 4th year undergraduate student of Dept. of Computer Science and Technology of Tsinghua University, Beijing, PRC. I am also an incomming Ph.D Student of Prof. Minlie Huang @ Conversational AI Group starting from 2025 Fall. I’m now a research intern at A*STAR’s Centre for Frontier AI Research (CFAR), under the supervision of Prof. Yew-Soon Ong. My research interests lie in LLM safety and trustworthy, and I’m recently working on the mechanism of jailbreaking attack & defense, hallucination and knowledge boundary of LRMs.

News

  • 🎉 Our Paper Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints is accepted by ACL 2025 Main! Please refer to our code and paper for more details.

  • 🎉 After months of hard work and collaboration, http://CAMEL-AI.org just release its new project Loong🐉. I’m delighted to be an contributor in it, mainly leading the safety and security domain team. I believe verifiable synthesis for complex questions will be an highly important issue for future LLM development. Here is our project link.

  • 🎉 AISafetyLab github repo released, please give us a star QwQ.

Publications

Conference Papers

  • Yang, J.*, Zhang, Z.*, Cui, S., Wang, H., & Huang, M. (2025). Guiding not Forcing: Enhancing the Transferability of Jailbreaking Attacks on LLMs via Removing Superfluous Constraints. ACL 2025 (Long Paper). link
  • Zhang, Z.*, Yang, J.*, Ke, P., Mi, F., Wang, H., & Huang, M. (2023). Defending large language models against jailbreaking attacks through goal prioritization. ACL 2024 (Long Paper). link

Preprints

  • Zhang, Z.*, Lei, L.*, Yang, J.*, … , & Huang, M. (2025). AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement. link
  • Zhang, Z.*, Yang, J.*, Ke, P., Cui, S., Zheng, C., Wang, H., & Huang, M. (2024). Safe unlearning: A surprisingly effective and generalizable solution to defend against jailbreak attacks. link
  • Zhang, Z., Cui, S., Lu, Y., Zhou, J., Yang, J., Wang, H., & Huang, M. (2024). Agent-SafetyBench: Evaluating the Safety of LLM Agents. link.
  • Jia, X., … , Yang, J., … , & Zhao, Z. (2024). Global Challenge for Safe and Secure LLMs Track 1. link

Teaching

I was a TA for the following undergraduate courses:

  • Artificial Neural Network (2024 Fall)
  • Linear Algebra (2024 Fall)

Honors and Awards

  • 3rd Prize Winner of the Global Challenge for Safe and Secure LLMs (Track 1)
  • Academic Excellence in Research Award of Tsinghua University, 2023.09-2024.09
  • Meritorious Winner of Mathematical Contest In Modeling Certificate of Achievement, 2023
  • Comprehensive Scholarship of Tsinghua University, 2022.09-2023.09
  • Comprehensive Scholarship of Tsinghua University, 2021.09-2022.09

Educations

  • 2021.09-now, Tsinghua University, Beijing, China. Undergraduate Student.
  • 2018.09-2021.06, Urumqi No.1 Senior High School, Xinjiang, China. High school Student.