About me

I am a Ph.D. candidate in the Department of Systems Engineering & Engineering Management at The Chinese University of Hong Kong (CUHK), advised by Prof. Kam-Fai Wong. I received my master’s and bachelor’s degrees from Peking University and Northwestern Polytechnical University, respectively. I had a wonderful time as a visiting researcher at LMU Munich, working with Prof. Hinrich Schütze.

My research centers on developing effective and reliable adaptation methods for large language models (LLMs), aiming to improve their alignment with human preferences and strengthen advanced capabilities such as reasoning. My work spans three directions:
(1) Large-scale reinforcement learning for reasoning and agents [EEPO, BRIDGE]
(2) Post-training reliable LLMs from imperfect supervision [PEARL, VAA]
(3) Evaluating and watermarking LLM outputs [CONNER, WatME]

News

  • [01/2026] Our work on exploration dynamics in RL is accepted at ICLR 2026.
    We diagnose mode collapse as a self-reinforcing loop in trajectory distributions, and mitigate it via distribution-level reshaping to enable systematic exploration.
  • [09/2025] New work on meta-learning for training reasoning models.
    Instead of serving as a warmup, SFT can now learn how to supervise RL by strategically transferring beneficial knowledge.
  • [05/2025] Gave a talk at LMU Munich on robust LLMs.
  • [05/2025] Our paper on safety alignment is accepted at ICML 2025.
    We reveal that some alignment examples are more prone to forgetting, and propose to upweight them to improve safety retention.
  • [02/2025] Our paper on instruction tuning is accepted at ICLR 2025.
    We propose an instruction finetuning method that helps LLMs better handle set-structure inputs — making them more robust in tasks like ICL and RAG.
  • [05/2024] Our paper on llm watermarking is accepted at ACL 2024.
    We introduce a decoding method that embeds watermarks via lexical redundancy, preserving text quality with minimal tradeoff.

Publications (Full List)


Talks

  • Beyond Two-Stage Training: Cooperative SFT and RL for Improved LLM Reasoning
    PhD Seminar, LMU Munich – August 2025
    Host: Prof. Hinrich Schütze

  • Vulnerability-Aware Alignment: Protect Open-Source LLMs against Unsafe Fine-tuning
    AI Time, Online Live – June 2025

  • Towards Trustworthy LLMs: Improving Robustness via Post-Training Optimization
    PhD Seminar, LMU Munich – May 2025
    Host: Prof. Hinrich Schütze


Teaching

I have served as a teaching assistant for the following courses:

  • Operations Research II (SEEM3440) – Covers advanced optimization techniques, including non-linear, integer, and dynamic programming.
  • Engineering Innovation and Entrepreneurship (SEEM3450) – A hands-on course focused on identifying engineering opportunities and developing business plans.

Internships

  • Microsoft Research Asia, Systems Research Group
  • Tencent AI Lab, Machine Learning Center

Community Service

  • Reviewer for ICML, ICLR, NeurIPS, AISTATS, ACL, EMNLP, and NAACL.

Honors & Scholarships

  • Postgraduate Studentship, CUHK
  • School Scholarship, PKU
  • First-Class Scholarship, NWPU

Miscellaneous

Outside of research, I enjoy walking in parks, as well as swimming, hiking, and table tennis.

During my time at NWPU, I was the runner-up in the Freshmen Cup table tennis singles match and won the team championship three times.


“I don't want to achieve immortality through my work; I want to achieve immortality through not dying.”
— Woody Allen