About me

I am a Ph.D. candidate in the Department of Systems Engineering & Engineering Management at The Chinese University of Hong Kong (CUHK), advised by Prof. Kam-Fai Wong. I received my master’s and bachelor’s degrees from Peking University and Northwestern Polytechnical University, respectively. I had a wonderful time as a visiting researcher at LMU Munich, working with Prof. Hinrich Schütze.

My research interests lie in natural language processing and large language models (LLMs). Currently, I’m focusing on the following topics:

  • Large-scale reinforcement learning for reasoning and agents [EEPO, BRIDGE]
  • Post-training reliable LLMs from imperfect supervision [PEARL, VAA]
  • Evaluating and watermarking LLM outputs [CONNER, WatME]

News

  • [01/2026] Our work on RLVR is accepted at ICLR 2026.
    We diagnose mode collapse in RL as a self‑reinforcing loop, and break it by reshaping the trajectory distribution to discourage revisits and enable BFS‑like exploration.
  • [09/2025] New work on meta-learning for training reasoning models.
    Instead of serving as a warmup, SFT can now learn how to supervise RL by strategically transferring beneficial knowledge.
  • [05/2025] Gave a talk at LMU Munich on robust LLMs.
  • [05/2025] Our paper on safety alignment is accepted at ICML 2025.
    We reveal that some alignment examples are more prone to forgetting, and propose to upweight them to improve safety retention.
  • [02/2025] Our paper on robust finetuning is accepted at ICLR 2025.
    We propose an instruction finetuning method that helps LLMs better handle unordered inputs — making them more robust in tasks like ICL and RAG.
  • [05/2024] Our paper on llm watermarking is accepted at ACL 2024.
    We introduce a decoding method that embeds watermarks via lexical redundancy, preserving text quality with minimal tradeoff.

Publications (Full List)


Talks

  • Beyond Two-Stage Training: Cooperative SFT and RL for Improved LLM Reasoning
    PhD Seminar, LMU Munich – August 2025
    Host: Prof. Hinrich Schütze

  • Vulnerability-Aware Alignment: Protect Open-Source LLMs against Unsafe Fine-tuning
    AI Time, Online Live – June 2025

  • Towards Trustworthy LLMs: Improving Robustness via Post-Training Optimization
    PhD Seminar, LMU Munich – May 2025
    Host: Prof. Hinrich Schütze


Teaching

I have served as a teaching assistant for the following courses:

  • Operations Research II (SEEM3440) – Covers advanced optimization techniques, including non-linear, integer, and dynamic programming.
  • Engineering Innovation and Entrepreneurship (SEEM3450) – A hands-on course focused on identifying engineering opportunities and developing business plans.

Internships

  • Microsoft Research Asia, Systems Research Group
  • Tencent AI Lab, Machine Learning Center

Community Service

  • Reviewer for ICML, ICLR, NeurIPS, AISTATS, ACL, EMNLP, and NAACL.

Honors & Scholarships

  • Postgraduate Studentship, CUHK
  • School Scholarship, PKU
  • First-Class Scholarship, NWPU

Miscellaneous

Outside of research, I enjoy walking in parks, as well as swimming, hiking, and table tennis.

During my time at NWPU, I was the runner-up in the Freshmen Cup table tennis singles match and won the team championship three times.


“I don't want to achieve immortality through my work; I want to achieve immortality through not dying.”
— Woody Allen