Research Portfolio

Research Portfolio –>

LLM Trustworthiness

I’m dedicated to enhancing the reliability of LLMs across four dimensions:

  • Robustness to Input: Ensuring LLMs can handle adversarial attacks and distribution shifts (ICLR 2025).
  • Transparency of Decision: Improving interpretability techniques and reasoning models (BRIDGE).
  • Validity of Output: Addressing hallucinations (EMNLP 2023) and inconsistencies in model outputs (ACL 2023 findings).
  • Resistance to Misuse: Preventing the use of AI for cheating, plagiarism (ACL 2024), and unsafe fine-tuning (ICML 2025).