Research Portfolio

LLM Trustworthiness

The diagram illustrates the four key aspects of LLM trustworthiness:

  • Robustness to Input: Ensuring LLMs can handle adversarial attacks and distribution shifts (ICLR 2025).
  • Transparency of Decision: Improving reasoning models and long-chain-of-thought (Coming Soon).
  • Validity of Output: Addressing factual errors (EMNLP 2023) and inconsistencies in outputs (ACL 2023 findings).
  • Resistance to Misuse: Preventing cheating, plagiarism (ACL 2024), and unsafe fine-tuning (ICML 2025).

From a systems perspective, the first three aspects correspond to the input, hidden states, and output of the system. The final aspect represents the relationship between the system and its users. These principles guide the design of safer and more trustworthy large language models.

Last updated: June 2025