Reinforcement Learning from Answer Reranking Feedback for Retrieval-Augmented Answer Generation

Publication Date: January 1, 2024

Nguyen, Minh & Nguyen, Toan & KC, Kishan & Zhang, Zeyu & Vu, Thuy. (2024). Reinforcement Learning from Answer Reranking Feedback for Retrieval-Augmented Answer Generation. 4044-4048. 10.21437/Interspeech.2024-2147.


Retrieval-augmented generation (RAG) is a method to improve accuracy and reliability of large language models (LLMs) for open-domain question answering (ODQA). Traditional approaches rely on supervised learning, which can result in misaligned user intent and system output. Reinforcement learning from human feedback (RLHF) addresses this issue by training a reward model using human preference feedback. In this work, we introduce a novel RLHF framework for ODQA, leveraging existing large-scale answer reranking datasets for training a reward model. In particular, our reward model for ODQA plays two complementary roles:(i) providing ranking scores as rewards for PPO, and (ii) retrieving relevant facts that enable the ODQA system to formulate a factual answer. Experimental results indicate that our proposed framework is effective for RLHF, leading to near-expert performance for ODQA.