Self-verification of large language models for math word problems

HKUST Electronic Theses

Self-verification of large language models for math word problems

by Qijun Wang

THESIS 2023

M.Phil. Emerging Interdisciplinary Areas

1 online resource (x, 45 pages) : illustrations (chiefly color)

Abstract

In recent years, large language models have performed promisingly in various natural language processing tasks. However, their ability to tackle reasoning tasks, such as solving math word problems that require multiple steps to yield the correct answer, is still limited. One of the primary reasons for this is that large language models cannot rectify errors in the generation process while producing solutions. Once the generated answers deviate from the correct direction, they become irrecoverable swiftly. Therefore, the shortcomings in multi-step reasoning tasks, such as solving math word problems, reflect the weaknesses of large language models. The chain-of-thought method improves the performance of large language models by prompting them to provide detailed steps while solving multistep reasoning tasks. However, even with this method, the answers generated by large language models can still lack self-consistency and sometimes produce hallucinations. This phenomenon highlights the importance of sieving out the correct answers from the candidate answers generated by the model while solving math word problems. To address this issue, recent works have tried to train an extra ranker to select among multiple model outputs or train a reward model under process or outcome supervision with the help of human feedback. However, all of these approaches involve extra costs for training. Instead, we propose a self-verification methodology for checking answers to math word problems generated by large language models without any additional training cost. Selfverification involves taking the answer generated by large language models as a given condition and constructing reverse problems predicated on other known conditions. Large language models are then tasked with answering the reverse problems to verify their previously generated answers. In our experiments, we first illustrate that our proposed self-verification procedure can effectively identify incorrect answers and further improve math-solving performance. Then, we conduct error analysis and refine our approach to improve the overall performance of our methodology. Finally, we compare our approach with other similar approaches.

[ Hide abstract ]

View Copyrighted to the author. Reproduction is prohibited without the author’s prior written consent.

Details

Collection HKUST Electronic Theses Degree M.Phil. Department Emerging Interdisciplinary Areas Supervisors Lin, Fangzhen Chen, Qifeng Authors Wang, Qijun Language English Call number Thesis EMIA 2023 WangQ DOI 10.14711/thesis-991013246559203412

Full record

Self-verification of large language models for math word problems

by Qijun Wang

Post a Comment Cancel reply