THESIS
2023
1 online resource (x, 45 pages) : illustrations (chiefly color)
Abstract
In recent years, large language models have performed promisingly in various natural language processing tasks. However, their ability to tackle reasoning tasks, such as solving math word problems that require multiple steps to yield the correct answer, is still limited. One of the primary reasons for this is that large language models cannot rectify errors in the generation process while producing solutions. Once the generated answers deviate from the correct direction, they become irrecoverable swiftly. Therefore, the shortcomings in multi-step reasoning tasks, such as solving math word problems, reflect the weaknesses of large language models. The chain-of-thought method improves the performance of large language models by prompting them to provide detailed steps while solving multis...[
Read more ]
In recent years, large language models have performed promisingly in various natural language processing tasks. However, their ability to tackle reasoning tasks, such as solving math word problems that require multiple steps to yield the correct answer, is still limited. One of the primary reasons for this is that large language models cannot rectify errors in the generation process while producing solutions. Once the generated answers deviate from the correct direction, they become irrecoverable swiftly. Therefore, the shortcomings in multi-step reasoning tasks, such as solving math word problems, reflect the weaknesses of large language models. The chain-of-thought method improves the performance of large language models by prompting them to provide detailed steps while solving multistep reasoning tasks. However, even with this method, the answers generated by large language models can still lack self-consistency and sometimes produce hallucinations. This phenomenon highlights the importance of sieving out the correct answers from the candidate answers generated by the model while solving math word problems. To address this issue, recent works have tried to train an extra ranker to select among multiple model outputs or train a reward model under process or outcome supervision with the help of human feedback. However, all of these approaches involve extra costs for training. Instead, we propose a self-verification methodology for checking answers to math word problems generated by large language models without any additional training cost. Selfverification involves taking the answer generated by large language models as a given condition and constructing reverse problems predicated on other known conditions. Large language models are then tasked with answering the reverse problems to verify their previously generated answers. In our experiments, we first illustrate that our proposed self-verification procedure can effectively identify incorrect answers and further improve math-solving performance. Then, we conduct error analysis and refine our approach to improve the overall performance of our methodology. Finally, we compare our approach with other similar approaches.
Post a Comment