THESIS
2023
1 online resource (xiv, 97 pages) : illustrations (chiefly color)
Abstract
Over the past few years, the natural language processing community has been witnessing the rapid development in language models. There is a high level of public interest in these models, particularly generative ones, and language models have started to appear in user-facing applications. While improving model performance has always been a key objective, the widespread use of language models has also brought attention to the importance of modeling “reliable” probabilities....[
Read more ]
Over the past few years, the natural language processing community has been witnessing the rapid development in language models. There is a high level of public interest in these models, particularly generative ones, and language models have started to appear in user-facing applications. While improving model performance has always been a key objective, the widespread use of language models has also brought attention to the importance of modeling “reliable” probabilities.
In many scenarios, a prediction by a model is coupled with a corresponding probability, commonly referred to as a confidence score, and the score adds meaningful information; the probability indicates the level of confidence by the model in making the prediction and represents the level of trustworthiness of the prediction. However, the underlying hypothesis is that the model is calibrated. When a model is miscalibrated, the confidence score is no longer a meaningful indicator.
Model calibration involves refining a model to produce accurate probability estimates. Specifically, the probability mapped by a model is expected to accurately reflect the likelihood of a corresponding prediction being correct. The importance of model calibration is arguably of utmost significance in natural language generation domain; a probability is merely an indicator of model’s certainty in other domains, yet with a language model that generates a text in autoregressive manner, calibration of a model has a direct impact on the model outputs. A language model creates a text with a decoding algorithm, and the decoding scheme utilizes probability distributions mapped by the model. Therefore, due to the distinct nature of language generation models, model calibration has a direct impact on a model output, and hence calibration of language models requires a thorough and extensive study.
In this thesis, we present three novel methods for improving calibration of a model that are specifically designed for text generation. Firstly, we propose a student-teacher framework that calibrates a language model. Given a calibrated teacher model, a student model not only benefits from the knowledge distilled, but also learns to match the calibrated scores mapped by the teacher model. The next method is a novel regularization scheme that not only improves model performance, but also reduces calibration error of the model. The regularizer is a variant of label smoothing, a popular regularization method. The proposed regularization scheme self-regulates the extent of smoothing based on the confidence score mapped by the model in training, and hence the language model is less likely to make predictions with overconfidence, a type of miscalibration. Lastly, this thesis presents a novel decoding scheme that is rooted from the concept of model calibration. The longstanding problem of a language model is repetitions in outputs. We see the problem as a casualty brought by miscalibration of a language model. In this context, our novel decoding scheme applies post-hoc calibration in the course of inference.
Post a Comment