THESIS
2023
1 online resource (xiii, 101 pages) : illustrations (chiefly color)
Abstract
The dissemination of erroneous or misleading information, known as misinformation, has become
a major societal concern due to its potential for catastrophic consequences. For instance, false
rumors about medical advice could result in detrimental outcomes, while the spread of biased news
has been linked to political polarization. Yet, given the sheer volume of information generated and
shared, verifying its accuracy in a timely and accurate manner is not practically feasible through
human labor alone. Consequently, it is imperative to develop efficient and effective methods for
automatically mitigating misinformation.
One effective means of automatically addressing the issue of misinformation involves utilizing
natural language processing (NLP) for fact-checking. This approach involves...[
Read more ]
The dissemination of erroneous or misleading information, known as misinformation, has become
a major societal concern due to its potential for catastrophic consequences. For instance, false
rumors about medical advice could result in detrimental outcomes, while the spread of biased news
has been linked to political polarization. Yet, given the sheer volume of information generated and
shared, verifying its accuracy in a timely and accurate manner is not practically feasible through
human labor alone. Consequently, it is imperative to develop efficient and effective methods for
automatically mitigating misinformation.
One effective means of automatically addressing the issue of misinformation involves utilizing
natural language processing (NLP) for fact-checking. This approach involves retrieving relevant
evidence from a knowledge base and then verifying the accuracy of the information against this
evidence. It is considered reliable because it is justified with facts and evidence. Many explorations
and advancements have been made in fact-checking methods, however, several critical limitations
remain: (1) A substantial amount of expensive data is required to properly train a fact-checking
model. (2) For rapidly evolving misinformation, there is often no reliable evidence available to do
proper fact-checking; this is especially the case for misinformation related to political and economic events. (3) Fact-checking fails to properly capture misinformation involving framing bias. This is
because framing bias is not always incorrect but instead involves subjective or nuanced wording that
deliberately conveys misleading impressions about what really happened. In this thesis, we aim to
improve and go beyond existing fact-checking solutions to more exhaustively cover misinformation.
Firstly, we introduce novel zero-shot fact-checking techniques that leverage LLMs to verify
information in a data-efficient manner. When LLMs are trained on a vast corpus of text, they can
extract rich world knowledge that makes them an effective tool for fact-checking purposes. Our
research demonstrates that the likelihood (and perplexity) of LLMs can serve as a reliable proxy
metric for the factual accuracy of information. We propose techniques to leverage such ability of
LLMs to efficiently fact-check misinformation.
Secondly, we show that misinformation without evidence can still be effectively tackled by
relying on the stylistic features. We propose to learn a generalizable and discriminative features,
such as stylistic and linguistic patterns, associated with the misinformation itself, and leverage
them to detect misinformation. To achieve this, we jointly trained LLM on a variety of different
domains of misinformation and empirically verified the effectiveness of our model in detecting
misinformation in previously unseen events.
Lastly, we propose to de-bias the framing bias in misinformation by fine-tuning the LLM to
generate a neutralized alternative. Since articles from conflicting news outlets still share the “same
set of underlying facts” [28], we propose a new task of generating a neutral summary from a
group of polarized and biased news articles. For this new task, we establish a new benchmark by
collecting the dataset, designing set of evaluation metrics, and proposing strong baseline models.
Post a Comment