THESIS
2023
1 online resource (xvii, 175 pages) : illustrations (some color)
Abstract
In the burgeoning realm of artificial intelligence, foundation models stand out as a pivotal advancement. Foundation models centralize the information from broad data across various modalities and adapt to a wide range of downstream tasks. Their general-purpose nature allows impressive performance across diverse tasks, yet domain-aware adaptation can further amplify their efficacy. This thesis focuses on the adaptation phase in the lifecycle of foundation models, which includes pretraining, adaptation, and deployment, and offers an in-depth exploration into taming language and vision-language models to particular domains and tasks. I propose a series of approaches to adapt foundation models into specific domains or tasks efficiently and effectively, enhancing their real-world applicabil...[
Read more ]
In the burgeoning realm of artificial intelligence, foundation models stand out as a pivotal advancement. Foundation models centralize the information from broad data across various modalities and adapt to a wide range of downstream tasks. Their general-purpose nature allows impressive performance across diverse tasks, yet domain-aware adaptation can further amplify their efficacy. This thesis focuses on the adaptation phase in the lifecycle of foundation models, which includes pretraining, adaptation, and deployment, and offers an in-depth exploration into taming language and vision-language models to particular domains and tasks. I propose a series of approaches to adapt foundation models into specific domains or tasks efficiently and effectively, enhancing their real-world applicability. First, I explore model-based adaptation, where the full model is trained from scratch to obtain a better performance in the Chinese domain (ZEN) and vision-language domain (DaVinci). Second, I investigate two adapter-based methods to improve the efficiency. T-DNA is a domain-aware n-gram adapter which explicitly leverages multi-granularity information of domain-specific n-grams and MixDA is a mixture of domain adapters to combine multiple domain adapters effectively. In the third part of this thesis, I present two prompt-based adaptation methods. BDPL updates a few prompts without access to parameters and gradients and Active-Prompt judiciously selects the most helpful questions for annotation, improving the performance of black-box foundation models. Together, my focus is on architectural modifications, training strategies, and prompting methods and these techniques enhance foundation model performance in specific domains and tasks while ensuring the adaptation is resource-efficient, scalable, and effective.
Post a Comment