Can Large Language Models Improve SE Active Learning via Warm-Starts?
Senthilkumar, Menzies
When SE data is scarce, "active learners" use models learned from tiny samples of the data to find the next most informative example to label. In this way, effective models can be generated using very little data. For multi-objective software engineering (SE) tasks, active learning can benefit from an effective set of initial guesses (also known as "warm starts"). This paper explores the use of Large Language Models (LLMs) for creating warm-starts. Those results are compared against Gaussian Process Models and Tree of Parzen Estimators. For 49 SE tasks, LLM-generated warm starts significantly improved the performance of low- and medium-dimensional tasks. However, LLM effectiveness diminishes in high-dimensional problems, where Bayesian methods like Gaussian Process Models perform best.
academic
Can Large Language Models Improve SE Active Learning via Warm-Starts?
When software engineering (SE) data is scarce, "active learners" use models learned from small data samples to identify the next most informative example for annotation. In this manner, effective models can be generated using minimal data. For multi-objective software engineering tasks, active learning can benefit from effective initial guess sets, also known as "warm-starts." This paper explores using large language models (LLMs) to create warm-starts and compares the results with Gaussian process models and Parzen tree estimators. Across 49 SE tasks, LLM-generated warm-starts significantly improve performance on low- and medium-dimensional tasks. However, LLM effectiveness diminishes on high-dimensional problems, where Bayesian methods such as Gaussian process models perform best.
This paper proposes using LLMs' background knowledge to generate better initial guesses (warm-starts) to improve active learning performance on SE multi-objective optimization tasks.
Using the SS-A dataset as an example, LLM/exploit achieves top ranking (rank 0) across different budgets, with median Chebyshev distance of 0.07-0.08, significantly outperforming the baseline of 0.18.
The paper cites 87 related references covering multiple domains including active learning, multi-objective optimization, software engineering, and large language models, providing a solid theoretical foundation for the research.
Summary: This is an innovative research contribution in the software engineering optimization domain, systematically exploring LLM applications in active learning warm-starts for the first time. Despite certain limitations, its large-scale experimental validation and practical value make it an important contribution to the field.