Test cases are essential for validating the reliability and quality of software applications. Recent studies have demonstrated the capability of Large Language Models (LLMs) to generate useful test cases for given source code. However, the existing work primarily relies on human-written plain prompts, which often leads to suboptimal results since the performance of LLMs can be highly influenced by the prompts. Moreover, these approaches use the same prompt for all LLMs, overlooking the fact that different LLMs might be best suited to different prompts. Given the wide variety of possible prompt formulations, automatically discovering the optimal prompt for each LLM presents a significant challenge. Although there are methods on automated prompt optimization in the natural language processing field, they are hard to produce effective prompts for the test case generation task. First, the methods iteratively optimize prompts by simply combining and mutating existing ones without proper guidance, resulting in prompts that lack diversity and tend to repeat the same errors in the generated test cases. Second, the prompts are generally lack of domain contextual knowledge, limiting LLMs' performance in the task.
è«æID : 2501.01329ã¿ã€ãã« : The Prompt Alchemist: Automated LLM-Tailored Prompt Optimization for Test Case Generationèè
: Shuzheng Gao, Chaozheng Wang, Cuiyun Gao, Xiaoqian Jiao, Chun Yong Chong, Shan Gao, Michael R. Lyuåé¡ : cs.SE cs.AI cs.CLçºè¡šææ/äŒè° : JOURNAL OF LATEX CLASS FILES, VOL. 18, NO. 9, SEPTEMBER 2020è«æãªã³ã¯ : https://arxiv.org/abs/2501.01329 ãã¹ãã±ãŒã¹ã¯ãœãããŠã§ã¢ã¢ããªã±ãŒã·ã§ã³ã®ä¿¡é Œæ§ãšåè³ªãæ€èšŒããããã«äžå¯æ¬ ã§ãããæè¿ã®ç ç©¶ã«ãããå€§èŠæš¡èšèªã¢ãã«ïŒLLMïŒãäžãããããœãŒã¹ã³ãŒãã«å¯ŸããŠæçšãªãã¹ãã±ãŒã¹ãçæããèœåãåããŠããããšã瀺ãããŠãããããããæ¢åã®ç ç©¶ã¯äž»ã«æåã§äœæãããåçŽãªããã³ããã«äŸåããŠãããLLMã®æ§èœãããã³ããã®å質ã«å€§ããäŸåããããããã°ãã°æé©ã§ãªãçµæããããããããã«ããããã®æ¹æ³ã¯ãã¹ãŠã®LLMã«å¯ŸããŠåãããã³ããã䜿çšããŠãããç°ãªãLLMãç°ãªãããã³ããã«æé©ã«é©å¿ããå¯èœæ§ããããšããäºå®ãç¡èŠããŠãããæ¬è«æã§ã¯ã倿§æ§ã¬ã€ãä»ãããã³ããçæã倱æé§ååã«ãŒã«åž°çŽãé åã³ã³ããã¹ãç¥èæœåºã®3ã€ã®ã³ã¢ã¢ãžã¥ãŒã«ãéããŠãç°ãªãLLMã«å¯Ÿããèªåããã³ããæé©åãå®çŸããMAPSã¡ãœãããææ¡ããã
ãã¹ãã±ãŒã¹çæã¯ãœãããŠã§ã¢ãšã³ãžãã¢ãªã³ã°ã«ãããéèŠãªã¿ã¹ã¯ã§ãããEvosuiteãRandoopãªã©ã®åŸæ¥ã®æ¹æ³ã¯æ€çŽ¢ãšå¶çŽæè¡ã«äŸåããŠãããäžæ¹ãLLMããŒã¹ã®æ¹æ³ã¯æœåšæ§ã瀺ããŠãããã以äžã®åé¡ãååšããïŒ
æåã§äœæãããåçŽãªããã³ããã«äŸåããŠãããæ§èœãæé©ã§ãªã ãã¹ãŠã®LLMã«å¯ŸããŠåãããã³ããã䜿çšããŠãããLLMéã®å·®ç°ãç¡èŠããŠãã ãã¹ãã±ãŒã¹çæã¿ã¹ã¯ã«ç¹åããæé©åãäžè¶³ããŠãã ãã¹ãã±ãŒã¹ã®æåäœæã¯æéããããå°é£ã§ãã é«å質ãªãã¹ãã±ãŒã¹ã¯ãœãããŠã§ã¢å質ä¿èšŒã«äžå¯æ¬ ã§ãã LLMã®ã³ãŒãçè§£ãšçæã«ããã匷åãªèœåã¯ãããã³ããæé©åãéããŠå®å
šã«çºæ®ãããå¿
èŠããã èè
ã¯äºåå®éšãéããŠãæ¢åã®èªåããã³ããæé©åïŒAPOïŒææ³ããã¹ãã±ãŒã¹çæã¿ã¹ã¯ã«ãããŠ3ã€ã®äž»èŠãªåé¡ãæ±ããŠããããšãçºèŠããïŒ
äœã倿§æ§ : çæãããããã³ããã¯å€æ§æ§ã«æ¬ ãã屿æé©ã«é¥ããããç¹°ãè¿ããããšã©ãŒ : æé©åãããããã³ããã¯äŸç¶ãšããŠå
ã®ããã³ãããšåããšã©ãŒãçæããé åç¥èã®æ¬ åŠ : ç¶æ¿é¢ä¿ãã¯ã©ã¹åŒã³åºãæ
å ±ãªã©ã®å¿
èŠãªãããžã§ã¯ãã¬ãã«ã®ã³ã³ããã¹ãæ
å ±ãäžè¶³ããŠããåã®ç ç©¶ : èè
ã®ç¥ãéããããã¯ãã¹ãã±ãŒã¹çæã¿ã¹ã¯ã«ç¹åããLLMé©å¿åããã³ããæé©åã«é¢ããåã®ç ç©¶ã§ãã驿°çãªææ³ : 倿§æ§ã¬ã€ãä»ãããã³ããçæã倱æé§ååã«ãŒã«åž°çŽãé åã³ã³ããã¹ãç¥èæœåºãçµ±åããMAPSã¡ãœãããææ¡é¡èãªæ¹å : 3ã€ã®äžè¬çãªLLMã§ã®å®éšã«ãããMAPSã¯æåŒ·ã®ããŒã¹ã©ã€ã³ææ³ãšæ¯èŒããŠãå¹³åã§è¡ã«ãã¬ããžã6.19%ãåå²ã«ãã¬ããžã5.03%åäžãããããšã瀺ãããLLMé©å¿å : ç°ãªãLLMã«å¯Ÿããã«ã¹ã¿ãã€ãºãããããã³ããçæã®æå¹æ§ãå®èšŒãã©ãã¯ããã¯ã¹ã¢ãã«Mãå°èŠæš¡éçºã»ããDdevããã¹ãã»ããDtestãã¹ã³ã¢ãªã³ã°é¢æ°s(·)ãäžããããå ŽåãAPOã¯èªç¶èšèªç©ºéããDdevã«åºã¥ããŠæé©åãããããã³ããpãçºèŠããããšãç®æããMããã¹ãã»ããDtestã§ã®æ§èœãæå€§åããã
MAPSã¯3ã€ã®ã³ã¢ã¢ãžã¥ãŒã«ã§æ§æãããŠããïŒ
ãã®ã¢ãžã¥ãŒã«ã¯LLMã«é¢é£ãããããžã§ã¯ãã¬ãã«ã®ã³ã³ããã¹ãæ
å ±ãæäŸããïŒ
ãã¡ã€ã«å
ã³ã³ããã¹ãç¥è :
ã¯ã©ã¹ã·ã°ããã£ïŒçŠç¹ã¡ãœãããå«ãã¯ã©ã¹ã®åãšåå çŠç¹ã¡ãœããïŒãã¹ãã±ãŒã¹ãçæããå¿
èŠãããå
·äœçãªã¡ãœãã ã¡ã³ããŒã¡ãœããã·ã°ããã£ïŒã¯ã©ã¹å
ã®ä»ã®ã¡ãœããã®é¢æ°ã·ã°ãã㣠ãã¡ã€ã«éã³ã³ããã¹ãç¥è :
ã¯ã©ã¹ç¶æ¿æ
å ±ïŒæœè±¡ã¯ã©ã¹ãŸãã¯ãã©ã€ããŒãã¯ã©ã¹ã®å Žåããããžã§ã¯ãå
šäœãã¹ãã£ã³ããŠãã®ãµãã¯ã©ã¹ãç¹å® ã¯ã©ã¹åŒã³åºãæ
å ±ïŒçŠç¹ã¡ãœããã®ãã©ã¡ãŒã¿åãèå¥ãããŠãŒã¶ãŒå®çŸ©åã®å®çŸ©ãšã³ã³ã¹ãã©ã¯ã¿ã远跡 ãã®ã¢ãžã¥ãŒã«ã¯ç°ãªãä¿®æ£ãã¹ãæ¢çŽ¢ããããšã§ã倿§åãããããã³ãããäœæããïŒ
ã¢ã«ãŽãªãºã 2ïŒPROMPTIMPROVEMENT
1. ããã©ãŒãã³ã¹ãæãè¯ãKåã®ããã³ãããéžæ
2. Nåã®ç°ãªãä¿®æ£æ¹æ³ãçæ
3. åä¿®æ£æ¹æ³ã«åºã¥ããŠæ°ããããã³ãããçæ
4. éžæãããããã³ãããšæ°ããçæãããããã³ãããããŒãž
ãã®ã¢ãžã¥ãŒã«ã¯å€±æãããã¹ãã±ãŒã¹ãåæããããšã§ã«ãŒã«ãåž°çŽããç¹°ãè¿ããããšã©ãŒãåé¿ããïŒ
倱ææ
å ±ã®éžæ :
倱æãããã¹ãã±ãŒã¹ãšãšã©ãŒæ
å ±ãåé DBSCANã¯ã©ã¹ã¿ãªã³ã°ã¢ã«ãŽãªãºã ã䜿çšããŠå€±ææ
å ±ãéçŽ ã¯ã©ã¹ã¿ãµã€ãºãšéå»ã®å€±æãšã®é¡äŒŒæ§ã«åºã¥ããŠéã¿ä»ããµã³ããªã³ã°ã宿œ ãšã©ãŒåæ :
代衚çãªå€±æã±ãŒã¹ãéžæããŠåæããã³ãããæ§ç¯ LLMã«è©³çްãªèª¬æãšè§£æ±ºçãæäŸããã 説æãšè§£æ±ºçãç°¡æœãªã«ãŒã«ã«å€æ ã«ãŒã«æ€èšŒ :
æ°ããçæãããåã«ãŒã«ã®æå¹æ§ãæ€èšŒ ããã©ãŒãã³ã¹ãæãè¯ãã«ãŒã«ãä¿æ 倿§æ§ã®ä¿èšŒ : ç°ãªãä¿®æ£æ¹æ³ã®äœ¿çšã匷å¶ããããšã§ãããã³ããã®å€æ§æ§ã確ä¿ãã屿æé©ãåé¿å€±æããã®åŠç¿ : 倱æã±ãŒã¹ããåŠç¿ããã«ãŒã«åž°çŽãéããŠæé©åããã»ã¹ãã¬ã€ãã³ã³ããã¹ã匷å : ãããžã§ã¯ãã¬ãã«ã®ã³ã³ããã¹ãæ
å ±ãæäŸããLLMãæ£ç¢ºãªãã¹ãã±ãŒã¹ãçæããã®ãæ¯æŽãœããçµ±å : åæåºåãç°¡æœãªã«ãŒã«ã«å€æããåé·ãªããã³ããã«ããããã©ãŒãã³ã¹äœäžãåé¿åºãæ¡çšãããŠããDefects4Jãã³ãããŒã¯ã䜿çšãã5ã€ã®Javaãããžã§ã¯ããå«ãïŒ
Apache Commons CLIïŒ29åã®ãã°ïŒ Apache Commons CSVïŒ15åã®ãã°ïŒ Google GsonïŒ17åã®ãã°ïŒ JFreeChartïŒ26åã®ãã°ïŒ Apache Commons LangïŒ60åã®ãã°ïŒ åèšïŒ147åã®ãã°ã85åã®çŠç¹ã¯ã©ã¹ã5,278åã®çŠç¹ã¡ãœãã è¡ã«ãã¬ããžïŒ%ïŒ : ãã¹ãäžã«å®è¡ãããã³ãŒãè¡ã®å²ååå²ã«ãã¬ããžïŒ%ïŒ : ãã¹ãäžã«å®è¡ãããåå²ã®å²åLLMã¢ãã« :
ChatGPTïŒgpt-3.5-turbo-0125ïŒ Llama-3.1-70B-Instruct Qwen2-72B-Instruct ããŒã¹ã©ã€ã³ææ³ :
BasicïŒæé©ãªã·ãŒãããã³ããã®ããã©ãŒãã³ã¹ APEïŒLLMã«æå³ä¿åçãªããã³ããå€äœã®çæãçŽæ¥èŠæ± OPROïŒããã©ãŒãã³ã¹æ
å ±ãçµã¿åãããŠæ°ããããã³ãããçæ EVOPROMPTïŒGA/DEïŒïŒé²åã¢ã«ãŽãªãºã ã«åºã¥ãææ°ã®ããã³ããæé©åææ³ ã·ãŒãããã³ããæ°ïŒ5 åã©ãŠã³ãã§çæãããããã³ããæ°ïŒ2 æå€§ååŸ©åæ°ïŒ5 éçºã»ããïŒã©ã³ãã ã«10åã®ãã°ããµã³ããªã³ã° å®éšã3åç¹°ãè¿ããŠå¹³åçµæãå ±å ChatGPTã§ã®ããã©ãŒãã³ã¹ :
è¡ã«ãã¬ããžïŒMAPSã¯53.80%ã«éããæåŒ·ã®ããŒã¹ã©ã€ã³EVOPROMPTïŒGAïŒã¯46.63%ã§ã7.17%ã®æ¹å åå²ã«ãã¬ããžïŒMAPSã¯41.84%ã«éããæåŒ·ã®ããŒã¹ã©ã€ã³ã¯35.88%ã§ã5.96%ã®æ¹å Llama-3.1ã§ã®ããã©ãŒãã³ã¹ :
è¡ã«ãã¬ããžïŒMAPSã¯50.59%ã«éããæåŒ·ã®ããŒã¹ã©ã€ã³ã¯46.52%ã§ã4.07%ã®æ¹å åå²ã«ãã¬ããžïŒMAPSã¯39.50%ã«éããæåŒ·ã®ããŒã¹ã©ã€ã³ã¯35.07%ã§ã4.43%ã®æ¹å Qwen2ã§ã®ããã©ãŒãã³ã¹ :
è¡ã«ãã¬ããžïŒMAPSã¯45.51%ã«éããæåŒ·ã®ããŒã¹ã©ã€ã³ã¯39.41%ã§ã6.10%ã®æ¹å åå²ã«ãã¬ããžïŒMAPSã¯32.71%ã«éããæåŒ·ã®ããŒã¹ã©ã€ã³ã¯28.92%ã§ã3.79%ã®æ¹å åã¢ãžã¥ãŒã«ã®è²¢ç®åæïŒChatGPTã®äŸïŒïŒ
é åã³ã³ããã¹ãç¥èæœåºãåé€ïŒè¡ã«ãã¬ããžã9.64%äœäžãåå²ã«ãã¬ããžã8.53%äœäž 倿§æ§ã¬ã€ãä»ãããã³ããçæãåé€ïŒè¡ã«ãã¬ããžã8.21%äœäžãåå²ã«ãã¬ããžã7.80%äœäž 倱æé§ååã«ãŒã«åž°çŽãåé€ïŒè¡ã«ãã¬ããžã6.94%äœäžãåå²ã«ãã¬ããžã4.76%äœäž å®éšã«ãããMAPSãç°ãªãLLMã«å¯Ÿããã«ã¹ã¿ãã€ãºãããããã³ãããçæã§ããããšãæ€èšŒãããïŒ
åLLMã¯å°çšã®æé©åããã³ããã§æé«ã®ããã©ãŒãã³ã¹ãçºæ® ChatGPTã®æçµããã³ããã¯ãä»ã®LLMã®ããã³ãããšæ¯èŒããŠãChatGPTäžã§è¡ã«ãã¬ããžã§ãããã2.45%ãš2.66%é«ã MAPSæé©åããã³ããã¯ãã¹ãŠæåèšèšããã³ãããäžåã ã±ãŒã¹1 - Llama-3.1 : 第2ã®åž°çŽã«ãŒã«ãéããŠãã¢ãã«ã¯äŸå€åŠçãå«ããã¹ãã±ãŒã¹ãæ£ããçæ
ã±ãŒã¹2 - ChatGPT : ãã¡ã€ã«éã³ã³ããã¹ãç¥èãéããŠãã¢ãã«ã¯æœè±¡ã¯ã©ã¹ãæ£ããåæåã§ãã
APEïŒLLMã«æå³ä¿åçãªããã³ããå€äœã®çæãçŽæ¥èŠæ± OPROïŒããã©ãŒãã³ã¹æ
å ±ãçµã¿åãããŠããã³ããçæãã¬ã€ã EVOPROMPTïŒé²åã¢ã«ãŽãªãºã ã«åºã¥ãææ°ã®ææ³ åŸæ¥ã®ææ³ïŒRandoopïŒã©ã³ãã ãã¡ãžã³ã°ïŒãEvosuiteïŒæ€çŽ¢ã¢ã«ãŽãªãºã ïŒ æ·±å±€åŠç¿ææ³ïŒAthenaTestïŒåŸ®èª¿æŽBARTïŒãA3TestïŒã¢ãµãŒã·ã§ã³ç¥è匷åïŒ LLMææ³ïŒChatUniTestãChatTESTERãªã© MAPSã¯ãã¹ãŠã®LLMã§æ¢åã®ããã³ããæé©åææ³ã倧å¹
ã«äžåã ç°ãªãLLMã¯ç¢ºãã«ã«ã¹ã¿ãã€ãºãããããã³ãããå¿
èŠãšãã 3ã€ã®ã³ã¢ã¢ãžã¥ãŒã«ã¯ãã¹ãŠããã©ãŒãã³ã¹åäžã«éèŠãªè²¢ç®ãããŠãããé åã³ã³ããã¹ãç¥èæœåºãæå€§ã®è²¢ç®ãããŠãã LLMã®éç : 3ã€ã®ä»£è¡šçãªLLMã§ã®ã¿è©äŸ¡ã宿œèšèªã®éç : å®éšã¯Javaãããžã§ã¯ãã«éå®ãããä»ã®ããã°ã©ãã³ã°èšèªãã«ããŒããŠããªãããŒã¿ã»ããç¯å² : Defects4Jãã³ãããŒã¯ã®ã¿ã䜿çšããå€ãã®LLMãšããã°ã©ãã³ã°èšèªãžã®æ¡åŒµ æ¢åã®LLMãã¹ãçæææ³ãšã®çµ±å ããè€éãªãããžã§ã¯ãã¬ãã«ã®ã³ã³ããã¹ãæ
å ±ã®æ¢çŽ¢ åé¡å®çŸ©ãæç¢º : LLMãã¹ãã±ãŒã¹çæã®ããã³ããæé©ååé¡ãåããŠäœç³»çã«ç ç©¶ææ³ã®é©æ°æ§ãé«ã : 3ã€ã®ã¢ãžã¥ãŒã«èšèšãåççã§ãæ¢åææ³ã®éèŠãªåé¡ã解決å®éšãå
å : è€æ°ã®LLMãè€æ°ã®ãããžã§ã¯ãã§ã®å
æ¬çãªè©äŸ¡ã宿œå®çšäŸ¡å€ãé«ã : ææ³ã¯æ±çšçã§ãç°ãªãLLMãšãããžã§ã¯ãã«é©çšå¯èœèšç®ã³ã¹ã : å埩æé©åããã»ã¹ã¯å€§éã®APIåŒã³åºããå¿
èŠãšããå¯èœæ§ããããã³ã¹ããé«ãã«ãŒã«å質 : 倱æé§ååã«ãŒã«åž°çŽã¯LLMã®åæèœåã«äŸåããŠãããã«ãŒã«å質ãäžå®å®ã§ããå¯èœæ§ãããã³ã³ããã¹ãæœåº : ãã¡ã€ã«éã³ã³ããã¹ãæœåºã®å®å
šæ§ãšæ£ç¢ºæ§ãããã«æ€èšŒããå¿
èŠãããåŠè¡çè²¢ç® : LLMãã¹ãã±ãŒã¹çæã®ããã³ããæé©åç ç©¶ã®æ¹åæ§ãéæå®çšçäŸ¡å€ : å®éã®ãœãããŠã§ã¢éçºã«ããããã¹ãã±ãŒã¹çæã«çŽæ¥é©çšå¯èœåçŸæ§ : å®å
šãªåçŸããã±ãŒãžãæäŸããåŸç¶ç ç©¶ã容æã«ããé«å質ãªãã¹ãã±ãŒã¹ã®èªåçæãå¿
èŠãªãœãããŠã§ã¢ãããžã§ã¯ã ç°ãªãLLMã䜿çšããŠã³ãŒãçæãè¡ãããŒã LLMããã©ãŒãã³ã¹ãæé©åããå¿
èŠããããœãããŠã§ã¢ãšã³ãžãã¢ãªã³ã°ã¿ã¹ã¯ è«æã¯48ç¯ã®é¢é£æç®ãåŒçšããŠããããœãããŠã§ã¢ãã¹ããããã³ãããšã³ãžãã¢ãªã³ã°ãå€§èŠæš¡èšèªã¢ãã«ãªã©è€æ°ã®åéã®éèŠãªç ç©¶ãã«ããŒããŠãããç ç©¶ã«å
å®ãªçè«çåºç€ãæäŸããŠããã
ç·åè©äŸ¡ : ããã¯LLMãã¹ãã±ãŒã¹çæåéã«ãããéèŠãªçè«çããã³å®çšç䟡å€ãæã€é«å質ãªãœãããŠã§ã¢ãšã³ãžãã¢ãªã³ã°ç ç©¶è«æã§ãããææ³èšèšãåççã§ãå®éšè©äŸ¡ãå
åã§ãããçµæã¯èª¬åŸåããããããã€ãã®éçãååšããããå
šäœçãªè²¢ç®ã¯é¡èã§ããããã®åéã®çºå±ã«éèŠãªæšé²åãæäŸããŠããã