Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into one. Despite its empirical success, a clear theoretical explanation of why and when it works is lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates the finetuning trajectory in both norm and direction. A key implication is that merging models finetuned for only a single epoch often yields performance comparable to merging fully converged models. These findings reframe task arithmetic as a form of approximate multitask learning, providing a clear rationale for its effectiveness and highlighting the critical role of early training dynamics in model merging.
è«æID : 2508.16082ã¿ã€ãã« : On Task Vectors and Gradientsèè
: Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe A. D'Inverno, Fabrizio Silvestri, Emanuele Rodolà åé¡ : cs.LG, cs.AIçºè¡šææ/äŒè° : NeurIPS 2025 Workshop: UniRepsè«æãªã³ã¯ : https://arxiv.org/abs/2508.16082 ã¿ã¹ã¯ç®è¡ïŒTask ArithmeticïŒã¯ãè€æ°ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã¢ãã«ãåäžã®çµ±åã¢ãã«ã«çµã¿åããããã·ã³ãã«ãã€åŒ·åãªã¢ãã«çµ±åæè¡ã§ãããå®éšã§ã¯åªããæ§èœã瀺ããŠããã«ããããããããã®åäœåçãšé©çšæ¡ä»¶ã説æããæç¢ºãªçè«ç説æãæ¬ ããŠãããæ¬è«æã¯ãã¿ã¹ã¯ãã¯ãã«ãšã¿ã¹ã¯æå€±åŸé
ã®éã®é¢é£æ§ã確ç«ããããšã§ãã¿ã¹ã¯ç®è¡ã«å³å¯ãªçè«çåºç€ãæäŸãããç ç©¶ã«ãããæšæºçãªåŸé
éäžæ³ã®æ¡ä»¶äžã§ã¯ã1ãšããã¯ã®ãã¡ã€ã³ãã¥ãŒãã³ã°ã§çæãããã¿ã¹ã¯ãã¯ãã«ã¯ãæå€±ã®è² ã®åŸé
ã«åŠç¿çãä¹ãããã®ãšå®å
šã«ç䟡ã§ããããšã瀺ããããå®éã®è€æ°ãšããã¯èšå®ã§ã¯ããã®ç䟡æ§ã¯è¿äŒŒçã«æç«ãã2次誀差é
ãååšãããèè
ã¯ååããããã¯ãŒã¯ã«å¯ŸããŠæç€ºçãªçãæäŸããŠããã7ã€ã®ããžã§ã³ãã³ãããŒã¯ã®å®éšåæã«ããçè«ãæ€èšŒãããæåã®ãšããã¯ã®åŸé
ããã¡ã€ã³ãã¥ãŒãã³ã°è»è·¡ãç¯æ°ãšæ¹åã®äž¡é¢ã§æ¯é
ããŠããããšã蚌æããããéèŠãªçºèŠãšããŠã1ãšããã¯ã®ã¿ã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãããã¢ãã«ã®çµ±åã¯ãå®å
šã«åæããã¢ãã«ã®çµ±åãšåçã®æ§èœãéæããããšãå€ããšããç¹ãæããããã
äºååŠç¿-ãã¡ã€ã³ãã¥ãŒãã³ã°ãã©ãã€ã ã¯æ·±å±€åŠç¿ã®åºç€ãšãªããå€§èŠæš¡ãªæ±çšã¢ãã«ãç¡æ°ã®ç¹å®ã¿ã¹ã¯ã«é©å¿ããããšãå¯èœã«ããããããããã®æåã«ã¯èããã³ã¹ãã䌎ããåã¿ã¹ã¯ã«å¯ŸããŠåå¥ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã¢ãã«ãä¿åããããšã¯ãèšå€§ãªã¹ãã¬ãŒãžãªãŒããŒããããçãããããã®èª²é¡ã¯å°éçå¿çšã®æ°ã®å¢å ã«äŒŽãæ·±å»åããã
ã¹ãã¬ãŒãžå¹çã®åé¡ ïŒåã¿ã¹ã¯ãç¬ç«ãããã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã¢ãã«ãå¿
èŠãšããã¹ãã¬ãŒãžã³ã¹ããç·åœ¢ã«å¢å ããçè«ççè§£ã®æ¬ èœ ïŒã¿ã¹ã¯ç®è¡ãå®éšã§è¯å¥œãªæ§èœã瀺ããŠããã«ãããããããå³å¯ãªçè«ç説æãæ¬ ããŠããæé©ãªãã¡ã€ã³ãã¥ãŒãã³ã°æŠç¥ã®äžæç¢ºæ§ ïŒã¢ãã«çµ±åã«æã广çãªãã¡ã€ã³ãã¥ãŒãã³ã°æéãäžæã§ããã¿ã¹ã¯ç®è¡ã¯ã·ã³ãã«ã§å¹æçã§ããããçè«çåºç€ãæ¬ ããŠãã å
è¡ç ç©¶ã¯ãçæéã®ãã¡ã€ã³ãã¥ãŒãã³ã°ã®ã¿ã¹ã¯ãã¯ãã«ãçµ±åã«é©ããŠããããšãçµéšçã«èгå¯ããŠããã ãã§ãå³å¯ãªèª¬æããªã ã¿ã¹ã¯ãã¯ãã«ãšåŸé
ã®é¢ä¿ã«é¢ããæ°åŠçåæãæ¬ ããŠãã æ¬è«æã¯çè«ç空çœãåããããšãç®çãšããæ°åŠçåæãéããŠã¿ã¹ã¯ç®è¡ã®åäœåçãæããã«ãããç¹ã«ãã¿ã¹ã¯ãã¯ãã«ãšãã«ãã¿ã¹ã¯åŠç¿åŸé
ã®éã®é¢é£æ§ã確ç«ããã
çè«çåºç€ã®ç¢ºç« ïŒåäžãšããã¯åŸé
éäžæ³ã®ã¿ã¹ã¯ãã¯ãã«ãã¹ã±ãŒãªã³ã°ãããè² ã®åŸé
ã§ããããšãå³å¯ã«èšŒæããåŸç¶ã®ã¿ã¹ã¯ç®è¡å埩ãšçµåãã«ãã¿ã¹ã¯åŠç¿ã®å·®ç°ã2次é
O(η²)ã®ã¿ã§ããããšã瀺ãã誀差çã®å°åº ïŒæçéã¿ãšæçå°é¢æ°æŽ»æ§å颿°ãä»®å®ããŠãååããããã¯ãŒã¯ã«å¯Ÿãã2次誀差é
ã®æç€ºçãªåäž2-ãã«ã çãå°åºããå®éšçæ€èšŒ ïŒè€æ°ã®ããžã§ã³ã¿ã¹ã¯ã§å®éšãè¡ããæåã®ãšããã¯åŸé
ãå
šäœçãªãã¡ã€ã³ãã¥ãŒãã³ã°è»è·¡ã«å¯ŸããŠãç¯æ°ãšæ¹åã®äž¡é¢ã§æ¯é
çãªå¯äžãããŠããããšã確èªããå®è·µçæå° ïŒçæéã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãã¢ãã«çµ±åã«æå©ã§ããçè«çæ ¹æ ãæäŸããã¿ã¹ã¯ç®è¡ããã«ãã¿ã¹ã¯åŠç¿ã®è¿äŒŒãšããŠåå®çŸ©ããTãã¿ã¹ã¯éåã|T|ãã¿ã¹ã¯æ°ãšãããäºååŠç¿æžã¿ã¢ãã«ã®éã¿ãΞ_baseãšãããã¿ã¹ã¯tâTã«å¯ŸããŠãΞ_t^(k)ã¯ã¿ã¹ã¯tã§ k ãšããã¯ã®ãã¡ã€ã³ãã¥ãŒãã³ã°åŸã®ãã©ã¡ãŒã¿ã衚ããã¿ã¹ã¯ãã¯ãã«ã¯ä»¥äžã®ããã«å®çŸ©ãããïŒ
Ï_t^(k) := Ξ_t^(k) - Ξ_base
ã¿ã¹ã¯tã®çµéšçæå€±ã¯ïŒ
L_t(Ξ) := (1/n_t) Σ_{i=1}^{n_t} â(x_i, y_i, Ξ)
Ξ_TA^(k) = Ξ_base + α Σ_{tâT} Ï_t^(k)ãã¿ã¹ã¯ç®è¡ã䜿çšããŠåŸãããã¢ãã«ãšãããããã§{Ξ_t^(k)}{tâT}㯠k ãšããã¯ã®å
šãããåŸé
éäžæ³ã§çæãããã¹ããããµã€ãºã¯Î·ã§ãããΞ_MT^(k)ãéçŽæå€±Î£ {tâT} L_tã§ k ãšããã¯åŸé
éäžæ³ãå®è¡ããçµæãšããã¹ããããµã€ãºãαηãšããããã®ãšã以äžãæç«ããïŒ
æåã®ãšããã¯ã®å®å
šçäŸ¡æ§ ïŒè€æ°ãšããã¯ã®è¿äŒŒçäŸ¡æ§ ïŒk > 1ïŒïŒÎž_TA^(k) = Ξ_MT^(k) + η²C({Ξ_MT^(j)}_{j=1}^{k-2}) + O(η³)
ããã§Cé
ã¯2次誀差é
ã§ããïŒ
C({Ξ_MT^(j)}_{j=1}^h) = Σ_{tâT} Σ_{e=0}^h â²L_t(Ξ_MT^(e)) Σ_{m=0}^e r_t(Ξ_MT^(m))
çè«ã¯ãæåã®ãšããã¯ã®åŸé
æ
å ±ãå
šäœçãªãã¡ã€ã³ãã¥ãŒãã³ã°è»è·¡ãæ¯é
ããŠããããšã瀺ããŠããïŒ
åŸé
ãã«ã åæ ïŒæåã®ãšããã¯ã¯ç·åŸé
ãã«ã ã®æå€§ã·ã§ã¢ã«å¯äžããæ¹åäžè²«æ§ ïŒåŸç¶ãšããã¯ã®åŸé
ã¯æåã®ãšããã¯åŸé
ãšã®é«ãã³ãµã€ã³é¡äŒŒåºŠïŒ>0.8ïŒãä¿æããæ§èœçäŸ¡æ§ ïŒ1ãšããã¯ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã¢ãã«ã®çµ±åã¯ãå®å
šã«åæããã¢ãã«ã®çµ±åãšåçã®æ§èœãç€ºãæ·±ãLã®ååããããã¯ãŒã¯ã«å¯ŸããŠãæçéã¿ãæçå
¥åãããã³æçå°é¢æ°æŽ»æ§å颿°ã®ä»®å®ã®äžã§ïŒ
äžè¬çãªæŽ»æ§å颿° ïŒ
||C({Ξ_MT^(j)}_{j=1}^h)||_2 †T((h+2)/2)|αT+1|H_max^Ï G_max^Ï
ReLU掻æ§å颿° ïŒ
||C({Ξ_MT^(j)}_{j=1}^h)||_2 †T((h+2)/2)|αT+1|H_max^ReLU G_max^ReLU
ããã§H_maxãšG_maxã¯ããããããã·ã¢ã³ãšåŸé
ã®äžçã§ããã
å®éšã¯7ã€ã®ããžã§ã³ãã³ãããŒã¯ããŒã¿ã»ããã䜿çšããïŒ
CIFAR-100 SVHN RESISC45 MNIST EuroSAT GTSRB DTD SUN397 1ãšããã¯å¯Ÿåææ¯èŒ ïŒ1ãšããã¯ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã¢ãã«ã®çµ±åãšå®å
šã«åæããã¢ãã«ã®çµ±åã®æ§èœãæ¯èŒåŸé
åæ ïŒåãšããã¯åŸé
ãã«ã ã®æ£èŠåãããå¯äžãåææ¹åäžè²«æ§ ïŒç°ãªããšããã¯åŸé
éã®ã³ãµã€ã³é¡äŒŒåºŠãèšç®ãã©ã¡ãŒã¿ç©ºéè»è·¡ ïŒPCAãéããŠç°ãªãçµ±åæŠç¥ã®ãã©ã¡ãŒã¿ç©ºéè»è·¡ãå¯èŠåæšæºã¿ã¹ã¯ç®è¡ïŒTask ArithmeticïŒ TIES-merging Model Breadcrumbs DARE å埩çã¿ã¹ã¯ç®è¡ïŒIterative TAïŒ æ§èœç䟡æ§ã®æ€èšŒ ïŒãã¹ãŠã®ãã¹ãããŒã¿ã»ããã§ã1ãšããã¯ã®ãã¡ã€ã³ãã¥ãŒãã³ã°æžã¿ã¢ãã«ã®çµ±åã¯ãå®å
šã«åæããã¢ãã«ã®çµ±åãšã»ãŒåçã®æ§èœã瀺ããå Žåã«ãã£ãŠã¯ããã«åªããŠããæåã®ãšããã¯æ¯é
æ§ ïŒæåã®ãšããã¯ã¯0.3ïœ0.7ã®æ£èŠååŸé
ãã«ã ã«å¯äžãã æåã®5ãšããã¯ã®åŸé
ãšæåã®ãšããã¯åŸé
ã®ã³ãµã€ã³é¡äŒŒåºŠã¯0.8以äžãç¶æãã ãã©ã¡ãŒã¿ç©ºéåæ ïŒå埩çã¿ã¹ã¯ç®è¡ã¯å°ããªã¹ããããµã€ãºã®æŽæ°ãéããŠãã¢ãã«ãç°ãªããããäœãæå€±é åãžå°ãããšãã§ããå®éšã¯çè«äºæž¬ã®ååŽé¢ãæ€èšŒããïŒ
æåã®ãšããã¯åŸé
ã®æ¯é
çå°äœãç¢ºèª åŸç¶ãšããã¯ã§å°å
¥ããã2次誀差é
ãçžå¯Ÿçã«å°ããããšãæ€èšŒ çæéã®ãã¡ã€ã³ãã¥ãŒãã³ã°ãã¢ãã«çµ±åã«æå©ã§ããããšãç¢ºèª ã¿ã¹ã¯ç¿ç床â çµ±åèœå ïŒé«åºŠã«å°éåãããã¢ãã«ãå¿
ãããããè¯ãçµ±åçµæãããããããã§ã¯ãªãåæåæ
ã®éèŠæ§ ïŒåæèšç·Žåæ
ã¯æåããã¢ãã«çµ±åã«éèŠã§ããåŸé
è¿äŒŒå質 ïŒã¿ã¹ã¯ãã¯ãã«ãçã®ãã«ãã¿ã¹ã¯åŸé
ã®è¿äŒŒãšããŠæã€å質ã¯ããã¡ã€ã³ãã¥ãŒãã³ã°æéã®å¢å ã«äŒŽãäœäžããç·åœ¢ã¢ãŒãæ¥ç¶æ§ç ç©¶ã¯ãå
±æåæåãæã€ã¢ãã«éã«ç·åœ¢çµè·¯ãååšããããšã瀺ããŠãã é åããŒã¹ã®çµ±åææ³ã¯æé©èŒžéãããã³ã°ãéããŠå¯Ÿç§°æ§ã®åé¡ã解決ãã ã¿ã¹ã¯ãã¯ãã«ã¯ãã¿ã¹ã¯ç¹å®ã®æŽæ°ãå
±æã¢ãã«ã®å¢åãšããŠè¡šçŸãã æ¡åŒµææ³ã¯ãã¹ããŒã¹æ§ãåªå®ããã¹ãã³ã°ãªã©ãéããŠå¹²æžãæžãã åŸæ¥ã®ãã«ãã¿ã¹ã¯åŠç¿ã¯ãå
±æè¡šçŸãšåž°çŽçãã€ã¢ã¹ãéããŠæ§èœãåäžããã åŸé
æè¡ãªã©ã®ææ³ã¯ã¿ã¹ã¯éã®åŸé
ç«¶åã解決ãã çè«çãã¬ãŒã¯ã¹ã«ãŒ ïŒã¿ã¹ã¯ãã¯ãã«ãšåŸé
ã®éã«åããŠå³å¯ãªæ°åŠçé¢é£æ§ã確ç«ããå®è·µçæå° ïŒ1ãšããã¯ã®ãã¡ã€ã³ãã¥ãŒãã³ã°ã®æå¹æ§ã蚌æããå®éã®å¿çšã«æå°ãæäŸããæ°ããèŠç¹ ïŒã¿ã¹ã¯ç®è¡ããã«ãã¿ã¹ã¯åŠç¿ã®è¿äŒŒãšããŠåå®çŸ©ããçè«çä»®å® ïŒåæã¯å
šãããåŸé
éäžæ³ã«åºã¥ããŠãããå®éã«ã¯SGDãå€ã䜿çšããããããã¯ãŒã¯ã¢ãŒããã¯ãã£ ïŒæç€ºçãªçã¯ååããããã¯ãŒã¯ã®ã¿ã察象ãšããçŸä»£çãªã¢ãŒããã¯ãã£ïŒCNNãTransformerïŒã¯ããè€éã§ããå®éšç¯å² ïŒäž»ã«ããžã§ã³ã¿ã¹ã¯ã§æ€èšŒãããŠãããä»ã®é åãžã®é©çšå¯èœæ§ã¯ãããªãæ€èšŒãå¿
èŠã§ããSGDçè«ã®æ¡åŒµ ïŒçè«ã確ççåŸé
éäžæ³èšå®ã«æ¡åŒµããè€éãªã¢ãŒããã¯ã㣠ïŒCNNãTransformerãªã©ã«å¯Ÿããçè«ççãæäŸãã2次é
ã®æé©å ïŒ2次誀差é
ãç¡èŠå¯èœãŸãã¯è¿äŒŒå¯èœãªå Žåãç ç©¶ããçµ±äžççè§£ ïŒæ©æåæ¢ãå¹³åŠ/éãæå°å€ãªã©ã®æŠå¿µãšã®é¢é£æ§ãæ¢çŽ¢ããçè«çè²¢ç®ãé¡è ïŒã¿ã¹ã¯ç®è¡ã®çè«ççè§£ã«ãããéèŠãªç©ºçœãåããæ°åŠçåæãå³å¯ ïŒå®å
šãªèšŒæãšæç€ºçãªèª€å·®çãæäŸããå®éšçæ€èšŒãå
å ïŒçè«äºæž¬ãè€æ°ã®ããŒã¿ã»ããã§å®éšçã«æ¯æãããŠããå®çšç䟡å€ãé«ã ïŒã¢ãã«çµ±åæŠç¥ã«çè«çæå°ãæäŸãã仮宿¡ä»¶ã匷ã ïŒå
šãããGDä»®å®ã¯å®éã®å¿çšãšã®ã®ã£ãããããã¢ãŒããã¯ãã£ã®å¶é ïŒçè«çµæã¯äž»ã«ã·ã³ãã«ãªååããããã¯ãŒã¯ã«é©çšå¯èœã§ããã¿ã¹ã¯ç¯å²ãçã ïŒå®éšã¯äž»ã«ããžã§ã³åé¡ã¿ã¹ã¯ã«éäžããŠããåŠè¡çäŸ¡å€ ïŒã¢ãã«çµ±åé åã«éèŠãªçè«çåºç€ãæäŸããå®çšçæçŸ© ïŒããå¹ççãªã¢ãã«çµ±åæŠç¥ãæå°ããåçºæ§ã匷ã ïŒåŸç¶ç ç©¶ã«æ°ããçè«çæ çµã¿ãæäŸãããã«ãã¿ã¹ã¯é
眮 ïŒè€æ°ã®å°éã¢ãã«ãçµ±äžã¢ãã«ã«çµ±åããå¿
èŠãããã·ãŒã³ãªãœãŒã¹å¶çŽç°å¢ ïŒã¹ãã¬ãŒãžãšèšç®ãªãœãŒã¹ãéå®ãããã¢ããªã±ãŒã·ã§ã³è¿
éãªé©å¿ ïŒãã«ãã¿ã¹ã¯èœåãè¿
éã«ç²åŸããå¿
èŠãããã·ãŒã³è«æã¯ãã¢ãã«çµ±åãã¿ã¹ã¯ãã¯ãã«ããã«ãã¿ã¹ã¯åŠç¿ãªã©ã®é åã«ãããéèŠãªç ç©¶ãåŒçšããŠãããããã«ã¯ä»¥äžãå«ãŸããïŒ
Ilharco et al. (2022) - ã¿ã¹ã¯ç®è¡ã®åå§çç ç©¶ Zhou et al. (2025) - å埩çã¿ã¹ã¯ç®è¡ Ortiz-Jimenez et al. (2024) - æ¥ç©ºéã«ãããã¿ã¹ã¯ç®è¡ Wortsman et al. (2022) - ã¢ãã«ã¹ãŒãææ³ æ¬è«æã¯ãå³å¯ãªæ°åŠçåæãéããŠã¿ã¹ã¯ç®è¡ã«çè«çåºç€ãæäŸããããã®æå¹æ§ã®çç±ã説æããã ãã§ãªããå®éã®å¿çšã«äŸ¡å€ã®ããæå°ãæäŸãããçè«çä»®å®ã«ããã€ãã®éçãããã«ããããããããã®è²¢ç®ã¯ã¢ãã«çµ±åæè¡ã®çè§£ãšæ¹åã«ãšã£ãŠéèŠãªæçŸ©ãæã€ã