CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms
Gong, Zhu, Yin et al.
Short-video platforms have rapidly become a new generation of information retrieval systems, where users formulate queries to access desired videos. However, user queries, especially long-tail ones, often suffer from spelling errors, incomplete phrasing, and ambiguous intent, resulting in mismatches between user expectations and retrieved results. While large language models (LLMs) have shown success in long-tail query rewriting within e-commerce, they struggle on short-video platforms, where proprietary content such as short videos, live streams, micro dramas, and user social networks falls outside their training distribution. To address this challenge, we introduce \textbf{CardRewriter}, an LLM-based framework that incorporates domain-specific knowledge to enhance long-tail query rewriting. For each query, our method aggregates multi-source knowledge relevant to the query and summarizes it into an informative and query-relevant knowledge card. This card then guides the LLM to better capture user intent and produce more effective query rewrites. We optimize CardRewriter using a two-stage training pipeline: supervised fine-tuning followed by group relative policy optimization, with a tailored reward system balancing query relevance and retrieval effectiveness. Offline experiments show that CardRewriter substantially improves rewriting quality for queries targeting proprietary content. Online A/B testing further confirms significant gains in long-view rate (LVR) and click-through rate (CTR), along with a notable reduction in initiative query reformulation rate (IQRR). Since September 2025, CardRewriter has been deployed on Kuaishou, one of China's largest short-video platforms, serving hundreds of millions of users daily.
academic
CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms
Short-video platforms have rapidly become a new generation of information retrieval systems, where users obtain desired videos through queries. However, user queries, particularly long-tail queries, frequently suffer from spelling errors, incomplete expressions, and ambiguous intent, resulting in mismatches between user expectations and retrieval results. While large language models (LLMs) have demonstrated excellence in long-tail query rewriting for e-commerce domains, they face challenges on short-video platforms because platform-specific content (such as short videos, live streams, micro-dramas, and user social networks) lies outside their training distribution. To address this challenge, this paper proposes CardRewriter, an LLM-based framework that enhances long-tail query rewriting by incorporating domain-specific knowledge. The method aggregates multi-source relevant knowledge for each query and summarizes it into informative and query-relevant knowledge cards, which then guide the LLM to better capture user intent and produce more effective query rewrites.
Proposes CardRewriter Framework: The first LLM framework specifically designed for long-tail query rewriting on short-video platforms, effectively integrating platform-specific knowledge through knowledge cards
Designs Two-Stage Training Strategy: Combines supervised fine-tuning (SFT) and group relative policy optimization (GRPO), using a customized reward system to balance relevance and effectiveness
Validates Practical Effectiveness: Deployment verification on Kuaishou platform shows significant improvements in both offline and online experiments
Provides Complete Solution: End-to-end solution from knowledge collection, card generation to query rewriting
Given an input query x, CardRewriter's objective is to generate a rewritten query y that can retrieve video content better aligned with user intent. The entire process can be expressed as:
y = G_θ(x, c), c = C_θ(x, M)
where c is the knowledge card, M is multi-source knowledge, C_θ is the card generation model, and G_θ is the query rewriting model.
Knowledge Card Design: Compared to direct injection of multi-source knowledge, knowledge cards effectively address issues of structural inconsistency, excessive noise, and limited relevance
Two-Stage Training Strategy:
SFT Stage: Supervised fine-tuning using high-quality data
GRPO Stage: Further optimization through reinforcement learning
Customized Reward System:
R_Overall = {
R_Sys, if R_Sys > 0
0.1, if R_Sys = 0 and R_Rel > 0
0, if R_Sys = R_Rel = 0
}
RAG technology enhances generation quality by retrieving relevant information. This paper applies it to query rewriting tasks, effectively integrating multi-source information through knowledge cards.
The paper cites 33 relevant references covering query rewriting, retrieval-augmented generation, large language models, and other research directions, providing substantial theoretical foundation.
Summary: CardRewriter is an innovative research project addressing long-tail query rewriting on short-video platforms. By effectively integrating platform-specific knowledge through knowledge cards, it achieves significant results in both theoretical methodology and engineering practice. This work provides a valuable solution for handling query understanding tasks involving proprietary content.