With the increasing capabilities of Large Language Models (LLMs), parallel reasoning has emerged as a new inference paradigm that enhances reasoning robustness by concurrently exploring multiple lines of thought before converging on a final answer. It has become a significant trend to explore parallel reasoning to overcome the fragility of standard sequential methods and improve practical performance. In this paper, we aim to survey and summarize the progress and challenges of parallel reasoning. We first present a formal definition of parallel reasoning and clarify its distinction from related concepts like Chain-of-Thought. Then, we organize and discuss advanced techniques based on a novel taxonomy, including non-interactive reasoning, interactive reasoning, and efficiency-focused decoding strategies. Additionally, we explore various application scenarios, such as solving complex problems and enhancing the reliability of LLM outputs.Finally, we highlight the core challenges of parallel reasoning and suggest potential directions for future research. We hope that our work can provide a useful roadmap for beginners and encourage more research on improving parallel reasoning methods. Related source can be avaliable in https://github.com/PPPP-kaqiu/Awesome-Parallel-Reasoning.
With the continuous advancement of Large Language Models (LLMs), parallel reasoning has emerged as a novel reasoning paradigm that enhances reasoning robustness by simultaneously exploring multiple thought paths and converging to a single answer. This survey aims to investigate and summarize the progress and challenges in parallel reasoning. First, it provides a formal definition of parallel reasoning and clarifies its distinction from related concepts such as Chain-of-Thought (CoT). Subsequently, it organizes and discusses advanced techniques based on a novel taxonomy, including non-interactive reasoning, interactive reasoning, and efficiency-oriented decoding strategies, while exploring various application scenarios.
Traditional sequential reasoning methods suffer from inherent fragility and are prone to the "prefix trap"—once the model commits to an early reasoning path, it becomes difficult to self-correct and may never reach the optimal solution. This weakness is starkly evident in the gap between single-pass performance (Pass@1) and the best results from multiple samples (Pass@k).
Computational Scaling: Significant performance improvements not only in generation stages but also through computational investment in aggregation stages
Industrial Applications: OpenAI o1, Gemini DeepThink and other cutting-edge models
This survey paper provides a comprehensive and systematic technical landscape for the emerging field of parallel reasoning, possessing significant academic value while offering valuable guidance for practical applications. As the demand for large model reasoning capabilities continues to grow, parallel reasoning is poised to become a core technology in next-generation AI systems.